-
Notifications
You must be signed in to change notification settings - Fork 502
Refined memory/cpu cost models for ValueData and UnValueData
#7500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
1bd6254 to
43bbe0a
Compare
Add new memory-analysis executable with modules for analyzing memory behavior of Plutus builtins. Includes plotting utilities, regression analysis, and experiment framework for deriving accurate memory models from empirical measurements.
Introduce DataNodeCount newtype that measures Data memory via lazy node traversal rather than serialization size. This provides more accurate memory accounting for UnValueData builtin which operates on the Data structure directly without serializing. The wrapper separates concerns: node counting logic here, cost coefficients in JSON models.
Add KnownTypeAst and builtin marshalling instances for DataNodeCount. This enables using the new memory model in builtin definitions while maintaining type safety through the universe system. Also includes minor refactoring (void instead of (() <$)) for clarity.
43bbe0a to
073fe97
Compare
Apply ValueTotalSize to ValueData and DataNodeCount to UnValueData, replacing plain Value/Data types. This enables accurate memory accounting: ValueData uses total serialized size, UnValueData uses node count for measuring input Data complexity.
Update ValueData and UnValueData benchmarks to use createOneTermBuiltinBenchWithWrapper with appropriate memory measurement wrappers (ValueTotalSize and DataNodeCount). This ensures benchmarks measure the same memory behavior as production builtins.
Replace constant memory costs with linear models derived from empirical measurements: - ValueData: memory = 38×size + 6 (was constant 1) - UnValueData: memory = 8×nodes + 0 (was constant 1) CPU: 290658×nodes + 1000 (was 43200×arg + 1000) The linear models better reflect actual memory behavior: ValueData scales with serialized size, UnValueData scales with node count. Benchmark data regenerated with new memory measurement approach.
073fe97 to
026e835
Compare
| @@ -0,0 +1,121 @@ | |||
| module PlutusBenchmark.RegressionInteger (integerBestFit) where | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should implement our own linear regression algorithm when R is the domain standard for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither do I!
| "cpu": { | ||
| "arguments": 194713, | ||
| "arguments": 164434, | ||
| "type": "constant_cost" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously, ValueData can not be constant time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the Slack discussion about this. We'll need to do something like using nf instead of whnf for the CPU costing for valueData, the problem being that the implementaion of valueData begins with Map . ... and whnf won't cause the stuff in ... (which does all the hard work) to be evaluated.
kwxm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a new PR that just updates the costs and doesn't include the memory cost inference stuff in PlutusBenchmark? I think that determining the memory costs empirically instead of just waving our hands and coming up with a rough estimate is a promising idea, but it's quite a big change and it's kind of orthogonal to what we're trying to achieve at the moment. As long as we have a measure of the memory cost it doesn't matter too much where it came from (although I do have some small reservations: see the comments). Keep the memory usage inference code somewhere though! We can come back and think about this more carefully after the pressure to get everything ready for the HF has relaxed.
However, the main thing that needs to be changed is the CPU costing for valueData: the current constant cost is definitely wrong, but that should be fixable.
| | ValueContains'cpu'arguments'intercept | ||
| | ValueContains'cpu'arguments'slope | ||
| | ValueContains'memory'arguments | ||
| | ValueData'cpu'arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to have an intercept and slope since the CPU cost of valueData should be linear (and similarly in the other ParamName files).
| {-# INLINE memoryUsage #-} | ||
|
|
||
| -- Should be 72 | ||
| -- Helper function to count nodes in a Data object, returning a lazy CostRose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need all this. It should be enough to count the nodes (so the same thing, but replacing s with 1), and then runOneArgumentModel will do the scaling for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, hold on. The only time this is called is with s=1 anyway, so why the extra generality?
|
|
||
| -- Should be 72 | ||
| -- Helper function to count nodes in a Data object, returning a lazy CostRose | ||
| -- with the slope applied per node. The intercept is applied once at the root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment correct? The ExMemoryUsage instance doesn't mention the intercept.
| The actual memory formula (slope × nodeCount + intercept) is applied in the JSON cost model. -} | ||
| newtype DataNodeCount = DataNodeCount Data | ||
|
|
||
| instance ExMemoryUsage DataNodeCount where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit worried about what would happen here when we extend Data to have a Value field and valueData and unValueData use that. However I think that'll be OK. For instance we won't need to care about the number of nodes in the input to unValueData any more, but we can deal with that by updating the CPU and memory costing functions to have zero slope, effectively making them constant (although now I'm wondering if we'll still have to traverse the entire CostRose).
| "cpu": { | ||
| "arguments": 194713, | ||
| "arguments": 164434, | ||
| "type": "constant_cost" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the Slack discussion about this. We'll need to do something like using nf instead of whnf for the CPU costing for valueData, the problem being that the implementaion of valueData begins with Map . ... and whnf won't cause the stuff in ... (which does all the hard work) to be evaluated.
| valueDataBenchmark gen = createOneTermBuiltinBench ValueData [] (generateTestValues gen) | ||
| valueDataBenchmark gen = | ||
| createOneTermBuiltinBenchWithWrapper | ||
| ValueTotalSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly we don't need this because ValueTotalSize gives the same result as the defaullt memory usage instance, but I think we should keep the wrapper because it makes the size measure explicit. So this change is good!
| import PlutusCore.Evaluation.Machine.ExMemoryUsage | ||
| ( ValueLogOuterSizeAddLogMaxInnerSize (..) | ||
| ( DataNodeCount (..) | ||
| , ValueLogOuterSizeAddLogMaxInnerSize (..) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that @ana-pantilie 's PR changes this to ValueMaxDepth, which is less cumbersome and a lot clearer.
| memU :: ExMemoryUsage a => a -> Integer | ||
| memU x = fromSatInt (sumCostStream (flattenCostRose (memoryUsage x))) | ||
|
|
||
| -- | Measure size by walking the object graph in 64-bit words; resistant to heap churn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little dubious about this approach because (if I understand correctly) it measures the total size occupied by the result, which, because the result may share some data with the input, may be considerably larger than the amount of new heap space that has to be allocated. For example, in the case of valueData and unValueData there's no need to make new copies of the keys and quantities: if you call valueData v then (I think) the resulting data object will contain pointers to the keys and quantities in the input value v, not new copies of them (and v itself will I think only contain pointers to the actual keys and quantities). Keys take up 4 words in the worst case and quantities take up 2 words, and we probably need to subtract those numbers from the memory usage reported by measureGraphWords to get the true amount of new heap space allocated.
Maybe this isn't really a problem: the numbers returned by the analysis should definitely be upper bounds for the "true" memory usage, so the bounds will be safe. Also, the existing memory costs are generallly pretty crude anyway, so a litlle bit of extra inaccuracy may be tolerable.
I think this issue would be more of a problem for things like tailList, where there really is a lot of sharing: if you have a list with 1000 elements then tailList will just return a pointer to the tail, not a new list with 999 elements. Similarly, bytestrings are implemented as C arrays in the heap together with a pointer to the start of thebytestring and an integer containing the length: sliceByteString doesn't copy any of the bytes in the bytestring, it just returns a small object containing the same array but with the pointer and the length updated. I have a vague memory that we didn't understand this at first and so overestimated the memory usage of sliceByteString. You'll see that the memory usage function is a linear function with slope zero, and I think that's because it originally had a nonzero slope but then we changed it when we realised what was going on.
These examples suggest that it'd be difficult to automatically infer the memory allocated by a builtin because you have to look at the implementation (which may change!) to see how much sharing there is in the inputs.
However, it's quite possible that I've misunderstood what's going on here, so let me know if that's the case.
|
I'll add that the PR description is very lengthy, but it doesn't explain exactly how the memory inference works out the total memory allocation. A crucial point is that it uses |
Context
ValueDataandUnValueDatabuiltinsProblem Statement
The current cost models for
ValueDataandUnValueDatause constant memory costs of 1 unit, regardless of input size. This is inaccurate because:ValuetoData, memory should scale with serialized sizeDatatoValue, memory should scale with node count in the Data structureInaccurate memory models can lead to budget misestimation in smart contracts.
Solution Approach
This PR implements a comprehensive solution in 6 logical commits:
DataNodeCountnewtype for node-based memory trackingDataNodeCountinto the DefaultUni type systemValueTotalSize,DataNodeCount) to builtinsMemory Measurement Strategy
ValueTotalSizewrapper (already exists) - measures total serialized sizeDataNodeCountwrapper - performs lazy node traversal of Data structureThis approach separates concerns:
ExMemoryUsage.hs(slope applied per node/byte)Design Decisions
Why node count for UnValueData?
Data → Valueby traversing the Data tree structureCostRose) ensures accurate accountingWhy separate wrappers?
Changes
Memory Analysis Tooling
New executable:
plutus-benchmark:memory-analysisPlutusBenchmark.MemoryAnalysis: Main analysis frameworkPlutusBenchmark.MemoryAnalysis.Experiments: Memory behavior experimentsPlutusBenchmark.MemoryAnalysis.Generators: Test data generatorsPlutusBenchmark.Plotting: Chart generation utilitiesPlutusBenchmark.RegressionInteger: Regression with asymmetric lossThis tooling enabled empirical measurement of memory behavior to derive the coefficients used in the cost models.
Core Memory Tracking
plutus-core/src/PlutusCore/Evaluation/Machine/ExMemoryUsage.hs
DataNodeCountnewtype wrappingDataExMemoryUsageinstance usingcountNodesRoseScaledAllowAmbiguousTypes,BlockArguments,InstanceSigs,KindSignatures,ScopedTypeVariablesplutus-core/src/PlutusCore/Default/Universe.hs
KnownTypeAstinstance forDataNodeCountMakeKnownInandReadKnownIninstances for marshallingvoidinstead of(() <$)for clarityBuiltin Updates
plutus-core/src/PlutusCore/Default/Builtins.hs
ValueData: Changed signature fromValue -> DatatoValueTotalSize -> DataUnValueData: Changed signature fromData -> BuiltinResult ValuetoDataNodeCount -> BuiltinResult ValueBenchmark Alignment
plutus-core/cost-model/budgeting-bench/Benchmarks/Values.hs
valueDataBenchmarkto usecreateOneTermBuiltinBenchWithWrapperwithValueTotalSizeunValueDataBenchmarkto usecreateOneTermBuiltinBenchWithWrapperwithDataNodeCountCost Model Data
plutus-core/cost-model/data/builtinCostModel{A,B,C}.json
Updated memory models (all three variants updated identically):
Memory = 38 × size + 6 (was constant 1)
Memory = 8 × nodes + 0 (was constant 1)
CPU = 290658 × nodes + 1000 (updated from 43200 × arg + 1000)
plutus-core/cost-model/data/benching-conway.csv
Regenerated benchmark data (404 lines changed) with new memory measurement approach.
Impact
Budget Changes
Scripts using
ValueDataorUnValueDatawill see different memory budget consumption:Conformance Tests
Expect budget differences in conformance tests that use these builtins. The new costs are more accurate than the previous constant models.
4. Memory Analysis
The memory-analysis executable can reproduce the experiments:
This generates plots and regression analysis in
plutus-benchmark/memory-analysis/data/.5. Conformance Tests
cabal test plutus-conformanceExpect budget differences but correct behavior.
Notes for Reviewers
Commit Structure
The PR is organized as 6 atomic commits following dependency order:
Each commit is buildable and represents a logical unit of change.
The updated CPU models for
ValueDataandUnValueDatacould be previewed here.