perf: eliminate redundant node clones during `Trie::recalculate_root_hash()` by cylewitruk-stacks · Pull Request #6935 · stacks-network/stacks-core

cylewitruk-stacks · 2026-02-26T12:41:48Z

Description

Background

Trie::recalculate_root_hash() is called on every MARF insert (regardless of trie hashing mode).

Currently, all ancestor nodes are pre-flushed with a zero-hash prior to being written with their final hash, which results in two write_nodetype() calls (and two full-node clones) per node.

The pre-flush exists to materialize a node's child pointers before get_trie_root_hash() performs Merkle skiplist ancestor lookups, but that lookup only happens for root nodes, and only when update_skiplist == true (i.e., non-Deferred hashing modes). Non-root ancestors never trigger ancestor lookups, and neither does the root in Deferred mode.

The Change

This PR gates the two-write pattern on is_root_node && update_skiplist and thus eliminates one full-node clone per-ancestor-per-insert.

Applicable issues

Not tied to a specific issue; part of the 100x chain optimization effort.

Additional info (benefits, drawbacks, caveats)

Core Optimization:

New TrieStorageConnection::write_node_hash() function which updates only a node's stored hash without cloning and overwriting the full node body.
Non-root ancestor nodes: write final hash directly via write_nodetype() — one clone instead of two (both hash modes).
Root nodes:
- In Deferred mode: write final hash directly via write_nodetype() — one clone instead of two (no skiplist lookup, no pre-flush needed).
- In Immediate (and All) modes: child pointers are still pre-flushed via write_nodetype() before skiplist hash derivation, but the final write is now a hash-only update via write_node_hash() — one clone instead of two.

Minor Hygiene:

Some minor trace! & comments cleanup.

Benchmark Diff:

Using marf-bench in PR #6932
Write path allocation counts down between 3-8%, allocated bytes down 3-18% (with synthetic benchmark data) and some minor improvements in timings.

[marf-bench] Comparison summary
values: base:0317850e7f042de98e7bc6a1f26f6183e7d20f98 / target:36c78bd17a8cde53fd5f3700ad117d21dbc6ac5c / %delta
benchmark  name                                       total(ms) b/t       Δ  alloc_count b/t      Δ  alloc_bytes b/t       Δ
----------------------------------------------------------------------------------------------------------------------------
write      node256/depth=1/begin_block                  0.253/0.201  -20.6%          280/272  -2.9%    955098/929754   -2.7%
write      node256/depth=1/commit_flush                 8.959/9.705   +8.3%            12/12  +0.0%      68704/68704   +0.0%
write      node256/depth=1/fill_node16_to_capacity      0.134/0.084  -37.3%          296/272  -8.1%    415320/339288  -18.3%
write      node256/depth=1/fill_node48_to_capacity      0.362/0.229  -36.7%         1034/968  -6.4%  1346504/1141896  -15.2%
write      node256/depth=1/fill_node4_to_capacity       0.035/0.023  -34.3%            80/74  -7.5%     104028/85020  -18.3%
write      node256/depth=1/insert_first_leaf            0.023/0.015  -34.8%            56/54  -3.6%      75396/69060   -8.4%
write      node256/depth=1/promote_node16_to_node48     0.027/0.018  -33.3%            72/66  -8.3%      84288/69760  -17.2%
write      node256/depth=1/promote_node48_to_node256    0.027/0.020  -25.9%            78/72  -7.7%    125696/106688  -15.1%
write      node256/depth=1/promote_node4_to_node16      0.024/0.016  -33.3%            56/52  -7.1%      69440/56768  -18.2%
write      node256/depth=1/seal                         0.119/0.082  -31.1%          168/168  +0.0%      65700/65700   +0.0%
write      node256/depth=1/split_leaf_to_node4          0.028/0.019  -32.1%            70/66  -5.7%      69868/57196  -18.1%
write      noop/depth=1/begin_block                     0.334/0.276  -17.4%          274/266  -2.9%    945898/920554   -2.7%
write      noop/depth=1/commit_flush                    9.410/8.641   -8.2%            12/12  +0.0%      68704/68704   +0.0%
write      noop/depth=1/fill_node16_to_capacity         0.146/0.126  -13.7%          296/272  -8.1%    415320/339288  -18.3%
write      noop/depth=1/fill_node48_to_capacity         0.396/0.336  -15.2%         1034/968  -6.4%  1346504/1141896  -15.2%
write      noop/depth=1/fill_node4_to_capacity          0.039/0.033  -15.4%            80/74  -7.5%     104028/85020  -18.3%
write      noop/depth=1/insert_first_leaf               0.029/0.024  -17.2%            63/60  -4.8%      72632/63128  -13.1%
write      noop/depth=1/promote_node16_to_node48        0.030/0.025  -16.7%            72/66  -8.3%      84288/69760  -17.2%
write      noop/depth=1/promote_node48_to_node256       0.034/0.028  -17.6%            78/72  -7.7%    125696/106688  -15.1%
write      noop/depth=1/promote_node4_to_node16         0.029/0.024  -17.2%            63/59  -6.3%      69661/56989  -18.2%
write      noop/depth=1/seal                            0.133/0.114  -14.3%          168/168  +0.0%      65697/65697   +0.0%
write      noop/depth=1/split_leaf_to_node4             0.030/0.025  -16.7%            63/59  -6.3%      69654/56982  -18.2%
----------------------------------------------------------------------------------------------------------------------------

[marf-bench] Repeated comparison stats
baseline: base:0317850e7f042de98e7bc6a1f26f6183e7d20f98
comparison: target:36c78bd17a8cde53fd5f3700ad117d21dbc6ac5c
values: median/min/max %delta across 10 repeats
benchmark  name                                        total Δ med   total Δ min   total Δ max   count Δ med   bytes Δ med  repeats
-----------------------------------------------------------------------------------------------------------------------------------
write      node256/depth=1/begin_block                       -4.1%        -20.6%         +9.1%         -2.9%         -2.7%       10
write      node256/depth=1/commit_flush                      +1.8%        -21.7%        +12.5%         +0.0%         +0.0%       10
write      node256/depth=1/fill_node16_to_capacity           -3.0%        -37.3%         +6.3%         -8.1%        -18.3%       10
write      node256/depth=1/fill_node48_to_capacity           -2.3%        -36.7%         +1.8%         -6.4%        -15.2%       10
write      node256/depth=1/fill_node4_to_capacity            +0.0%        -34.3%         +6.3%         -7.5%        -18.3%       10
write      node256/depth=1/insert_first_leaf                 +0.0%        -34.8%         +9.1%         -3.6%         -8.4%       10
write      node256/depth=1/promote_node16_to_node48          +0.0%        -33.3%        +16.7%         -8.3%        -17.2%       10
write      node256/depth=1/promote_node48_to_node256         -3.6%        -25.9%        +21.4%         -7.7%        -15.1%       10
write      node256/depth=1/promote_node4_to_node16           +0.0%        -33.3%         +0.0%         -7.1%        -18.2%       10
write      node256/depth=1/seal                              +0.0%        -31.1%         +5.4%         +0.0%         +0.0%       10
write      node256/depth=1/split_leaf_to_node4               +0.0%        -32.1%         +0.0%         -5.7%        -18.1%       10
write      noop/depth=1/begin_block                         -10.5%        -21.9%        +15.7%         -2.9%         -2.7%       10
write      noop/depth=1/commit_flush                         +1.5%        -10.8%        +11.6%         +0.0%         +0.0%       10
write      noop/depth=1/fill_node16_to_capacity              -2.5%        -13.7%         +1.7%         -8.1%        -18.3%       10
write      noop/depth=1/fill_node48_to_capacity              -2.1%        -15.3%         +5.3%         -6.4%        -15.2%       10
write      noop/depth=1/fill_node4_to_capacity               -2.9%        -30.4%        +12.5%         -7.5%        -18.3%       10
write      noop/depth=1/insert_first_leaf                    -7.1%        -36.8%         +8.3%         -4.8%        -13.1%       10
write      noop/depth=1/promote_node16_to_node48             -3.6%        -16.7%        +21.4%         -8.3%        -17.2%       10
write      noop/depth=1/promote_node48_to_node256           -10.0%        -33.3%         +7.1%         -7.7%        -15.1%       10
write      noop/depth=1/promote_node4_to_node16              +0.0%        -17.2%         +8.3%         -6.3%        -18.2%       10
write      noop/depth=1/seal                                 -3.3%        -14.3%         +3.5%         +0.0%         +0.0%       10
write      noop/depth=1/split_leaf_to_node4                  +0.0%        -82.1%         +0.0%         -6.3%        -18.2%       10
-----------------------------------------------------------------------------------------------------------------------------------

[marf-bench] Repeat confidence summary
baseline: base:0317850e7f042de98e7bc6a1f26f6183e7d20f98
comparison: target:36c78bd17a8cde53fd5f3700ad117d21dbc6ac5c
values: total_ms stability across 10 repeats
rows: total=22 stable=9 high-jitter=13  (high-jitter means min<0<max and spread>=30.0%)
top high-jitter rows (by spread):
  write / node256/depth=1/promote_node16_to_node48  median=+0.0%  min=-33.3%  max=+16.7%  spread=50.0%
  write / node256/depth=1/promote_node48_to_node256  median=-3.6%  min=-25.9%  max=+21.4%  spread=47.4%
  write / noop/depth=1/insert_first_leaf  median=-7.1%  min=-36.8%  max=+8.3%  spread=45.2%
  write / node256/depth=1/insert_first_leaf  median=+0.0%  min=-34.8%  max=+9.1%  spread=43.9%
  write / node256/depth=1/fill_node16_to_capacity  median=-3.0%  min=-37.3%  max=+6.3%  spread=43.7%
  write / noop/depth=1/fill_node4_to_capacity  median=-2.9%  min=-30.4%  max=+12.5%  spread=42.9%
  write / node256/depth=1/fill_node4_to_capacity  median=+0.0%  min=-34.3%  max=+6.3%  spread=40.5%
  write / noop/depth=1/promote_node48_to_node256  median=-10.0%  min=-33.3%  max=+7.1%  spread=40.5%
  write / node256/depth=1/fill_node48_to_capacity  median=-2.3%  min=-36.7%  max=+1.8%  spread=38.5%
  write / noop/depth=1/promote_node16_to_node48  median=-3.6%  min=-16.7%  max=+21.4%  spread=38.1%

Checklist

Test coverage for new or modified code paths
Changelog is updated

…ite-elision

Copilot

Pull request overview

This PR optimizes MARF trie insert performance by reducing redundant node cloning/writes during Trie::recalculate_root_hash(), especially when computing the root hash with skiplist updates enabled.

Changes:

Gate the “pre-flush then final write” behavior to only the root node when update_skiplist == true.
Add TrieStorageConnection::write_node_hash() to update only a node’s stored hash in the uncommitted trie state (avoiding a full node clone/write).
Minor logging/message and comment hygiene updates in trie operations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`stackslib/src/chainstate/stacks/index/trie.rs`	Avoids redundant pre-flush writes for non-root ancestors; uses hash-only update for the root’s final write when skiplist updating.
`stackslib/src/chainstate/stacks/index/storage.rs`	Introduces `TrieStorageConnection::write_node_hash()` to support hash-only updates in uncommitted state.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov · 2026-02-26T14:22:10Z

Codecov Report

❌ Patch coverage is 72.41379% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.02%. Comparing base (b61a766) to head (f6f9f82).

Files with missing lines	Patch %	Lines
stackslib/src/chainstate/stacks/index/storage.rs	63.63%	4 Missing ⚠️
stackslib/src/chainstate/stacks/index/trie.rs	77.77%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           develop    #6935       +/-   ##
============================================
+ Coverage    59.40%   70.02%   +10.61%     
============================================
  Files          412      412               
  Lines       218187   218190        +3     
  Branches       338      338               
============================================
+ Hits        129610   152778    +23168     
+ Misses       88577    65412    -23165

Files with missing lines	Coverage Δ
stackslib/src/chainstate/stacks/index/storage.rs	`80.79% <63.63%> (+0.36%)`	⬆️
stackslib/src/chainstate/stacks/index/trie.rs	`87.09% <77.77%> (+4.01%)`	⬆️

... and 317 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b61a766...f6f9f82. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

federico-stacks

the change makes sense to me. Just added a nit

jcnelson

Couple nits but LGTM. Will approve once resolved. Thanks @cylewitruk-stacks!

…ite-elision

…and some renames

…impl macro

federico-stacks · 2026-03-13T17:19:28Z

-            );
+            debug!("Next root hash is {h} (update_skiplist={update_skiplist})");

            storage.write_nodetype(child_ptr.ptr(), &node, h)?;


sanity-check: could write_node_hash be used here?

yeah I think we might since we're not modifying the node in this branch (like we do below) 👍 Will try it and make sure things are green.

jcnelson

LGTM!

benjamin-stacks

I'm an absolute noob in our MARF implementation code, but this change makes a lot of sense to me. Maybe take me as half an approval, given that there are probably a lot of edge cases I may not be aware of 😅

One thing I'm wondering: How good is the code coverage of the surface area of this? I assume that any regression would (at the latest) be caught during a full replay test, but I don't have a good intuition on the validity of that assumption.

dhaney-stacks · 2026-04-17T18:04:36Z

@cylewitruk-stacks please merge then when able.

cylewitruk-stacks added 2 commits February 26, 2026 13:25

eliminate redundant node clones during trie root hash recalculation

36c78bd

Merge remote-tracking branch 'upstream/develop' into perf/marf-put-wr…

0225893

…ite-elision

cylewitruk-stacks requested a review from Copilot February 26, 2026 12:41

Copilot started reviewing on behalf of cylewitruk-stacks February 26, 2026 12:42 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Comment thread stackslib/src/chainstate/stacks/index/trie.rs Outdated

Comment thread stackslib/src/chainstate/stacks/index/storage.rs

cylewitruk-stacks requested review from aaronb-stacks, federico-stacks and jcnelson February 26, 2026 13:24

federico-stacks reviewed Feb 26, 2026

View reviewed changes