Skip to content

perf: eliminate redundant node clones during Trie::recalculate_root_hash()#6935

Open
cylewitruk-stacks wants to merge 5 commits intostacks-network:developfrom
cylewitruk-stacks:perf/marf-put-write-elision
Open

perf: eliminate redundant node clones during Trie::recalculate_root_hash()#6935
cylewitruk-stacks wants to merge 5 commits intostacks-network:developfrom
cylewitruk-stacks:perf/marf-put-write-elision

Conversation

@cylewitruk-stacks
Copy link
Copy Markdown
Contributor

@cylewitruk-stacks cylewitruk-stacks commented Feb 26, 2026

Description

Background

Trie::recalculate_root_hash() is called on every MARF insert (regardless of trie hashing mode).

Currently, all ancestor nodes are pre-flushed with a zero-hash prior to being written with their final hash, which results in two write_nodetype() calls (and two full-node clones) per node.

The pre-flush exists to materialize a node's child pointers before get_trie_root_hash() performs Merkle skiplist ancestor lookups, but that lookup only happens for root nodes, and only when update_skiplist == true (i.e., non-Deferred hashing modes). Non-root ancestors never trigger ancestor lookups, and neither does the root in Deferred mode.

The Change

This PR gates the two-write pattern on is_root_node && update_skiplist and thus eliminates one full-node clone per-ancestor-per-insert.

Applicable issues

  • Not tied to a specific issue; part of the 100x chain optimization effort.

Additional info (benefits, drawbacks, caveats)

Core Optimization:

  • New TrieStorageConnection::write_node_hash() function which updates only a node's stored hash without cloning and overwriting the full node body.
  • Non-root ancestor nodes: write final hash directly via write_nodetype() — one clone instead of two (both hash modes).
  • Root nodes:
    • In Deferred mode: write final hash directly via write_nodetype() — one clone instead of two (no skiplist lookup, no pre-flush needed).
    • In Immediate (and All) modes: child pointers are still pre-flushed via write_nodetype() before skiplist hash derivation, but the final write is now a hash-only update via write_node_hash() — one clone instead of two.

Minor Hygiene:

  • Some minor trace! & comments cleanup.

Benchmark Diff:

Using marf-bench in PR #6932
Write path allocation counts down between 3-8%, allocated bytes down 3-18% (with synthetic benchmark data) and some minor improvements in timings.

[marf-bench] Comparison summary
values: base:0317850e7f042de98e7bc6a1f26f6183e7d20f98 / target:36c78bd17a8cde53fd5f3700ad117d21dbc6ac5c / %delta
benchmark  name                                       total(ms) b/t       Δ  alloc_count b/t      Δ  alloc_bytes b/t       Δ
----------------------------------------------------------------------------------------------------------------------------
write      node256/depth=1/begin_block                  0.253/0.201  -20.6%          280/272  -2.9%    955098/929754   -2.7%
write      node256/depth=1/commit_flush                 8.959/9.705   +8.3%            12/12  +0.0%      68704/68704   +0.0%
write      node256/depth=1/fill_node16_to_capacity      0.134/0.084  -37.3%          296/272  -8.1%    415320/339288  -18.3%
write      node256/depth=1/fill_node48_to_capacity      0.362/0.229  -36.7%         1034/968  -6.4%  1346504/1141896  -15.2%
write      node256/depth=1/fill_node4_to_capacity       0.035/0.023  -34.3%            80/74  -7.5%     104028/85020  -18.3%
write      node256/depth=1/insert_first_leaf            0.023/0.015  -34.8%            56/54  -3.6%      75396/69060   -8.4%
write      node256/depth=1/promote_node16_to_node48     0.027/0.018  -33.3%            72/66  -8.3%      84288/69760  -17.2%
write      node256/depth=1/promote_node48_to_node256    0.027/0.020  -25.9%            78/72  -7.7%    125696/106688  -15.1%
write      node256/depth=1/promote_node4_to_node16      0.024/0.016  -33.3%            56/52  -7.1%      69440/56768  -18.2%
write      node256/depth=1/seal                         0.119/0.082  -31.1%          168/168  +0.0%      65700/65700   +0.0%
write      node256/depth=1/split_leaf_to_node4          0.028/0.019  -32.1%            70/66  -5.7%      69868/57196  -18.1%
write      noop/depth=1/begin_block                     0.334/0.276  -17.4%          274/266  -2.9%    945898/920554   -2.7%
write      noop/depth=1/commit_flush                    9.410/8.641   -8.2%            12/12  +0.0%      68704/68704   +0.0%
write      noop/depth=1/fill_node16_to_capacity         0.146/0.126  -13.7%          296/272  -8.1%    415320/339288  -18.3%
write      noop/depth=1/fill_node48_to_capacity         0.396/0.336  -15.2%         1034/968  -6.4%  1346504/1141896  -15.2%
write      noop/depth=1/fill_node4_to_capacity          0.039/0.033  -15.4%            80/74  -7.5%     104028/85020  -18.3%
write      noop/depth=1/insert_first_leaf               0.029/0.024  -17.2%            63/60  -4.8%      72632/63128  -13.1%
write      noop/depth=1/promote_node16_to_node48        0.030/0.025  -16.7%            72/66  -8.3%      84288/69760  -17.2%
write      noop/depth=1/promote_node48_to_node256       0.034/0.028  -17.6%            78/72  -7.7%    125696/106688  -15.1%
write      noop/depth=1/promote_node4_to_node16         0.029/0.024  -17.2%            63/59  -6.3%      69661/56989  -18.2%
write      noop/depth=1/seal                            0.133/0.114  -14.3%          168/168  +0.0%      65697/65697   +0.0%
write      noop/depth=1/split_leaf_to_node4             0.030/0.025  -16.7%            63/59  -6.3%      69654/56982  -18.2%
----------------------------------------------------------------------------------------------------------------------------

[marf-bench] Repeated comparison stats
baseline: base:0317850e7f042de98e7bc6a1f26f6183e7d20f98
comparison: target:36c78bd17a8cde53fd5f3700ad117d21dbc6ac5c
values: median/min/max %delta across 10 repeats
benchmark  name                                        total Δ med   total Δ min   total Δ max   count Δ med   bytes Δ med  repeats
-----------------------------------------------------------------------------------------------------------------------------------
write      node256/depth=1/begin_block                       -4.1%        -20.6%         +9.1%         -2.9%         -2.7%       10
write      node256/depth=1/commit_flush                      +1.8%        -21.7%        +12.5%         +0.0%         +0.0%       10
write      node256/depth=1/fill_node16_to_capacity           -3.0%        -37.3%         +6.3%         -8.1%        -18.3%       10
write      node256/depth=1/fill_node48_to_capacity           -2.3%        -36.7%         +1.8%         -6.4%        -15.2%       10
write      node256/depth=1/fill_node4_to_capacity            +0.0%        -34.3%         +6.3%         -7.5%        -18.3%       10
write      node256/depth=1/insert_first_leaf                 +0.0%        -34.8%         +9.1%         -3.6%         -8.4%       10
write      node256/depth=1/promote_node16_to_node48          +0.0%        -33.3%        +16.7%         -8.3%        -17.2%       10
write      node256/depth=1/promote_node48_to_node256         -3.6%        -25.9%        +21.4%         -7.7%        -15.1%       10
write      node256/depth=1/promote_node4_to_node16           +0.0%        -33.3%         +0.0%         -7.1%        -18.2%       10
write      node256/depth=1/seal                              +0.0%        -31.1%         +5.4%         +0.0%         +0.0%       10
write      node256/depth=1/split_leaf_to_node4               +0.0%        -32.1%         +0.0%         -5.7%        -18.1%       10
write      noop/depth=1/begin_block                         -10.5%        -21.9%        +15.7%         -2.9%         -2.7%       10
write      noop/depth=1/commit_flush                         +1.5%        -10.8%        +11.6%         +0.0%         +0.0%       10
write      noop/depth=1/fill_node16_to_capacity              -2.5%        -13.7%         +1.7%         -8.1%        -18.3%       10
write      noop/depth=1/fill_node48_to_capacity              -2.1%        -15.3%         +5.3%         -6.4%        -15.2%       10
write      noop/depth=1/fill_node4_to_capacity               -2.9%        -30.4%        +12.5%         -7.5%        -18.3%       10
write      noop/depth=1/insert_first_leaf                    -7.1%        -36.8%         +8.3%         -4.8%        -13.1%       10
write      noop/depth=1/promote_node16_to_node48             -3.6%        -16.7%        +21.4%         -8.3%        -17.2%       10
write      noop/depth=1/promote_node48_to_node256           -10.0%        -33.3%         +7.1%         -7.7%        -15.1%       10
write      noop/depth=1/promote_node4_to_node16              +0.0%        -17.2%         +8.3%         -6.3%        -18.2%       10
write      noop/depth=1/seal                                 -3.3%        -14.3%         +3.5%         +0.0%         +0.0%       10
write      noop/depth=1/split_leaf_to_node4                  +0.0%        -82.1%         +0.0%         -6.3%        -18.2%       10
-----------------------------------------------------------------------------------------------------------------------------------

[marf-bench] Repeat confidence summary
baseline: base:0317850e7f042de98e7bc6a1f26f6183e7d20f98
comparison: target:36c78bd17a8cde53fd5f3700ad117d21dbc6ac5c
values: total_ms stability across 10 repeats
rows: total=22 stable=9 high-jitter=13  (high-jitter means min<0<max and spread>=30.0%)
top high-jitter rows (by spread):
  write / node256/depth=1/promote_node16_to_node48  median=+0.0%  min=-33.3%  max=+16.7%  spread=50.0%
  write / node256/depth=1/promote_node48_to_node256  median=-3.6%  min=-25.9%  max=+21.4%  spread=47.4%
  write / noop/depth=1/insert_first_leaf  median=-7.1%  min=-36.8%  max=+8.3%  spread=45.2%
  write / node256/depth=1/insert_first_leaf  median=+0.0%  min=-34.8%  max=+9.1%  spread=43.9%
  write / node256/depth=1/fill_node16_to_capacity  median=-3.0%  min=-37.3%  max=+6.3%  spread=43.7%
  write / noop/depth=1/fill_node4_to_capacity  median=-2.9%  min=-30.4%  max=+12.5%  spread=42.9%
  write / node256/depth=1/fill_node4_to_capacity  median=+0.0%  min=-34.3%  max=+6.3%  spread=40.5%
  write / noop/depth=1/promote_node48_to_node256  median=-10.0%  min=-33.3%  max=+7.1%  spread=40.5%
  write / node256/depth=1/fill_node48_to_capacity  median=-2.3%  min=-36.7%  max=+1.8%  spread=38.5%
  write / noop/depth=1/promote_node16_to_node48  median=-3.6%  min=-16.7%  max=+21.4%  spread=38.1%

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes MARF trie insert performance by reducing redundant node cloning/writes during Trie::recalculate_root_hash(), especially when computing the root hash with skiplist updates enabled.

Changes:

  • Gate the “pre-flush then final write” behavior to only the root node when update_skiplist == true.
  • Add TrieStorageConnection::write_node_hash() to update only a node’s stored hash in the uncommitted trie state (avoiding a full node clone/write).
  • Minor logging/message and comment hygiene updates in trie operations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
stackslib/src/chainstate/stacks/index/trie.rs Avoids redundant pre-flush writes for non-root ancestors; uses hash-only update for the root’s final write when skiplist updating.
stackslib/src/chainstate/stacks/index/storage.rs Introduces TrieStorageConnection::write_node_hash() to support hash-only updates in uncommitted state.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread stackslib/src/chainstate/stacks/index/trie.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/storage.rs
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 26, 2026

Codecov Report

❌ Patch coverage is 72.41379% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.02%. Comparing base (b61a766) to head (f6f9f82).

Files with missing lines Patch % Lines
stackslib/src/chainstate/stacks/index/storage.rs 63.63% 4 Missing ⚠️
stackslib/src/chainstate/stacks/index/trie.rs 77.77% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #6935       +/-   ##
============================================
+ Coverage    59.40%   70.02%   +10.61%     
============================================
  Files          412      412               
  Lines       218187   218190        +3     
  Branches       338      338               
============================================
+ Hits        129610   152778    +23168     
+ Misses       88577    65412    -23165     
Files with missing lines Coverage Δ
stackslib/src/chainstate/stacks/index/storage.rs 80.79% <63.63%> (+0.36%) ⬆️
stackslib/src/chainstate/stacks/index/trie.rs 87.09% <77.77%> (+4.01%) ⬆️

... and 317 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b61a766...f6f9f82. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@federico-stacks federico-stacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change makes sense to me. Just added a nit

Comment thread stackslib/src/chainstate/stacks/index/trie.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/trie.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/trie.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/trie.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/trie.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/trie.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/trie.rs Outdated
Copy link
Copy Markdown
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits but LGTM. Will approve once resolved. Thanks @cylewitruk-stacks!

Comment thread stackslib/src/chainstate/stacks/index/trie.rs
);
debug!("Next root hash is {h} (update_skiplist={update_skiplist})");

storage.write_nodetype(child_ptr.ptr(), &node, h)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanity-check: could write_node_hash be used here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think we might since we're not modifying the node in this branch (like we do below) 👍 Will try it and make sure things are green.

Copy link
Copy Markdown
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@benjamin-stacks benjamin-stacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm an absolute noob in our MARF implementation code, but this change makes a lot of sense to me. Maybe take me as half an approval, given that there are probably a lot of edge cases I may not be aware of 😅

One thing I'm wondering: How good is the code coverage of the surface area of this? I assume that any regression would (at the latest) be caught during a full replay test, but I don't have a good intuition on the validity of that assumption.

@dhaney-stacks
Copy link
Copy Markdown
Contributor

@cylewitruk-stacks please merge then when able.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants