Speed up initial in-memory Soroban state population#5252
Open
drebelsky wants to merge 3 commits intostellar:masterfrom
Open
Speed up initial in-memory Soroban state population#5252drebelsky wants to merge 3 commits intostellar:masterfrom
drebelsky wants to merge 3 commits intostellar:masterfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR speeds up startup-time reconstruction of the in-memory Soroban state by changing live-state discovery from per-bucket deduping to a merged scan across all buckets, and by deferring bucket-merge restart until after full state population. It fits into the ledger/bucket startup path that rebuilds Soroban state from the BucketList on node startup.
Changes:
- Replace
initializeStateFromSnapshot’s per-type bucket scans with a new “current live entries” scan that returns only the latest live version of each key. - Add bucket-snapshot support for k-way merged live-entry scanning, including a new ledger-key comparator used by the loser-tree merge.
- Split bucket merge restart out of
assumeStateand invoke it later in full startup mode.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/ledger/LedgerManagerImpl.cpp |
Defers restarting bucket merges until after full Soroban state setup. |
src/ledger/InMemorySorobanState.cpp |
Switches snapshot initialization to current-live scans for Soroban entry types. |
src/ledger/ImmutableLedgerView.h |
Exposes a new current-live scan API on immutable/apply ledger views. |
src/ledger/ImmutableLedgerView.cpp |
Wires the new ledger-view scan API to the live bucket snapshot. |
src/bucket/LedgerCmp.h |
Declares a 3-way comparator for LedgerKey ordering. |
src/bucket/LedgerCmp.cpp |
Implements LedgerKey comparison logic used by merged scanning. |
src/bucket/BucketManager.h |
Adds an explicit restartMerges API. |
src/bucket/BucketManager.cpp |
Refactors merge restart out of assumeState into a separate method. |
src/bucket/BucketListSnapshot.h |
Adds snapshot API for scanning only current live entries of a type. |
src/bucket/BucketListSnapshot.cpp |
Implements the loser-tree/k-way merge scan over bucket entry streams. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related to #4902. Note that since that time, state churn has continued, so population now takes ~70s on a dev watcher. This PR changes the live state calculation from going through the buckets one-by-one using a hash map to a k-way merge among all the buckets. The merge is done using a loser tree, which gives us about half as many comparisons as using a heap. Running on a dev watcher speeds up from ~70s to ~30s.
Time for 3 runs on upstream vs patch
Doing the k-way merge also has nicer memory scaling characteristics than the current approach: the amount of memory we use scales with the live state + number of buckets, instead of the current approach that scales with churn.
Additionally, the PR disables bucket merges until after the in-memory state is populated.