perf: optimize division to uint32 to make re-target-pow calculation much faster#7352
Conversation
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
WalkthroughThis PR adds scalar division operator overloads to the Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/arith_uint256.cpp (1)
76-77: ⚡ Quick winParameter name
b32is misleading foruint64_ttype.The parameter is named
b32but the function acceptsuint64_t. This inconsistency appears in both the declaration (line 161 of arith_uint256.h) and implementation. Consider renaming tob64orbfor clarity.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/arith_uint256.cpp` around lines 76 - 77, Rename the misleading parameter b32 in the division operator to a name that matches its type and intent (e.g., b64 or simply b) in both the declaration and implementation of template<unsigned int BITS> base_uint<BITS>::operator/=(uint64_t b32); update the parameter name in the function signature and all uses inside the implementation so they remain consistent (search for base_uint<BITS>::operator/= and the declaration in the header to change both places).src/arith_uint256.h (1)
161-161: ⚡ Quick winParameter name
b32is misleading foruint64_ttype.The parameter is named
b32but acceptsuint64_t. While the implementation optimizes for values ≤uint32_tmax, the parameter type is 64-bit and the name could cause confusion. Consider renaming tob64or simplyb(as used in line 211) for clarity.✏️ Suggested naming improvement
- base_uint& operator/=(uint64_t b32); + base_uint& operator/=(uint64_t b);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/arith_uint256.h` at line 161, Rename the misleading parameter `b32` in the declaration of operator/= to a clearer name (e.g., `b` or `b64`) and update the corresponding definition/implementation to use the same name; specifically modify the signature `base_uint& operator/=(uint64_t b32);` to `base_uint& operator/=(uint64_t b);` (or `b64`) and change the implementation function parameter and any internal references to match, and then update any callers/forward-declarations that reference the old name to keep declarations and definitions consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/arith_uint256.cpp`:
- Around line 76-77: Rename the misleading parameter b32 in the division
operator to a name that matches its type and intent (e.g., b64 or simply b) in
both the declaration and implementation of template<unsigned int BITS>
base_uint<BITS>::operator/=(uint64_t b32); update the parameter name in the
function signature and all uses inside the implementation so they remain
consistent (search for base_uint<BITS>::operator/= and the declaration in the
header to change both places).
In `@src/arith_uint256.h`:
- Line 161: Rename the misleading parameter `b32` in the declaration of
operator/= to a clearer name (e.g., `b` or `b64`) and update the corresponding
definition/implementation to use the same name; specifically modify the
signature `base_uint& operator/=(uint64_t b32);` to `base_uint&
operator/=(uint64_t b);` (or `b64`) and change the implementation function
parameter and any internal references to match, and then update any
callers/forward-declarations that reference the old name to keep declarations
and definitions consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: bf974529-fc8b-450b-857a-fe88d01cd909
📒 Files selected for processing (3)
src/arith_uint256.cppsrc/arith_uint256.hsrc/test/arith_uint256_tests.cpp
|
✅ Review complete (commit e8435d3) |
| } | ||
|
|
||
| template <unsigned int BITS> | ||
| base_uint<BITS>& base_uint<BITS>::operator/=(uint64_t b32) |
There was a problem hiding this comment.
nit: is it a typo?
| base_uint<BITS>& base_uint<BITS>::operator/=(uint64_t b32) | |
| base_uint<BITS>& base_uint<BITS>::operator/=(uint64_t b64) |
same in other places
There was a problem hiding this comment.
not really a typo, because it's supposed optimized for b32 only. If you think that's confusing I'd rename it.
It is supposed to be uint32_t b32 but compiler won't say a word if that's called with uint64_t, so, I keep it mixed.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Small, well-scoped performance PR: adds a uint64_t divisor overload to base_uint with a fast word-wise long-division path when the divisor fits in uint32_t, falling back to the existing base_uint divisor otherwise. The schoolbook radix-2^32 loop is correct (the running remainder is always < b32 ≤ 2^32, so (rem<<32)|pn[i] fits in uint64_t), zero is rejected, and the regression test asserts bit-identity against the widened arith_uint256 divisor across both code paths plus the zero-throw case. Both agents and CodeRabbit found no in-scope issues.
…calculation much faster e8435d3 test: add regression test for faster division of base_uint<256> to uint32_t (Konstantin Akimov) de1e56a perf: optimize division to uint32 to make re-target-pow calculation much faster (Konstantin Akimov) Pull request description: ## Issue being fixed or feature implemented Dash retargets difficulty every block via DarkGravityWave (24-block loop), so every header processed during sync runs several 256-bit divisions. Bitcoin only retargets once per 2016 blocks. Every division in DGW is by a small integer (nCountBlocks+1) * nTargetTimespan (3600). But the code routes through the only available overload, `operator/=(const base_uint&)`, which does bit-by-bit long division: ~224 iterations, each doing a full-width `div >>= 1` During header sync up to 35% of CPU is wasted on this division on release build (accordingly `perf`): ``` 20.36% d-msghand dash-qt [.] base_uint<256u>::operator>>=(unsigned int) 15.00% d-msghand dash-qt [.] base_uint<256u>::operator/=(base_uint<256u> const&) ``` ## What was done? Implemented optimization for division to unsigned values that could fit inside uint32_t. ## How Has This Been Tested? Got GUIX build from changes in this PR and develop: ``` [develop] 5af9f57/src/qt/dash-qt -datadir=/tmp/dd -reindex real 7m42.152s user 7m0.396s sys 0m3.328s [PR] 8608a3c3bbcd/src/qt/dash-qt -datadir=/tmp/dd -reindex real 4m17.294s user 2m55.474s sys 0m2.731s ``` Perf stats are updated as expected: ``` - 1.61% 0.06% d-msghand dash-qt [.] base_uint<256u>::operator/=(base_uint<256u> const&) - 1.55% base_uint<256u>::operator/=(base_uint<256u> const&) 0.92% base_uint<256u>::operator>>=(unsigned int) + 1.48% 0.57% d-msghand dash-qt [.] base_uint<256u>::operator>>=(unsigned int) ``` So total header sync just a half of the time. Please note, that header sync will be slightly slower due to #7320 so overall win is smaller. ## Breaking Changes N/A ## Checklist: - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: UdjinM6: utACK e8435d3 Tree-SHA512: 797c446cbfad6a1eecb98bee893d530e88db7576d0d15faa36c0d9ccb6d3872313c29f977194ba42cfdedf837a7227018dc750fa87b1f376106831204b08f650
01640b9 chore: prepare v23.1.5 release (PastaClaw) c3e3b9d fix: correct v23.1.4 checkpoint hash (PastaClaw) 10b6309 docs: use standard bug fixes wording (PastaClaw) 0495b47 docs: refine v23.1.4 release intro (PastaClaw) 3826688 docs: fix v23.1.4 release notes title (PastaClaw) 2068176 Merge #7317: fix: drop `virtual` specifier for `CZMQNotificationInterface` dtor (pasta) 9a463a2 test: early bail-out for huge QSIGSHARESINV and QGETSIGSHARES (Konstantin Akimov) 3426458 fix: early bail-out for huge QSIGSHARESINV and QGETSIGSHARES (Konstantin Akimov) b9c6b95 chore: version bump, man pages, flatpak, release notes for release v23.1.4 (Konstantin Akimov) 6f7a0ef chore: version bump, man pages, flatpak, release notes for release v23.1.4 (Konstantin Akimov) 78bb4c4 Merge #7359: fix: make huge asset lock tx non-standard (pasta) 5962016 chore: update min-chainwork, checkpoints and seeds for v23.1.4 (Konstantin Akimov) 99305fb test: add test for duplicated platform-node-id in mempool (Konstantin Akimov) 78b8274 fix: don't let 2 protx with the same platform-id to be presented in mempool (Konstantin Akimov) 54187cd perf: avoid re-validation of ehf signals during block-connect (Konstantin Akimov) 47fa631 Merge #7352: perf: optimize division to uint32 to make re-target-pow calculation much faster (pasta) d93f755 Merge #7332: fix: remove cs_main from MaybePunishNodeForTx, missing changes from bitcoin#19607 (pasta) Pull request description: # chore: merge master 23.1.5 back into develop ## Issue being fixed or feature implemented Backmerge current `master` into `develop` so the active development branch includes the `v23.1.4` and `v23.1.5` release-line changes. ## What was done? Merged `upstream/master` (`v23.1.5`, `5dde61050d20c23c3e179c34e663a025f56843f3`) into `upstream/develop`. Carried forward release metadata and generated artifacts for `v23.1.4` and `v23.1.5`, including version metadata, man pages, release notes, checkpoints, chainTxData, minimum chain work, assume-valid data, and seed updates. Resolved conflicts in Dash-specific code while preserving develop-only refactors: - Kept develop's `CChainstateHelper` ownership layout and `SuperblockManager` integration. - Carried forward the EHF block-connect optimization without reintroducing the removed `CMNHFManager` quorum-manager member. - Carried forward the LLMQ signing-share inventory bounds into develop's `NetSigning` split. - Carried forward the Platform node ID mempool conflict handling and its functional test coverage. ## How Has This Been Tested? - Ran `git diff --check upstream/develop...HEAD`. - Scanned changed files for conflict markers. - Verified the corrected `CMNHFManager` constructor call after conflict resolution. - Ran the required pre-PR `code-review` gate against the backmerge branch; final recommendation was `ship` with no significant issues found. No local build was run; this is a release backmerge and CI should run the full Dash Core PR build/test matrix. ## Breaking Changes None expected. This is a backmerge from the release branch into `develop`. ## Checklist - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [x] I have added or updated relevant unit/integration/functional/e2e tests - [x] I have made corresponding changes to the documentation - [ ] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ Top commit has no ACKs. Tree-SHA512: 1a828c9acd5fe87617f3ab11ca1b5b13656adc61e591b58d68b685bba7404696afa664c53cc752dfbaca7fe08439d9428f33116f475c18c0959293460062b284
Issue being fixed or feature implemented
Dash retargets difficulty every block via DarkGravityWave (24-block loop), so every header processed during sync runs several 256-bit divisions. Bitcoin only retargets once per 2016 blocks.
Every division in DGW is by a small integer (nCountBlocks+1) * nTargetTimespan (3600).
But the code routes through the only available overload,
operator/=(const base_uint&), which does bit-by-bit long division: ~224 iterations, each doing a full-widthdiv >>= 1During header sync up to 35% of CPU is wasted on this division on release build (accordingly
perf):What was done?
Implemented optimization for division to unsigned values that could fit inside uint32_t.
How Has This Been Tested?
Got GUIX build from changes in this PR and develop:
Perf stats are updated as expected:
So total header sync takes just a half of the time. Please note, that header sync will be slightly slower due to #7320 so overall win is smaller.
Breaking Changes
N/A
Checklist: