Optimize is_permutation for vector<bool>#6148
Conversation
|
I believe this could be generalized further. For any ranges where the value types are |
|
Generalized, benchmark results are about the same |
There was a problem hiding this comment.
Pull request overview
This PR optimizes std::is_permutation when operating on bool ranges (notably vector<bool> iterators) by replacing the general-purpose O(N²) match-counting approach with a linear-time mismatch + count(true) strategy.
Changes:
- Added an internal helper (
_Is_permutation_of_bool) and fast-paths inis_permutationoverloads when the predicate is an equality predicate and both ranges arebool. - Added tests validating
is_permutationbehavior forvector<bool>and mixedvector<bool>/raw-bool[]iterator combinations. - Added a new microbenchmark to measure
is_permutationperformance onvector<bool>and on rawbool[].
Show a summary per file
| File | Description |
|---|---|
stl/inc/algorithm |
Introduces bool-specific is_permutation fast-paths using count(true) (and mismatch when neither iterator is vector<bool>). |
tests/std/tests/GH_000625_vector_bool_optimization/test.cpp |
Adds constexpr/runtime coverage for is_permutation with vector<bool> and bool[]. |
benchmarks/src/vector_bool_permute.cpp |
Adds benchmarks for is_permutation on vector<bool> and bool[]. |
benchmarks/CMakeLists.txt |
Registers the new vector_bool_permute benchmark target. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 1
| template <class _FwdIt1, class _FwdIt2> | ||
| _NODISCARD _CONSTEXPR20 bool _Is_permutation_of_bool(_FwdIt1 _First1, _FwdIt1 _Last1, _FwdIt2 _First2, _FwdIt2 _Last2) { |
There was a problem hiding this comment.
I think this one is premature. I don't attempt to handle std::ranges::is_permutation yet, so there's no distinct sentinel type. When we handle it, we'll see how it should be done.
| if constexpr (is_same_v<_Iter_value_t<decltype(_UFirst1)>, bool> | ||
| && is_same_v<_Iter_value_t<decltype(_UFirst2)>, bool> // | ||
| && _Is_ranges_random_iter_v<decltype(_UFirst1)> // | ||
| && _Is_ranges_random_iter_v<decltype(_UFirst2)> |
There was a problem hiding this comment.
The 4-arg version checks the wrapped iterator instead. The check it pre-existing.
Maybe I need to change it to check the unwrapped iterators too though?
There was a problem hiding this comment.
Checking the unwrapped iterators.
| if constexpr (!_Is_vb_iterator<_FwdIt1> && !_Is_vb_iterator<_FwdIt2>) { | ||
| auto _Pair = _STD mismatch(_First1, _Last1, _First2); |
There was a problem hiding this comment.
No.
This is intentional optimization decision. It is explained in the PR description.
mismatch is good for non-vb iterators (either vectorized, or better than count), but for just one vb iterator it flips as count is SWAR with popcount and mismatch is individual bit matching.
I was looking into bit tricks for
next_permutation/prev_permutation.Seems like that there will be no impressive results, as these operations perform already at ~15 ns order of magnitude.
I may further look into these though.
But let's go for the easiest optimization here that gains 100x and more!
The optimized algorithm does mismatch, and then counts
truevalues in each of the remaining ranges.Mismatch
The
mismatchpart serves to save the equality case optimization that is implied by the standard, asking for N**2 comparisons generally, but just N if the ranges are equal. If none of the ranges isvector<bool>then we have these cases:mismatchturns to vector mismatch, andcountto vector countmismatchis slightly fastermismatchcall that will only likely misalign the input for thecountcall due to advancing few elementsmismatchthe number of comparison is twice smaller for equal cases.counts is vectorized, somismatchis not that much fasterSo overall
mismatchshould be there.One
vector<bool>or both of them flip it:mismatchforvector<bool>is not optimized currentlycountis optimized equally well for aligned and misaligned casesBenchmark results
Interim version is without
mismatch. Its timings are not used is speedup calculation.Array:
vector<bool>, Interim column is irrelevant, it does not show any data different from After.