Core: Optimize RoaringPositionBitmap.setRange with native range API#15791
Conversation
Replace the O(n) per-position loop in setRange() with native RoaringBitmap.add(long, long) bulk range insertion. The new implementation splits 64-bit ranges at key boundaries and delegates to the native range add API, reducing cost from O(range_size) to O(containers_touched) and eliminating per-position validation and bitmap allocation checks.
…y ranges Add tests that verify every position in a range via contains() for both the same-key and cross-key code paths. Strengthen the empty range test to cover start == end and start > end cases with cardinality and contains() assertions.
… tests Document that setRange does nothing when start >= end. Add assertThatThrownBy tests for setRange with negative start and end beyond MAX_POSITION to match existing set/contains coverage.
| if (posStartInclusive >= posEndExclusive) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
I think the original implementation had an unintended side-effect of returning in this case, but we should probably have a precondition check and throw as setting this wrong indicates there's a problem.
There was a problem hiding this comment.
Thanks, I changed it to error out instead.
A reversed range (start > end) indicates a bug in the caller and should fail fast rather than silently succeed. An empty range (start == end) remains a valid no-op. This makes setRange consistent with set() which already validates its input via Preconditions.checkArgument.
|
Since i was asked for a benchmark. Here is the results, and links to related PRs in the other subpackages: Related PRs:
Benchmark results
|
Replace the per-position loop in `RoaringPositionBitmap::AddRange` with direct calls to `roaring::Roaring::addRange`, matching the optimization in the Java implementation (apache/iceberg#15791). Added tests for single-position ranges, large contiguous ranges, three-key spanning ranges, and invalid inputs.
|
Thanks @Baunsgaard! |
|
Thanks for reviewing and merging ! |
Replace the per-position loop in
RoaringPositionBitmap.setRange()withRoaringBitmap.add(long, long)bulk range insertion.The new implementation splits the 64-bit range at key boundaries and delegates each segment to the native API, which operates at the container level internally. This also eliminates per-position validation and bitmap array growth checks.