Skip to content

Core: Optimize RoaringPositionBitmap.setRange with native range API#15791

Merged
danielcweeks merged 4 commits intoapache:mainfrom
Baunsgaard:optimize-setrange-roaring-bitmap
Apr 15, 2026
Merged

Core: Optimize RoaringPositionBitmap.setRange with native range API#15791
danielcweeks merged 4 commits intoapache:mainfrom
Baunsgaard:optimize-setrange-roaring-bitmap

Conversation

@Baunsgaard
Copy link
Copy Markdown
Contributor

Replace the per-position loop in RoaringPositionBitmap.setRange() with RoaringBitmap.add(long, long) bulk range insertion.

The new implementation splits the 64-bit range at key boundaries and delegates each segment to the native API, which operates at the container level internally. This also eliminates per-position validation and bitmap array growth checks.

Replace the O(n) per-position loop in setRange() with native
RoaringBitmap.add(long, long) bulk range insertion. The new
implementation splits 64-bit ranges at key boundaries and delegates
to the native range add API, reducing cost from O(range_size) to
O(containers_touched) and eliminating per-position validation and
bitmap allocation checks.
…y ranges

Add tests that verify every position in a range via contains() for
both the same-key and cross-key code paths. Strengthen the empty
range test to cover start == end and start > end cases with
cardinality and contains() assertions.
@github-actions github-actions Bot added the core label Mar 27, 2026
… tests

Document that setRange does nothing when start >= end. Add
assertThatThrownBy tests for setRange with negative start and
end beyond MAX_POSITION to match existing set/contains coverage.
Comment on lines +89 to +91
if (posStartInclusive >= posEndExclusive) {
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original implementation had an unintended side-effect of returning in this case, but we should probably have a precondition check and throw as setting this wrong indicates there's a problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I changed it to error out instead.

A reversed range (start > end) indicates a bug in the caller and
should fail fast rather than silently succeed. An empty range
(start == end) remains a valid no-op.

This makes setRange consistent with set() which already validates
its input via Preconditions.checkArgument.
@Baunsgaard
Copy link
Copy Markdown
Contributor Author

Since i was asked for a benchmark. Here is the results, and links to related PRs in the other subpackages:

Related PRs:

Benchmark results

setRange() (native bulk range) vs a loop of individual set() calls
(JDK 17, 5 iterations after 3 warmup):

Scenario Before (set loop) After (setRange) Speedup
100 positions 37.8 µs 2.4 µs 16x
10k positions 618.8 µs 2.4 µs 258x
200k positions 2.53 ms 4.9 µs 514x
1M positions 7.38 ms 21.5 µs 344x
10k cross-key boundary 208.9 µs 2.0 µs 104x

The cross-key boundary scenario inserts 10k positions spanning the 32-bit key boundary (e.g., (1LL << 32) - 5000 to (1LL << 32) + 5000). This exercises the key-splitting logic where the range is divided across two internal 32-bit Roaring bitmaps.

wgtmac pushed a commit to apache/iceberg-cpp that referenced this pull request Apr 4, 2026
Replace the per-position loop in `RoaringPositionBitmap::AddRange` with
direct calls to `roaring::Roaring::addRange`, matching the optimization
in the Java implementation
(apache/iceberg#15791).

Added tests for single-position ranges, large contiguous ranges,
three-key spanning ranges, and invalid inputs.
@Baunsgaard Baunsgaard requested a review from danielcweeks April 7, 2026 10:01
@danielcweeks danielcweeks merged commit 4414526 into apache:main Apr 15, 2026
35 checks passed
@danielcweeks
Copy link
Copy Markdown
Contributor

Thanks @Baunsgaard!

@Baunsgaard
Copy link
Copy Markdown
Contributor Author

Thanks for reviewing and merging !

@Baunsgaard Baunsgaard deleted the optimize-setrange-roaring-bitmap branch April 16, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants