-
Notifications
You must be signed in to change notification settings - Fork 448
feat/perf: add insert_range and contains to DeleteVector #2292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -45,6 +45,26 @@ impl DeleteVector { | |
| self.inner.insert(pos) | ||
| } | ||
|
|
||
| /// Inserts all positions in the range [start, end) into the delete vector. | ||
| /// If start == end, this method does nothing and returns 0. | ||
| /// | ||
| /// # Panics | ||
| /// | ||
| /// Panics if start > end (a reversed range indicates a bug in the caller). | ||
| /// | ||
| /// Returns the number of newly inserted positions. | ||
| #[allow(unused)] | ||
| pub fn insert_range(&mut self, start: u64, end: u64) -> u64 { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should these be pub if delete_vector is
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I do not know, I chose to go with a similar function definition as the insert function. |
||
| assert!( | ||
| start <= end, | ||
| "insert_range requires start <= end, got [{start}, {end})" | ||
| ); | ||
| if start == end { | ||
| return 0; | ||
| } | ||
| self.inner.insert_range(start..end) | ||
| } | ||
|
|
||
| /// Marks the given `positions` as deleted and returns the number of elements appended. | ||
| /// | ||
| /// The input slice must be strictly ordered in ascending order, and every value must be greater than all existing values already in the set. | ||
|
|
@@ -64,6 +84,12 @@ impl DeleteVector { | |
| Ok(positions.len()) | ||
| } | ||
|
|
||
| /// Returns true if the given position is present in the delete vector. | ||
| #[allow(unused)] | ||
| pub fn contains(&self, pos: u64) -> bool { | ||
| self.inner.contains(pos) | ||
| } | ||
|
|
||
| #[allow(unused)] | ||
| pub fn len(&self) -> u64 { | ||
| self.inner.len() | ||
|
|
@@ -198,4 +224,81 @@ mod tests { | |
| let res = dv.insert_positions(&positions); | ||
| assert!(res.is_err()); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_insert_range_single_key() { | ||
| let mut dv = DeleteVector::default(); | ||
| assert_eq!(dv.insert_range(10, 20), 10); | ||
| assert_eq!(dv.len(), 10); | ||
| for pos in 10..20 { | ||
| assert!(dv.iter().any(|p| p == pos), "missing {pos}"); | ||
| } | ||
| assert!(!dv.iter().any(|p| p == 9)); | ||
| assert!(!dv.iter().any(|p| p == 20)); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_insert_range_single_position() { | ||
| let mut dv = DeleteVector::default(); | ||
| assert_eq!(dv.insert_range(42, 43), 1); | ||
| assert_eq!(dv.len(), 1); | ||
| assert!(dv.iter().any(|p| p == 42)); | ||
| assert!(!dv.iter().any(|p| p == 41)); | ||
| assert!(!dv.iter().any(|p| p == 43)); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_insert_range_across_keys() { | ||
| let mut dv = DeleteVector::default(); | ||
| let start = (1u64 << 32) - 5; | ||
| let end = (1u64 << 32) + 5; | ||
| assert_eq!(dv.insert_range(start, end), 10); | ||
| assert_eq!(dv.len(), 10); | ||
| for pos in start..end { | ||
| assert!(dv.iter().any(|p| p == pos), "missing {pos}"); | ||
| } | ||
| assert!(!dv.iter().any(|p| p == start - 1)); | ||
| assert!(!dv.iter().any(|p| p == end)); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_insert_range_spanning_three_keys() { | ||
| let mut dv = DeleteVector::default(); | ||
| let start = 0xFFFFFFF0u64; | ||
| let end = (2u64 << 32) | 0x10; | ||
| let inserted = dv.insert_range(start, end); | ||
| assert_eq!(inserted, end - start); | ||
| assert_eq!(dv.len(), end - start); | ||
| assert!(dv.contains(start)); | ||
| assert!(dv.contains(end - 1)); | ||
| assert!(dv.contains(1u64 << 32)); | ||
| assert!(dv.contains((1u64 << 32) | 0xFFFFFFF0)); | ||
| assert!(!dv.contains(start - 1)); | ||
| assert!(!dv.contains(end)); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_insert_range_empty_when_start_equals_end() { | ||
| let mut dv = DeleteVector::default(); | ||
| assert_eq!(dv.insert_range(100, 100), 0); | ||
| assert_eq!(dv.len(), 0); | ||
| } | ||
|
|
||
| #[test] | ||
| #[should_panic(expected = "insert_range requires start <= end")] | ||
| fn test_insert_range_reversed_panics() { | ||
| let mut dv = DeleteVector::default(); | ||
| dv.insert_range(100, 50); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_insert_range_large_contiguous() { | ||
| let mut dv = DeleteVector::default(); | ||
| assert_eq!(dv.insert_range(500, 200_500), 200_000); | ||
| assert_eq!(dv.len(), 200_000); | ||
| assert!(dv.iter().any(|p| p == 500)); | ||
| assert!(dv.iter().any(|p| p == 200_499)); | ||
| assert!(!dv.iter().any(|p| p == 499)); | ||
| assert!(!dv.iter().any(|p| p == 200_500)); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect to be able to use the range function? How much is the cost of constructing a from a set of arbitrary rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I wanted to add the range insert to help in exactly this case!
I already made the followup PR in Java : apache/iceberg#16052
TL;DR; We can get speedups when processing a sorted list of rows with ranges of deleted rows.