-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix:[9018]Fixed RunArray slice offsets #9036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Looking into test(intermittent) case failure. |
b17d380 to
d9e54fb
Compare
|
Need help on. |
Can ignore, it's an unrelated issue: #9024 |
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up. Left some comments, also changed PR body to not close original issue as I still need to check what other kernels/code might need fixing.
| .take(len) | ||
| .map(move |run_end| { | ||
| let value = *run_end - offset; | ||
| value.min(length) + adjustment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this min necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is essential for handling sliced RunArray, where the last physical run extends beyond the logical slice boundary. Added test case with analysis of this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment to that effect here? Something along the lines of:
min required for cases where slice ends in the middle of the run at the end
d9e54fb to
6fb02cd
Compare
Thanks for reviewing and feedback. If you permit, once this PR is good to go, can take this. |
| .take(len) | ||
| .map(move |run_end| { | ||
| let value = *run_end - offset; | ||
| value.min(length) + adjustment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment to that effect here? Something along the lines of:
min required for cases where slice ends in the middle of the run at the end
| // Without min(length), its run end would be 5 - 0 = 5. | ||
| // With min(length), it becomes min(5 - 0, 3) = 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are nice, but they are quite separated from the implementation so they are a bit confusing (if someone were to look at this test, how would they understand what min(length) is referring to here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added more explanation note, also added comment in code as suggested above.
arrow-select/src/filter.rs
Outdated
| let filter_values = predicate.filter.values(); | ||
| let run_ends = run_ends.inner(); | ||
|
|
||
| let pred: BooleanArray = BooleanBuffer::collect_bool(run_ends.len(), |i| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still thinking about this a bit; it seems odd that we still iterate through all of the run_ends including the values before/after the slice. I wonder if there is a better way to handle it, or maybe I misunderstand how this logic is working 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for keeping performance aspect in mind. Changed the run for physical length only.
| /// Similar to [`values`] but accounts for logical slicing, returning only the values | ||
| /// that are part of the logical slice of this buffer. | ||
| /// | ||
| /// [`values`]: Self::values | ||
| pub fn values_slice(&self) -> &[E] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this docstring we also need a clear warning note that the first & last run values may be inaccurate as they don't take into account the logical slicing. I wonder if there is a better way to handle this without too much overhead 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added warning but not sure logical run ends API(logical_run_ends) is needed or not, but added it. Please feel free to suggest to remove it.
6fb02cd to
57e6a3c
Compare
|
@Jefffrey Please help to close this. Thanks |
|
I'll try take another look soon. |
Which issue does this PR close?
RunArrays #9018 .Rationale for this change
To consider offset in slicing of RunArray.
What changes are included in this PR?
Are these changes tested?
yes
Are there any user-facing changes?
Yes, extended API to access RunArray slices directly than getting it from index.