Minor: Improve comments on GenericByteViewArray::bytes_iter(), prefix_iter() and suffix_iter()#6306
Conversation
…_iter() and suffix_iter()
| /// Returns an iterator over the first `prefix_len` bytes of each array | ||
| /// element, including null values. | ||
| /// | ||
| /// If `prefix_len` is larger than the element's length, the iterator will |
There was a problem hiding this comment.
This feels rather surprising to me?
There was a problem hiding this comment.
Is that a statement or a question?
If you have a suggestion of how to make it less surprising perhaps @xinlifoobar would be willing to give it a try
There was a problem hiding this comment.
I guess I would have thought returning Option<&[u8]> would be perhaps more intuitive and likely perform similarly? That being said I am surprised that skipping the null mask yields performance returns, as this hasn't been my experience with regular StringArray, so my intuition about performance may be off
There was a problem hiding this comment.
Thanks for the suggestions! @tustvold.
I created a new PR #6312 to put this discussion. I did this in the early version but didn't see performance gains like this time - it did well for inline values. We might do a combination based on the bench result, e.g., null masks for prefix_iter and slice for suffix_iter.
There was a problem hiding this comment.
The results on #6312 seem to show that using null masks does indeed reduce performance
So is the conclusion that these APIs are good to go?
There was a problem hiding this comment.
I left some comments, I think this API would be better if it returned None for a prefix too long, even if it continues to not inspect null masks
tustvold
left a comment
There was a problem hiding this comment.
The docs look good, we can always update should we alter the signature
|
Ok, merging in these docs so they match the current code. As @tustvold says if the code needs to be updated we can also update the docs |
Which issue does this PR close?
Follow on to #6231 from @xinlifoobar
Rationale for this change
These new APIs were added in #6231 and I wanted to clarify their behavior (so we can potentailly use them in DataFusion)
What changes are included in this PR?
Update doc comments
Are there any user-facing changes?
Documentation, no functional change