feat: rework SegmentFileBuilderV10 to organize internal containers by projection to be friendly to partial loading/eviction#19375
Conversation
… projection to be friendly to partial loading changes: * adds new method `SegmentFileBuilder.startFileGroup(@nullable String)` which is a hint to the file builder about how to group the files a caller is adding * `SegmentFileBuilderV10` replaced the internal `FileSmoosher` with a purpose-built container writer that rolls on group changes or when a file won't fit, organizing containers by the new file group concept * `IndexMergerV10`/`IndexMergerBase` use the new file group method to organize v10 segments by projection * removed stuff from `FileSmoosher` that was only needed by v10 builder delegating to it * `AggregateProjectionSpec` validates that the projection cannot be named `__base`
|
Findings that could not be attached inline:
|
gianm
left a comment
There was a problem hiding this comment.
Seems ok to me although I'm wondering about Frank's comment.
| if (name == null || name.isEmpty()) { | ||
| throw InvalidInput.exception("projection name cannot be null or empty"); | ||
| } | ||
| if (Projections.BASE_TABLE_PROJECTION_NAME.equals(name)) { |
There was a problem hiding this comment.
Might as well reserve all names starting with __, just in case we need another one later.
There was a problem hiding this comment.
done, moved to a method on Projections
Its like kind of real, but also maybe unlikely to ever happen in practice given standard uses of nested writers. The usage is like a column serializer is open and writing and needs to open a nested writer to write some of its parts to separate files which needs some temp files to do that get merged back in later. The problem described would occur only i think if someone closed the outer thingy but not the nested, then set a new group, then close the nested ones (which i'm not entirely sure how would occur with our current index merger paths). Nothing does this, because its pretty strange? I went ahead and fixed the problem. Well, with the same limitations as the builder currently has here, that if the files aren't added to a group contiguously then they get stored in different containers, but it does at least ensure that containers have at most 1 group. So, if someone manages to do ^, it will put them into separate container associated with the group at the time of delegation instead of the current active group. 🤷 |
FrankChen021
left a comment
There was a problem hiding this comment.
The changes LGTM, no correctness issues found.
Description
Rework
SegmentFileBuilderV10so V10 containers are scoped by a new declared file group concept instead of size alone. Groups provide a layout hint: a new container rolls when the declared group changes, giving readers a clean 1-to-N mapping between groups and containers that will support per-group partial loading in the future. Projections are the primary caller today, but the mechanism is deliberately generic (shared data across columns, internal metadata, etc.).changes:
SegmentFileBuilder.startFileGroup(@Nullable String)which is a hint to the file builder about how to group the files a caller is addingSegmentFileBuilderV10replaced the internalFileSmoosherwith a purpose-built container writer that rolls on group changes or when a file won't fit, organizing containers by the new file group conceptIndexMergerV10/IndexMergerBaseuse the new file group method to organize v10 segments by projectionFileSmoosherthat was only needed by v10 builder delegating to itAggregateProjectionSpecvalidates that the projection cannot be named__base