[WIP] OPEN_STRUCT storage layer — columnar two-tier dense/sparse index (PR 2/4)#18643
Draft
tarun11Mavani wants to merge 12 commits into
Draft
[WIP] OPEN_STRUCT storage layer — columnar two-tier dense/sparse index (PR 2/4)#18643tarun11Mavani wants to merge 12 commits into
tarun11Mavani wants to merge 12 commits into
Conversation
7f254e4 to
d0e436f
Compare
tarun11Mavani
commented
Jun 1, 2026
| _dataSources = | ||
| new Object2ObjectOpenHashMap<>(segmentMetadata.getColumnMetadataMap().size()); | ||
|
|
||
| Map<String, Map<String, DataSource>> openStructDenseChildren = new HashMap<>(); |
Contributor
Author
There was a problem hiding this comment.
Use Map.of() for all
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18643 +/- ##
============================================
+ Coverage 64.41% 64.44% +0.02%
- Complexity 1282 1291 +9
============================================
Files 3362 3373 +11
Lines 207907 208683 +776
Branches 32463 32588 +125
============================================
+ Hits 133923 134476 +553
- Misses 63221 63393 +172
- Partials 10763 10814 +51
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…x (PR 2/4) Storage layer for the OPEN_STRUCT column type: a self-describing column whose keys are discovered at ingest time and stored columnar in two tiers. Mutable (consuming) path: - MutableOpenStructIndex / MutableKeyColumn: per-key dictionary-encoded forward index + presence bitmap, with 3-level type resolution (declared child FieldSpec, else value-based inference) and maxDenseKeys capping. - MutableOpenStructDataSource exposes per-key DataSources to the query layer via the OpenStructDataSource SPI; implemented as a MutableIndex. Seal / offline path: - OpenStructColumnSplitter classifies keys into dense vs sparse by fill rate (plus explicit denseKeys and maxDenseKeys), writes each dense key as a standard materialized column (col$key) with optional dictionary/inverted/ null-vector, and packs the rest into a single sparse JSON column (col$__sparse__). Emits per-child and parent column metadata. - BaseSegmentCreator merges the splitter's per-child metadata into the segment properties so each child loads as its own column. Immutable (sealed) path: - ImmutableSegmentImpl groups materialized children (parentColumn metadata) under an ImmutableOpenStructDataSource in a single pass over column metadata. - ImmutableSegmentLoader keeps materialized children that are absent from the user-facing schema. Wiring / validation: - OpenStructIndexType + OpenStructIndexPlugin register the index; reader factory is a no-op (children load as standard columns). - TableConfigUtils rejects user columns containing the reserved '$' separator when any field is OPEN_STRUCT. - RealtimeSegmentStatsContainer returns EmptyColumnStatistics for the parent. - V1Constants: PARENT_COLUMN, HAS_SPARSE_COLUMN; OpenStructNaming helpers including shared value->DataType inference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c6a3db5 to
72cf061
Compare
A BIG_DECIMAL-valued OPEN_STRUCT key aborted segment seal: BIG_DECIMAL is its own stored type (unlike BOOLEAN->INT, TIMESTAMP->LONG), so the splitter's type switches hit `default: throw`. It is reachable via inferDataType for any java.math.BigDecimal or an explicit child FieldSpec. Add BIG_DECIMAL as a variable-length stored type alongside STRING/BYTES in OpenStructColumnSplitter: getDefaultValue (BigDecimal.ZERO), dictionary build, raw var-byte forward index (putBigDecimal), and dictionary-vs-raw sizing. Dictionary dedup uses BigDecimal.equals (matching SegmentDictionaryCreator's equals-keyed indexOfSV), not compareTo, so scale-differing equal values (1.0 vs 1.00) stay distinct instead of silently resolving to dict id 0. The realtime mutable path and segment min/max metadata already handle BIG_DECIMAL and are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tisticsCollector Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… stats collector Treats absent docs as null docs holding the default value (standard model). For keys with absent docs, CARDINALITY now counts the default and MIN/MAX include it. Intended. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…MetadataInfo Replaces the hand-written emitVirtualColumnMetadata with the standard metadata writer plus the OPEN_STRUCT-specific keys (PARENT_COLUMN, hasNullValue, hasInvertedIndex). The shared writer additionally emits standard keys the old code omitted, producing a superset of the previous metadata. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…istinctValuesPerKey Replaces the _distinctValuesPerKey distinct-string-set tracking with the sealed stats collector's cardinality and longest-element length. _totalRawBytesPerKey is retained for the var-length raw-size estimate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dictElementSize is already 0 on the raw path (only assigned when a dictionary creator exists), so the conditional alias was a no-op. Pass it directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…reators Drives each dense materialized child's forward, dictionary-id forward, and inverted index through the standard ForwardIndexCreator / DictionaryBasedInvertedIndexCreator obtained from StandardIndexes, using an IndexCreationContext built from the sealed stats collector (no TableConfig). The dict-vs-raw decision now mirrors BaseSegmentCreator.createDictionaryForColumn (standard default flags), and absent docs store the Pinot dimension null default. Deletes writeRawForwardIndex, shouldUseDictionary, getDefaultValue, and _totalRawBytesPerKey. Behavior changes (intended, unreleased feature): absent-doc defaults are now dimension nulls (STRING "null", INT MIN_VALUE, ...); the local size-ratio dict downgrade is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…riteDenseKeyColumn Split the ~144-line writeDenseKeyColumn into a focused orchestrator plus two private helpers, mirroring how BaseSegmentCreator decomposes column creation: - resolveUseDictionary(...): the three-step dict-vs-raw decision. - writeForwardAndInvertedIndexes(...): dictionary + forward + inverted creation, the per-doc add loop, and the nested resource handling; returns the dictionary element size for metadata. Behavior-preserving — no logic, ordering, or resource-management change. Covered by the existing OpenStructColumnSplitterTest (16 tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o OpenStructTypeInference inferDataType performs value->DataType inference, an orthogonal concern to OpenStructNaming's column-name string mapping (it uses none of the naming constants). Relocate it to a dedicated OpenStructTypeInference class in the same package so each class has a single responsibility; update the two callers. Behavior-preserving — the method body is moved verbatim. The core Object classification still delegates to PinotDataType.getSingleValueType; the remaining PinotDataType->DataType switch is OPEN_STRUCT-specific policy that exists nowhere else. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…taType Data-driven test locking the value->DataType mapping policy now that it lives in its own class: each return branch (INT via all four widening inputs, LONG, FLOAT, DOUBLE, BIG_DECIMAL, BOOLEAN, TIMESTAMP, STRING via all four folding inputs, BYTES) plus the unrepresentable cases (Map, bare Object) returning null. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Instructions:
featurebugfixperformanceuibackward-incompatrelease-notes(**)(*) Other labels to consider:
testingdependenciesdockerkubernetesobservabilitysecuritycode-styleextension-pointrefactorcleanup(**) Use
release-noteslabel for scenarios like: