refactor: centralize Lance metadata column definitions#452
Open
wombatu-kun wants to merge 1 commit intolance-format:mainfrom
Open
refactor: centralize Lance metadata column definitions#452wombatu-kun wants to merge 1 commit intolance-format:mainfrom
wombatu-kun wants to merge 1 commit intolance-format:mainfrom
Conversation
Extract the five Spark `MetadataColumn` instances (`_rowid`, `_rowaddr`, `_row_created_at_version`, `_row_last_updated_at_version`, `_fragid`) from `LanceDataset` into a new `LanceMetadataColumns` registry class, and replace the hand-maintained exclusion filter and if-ladder in `LanceFragmentScanner.getColumnNames` with iteration over the registry. Before this change, adding a virtual column required edits in three coupled sites: a new anonymous `MetadataColumn` constant in `LanceDataset`, an entry in the `METADATA_COLUMNS` array, a clause in the "exclude from regular columns" filter, and a stanza in the "append to scanner projection" if-ladder. All four had to stay in lock-step; forgetting one produced silent bugs (column in schema but not in scanner output, or Lance rejecting an unknown name). After this change, adding a column is a one-line entry in the registry. Call sites that only needed the column name string (`LanceDataset.X_COLUMN .name()`) now reference `LanceConstant.X` directly — the `MetadataColumn` object was never used at those sites, only its name. Pure refactor: no behavior change. All existing tests pass on `lance-spark-3.5_2.12` (the default module) — the invariant that `LanceFragmentScannerTest` exercises around column ordering is preserved by the new `PROJECTABLE` list, which intentionally excludes `_fragid` (computed per-fragment outside the scanner's projection list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MetadataColumninstances (_rowid,_rowaddr,_row_created_at_version,_row_last_updated_at_version,_fragid) fromLanceDatasetinto a newLanceMetadataColumnsregistry classLanceFragmentScanner.getColumnNameswith iteration overALL/PROJECTABLELanceDataset.X_COLUMN.name()→LanceConstant.X)Motivation
Before this change, adding a virtual column took edits in three coupled sites: a new anonymous
MetadataColumnconstant inLanceDataset, an entry in theMETADATA_COLUMNSarray, a clause in the "exclude from regular columns" filter, and a stanza in the "append to scanner projection" if-ladder. All had to stay in lock-step; forgetting one produced silent bugs (column in schema but not in scanner output, or Lance rejecting an unknown name). After, adding a column is a one-line registry entry.This refactor unblocks the
_scorecolumn PR (stacked in the FTS PR series); future virtual columns such as_score_explainbenefit from the same one-line shape.Pure refactor — no user-visible behavior change. The asymmetry between
ALL(exclusion + Spark registration) andPROJECTABLE(scanner projection) pins the invariant that_fragidis computed per-fragment outside the scanner, preserving the original behavior.Test plan
make test SPARK_VERSION=3.5 SCALA_VERSION=2.12— full regression passesmake lint— checkstyle + spotless clean🤖 Generated with Claude Code