Skip to content

refactor: migrate filter pushdown from V1 Filter to V2 Predicate API#448

Open
wombatu-kun wants to merge 1 commit intolance-format:mainfrom
wombatu-kun:v1-v2-pushdown-migration
Open

refactor: migrate filter pushdown from V1 Filter to V2 Predicate API#448
wombatu-kun wants to merge 1 commit intolance-format:mainfrom
wombatu-kun:v1-v2-pushdown-migration

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

Summary

  • Replace SupportsPushDownFilters with SupportsPushDownV2Filters in LanceScanBuilder, and migrate the downstream filter-analysis utilities (FilterPushDown, RowAddressFilterAnalyzer, ZonemapFragmentPruner) from the sealed org.apache.spark.sql.sources.Filter hierarchy to the extensible org.apache.spark.sql.connector.expressions.filter.Predicate.
  • V1 Filter is legacy in Spark 3.3+; V2 Predicate is the current recommended API and supports arbitrary named predicates, making future extensions (e.g. custom functions) straightforward.
  • No external behavior changes — SQL filter semantics, zonemap pruning, _rowaddr fragment pruning, and the compiled SQL WHERE clause are preserved.

Notable details

  • V2 Literal.value() returns UTF8String for strings; normalized to String in ZonemapFragmentPruner before compareTo against zone min/max stats (which store native String).
  • Column access via NamedReference.fieldNames() joined with "." for nested paths (drop-in for V1 Filter.references() flat names).
  • Test helper TestPredicates builds canonical V2 predicates with the same shape Spark produces via SupportsPushDownV2Filters, including UTF8String/Date-epoch-days/Timestamp-epoch-micros conversion matching Spark's V2 literal representation.

Test plan

  • make install — builds successfully
  • make test — 619 tests pass (0 failures, 0 errors)
  • make lint — checkstyle + spotless pass
  • No remaining references to V1 Filter API anywhere in the codebase

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Replace `SupportsPushDownFilters` with `SupportsPushDownV2Filters` in
`LanceScanBuilder`, and migrate the downstream filter-analysis utilities
(`FilterPushDown`, `RowAddressFilterAnalyzer`, `ZonemapFragmentPruner`)
from the sealed `org.apache.spark.sql.sources.Filter` hierarchy to the
extensible `org.apache.spark.sql.connector.expressions.filter.Predicate`.

Why: V1 `Filter` is legacy in Spark 3.3+; V2 `Predicate` is the current
recommended API and supports arbitrary named predicates, making future
extensions (e.g. custom functions) straightforward. No external behavior
changes — SQL filter semantics, zonemap pruning, `_rowaddr` fragment
pruning, and the compiled SQL `WHERE` clause are preserved.

Notable details:
- V2 `Literal.value()` returns `UTF8String` for strings; normalized to
  `String` in `ZonemapFragmentPruner` before `compareTo` against zone
  min/max stats (which store native `String`).
- Column access via `NamedReference.fieldNames()` joined with "." for
  nested paths (drop-in for V1 `Filter.references()` flat names).
- Test helper `TestPredicates` builds canonical V2 predicates with the
  same shape Spark produces via `SupportsPushDownV2Filters`, including
  UTF8String/Date-epoch-days/Timestamp-epoch-micros conversion matching
  Spark's V2 literal representation.

Regression: all 286 base-module unit tests pass unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant