refactor: migrate filter pushdown from V1 Filter to V2 Predicate API by wombatu-kun · Pull Request #448 · lance-format/lance-spark

wombatu-kun · 2026-04-18T07:16:30Z

Summary

Replace SupportsPushDownFilters with SupportsPushDownV2Filters in LanceScanBuilder, and migrate the downstream filter-analysis utilities (FilterPushDown, RowAddressFilterAnalyzer, ZonemapFragmentPruner) from the sealed org.apache.spark.sql.sources.Filter hierarchy to the extensible org.apache.spark.sql.connector.expressions.filter.Predicate.
V1 Filter is legacy in Spark 3.3+; V2 Predicate is the current recommended API and supports arbitrary named predicates, making future extensions (e.g. custom functions) straightforward.
No external behavior changes — SQL filter semantics, zonemap pruning, _rowaddr fragment pruning, and the compiled SQL WHERE clause are preserved.

Notable details

V2 Literal.value() returns UTF8String for strings; normalized to String in ZonemapFragmentPruner before compareTo against zone min/max stats (which store native String).
Column access via NamedReference.fieldNames() joined with "." for nested paths (drop-in for V1 Filter.references() flat names).
Test helper TestPredicates builds canonical V2 predicates with the same shape Spark produces via SupportsPushDownV2Filters, including UTF8String/Date-epoch-days/Timestamp-epoch-micros conversion matching Spark's V2 literal representation.

Test plan

make install — builds successfully
make test — 619 tests pass (0 failures, 0 errors)
make lint — checkstyle + spotless pass
No remaining references to V1 Filter API anywhere in the codebase

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Replace `SupportsPushDownFilters` with `SupportsPushDownV2Filters` in `LanceScanBuilder`, and migrate the downstream filter-analysis utilities (`FilterPushDown`, `RowAddressFilterAnalyzer`, `ZonemapFragmentPruner`) from the sealed `org.apache.spark.sql.sources.Filter` hierarchy to the extensible `org.apache.spark.sql.connector.expressions.filter.Predicate`. Why: V1 `Filter` is legacy in Spark 3.3+; V2 `Predicate` is the current recommended API and supports arbitrary named predicates, making future extensions (e.g. custom functions) straightforward. No external behavior changes — SQL filter semantics, zonemap pruning, `_rowaddr` fragment pruning, and the compiled SQL `WHERE` clause are preserved. Notable details: - V2 `Literal.value()` returns `UTF8String` for strings; normalized to `String` in `ZonemapFragmentPruner` before `compareTo` against zone min/max stats (which store native `String`). - Column access via `NamedReference.fieldNames()` joined with "." for nested paths (drop-in for V1 `Filter.references()` flat names). - Test helper `TestPredicates` builds canonical V2 predicates with the same shape Spark produces via `SupportsPushDownV2Filters`, including UTF8String/Date-epoch-days/Timestamp-epoch-micros conversion matching Spark's V2 literal representation. Regression: all 286 base-module unit tests pass unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: migrate filter pushdown from V1 Filter to V2 Predicate API#448

refactor: migrate filter pushdown from V1 Filter to V2 Predicate API#448
wombatu-kun wants to merge 1 commit intolance-format:mainfrom
wombatu-kun:v1-v2-pushdown-migration

wombatu-kun commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wombatu-kun commented Apr 18, 2026

Summary

Notable details

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant