Skip to content

Detect contradictory predicates in Expressions.and() to enable scan p…#16071

Open
arthurli-dotcom wants to merge 1 commit intoapache:mainfrom
arthurli-dotcom:expression-contradiction-detection
Open

Detect contradictory predicates in Expressions.and() to enable scan p…#16071
arthurli-dotcom wants to merge 1 commit intoapache:mainfrom
arthurli-dotcom:expression-contradiction-detection

Conversation

@arthurli-dotcom
Copy link
Copy Markdown

…runing

When Spark pushes predicates through UNION ALL views to Iceberg, contradictory filters like col = 'a' AND col != 'a' can end up combined in a single scan. Neither Spark's Catalyst optimizer nor Iceberg's expression system detected these contradictions, causing Iceberg to plan tens of thousands of file tasks that would return zero rows.

This adds contradiction detection to Expressions.and() for:

  • EQ(col, a) AND NOT_EQ(col, a) with same value
  • EQ(col, a) AND EQ(col, b) with different values
  • EQ(col, val) AND NOT_IN(col, {val, ...})
  • EQ(col, val) AND IN(col, set) where val not in set
  • Contradictions through nested AND expressions

Handles both bound predicates (post-binding) and unbound predicates (pre-binding, as used by SparkScanBuilder).

…runing

When Spark pushes predicates through UNION ALL views to Iceberg, contradictory
filters like `col = 'a' AND col != 'a'` can end up combined in a single scan.
Neither Spark's Catalyst optimizer nor Iceberg's expression system detected
these contradictions, causing Iceberg to plan tens of thousands of file tasks
that would return zero rows.

This adds contradiction detection to Expressions.and() for:
- EQ(col, a) AND NOT_EQ(col, a) with same value
- EQ(col, a) AND EQ(col, b) with different values
- EQ(col, val) AND NOT_IN(col, {val, ...})
- EQ(col, val) AND IN(col, set) where val not in set
- Contradictions through nested AND expressions

Handles both bound predicates (post-binding) and unbound predicates
(pre-binding, as used by SparkScanBuilder).
@github-actions github-actions Bot added the API label Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant