Avoid null-restrict evaluation for predicates that reference non-join columns in PushDownFilter#20961
Avoid null-restrict evaluation for predicates that reference non-join columns in PushDownFilter#20961kosiew wants to merge 64 commits intoapache:mainfrom
Conversation
Introduce a test case to assert non-restricting behavior when evaluating the predicate a > b, focusing on join keys that only include a. This directly tests the new early-return branch in the is_restrict_null_predicate function in utils.rs, enhancing overall code coverage.
Extract the column-membership check into a new helper function called `predicate_uses_only_columns` in utils.rs. Update the current implementation at utils.rs:91 to use this new helper, improving code readability and maintainability.
Add call-site contract comment in push_down_filter.rs to specify that only Ok(true) is treated as null-restricting. State that both Ok(false) and Err(_) are considered non-restricting and will be skipped during processing.
Inline iterator predicate in utils.rs and streamline the null-restrict handling in push_down_filter.rs. This reduces indirections and lines of code while maintaining the same logic and behavior. No public interface or behavior changes intended.
…te_uses_only_columns function
|
run benchmark sql_planner_extended |
|
🤖 Criterion benchmark running (GKE) | trigger |
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)). No pending jobs. |
|
run benchmark sql_planner_extended |
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).
|
|
🤖 Criterion benchmark running (GKE) | trigger |
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).
|
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).
|
Implement fast path for syntactic null-restriction in utils.rs to classify predicates without evaluating physical expressions. Enhance SQL boolean handling with a large supporting evaluator, including CASE management. Retain existing branch helper styles and expand test coverage for constant simple CASE and outside-join-key fast paths. Fix correctness edge case for simple CASE indeterminate comparisons, ensuring proper tracking and fallback to Unknown.
|
I think the benchmark never completes or gets killed because it's too heavy. |
|
run benchmark sql_planner_extended --sample-size 10 |
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).
|
|
🤖 Criterion benchmark running (GKE) | trigger |
|
🤖 Criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
🤖 Criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
Update the function name to specify its relevance to scalar subquery cross-joins. Add an intent comment for better understanding of its purpose. Replace the old function call in join predicate handling for improved readability.
Clarify supported expression/operator families in the syntactic evaluator. Emphasize that returning None indicates deferral to authoritative evaluation, rather than "non-restricting." Ensure unsupported variants also return None for consistency.
Broaden derived-relation detection to include projection wrappers over derived relations. Add regression tests to cover alias/projection shape changes and ensure mixed-side filters are preserved. Implement a panic-path robustness test to confirm that eval mode resets properly, even on closure panic using catch_unwind.
Combine plan-wrapper traversal and cross-join shape detection. Shorten join-column replacement scan and share authoritative null-result decoding. Remove unused helpers and reorganize strict-null operator list behind a classifier helper. Public interfaces remain unchanged.
authoritative_restrict_null_predicate Restore evaluates_to_null behavior for general expressions without boolean-downcasting arrays. Fix scalar_subquery_with_non_strong_project regression. Update authoritative_restrict_null_predicate to handle predicate results directly, treating scalar NULL as null-restricting, resolving the CASE ... ELSE NULL test failure. Maintain non-join-column fast path functionality.
Consolidate repeated join-input wrapper inspections into a single JoinInputShape classifier. Hoist scalar-subquery cross-join shape check out of the predicate loop and unify repeated left/right predicate bucketing. Remove temporary Vec in join-column replacement inference and narrow test-only null-restriction mode support into its own helper module. Share column-subset check path and extract helper for authoritative null-evaluation results. Reduce repetition in syntactic null-restriction evaluator by factoring strict-null-preserving unary cases.
6d3f368 to
106e963
Compare
Eliminate JoinInputShape/PredicateDestination layers and integrate scalar-subquery handling into the main join-pushdown flow. Revert join-inference replacement logic to an explicit loop. Remove unnecessary helper functions and simplify the null-restriction evaluator for cleaner code.
Flatten single-use helper layers and tighten small utility helpers. In push_down_filter, fold the join-input classifier and bind predicate.column_refs() once for join inference. Shorten test-only import surface in utils.rs and compact mapping helpers in null_restriction.rs without changing semantics.
Tighten scalar-subquery guard, remove redundant empty-loop scaffolding, and share more of the null-evaluation flow. Simplify small null-restriction helpers and match arms.
Extract shared helpers in push_down_filter_regressions.rs and push_down_filter.rs to reduce code duplication. Consolidate optimizer-delta test assertions and create specific plan builders for common expressions. Add a utility in utils.rs to evaluate predicates under different null-restriction modes, streamlining mode-comparison tests and enhancing maintainability.
Restore is_restrict_null_predicate to its original behavior by removing the broader syntactic null-restriction path and its associated test scaffolding. Maintain early rejection for the push_down_filter caller while eliminating extra tree-walk work on common paths. Confirmed changes by running tests and formatting.
dfdd011 to
d0f0976
Compare
Reinstate the deleted fast path for is_restrict_null_predicate in null_restriction.rs. Implement a two-stage evaluation process: return early false for mixed-reference predicates, perform syntactic evaluation for supported join-key-only predicates, and ensure authoritative fallback is applied only when necessary.
d0f0976 to
847a8fa
Compare
d79cc72 to
6ef1c84
Compare
Update binary_boolean_value to return None for mixed boolean states instead of asserting. This change allows is_restrict_null_predicate to use the authoritative evaluator, preventing panics from unsupported cases. Add regression tests to ensure proper behavior in unsupported boolean-wrapper scenarios, maintaining consistent functionality between auto mode and authoritative mode.
86609f2 to
a355cb0
Compare
- Refactored `classify_join_input` to `strip_plan_wrappers` for clearer handling of logical plan wrappers. - Introduced helper functions `is_scalar_aggregate_subquery` and `is_derived_relation` to check for specific logical plan structures. - Enhanced `is_scalar_subquery_cross_join` to streamline the evaluation logic of joins with scalar subqueries. - Added `should_keep_filter_above_scalar_subquery_cross_join` to manage filter preservation based on join conditions. - Adjusted the predicate handling in `push_down_all_join` to utilize the new helper functions, improving filter application logic. - Updated tests for window operations over scalar subquery cross joins to maintain correct behavior and refactored test setup for clarity.
a355cb0 to
1f2b598
Compare
Which issue does this PR close?
Rationale for this change
PushDownFilterhas been identified as a major planning-time bottleneck, with profiling showing significant time spent repeatedly evaluating expression types while reasoning about predicate pushdown and null-restriction behavior. This PR addresses a slice of that problem by reducing unnecessary work in null-restriction evaluation and by tightening join-filter conversion behavior to avoid incorrect rewrites in scalar-side join cases.In particular, the patch focuses on two goals:
Together, these changes both improve optimizer robustness and target one of the hot paths called out in the profiling for
PushDownFilter.What changes are included in this PR?
This PR includes the following changes:
Adds regression SQL tests under
datafusion/core/tests/sql/push_down_filter_regressions.rsand wires the new module into the SQL test suite.Adds end-to-end regression coverage for:
push_down_filterenabled/disabledINsubqueriesNATURAL JOINcombined withUNION ALLAdds an optimizer regression test to ensure a post-join filter is retained for a cross join where one side is scalar/aggregated.
Changes
push_down_filterso filters are only converted into join conditions for inner joins when it is safe to do so, specifically avoiding that conversion for joins with emptyONclauses when either side is known to be scalar (max_rows() == Some(1)).Updates inferred predicate handling to treat null-restriction evaluation failures conservatively via
unwrap_or(false).Refactors null-restriction evaluation in
optimizer::utilsby:NullRestrictionEvalModeand a test-only guard for controlling evaluation modeExpands unit test coverage for
is_restrict_null_predicateand for agreement between the syntactic fast path and the authoritative evaluator.Are these changes tested?
Yes.
The PR adds both logical optimizer tests and SQL-level regression coverage:
is_restrict_null_predicate, including additionalCASE,BETWEEN,IN, and null-related predicate formspush_down_filterenabled and disabledThese tests verify both correctness and protection against the regressions addressed by this patch.
Are there any user-facing changes?
There are no intended user-facing API changes.
This PR improves optimizer behavior and planner performance characteristics internally, and fixes incorrect or fragile filter-pushdown behavior for certain query shapes. Users may observe more stable planning behavior and preserved correctness for affected queries, but no documentation or API changes should be required.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.