Refactor PushDownFilter benchmark suite: add fast default mode, validation, and unified A/B harness#21029
Open
kosiew wants to merge 9 commits intoapache:mainfrom
Open
Refactor PushDownFilter benchmark suite: add fast default mode, validation, and unified A/B harness#21029kosiew wants to merge 9 commits intoapache:mainfrom
kosiew wants to merge 9 commits intoapache:mainfrom
Conversation
Contributor
Author
|
run benchmark sql_planner_extended |
Contributor
Author
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#21029 (comment)).
|
|
🤖 Criterion benchmark running (GKE) | trigger |
…rganization - Refactored functions to build left join DataFrames with push down filters. - Created `bench_push_down_filter_ab` for streamlined benchmarking of push down filter impact. - Updated benchmark groups to use the new building functions, enhancing readability and maintainability.
…n_df_with_push_down_filter` This commit removes the `build_case_heavy_left_join_df_with_push_down_filter` duplicate function from the `sql_planner_extended.rs` benchmark file
… checks - Added assertions to verify inference candidates in case-heavy left join data frames. - Introduced helper functions `find_filter_predicates` and `assert_case_heavy_left_join_inference_candidates` for better structure and readability. - Updated join logic in `build_case_heavy_left_join_query` for more complex case handling. This update improves the robustness of benchmarks by ensuring correctness in filter predicate references related to join keys.
…ery handling - Removed unnecessary functions and constants related to push down filter sweep configurations. - Simplified the logic for constructing test DataFrames, focusing on essential parameters for benchmarks. - Enhanced clarity of the benchmarks by differentiating cases for `with_push_down_filter` and `without_push_down_filter`. - Updated the implementation to improve readability and maintainability.
- Updated `find_filter_predicates` function to streamline the code by removing unnecessary line breaks and retaining clarity in the error message when the expected structure is not met. - Ensured that the function continues to accurately identify and handle logical plans with projections.
…ests - Refactored case heavy and non-case left join benchmark functions to include push down filter tests. - Added utility functions to configure benchmark sweeps for push down filters, making it customizable via environment variables. - Improved assertions for filter predicates in case heavy left join inference. - Cleaned up and organized existing benchmark code for clarity and reuse.
074c431 to
e2af627
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
The existing PushDownFilter benchmark suite is difficult to use for iteration and debugging due to:
These issues make it hard to distinguish between slow execution and incorrect behavior, reducing developer productivity when investigating optimizer performance.
This PR improves usability, correctness, and maintainability of the benchmark suite.
What changes are included in this PR?
Introduced a lightweight default sweep (
DEFAULT_SWEEP_POINTS) for faster local iterationAdded optional full sweep mode controlled by
DATAFUSION_PUSH_DOWN_FILTER_FULL_SWEEPRefactored benchmark loops into a reusable helper (
bench_push_down_filter_ab) to unify A/B comparisonsUnified DataFrame construction via
build_left_join_df_with_push_down_filterAdded validation helpers:
find_filter_predicatesassert_case_heavy_left_join_inference_candidatesImproved CASE expression generation to better simulate realistic predicate shapes and ensure join-key involvement
Ensured predicates reference both join keys and non-join columns for meaningful PushDownFilter evaluation
Are these changes tested?
Yes.
Assertions were added to validate that generated predicates:
l.c0orr.c0)These validations act as correctness checks during benchmark construction and help prevent silent logical errors.
Are there any user-facing changes?
No user-facing API changes.
However:
Benchmark execution behavior has improved:
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.