Skip to content

feat: adaptive filter selectivity tracking for Parquet row filters#19639

Open
adriangb wants to merge 2 commits intoapache:mainfrom
pydantic:filter-pushdown-dynamic
Open

feat: adaptive filter selectivity tracking for Parquet row filters#19639
adriangb wants to merge 2 commits intoapache:mainfrom
pydantic:filter-pushdown-dynamic

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Jan 4, 2026

Summary

This PR implements cross-file tracking of filter selectivity in ParquetSource to adaptively reorder and demote low-selectivity filters, as discussed in #3463 (comment).

Key changes:

  • Add SelectivityTracker to track filter effectiveness across files using ExprKey wrapper for structural equality
  • Each ParquetOpener queries shared stats to partition filters into row filters (push down) vs post-scan filters (inline application)
  • Post-scan filters are added to projection, applied inline in stream via apply_post_scan_filters(), then filter columns are removed from output
  • SelectivityUpdatingStream wrapper updates tracker when stream completes
  • build_row_filter_with_metrics() returns per-filter metrics for selectivity tracking
  • Filters are reordered by observed effectiveness (most selective first)

Configuration:

  • parquet_options.filter_effectiveness_threshold (default: 0.8)
  • Effectiveness = 1 - (rows_matched / rows_total) = fraction of rows filtered out
  • Filters with effectiveness < threshold are demoted to post-scan

Files added:

  • datafusion/datasource-parquet/src/selectivity.rs - Core tracking infrastructure

Files modified:

  • opener.rs - Filter partitioning, post-scan application, SelectivityUpdatingStream
  • row_filter.rs - FilterMetrics, RowFilterWithMetrics, effectiveness-based reordering
  • source.rs - selectivity_tracker field and builder methods
  • config.rs - filter_effectiveness_threshold config option

Test plan

  • Unit tests for ExprKey hash/eq consistency
  • Unit tests for SelectivityStats::effectiveness() edge cases
  • Unit tests for SelectivityTracker::partition_filters() threshold logic
  • Existing test suite passes
  • Integration tests for post-scan filter application
  • End-to-end tests for adaptive behavior across files
  • Performance benchmarks

🤖 Generated with Claude Code

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate datasource Changes to the datasource crate proto Related to proto crate labels Jan 4, 2026
@adriangb
Copy link
Contributor Author

adriangb commented Jan 4, 2026

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (3065a0e) to 955fd41 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@adriangb
Copy link
Contributor Author

adriangb commented Jan 4, 2026

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @adriangb, you asked to view the benchmark queue (#19639 (comment)).

Job User Benchmarks Comment
19639_3708500887.sh adriangb default https://github.com/apache/datafusion/pull/19639#issuecomment-3708500887

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ filter-pushdown-dynamic ┃         Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2417.81 ms │              2359.51 ms │      no change │
│ QQuery 1     │   909.35 ms │               930.52 ms │      no change │
│ QQuery 2     │  1873.59 ms │              1891.90 ms │      no change │
│ QQuery 3     │  1150.33 ms │              1160.92 ms │      no change │
│ QQuery 4     │  2297.04 ms │              2302.27 ms │      no change │
│ QQuery 5     │ 28141.18 ms │             28259.93 ms │      no change │
│ QQuery 6     │  3995.68 ms │               230.39 ms │ +17.34x faster │
│ QQuery 7     │  3748.71 ms │              3945.39 ms │   1.05x slower │
└──────────────┴─────────────┴─────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 44533.69ms │
│ Total Time (filter-pushdown-dynamic)   │ 41080.84ms │
│ Average Time (HEAD)                    │  5566.71ms │
│ Average Time (filter-pushdown-dynamic) │  5135.11ms │
│ Queries Faster                         │          1 │
│ Queries Slower                         │          1 │
│ Queries with No Change                 │          6 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ filter-pushdown-dynamic ┃         Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0     │     1.41 ms │                 1.46 ms │      no change │
│ QQuery 1     │    50.80 ms │                51.07 ms │      no change │
│ QQuery 2     │   133.96 ms │               133.81 ms │      no change │
│ QQuery 3     │   153.01 ms │               151.42 ms │      no change │
│ QQuery 4     │  1056.19 ms │              1088.14 ms │      no change │
│ QQuery 5     │  1351.78 ms │              1406.79 ms │      no change │
│ QQuery 6     │     1.45 ms │                 1.44 ms │      no change │
│ QQuery 7     │    55.47 ms │                69.68 ms │   1.26x slower │
│ QQuery 8     │  1441.54 ms │              1478.63 ms │      no change │
│ QQuery 9     │  1874.93 ms │              1844.17 ms │      no change │
│ QQuery 10    │   340.79 ms │               479.23 ms │   1.41x slower │
│ QQuery 11    │   398.06 ms │               547.82 ms │   1.38x slower │
│ QQuery 12    │  1254.38 ms │              1489.86 ms │   1.19x slower │
│ QQuery 13    │  2007.28 ms │              2138.37 ms │   1.07x slower │
│ QQuery 14    │  1233.55 ms │              1461.97 ms │   1.19x slower │
│ QQuery 15    │  1253.12 ms │              1280.44 ms │      no change │
│ QQuery 16    │  2582.68 ms │              2557.03 ms │      no change │
│ QQuery 17    │  2577.73 ms │              2584.75 ms │      no change │
│ QQuery 18    │  5810.75 ms │              4857.70 ms │  +1.20x faster │
│ QQuery 19    │   122.85 ms │               142.04 ms │   1.16x slower │
│ QQuery 20    │  1938.15 ms │              1881.92 ms │      no change │
│ QQuery 21    │  2253.41 ms │              2312.49 ms │      no change │
│ QQuery 22    │  3794.87 ms │              3262.80 ms │  +1.16x faster │
│ QQuery 23    │ 19266.34 ms │              1268.50 ms │ +15.19x faster │
│ QQuery 24    │   213.96 ms │               298.77 ms │   1.40x slower │
│ QQuery 25    │   473.16 ms │               621.61 ms │   1.31x slower │
│ QQuery 26    │   232.57 ms │               328.22 ms │   1.41x slower │
│ QQuery 27    │  2702.33 ms │              2516.97 ms │  +1.07x faster │
│ QQuery 28    │ 23553.89 ms │             21856.71 ms │  +1.08x faster │
│ QQuery 29    │   976.86 ms │               972.07 ms │      no change │
│ QQuery 30    │  1328.89 ms │              1332.73 ms │      no change │
│ QQuery 31    │  1373.19 ms │              1355.09 ms │      no change │
│ QQuery 32    │  5179.01 ms │              4582.04 ms │  +1.13x faster │
│ QQuery 33    │  5626.50 ms │              5289.73 ms │  +1.06x faster │
│ QQuery 34    │  5864.31 ms │              5660.80 ms │      no change │
│ QQuery 35    │  1928.59 ms │              1940.03 ms │      no change │
│ QQuery 36    │    66.34 ms │                14.39 ms │  +4.61x faster │
│ QQuery 37    │    43.87 ms │                14.31 ms │  +3.07x faster │
│ QQuery 38    │    64.91 ms │                13.90 ms │  +4.67x faster │
│ QQuery 39    │   104.78 ms │                12.40 ms │  +8.45x faster │
│ QQuery 40    │    26.49 ms │                15.59 ms │  +1.70x faster │
│ QQuery 41    │    22.49 ms │                14.04 ms │  +1.60x faster │
│ QQuery 42    │    19.10 ms │                13.66 ms │  +1.40x faster │
└──────────────┴─────────────┴─────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 100755.75ms │
│ Total Time (filter-pushdown-dynamic)   │  79344.60ms │
│ Average Time (HEAD)                    │   2343.16ms │
│ Average Time (filter-pushdown-dynamic) │   1845.22ms │
│ Queries Faster                         │          14 │
│ Queries Slower                         │          10 │
│ Queries with No Change                 │          19 │
│ Queries with Failure                   │           0 │
└────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 121.20 ms │               115.82 ms │     no change │
│ QQuery 2     │  29.95 ms │                27.73 ms │ +1.08x faster │
│ QQuery 3     │  38.04 ms │                32.55 ms │ +1.17x faster │
│ QQuery 4     │  29.65 ms │                29.55 ms │     no change │
│ QQuery 5     │  87.21 ms │                86.04 ms │     no change │
│ QQuery 6     │  19.65 ms │                19.50 ms │     no change │
│ QQuery 7     │ 215.75 ms │               214.50 ms │     no change │
│ QQuery 8     │  33.85 ms │                32.06 ms │ +1.06x faster │
│ QQuery 9     │  93.47 ms │                96.83 ms │     no change │
│ QQuery 10    │  62.68 ms │                63.08 ms │     no change │
│ QQuery 11    │  17.67 ms │                18.66 ms │  1.06x slower │
│ QQuery 12    │  50.57 ms │                50.22 ms │     no change │
│ QQuery 13    │  46.61 ms │                46.59 ms │     no change │
│ QQuery 14    │  13.05 ms │                13.65 ms │     no change │
│ QQuery 15    │  24.12 ms │                24.02 ms │     no change │
│ QQuery 16    │  24.23 ms │                23.95 ms │     no change │
│ QQuery 17    │ 148.81 ms │               149.81 ms │     no change │
│ QQuery 18    │ 279.28 ms │               269.82 ms │     no change │
│ QQuery 19    │  37.62 ms │                37.50 ms │     no change │
│ QQuery 20    │  50.02 ms │                50.24 ms │     no change │
│ QQuery 21    │ 312.84 ms │               305.36 ms │     no change │
│ QQuery 22    │  17.35 ms │                17.24 ms │     no change │
└──────────────┴───────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 1753.60ms │
│ Total Time (filter-pushdown-dynamic)   │ 1724.72ms │
│ Average Time (HEAD)                    │   79.71ms │
│ Average Time (filter-pushdown-dynamic) │   78.40ms │
│ Queries Faster                         │         3 │
│ Queries Slower                         │         1 │
│ Queries with No Change                 │        18 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

@adriangb
Copy link
Contributor Author

adriangb commented Jan 4, 2026

run benchmarks tpch

@alamb-ghbot
Copy link

🤖 Hi @adriangb, thanks for the request (#19639 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, external_aggr, tpcds, tpch, tpch10, tpch_mem, tpch_mem10
  • Criterion: aggregate_query_sql, aggregate_vectorized, case_when, character_length, in_list, range_and_generate_series, sort, sql_planner, strpos, with_hashes

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

@adriangb
Copy link
Contributor Author

adriangb commented Jan 4, 2026

run benchmark tpch

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (3065a0e) to 955fd41 diff using: tpch
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ filter-pushdown-dynamic ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 203.54 ms │               196.22 ms │    no change │
│ QQuery 2     │  91.58 ms │               128.48 ms │ 1.40x slower │
│ QQuery 3     │ 126.46 ms │               195.37 ms │ 1.54x slower │
│ QQuery 4     │  77.57 ms │               125.63 ms │ 1.62x slower │
│ QQuery 5     │ 172.74 ms │               334.33 ms │ 1.94x slower │
│ QQuery 6     │  67.70 ms │               109.04 ms │ 1.61x slower │
│ QQuery 7     │ 216.22 ms │               256.53 ms │ 1.19x slower │
│ QQuery 8     │ 162.62 ms │               266.87 ms │ 1.64x slower │
│ QQuery 9     │ 228.36 ms │               396.32 ms │ 1.74x slower │
│ QQuery 10    │ 182.90 ms │               273.24 ms │ 1.49x slower │
│ QQuery 11    │  73.68 ms │                97.62 ms │ 1.32x slower │
│ QQuery 12    │ 115.03 ms │               259.93 ms │ 2.26x slower │
│ QQuery 13    │ 212.04 ms │               202.51 ms │    no change │
│ QQuery 14    │  91.21 ms │               105.22 ms │ 1.15x slower │
│ QQuery 15    │ 119.99 ms │               167.31 ms │ 1.39x slower │
│ QQuery 16    │  56.96 ms │                86.84 ms │ 1.52x slower │
│ QQuery 17    │ 281.93 ms │               276.68 ms │    no change │
│ QQuery 18    │ 317.53 ms │               664.90 ms │ 2.09x slower │
│ QQuery 19    │ 135.10 ms │               179.47 ms │ 1.33x slower │
│ QQuery 20    │ 125.82 ms │               153.15 ms │ 1.22x slower │
│ QQuery 21    │ 264.64 ms │               328.90 ms │ 1.24x slower │
│ QQuery 22    │  42.98 ms │                67.05 ms │ 1.56x slower │
└──────────────┴───────────┴─────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 3366.60ms │
│ Total Time (filter-pushdown-dynamic)   │ 4871.61ms │
│ Average Time (HEAD)                    │  153.03ms │
│ Average Time (filter-pushdown-dynamic) │  221.44ms │
│ Queries Faster                         │         0 │
│ Queries Slower                         │        19 │
│ Queries with No Change                 │         3 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

@adriangb
Copy link
Contributor Author

adriangb commented Jan 4, 2026

run benchmark tpch

@github-actions github-actions bot added documentation Improvements or additions to documentation core Core DataFusion crate labels Jan 4, 2026
@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (6af7b28) to 955fd41 diff using: tpch
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 201.63 ms │                    FAIL │  incomparable │
│ QQuery 2     │  95.11 ms │                    FAIL │  incomparable │
│ QQuery 3     │ 124.15 ms │                    FAIL │  incomparable │
│ QQuery 4     │  76.53 ms │                    FAIL │  incomparable │
│ QQuery 5     │ 173.60 ms │                    FAIL │  incomparable │
│ QQuery 6     │  68.07 ms │                    FAIL │  incomparable │
│ QQuery 7     │ 214.34 ms │              9433.89 ms │ 44.01x slower │
│ QQuery 8     │ 161.99 ms │                    FAIL │  incomparable │
│ QQuery 9     │ 225.75 ms │                    FAIL │  incomparable │
│ QQuery 10    │ 187.86 ms │                    FAIL │  incomparable │
│ QQuery 11    │  73.04 ms │                    FAIL │  incomparable │
│ QQuery 12    │ 118.45 ms │                    FAIL │  incomparable │
│ QQuery 13    │ 212.77 ms │                    FAIL │  incomparable │
│ QQuery 14    │  91.33 ms │                    FAIL │  incomparable │
│ QQuery 15    │ 120.45 ms │                    FAIL │  incomparable │
│ QQuery 16    │  56.10 ms │                    FAIL │  incomparable │
│ QQuery 17    │ 271.89 ms │                    FAIL │  incomparable │
│ QQuery 18    │ 317.17 ms │              9294.69 ms │ 29.30x slower │
│ QQuery 19    │ 134.01 ms │                    FAIL │  incomparable │
│ QQuery 20    │ 127.50 ms │                    FAIL │  incomparable │
│ QQuery 21    │ 263.41 ms │                    FAIL │  incomparable │
│ QQuery 22    │  43.88 ms │                    FAIL │  incomparable │
└──────────────┴───────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │   531.52ms │
│ Total Time (filter-pushdown-dynamic)   │ 18728.57ms │
│ Average Time (HEAD)                    │   265.76ms │
│ Average Time (filter-pushdown-dynamic) │  9364.29ms │
│ Queries Faster                         │          0 │
│ Queries Slower                         │          2 │
│ Queries with No Change                 │          0 │
│ Queries with Failure                   │         20 │
└────────────────────────────────────────┴────────────┘

@adriangb
Copy link
Contributor Author

adriangb commented Jan 5, 2026

This is probably not a good issue to pick up. This is a draft PR for an unproven idea.

@adriangb
Copy link
Contributor Author

adriangb commented Jan 5, 2026

run benchmark tpch

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (435e83f) to 955fd41 diff using: tpch
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 199.58 ms │               195.16 ms │     no change │
│ QQuery 2     │  94.23 ms │               149.31 ms │  1.58x slower │
│ QQuery 3     │ 122.60 ms │               193.79 ms │  1.58x slower │
│ QQuery 4     │  76.28 ms │                80.36 ms │  1.05x slower │
│ QQuery 5     │ 170.05 ms │               446.09 ms │  2.62x slower │
│ QQuery 6     │  66.15 ms │                56.58 ms │ +1.17x faster │
│ QQuery 7     │ 213.98 ms │               389.72 ms │  1.82x slower │
│ QQuery 8     │ 167.91 ms │               522.12 ms │  3.11x slower │
│ QQuery 9     │ 223.74 ms │               997.35 ms │  4.46x slower │
│ QQuery 10    │ 178.95 ms │               473.07 ms │  2.64x slower │
│ QQuery 11    │  74.02 ms │                92.17 ms │  1.25x slower │
│ QQuery 12    │ 113.71 ms │               191.25 ms │  1.68x slower │
│ QQuery 13    │ 214.39 ms │               213.82 ms │     no change │
│ QQuery 14    │  93.74 ms │               328.15 ms │  3.50x slower │
│ QQuery 15    │ 114.82 ms │               123.16 ms │  1.07x slower │
│ QQuery 16    │  56.52 ms │                80.61 ms │  1.43x slower │
│ QQuery 17    │ 270.57 ms │               302.50 ms │  1.12x slower │
│ QQuery 18    │ 312.20 ms │               678.27 ms │  2.17x slower │
│ QQuery 19    │ 132.32 ms │               172.23 ms │  1.30x slower │
│ QQuery 20    │ 123.55 ms │               257.72 ms │  2.09x slower │
│ QQuery 21    │ 267.67 ms │               386.39 ms │  1.44x slower │
│ QQuery 22    │  42.97 ms │                60.79 ms │  1.41x slower │
└──────────────┴───────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 3329.95ms │
│ Total Time (filter-pushdown-dynamic)   │ 6390.60ms │
│ Average Time (HEAD)                    │  151.36ms │
│ Average Time (filter-pushdown-dynamic) │  290.48ms │
│ Queries Faster                         │         1 │
│ Queries Slower                         │        19 │
│ Queries with No Change                 │         2 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

@adriangb
Copy link
Contributor Author

adriangb commented Jan 5, 2026

run benchmark tpch

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (78a587d) to 955fd41 diff using: tpch
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 199.59 ms │               198.00 ms │     no change │
│ QQuery 2     │  93.78 ms │               149.46 ms │  1.59x slower │
│ QQuery 3     │ 123.68 ms │               195.58 ms │  1.58x slower │
│ QQuery 4     │  75.93 ms │                80.22 ms │  1.06x slower │
│ QQuery 5     │ 166.95 ms │               444.15 ms │  2.66x slower │
│ QQuery 6     │  68.71 ms │                58.23 ms │ +1.18x faster │
│ QQuery 7     │ 208.39 ms │               401.46 ms │  1.93x slower │
│ QQuery 8     │ 161.03 ms │               542.26 ms │  3.37x slower │
│ QQuery 9     │ 227.68 ms │               962.95 ms │  4.23x slower │
│ QQuery 10    │ 180.52 ms │               469.65 ms │  2.60x slower │
│ QQuery 11    │  74.00 ms │                95.13 ms │  1.29x slower │
│ QQuery 12    │ 114.07 ms │               190.16 ms │  1.67x slower │
│ QQuery 13    │ 215.88 ms │               217.12 ms │     no change │
│ QQuery 14    │  91.12 ms │               323.06 ms │  3.55x slower │
│ QQuery 15    │ 119.72 ms │               118.02 ms │     no change │
│ QQuery 16    │  55.77 ms │                85.25 ms │  1.53x slower │
│ QQuery 17    │ 267.21 ms │               306.83 ms │  1.15x slower │
│ QQuery 18    │ 308.46 ms │               660.48 ms │  2.14x slower │
│ QQuery 19    │ 132.97 ms │               174.91 ms │  1.32x slower │
│ QQuery 20    │ 124.63 ms │               260.62 ms │  2.09x slower │
│ QQuery 21    │ 260.58 ms │               373.49 ms │  1.43x slower │
│ QQuery 22    │  42.65 ms │                58.52 ms │  1.37x slower │
└──────────────┴───────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 3313.34ms │
│ Total Time (filter-pushdown-dynamic)   │ 6365.57ms │
│ Average Time (HEAD)                    │  150.61ms │
│ Average Time (filter-pushdown-dynamic) │  289.34ms │
│ Queries Faster                         │         1 │
│ Queries Slower                         │        18 │
│ Queries with No Change                 │         3 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Jan 5, 2026

Did you find any evidence that the selectivity of predicates changes over the course of the query (or put another way that reordering them during execution would help?)

@adriangb
Copy link
Contributor Author

adriangb commented Jan 5, 2026

Did you find any evidence that the selectivity of predicates changes over the course of the query (or put another way that reordering them during execution would help?)

One case where that for sure happens is dynamic filters 😉 (although we treat each version of it as a different filter, the point is that we need to be dynamic about new filters showing up mid query). But I think it's also not unusual to have unevenly distributed data across files. I do agree that if we change how we apply a filter between files that probably captures 95% of the benefit (as opposed to within a scan of a single file).

But the main point is that we start from "we know nothing" which we're treating as "nothing is selective" and then once we know a filter is selective enough we move it over to be a row filter.

@sdf-jkl
Copy link
Contributor

sdf-jkl commented Jan 5, 2026

Hey @adriangb, I've been thinking about something like this since the New Year. It's really cool to see you putting together a draft for it.

I haven't had a chance to give a full go at your code, but I wanted to share some research I've done earlier that might be relevant:

Before seeing your PR and comments in #3463 I was thinking about using more simple heuristics for sorting predicates.

  • col type -> size
  • cardinality of the predicate operator -> = > (>, <) > (>=, <=) > != etc.
  • how simple/complex the predicate -> how long/ how much CPU it takes to evaluate
  • col encoding -> if it supports random access, we could filter without decoding (API for filtering / evaluation directly on encoded data arrow-rs#8842)
  • prioritize using indexes first too

From a quick skim of the clickhouse original PR, they still rely on some simple heuristics when columns statistics aren't available.

I would like to give your PR a proper review once I'm home, but I already love the direction you're taking.

@Dandandan
Copy link
Contributor

Seeing the results in #19694 (comment) I think something like this is needed.

However I also see some regressions there that are related to the execution order of the filter expressions (such as regex / string matching functions coming earlier now). I guess we also need some adaptiveness for that (e.g. based on measured time or some static heuristic).

@Dandandan
Copy link
Contributor

Dandandan commented Feb 4, 2026

Perhaps that should also help with other expensive expressions like for join filter pushdown - we materialize columns using cheap/selective filters first and then expensive ones like case when 1 then ... 2... 3....

@adriangb
Copy link
Contributor Author

adriangb commented Feb 4, 2026

Yeah makes sense to me 👍🏻. But if we can't show via benchmarks this is an immediate win... can we justify moving forward with it?

@Dandandan
Copy link
Contributor

In #19694 (comment) it seems this adaptivity is better than main (almost no regression on tpch and clickbench) when also setting DATAFUSION_OPTIMIZER_ENABLE_JOIN_DYNAMIC_FILTER_PUSHDOWN to false.
The remaining part seems mostly about getting the expression order right...

@adriangb adriangb force-pushed the filter-pushdown-dynamic branch from c0b86d7 to b13bd64 Compare February 14, 2026 16:27
@adriangb
Copy link
Contributor Author

run benchmark tpcds
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true
DATAFUSION_OPTIMIZER_ENABLE_JOIN_DYNAMIC_FILTER_PUSHDOWN=false

@adriangb
Copy link
Contributor Author

run benchmark clickbench
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true
DATAFUSION_OPTIMIZER_ENABLE_JOIN_DYNAMIC_FILTER_PUSHDOWN=false

@alamb-ghbot
Copy link

🤖 Hi @adriangb, thanks for the request (#19639 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, external_aggr, tpcds, tpch, tpch10, tpch_mem, tpch_mem10
  • Criterion: aggregate_query_sql, aggregate_vectorized, case_when, character_length, in_list, left, plan_reuse, range_and_generate_series, replace, reset_plan_states, sort, sql_planner, strpos, substr_index, with_hashes

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: clickbench.

@adriangb
Copy link
Contributor Author

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true
DATAFUSION_OPTIMIZER_ENABLE_JOIN_DYNAMIC_FILTER_PUSHDOWN=false

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (b27a9c3) to af5f470 diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │    72.51 ms │                73.25 ms │     no change │
│ QQuery 2  │   208.80 ms │               213.23 ms │     no change │
│ QQuery 3  │   151.99 ms │               157.30 ms │     no change │
│ QQuery 4  │  2134.03 ms │              2168.18 ms │     no change │
│ QQuery 5  │   257.11 ms │               267.68 ms │     no change │
│ QQuery 6  │  1226.54 ms │              1237.84 ms │     no change │
│ QQuery 7  │   668.45 ms │               658.41 ms │     no change │
│ QQuery 8  │   170.07 ms │               174.58 ms │     no change │
│ QQuery 9  │   812.88 ms │               863.56 ms │  1.06x slower │
│ QQuery 10 │   175.79 ms │               169.77 ms │     no change │
│ QQuery 11 │  1326.73 ms │              1341.39 ms │     no change │
│ QQuery 12 │    70.47 ms │                71.04 ms │     no change │
│ QQuery 13 │   737.89 ms │               753.91 ms │     no change │
│ QQuery 14 │  1950.11 ms │              1935.28 ms │     no change │
│ QQuery 15 │   125.42 ms │               124.98 ms │     no change │
│ QQuery 16 │   184.17 ms │               183.93 ms │     no change │
│ QQuery 17 │   450.58 ms │               442.27 ms │     no change │
│ QQuery 18 │   534.63 ms │               535.66 ms │     no change │
│ QQuery 19 │   211.08 ms │               211.44 ms │     no change │
│ QQuery 20 │   114.01 ms │               114.13 ms │     no change │
│ QQuery 21 │   395.89 ms │               389.65 ms │     no change │
│ QQuery 22 │   870.94 ms │               840.81 ms │     no change │
│ QQuery 23 │  1967.01 ms │              1957.82 ms │     no change │
│ QQuery 24 │   681.25 ms │               671.87 ms │     no change │
│ QQuery 25 │   525.96 ms │               518.13 ms │     no change │
│ QQuery 26 │   444.47 ms │               438.60 ms │     no change │
│ QQuery 27 │   658.98 ms │               667.76 ms │     no change │
│ QQuery 28 │   503.03 ms │               490.19 ms │     no change │
│ QQuery 29 │   457.72 ms │               449.04 ms │     no change │
│ QQuery 30 │    74.89 ms │                73.46 ms │     no change │
│ QQuery 31 │   295.31 ms │               302.08 ms │     no change │
│ QQuery 32 │   132.18 ms │               133.80 ms │     no change │
│ QQuery 33 │   201.25 ms │               221.68 ms │  1.10x slower │
│ QQuery 34 │   159.22 ms │               160.21 ms │     no change │
│ QQuery 35 │   198.79 ms │               200.22 ms │     no change │
│ QQuery 36 │   281.65 ms │               281.00 ms │     no change │
│ QQuery 37 │   942.49 ms │               980.90 ms │     no change │
│ QQuery 38 │   173.69 ms │               177.75 ms │     no change │
│ QQuery 39 │  1706.21 ms │              1669.16 ms │     no change │
│ QQuery 40 │   186.44 ms │               189.96 ms │     no change │
│ QQuery 41 │    26.86 ms │                27.37 ms │     no change │
│ QQuery 42 │   139.53 ms │               144.92 ms │     no change │
│ QQuery 43 │   127.10 ms │               129.12 ms │     no change │
│ QQuery 44 │    29.15 ms │                28.13 ms │     no change │
│ QQuery 45 │    87.08 ms │                83.79 ms │     no change │
│ QQuery 46 │   323.93 ms │               326.96 ms │     no change │
│ QQuery 47 │  1080.92 ms │              1080.38 ms │     no change │
│ QQuery 48 │   606.53 ms │               614.04 ms │     no change │
│ QQuery 49 │   533.92 ms │               547.54 ms │     no change │
│ QQuery 50 │   356.05 ms │               348.40 ms │     no change │
│ QQuery 51 │   308.85 ms │               313.84 ms │     no change │
│ QQuery 52 │   142.38 ms │               144.93 ms │     no change │
│ QQuery 53 │   141.49 ms │               146.69 ms │     no change │
│ QQuery 54 │   222.97 ms │               225.98 ms │     no change │
│ QQuery 55 │   140.78 ms │               144.44 ms │     no change │
│ QQuery 56 │   208.11 ms │               206.75 ms │     no change │
│ QQuery 57 │   449.28 ms │               444.83 ms │     no change │
│ QQuery 58 │   498.79 ms │               501.64 ms │     no change │
│ QQuery 59 │   290.94 ms │               292.54 ms │     no change │
│ QQuery 60 │   209.78 ms │               212.35 ms │     no change │
│ QQuery 61 │   249.39 ms │               254.07 ms │     no change │
│ QQuery 62 │  1313.50 ms │              1255.19 ms │     no change │
│ QQuery 63 │   146.02 ms │               147.47 ms │     no change │
│ QQuery 64 │  1135.21 ms │              1145.34 ms │     no change │
│ QQuery 65 │   353.30 ms │               365.45 ms │     no change │
│ QQuery 66 │   389.74 ms │               387.84 ms │     no change │
│ QQuery 67 │   548.81 ms │               535.32 ms │     no change │
│ QQuery 68 │   370.03 ms │               379.57 ms │     no change │
│ QQuery 69 │   167.41 ms │               169.50 ms │     no change │
│ QQuery 70 │   488.50 ms │               488.25 ms │     no change │
│ QQuery 71 │   182.73 ms │               185.64 ms │     no change │
│ QQuery 72 │ 12970.55 ms │             13115.33 ms │     no change │
│ QQuery 73 │   155.54 ms │               153.51 ms │     no change │
│ QQuery 74 │   836.28 ms │               865.13 ms │     no change │
│ QQuery 75 │   408.75 ms │               418.90 ms │     no change │
│ QQuery 76 │   198.19 ms │               206.75 ms │     no change │
│ QQuery 77 │   277.19 ms │               280.48 ms │     no change │
│ QQuery 78 │   691.63 ms │               695.66 ms │     no change │
│ QQuery 79 │   324.21 ms │               332.26 ms │     no change │
│ QQuery 80 │   534.60 ms │               536.42 ms │     no change │
│ QQuery 81 │    59.64 ms │                57.94 ms │     no change │
│ QQuery 82 │   972.25 ms │               989.86 ms │     no change │
│ QQuery 83 │    78.96 ms │                77.59 ms │     no change │
│ QQuery 84 │    67.66 ms │                70.22 ms │     no change │
│ QQuery 85 │   315.94 ms │               323.98 ms │     no change │
│ QQuery 86 │    57.75 ms │                57.18 ms │     no change │
│ QQuery 87 │   180.39 ms │               182.15 ms │     no change │
│ QQuery 88 │   256.71 ms │               264.75 ms │     no change │
│ QQuery 89 │   160.08 ms │               165.93 ms │     no change │
│ QQuery 90 │    44.92 ms │                44.07 ms │     no change │
│ QQuery 91 │   144.70 ms │               136.94 ms │ +1.06x faster │
│ QQuery 92 │    78.38 ms │                79.39 ms │     no change │
│ QQuery 93 │   292.94 ms │               290.44 ms │     no change │
│ QQuery 94 │    86.36 ms │                88.47 ms │     no change │
│ QQuery 95 │   244.75 ms │               241.87 ms │     no change │
│ QQuery 96 │   113.65 ms │               115.39 ms │     no change │
│ QQuery 97 │   226.73 ms │               233.61 ms │     no change │
│ QQuery 98 │   217.10 ms │               220.05 ms │     no change │
│ QQuery 99 │ 14433.39 ms │             14270.52 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 68138.90ms │
│ Total Time (filter-pushdown-dynamic)   │ 68304.95ms │
│ Average Time (HEAD)                    │   688.27ms │
│ Average Time (filter-pushdown-dynamic) │   689.95ms │
│ Queries Faster                         │          1 │
│ Queries Slower                         │          2 │
│ Queries with No Change                 │         96 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (b27a9c3) to af5f470 diff using: clickbench_partitioned
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.62 ms │                 2.64 ms │     no change │
│ QQuery 1  │    54.29 ms │                54.00 ms │     no change │
│ QQuery 2  │   139.32 ms │               131.79 ms │ +1.06x faster │
│ QQuery 3  │   157.90 ms │               159.57 ms │     no change │
│ QQuery 4  │  1023.55 ms │              1031.84 ms │     no change │
│ QQuery 5  │  1300.49 ms │              1287.39 ms │     no change │
│ QQuery 6  │    16.27 ms │                14.56 ms │ +1.12x faster │
│ QQuery 7  │    68.30 ms │                70.15 ms │     no change │
│ QQuery 8  │  1414.88 ms │              1411.41 ms │     no change │
│ QQuery 9  │  1762.87 ms │              1796.19 ms │     no change │
│ QQuery 10 │   484.09 ms │               476.85 ms │     no change │
│ QQuery 11 │   517.43 ms │               519.10 ms │     no change │
│ QQuery 12 │  1371.26 ms │              1422.57 ms │     no change │
│ QQuery 13 │  2071.77 ms │              2084.33 ms │     no change │
│ QQuery 14 │  1412.70 ms │              1396.25 ms │     no change │
│ QQuery 15 │  1158.06 ms │              1205.29 ms │     no change │
│ QQuery 16 │  2494.84 ms │              2495.92 ms │     no change │
│ QQuery 17 │  2421.90 ms │              2458.61 ms │     no change │
│ QQuery 18 │  4934.94 ms │              4663.00 ms │ +1.06x faster │
│ QQuery 19 │   136.32 ms │               140.24 ms │     no change │
│ QQuery 20 │  1946.06 ms │              1898.34 ms │     no change │
│ QQuery 21 │  2314.07 ms │              2338.94 ms │     no change │
│ QQuery 22 │  4098.86 ms │              3249.58 ms │ +1.26x faster │
│ QQuery 23 │  1085.25 ms │              1327.49 ms │  1.22x slower │
│ QQuery 24 │   247.11 ms │               209.19 ms │ +1.18x faster │
│ QQuery 25 │   641.40 ms │               647.15 ms │     no change │
│ QQuery 26 │   360.93 ms │               291.90 ms │ +1.24x faster │
│ QQuery 27 │  2970.13 ms │              2503.54 ms │ +1.19x faster │
│ QQuery 28 │ 25581.91 ms │             24355.34 ms │     no change │
│ QQuery 29 │   978.83 ms │               976.84 ms │     no change │
│ QQuery 30 │  1295.60 ms │              1272.14 ms │     no change │
│ QQuery 31 │  1334.83 ms │              1302.15 ms │     no change │
│ QQuery 32 │  4160.84 ms │              4166.91 ms │     no change │
│ QQuery 33 │  5340.68 ms │              5219.89 ms │     no change │
│ QQuery 34 │  6201.84 ms │              5949.84 ms │     no change │
│ QQuery 35 │  1915.14 ms │              1837.58 ms │     no change │
│ QQuery 36 │   191.59 ms │               180.57 ms │ +1.06x faster │
│ QQuery 37 │    94.81 ms │                86.02 ms │ +1.10x faster │
│ QQuery 38 │    95.22 ms │                93.91 ms │     no change │
│ QQuery 39 │   302.85 ms │               310.73 ms │     no change │
│ QQuery 40 │    61.46 ms │                59.89 ms │     no change │
│ QQuery 41 │    52.71 ms │                55.73 ms │  1.06x slower │
│ QQuery 42 │    38.45 ms │                39.10 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 84254.38ms │
│ Total Time (filter-pushdown-dynamic)   │ 81194.46ms │
│ Average Time (HEAD)                    │  1959.40ms │
│ Average Time (filter-pushdown-dynamic) │  1888.24ms │
│ Queries Faster                         │          9 │
│ Queries Slower                         │          2 │
│ Queries with No Change                 │         32 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

@adriangb
Copy link
Contributor Author

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (b27a9c3) to af5f470 diff using: clickbench_partitioned
Results will be posted here when complete

@adriangb
Copy link
Contributor Author

run benchmark tpcds
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

@adriangb adriangb marked this pull request as ready for review February 14, 2026 19:07
@adriangb
Copy link
Contributor Author

@Dandandan @alamb these numbers are looking pretty good

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.59 ms │                 2.63 ms │     no change │
│ QQuery 1  │    55.45 ms │                52.36 ms │ +1.06x faster │
│ QQuery 2  │   147.16 ms │               132.93 ms │ +1.11x faster │
│ QQuery 3  │   155.28 ms │               159.63 ms │     no change │
│ QQuery 4  │  1006.83 ms │              1044.41 ms │     no change │
│ QQuery 5  │  1295.85 ms │              1268.86 ms │     no change │
│ QQuery 6  │    18.02 ms │                14.52 ms │ +1.24x faster │
│ QQuery 7  │    70.16 ms │                68.96 ms │     no change │
│ QQuery 8  │  1392.99 ms │              1387.51 ms │     no change │
│ QQuery 9  │  1763.73 ms │              1699.82 ms │     no change │
│ QQuery 10 │   474.73 ms │               469.21 ms │     no change │
│ QQuery 11 │   515.79 ms │               517.90 ms │     no change │
│ QQuery 12 │  1380.58 ms │              1388.35 ms │     no change │
│ QQuery 13 │  2068.06 ms │              1997.90 ms │     no change │
│ QQuery 14 │  1413.11 ms │              1403.19 ms │     no change │
│ QQuery 15 │  1157.26 ms │              1177.77 ms │     no change │
│ QQuery 16 │  2442.67 ms │              2436.55 ms │     no change │
│ QQuery 17 │  2416.65 ms │              2421.16 ms │     no change │
│ QQuery 18 │  4656.00 ms │              4789.27 ms │     no change │
│ QQuery 19 │   141.35 ms │               142.45 ms │     no change │
│ QQuery 20 │  1908.47 ms │              1891.88 ms │     no change │
│ QQuery 21 │  2353.18 ms │              2367.93 ms │     no change │
│ QQuery 22 │  4047.30 ms │              3256.70 ms │ +1.24x faster │
│ QQuery 23 │  1097.02 ms │              1301.34 ms │  1.19x slower │
│ QQuery 24 │   242.99 ms │               203.91 ms │ +1.19x faster │
│ QQuery 25 │   645.90 ms │               639.50 ms │     no change │
│ QQuery 26 │   341.98 ms │               314.57 ms │ +1.09x faster │
│ QQuery 27 │  3013.13 ms │              2526.00 ms │ +1.19x faster │
│ QQuery 28 │ 24783.63 ms │             24401.91 ms │     no change │
│ QQuery 29 │   956.82 ms │               970.88 ms │     no change │
│ QQuery 30 │  1340.82 ms │              1283.11 ms │     no change │
│ QQuery 31 │  1320.37 ms │              1314.20 ms │     no change │
│ QQuery 32 │  4278.84 ms │              4515.13 ms │  1.06x slower │
│ QQuery 33 │  5114.64 ms │              5190.63 ms │     no change │
│ QQuery 34 │  5754.13 ms │              5823.29 ms │     no change │
│ QQuery 35 │  1865.97 ms │              1856.49 ms │     no change │
│ QQuery 36 │   177.67 ms │               183.88 ms │     no change │
│ QQuery 37 │    91.66 ms │                94.01 ms │     no change │
│ QQuery 38 │    93.74 ms │                91.08 ms │     no change │
│ QQuery 39 │   303.73 ms │               297.59 ms │     no change │
│ QQuery 40 │    58.49 ms │                58.75 ms │     no change │
│ QQuery 41 │    50.42 ms │                50.85 ms │     no change │
│ QQuery 42 │    38.18 ms │                37.61 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 82453.33ms │
│ Total Time (filter-pushdown-dynamic)   │ 81246.66ms │
│ Average Time (HEAD)                    │  1917.52ms │
│ Average Time (filter-pushdown-dynamic) │  1889.46ms │
│ Queries Faster                         │          7 │
│ Queries Slower                         │          2 │
│ Queries with No Change                 │         34 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (b27a9c3) to af5f470 diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │    85.92 ms │                88.20 ms │     no change │
│ QQuery 2  │   242.54 ms │               252.54 ms │     no change │
│ QQuery 3  │   160.18 ms │               164.62 ms │     no change │
│ QQuery 4  │  1665.42 ms │              1682.98 ms │     no change │
│ QQuery 5  │   311.58 ms │               301.07 ms │     no change │
│ QQuery 6  │   431.75 ms │               429.27 ms │     no change │
│ QQuery 7  │   539.45 ms │               537.77 ms │     no change │
│ QQuery 8  │   238.58 ms │               247.73 ms │     no change │
│ QQuery 9  │   824.16 ms │               868.72 ms │  1.05x slower │
│ QQuery 10 │   169.95 ms │               164.29 ms │     no change │
│ QQuery 11 │  1044.10 ms │              1069.46 ms │     no change │
│ QQuery 12 │    57.19 ms │                55.60 ms │     no change │
│ QQuery 13 │   855.32 ms │               868.19 ms │     no change │
│ QQuery 14 │  2546.73 ms │              2528.43 ms │     no change │
│ QQuery 15 │    32.31 ms │                31.64 ms │     no change │
│ QQuery 16 │    65.08 ms │                64.00 ms │     no change │
│ QQuery 17 │   302.23 ms │               293.93 ms │     no change │
│ QQuery 18 │   477.25 ms │               488.30 ms │     no change │
│ QQuery 19 │   192.74 ms │               192.49 ms │     no change │
│ QQuery 20 │    27.66 ms │                27.51 ms │     no change │
│ QQuery 21 │    47.66 ms │                48.23 ms │     no change │
│ QQuery 22 │   777.31 ms │               753.22 ms │     no change │
│ QQuery 23 │  2527.93 ms │              2530.41 ms │     no change │
│ QQuery 24 │   256.95 ms │               254.55 ms │     no change │
│ QQuery 25 │   456.59 ms │               451.10 ms │     no change │
│ QQuery 26 │   275.87 ms │               286.97 ms │     no change │
│ QQuery 27 │   547.53 ms │               544.51 ms │     no change │
│ QQuery 28 │   498.76 ms │               484.31 ms │     no change │
│ QQuery 29 │   388.83 ms │               394.11 ms │     no change │
│ QQuery 30 │    92.99 ms │                95.57 ms │     no change │
│ QQuery 31 │   289.31 ms │               306.91 ms │  1.06x slower │
│ QQuery 32 │    76.97 ms │                74.76 ms │     no change │
│ QQuery 33 │   193.79 ms │               196.70 ms │     no change │
│ QQuery 34 │   192.71 ms │               202.87 ms │  1.05x slower │
│ QQuery 35 │   258.47 ms │               252.22 ms │     no change │
│ QQuery 36 │   354.53 ms │               343.95 ms │     no change │
│ QQuery 37 │   959.12 ms │              1019.77 ms │  1.06x slower │
│ QQuery 38 │   183.43 ms │               181.22 ms │     no change │
│ QQuery 39 │   218.58 ms │               214.02 ms │     no change │
│ QQuery 40 │   196.74 ms │               194.13 ms │     no change │
│ QQuery 41 │    26.43 ms │                26.41 ms │     no change │
│ QQuery 42 │   152.90 ms │               144.47 ms │ +1.06x faster │
│ QQuery 43 │   166.86 ms │               169.08 ms │     no change │
│ QQuery 44 │    29.24 ms │                28.91 ms │     no change │
│ QQuery 45 │    71.61 ms │                70.77 ms │     no change │
│ QQuery 46 │   325.08 ms │               321.10 ms │     no change │
│ QQuery 47 │  1259.33 ms │              1246.52 ms │     no change │
│ QQuery 48 │   728.97 ms │               734.84 ms │     no change │
│ QQuery 49 │   545.52 ms │               555.76 ms │     no change │
│ QQuery 50 │   743.89 ms │               748.74 ms │     no change │
│ QQuery 51 │   362.21 ms │               361.91 ms │     no change │
│ QQuery 52 │   145.96 ms │               150.55 ms │     no change │
│ QQuery 53 │   221.21 ms │               211.98 ms │     no change │
│ QQuery 54 │   181.83 ms │               183.74 ms │     no change │
│ QQuery 55 │   146.08 ms │               143.52 ms │     no change │
│ QQuery 56 │   195.19 ms │               192.69 ms │     no change │
│ QQuery 57 │   320.35 ms │               316.03 ms │     no change │
│ QQuery 58 │   616.47 ms │               603.38 ms │     no change │
│ QQuery 59 │   370.36 ms │               370.93 ms │     no change │
│ QQuery 60 │   204.07 ms │               205.64 ms │     no change │
│ QQuery 61 │   267.36 ms │               271.51 ms │     no change │
│ QQuery 62 │  1353.17 ms │              1348.46 ms │     no change │
│ QQuery 63 │   215.63 ms │               212.22 ms │     no change │
│ QQuery 64 │ 29835.87 ms │             29326.73 ms │     no change │
│ QQuery 65 │   443.14 ms │               456.41 ms │     no change │
│ QQuery 66 │   285.00 ms │               285.52 ms │     no change │
│ QQuery 67 │   871.59 ms │               866.72 ms │     no change │
│ QQuery 68 │   348.88 ms │               347.49 ms │     no change │
│ QQuery 69 │   172.10 ms │               171.81 ms │     no change │
│ QQuery 70 │   657.55 ms │               654.35 ms │     no change │
│ QQuery 71 │   177.72 ms │               173.72 ms │     no change │
│ QQuery 72 │  1778.52 ms │              1789.50 ms │     no change │
│ QQuery 73 │   182.30 ms │               176.25 ms │     no change │
│ QQuery 74 │   753.34 ms │               760.75 ms │     no change │
│ QQuery 75 │   459.03 ms │               468.48 ms │     no change │
│ QQuery 76 │   403.50 ms │               397.76 ms │     no change │
│ QQuery 77 │   339.76 ms │               337.09 ms │     no change │
│ QQuery 78 │   686.79 ms │               692.78 ms │     no change │
│ QQuery 79 │   373.88 ms │               375.40 ms │     no change │
│ QQuery 80 │   534.01 ms │               535.21 ms │     no change │
│ QQuery 81 │    58.32 ms │                58.16 ms │     no change │
│ QQuery 82 │   257.64 ms │               255.07 ms │     no change │
│ QQuery 83 │   107.34 ms │               106.23 ms │     no change │
│ QQuery 84 │    98.57 ms │                95.52 ms │     no change │
│ QQuery 85 │   411.41 ms │               405.68 ms │     no change │
│ QQuery 86 │    73.81 ms │                73.40 ms │     no change │
│ QQuery 87 │   190.00 ms │               186.15 ms │     no change │
│ QQuery 88 │   319.45 ms │               311.55 ms │     no change │
│ QQuery 89 │   238.29 ms │               244.93 ms │     no change │
│ QQuery 90 │    49.97 ms │                49.23 ms │     no change │
│ QQuery 91 │   155.84 ms │               150.72 ms │     no change │
│ QQuery 92 │    75.11 ms │                73.60 ms │     no change │
│ QQuery 93 │   286.46 ms │               287.42 ms │     no change │
│ QQuery 94 │    93.80 ms │                94.15 ms │     no change │
│ QQuery 95 │   318.58 ms │               317.20 ms │     no change │
│ QQuery 96 │   129.17 ms │               125.50 ms │     no change │
│ QQuery 97 │   245.32 ms │               243.50 ms │     no change │
│ QQuery 98 │   162.59 ms │               157.96 ms │     no change │
│ QQuery 99 │ 14544.65 ms │             14592.30 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 83833.23ms │
│ Total Time (filter-pushdown-dynamic)   │ 83449.68ms │
│ Average Time (HEAD)                    │   846.80ms │
│ Average Time (filter-pushdown-dynamic) │   842.93ms │
│ Queries Faster                         │          1 │
│ Queries Slower                         │          4 │
│ Queries with No Change                 │         94 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

/// promoted. A value of 0.0 means all filters will be promoted.
/// Because there can be a high I/O cost to pushing down ineffective filters,
/// recommended values are in the range [0.8, 0.95], depending on random I/0 costs.
pub filter_effectiveness_threshold: f64, default = 0.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check 0.5 as well here?

@Dandandan
Copy link
Contributor

Is it me or don't I see really large difference in performamce (I would have expected more).

@adriangb
Copy link
Contributor Author

}

if reorder_predicates {
candidates_with_exprs.sort_unstable_by(|(_, c1), (_, c2)| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK build_row_filter_with_metrics only runs once on file open?
Which means for all files/partitions that are opened directly on start it will not do anything.

(E.g. for TPCH / TPCDS this is not helping much as the number of files is limited, so it will only help if partitions are started one after another).

clickbench_partitioned consists of 100 files - so it might help there more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants