feat: adaptive filter selectivity tracking for Parquet row filters by adriangb · Pull Request #19639 · apache/datafusion

adriangb · 2026-01-04T21:59:07Z

Summary

This PR implements cross-file tracking of filter selectivity in ParquetSource to adaptively reorder and demote low-selectivity filters, as discussed in #3463 (comment).

Key changes:

Add SelectivityTracker to track filter effectiveness across files using ExprKey wrapper for structural equality
Each ParquetOpener queries shared stats to partition filters into row filters (push down) vs post-scan filters (inline application)
Post-scan filters are added to projection, applied inline in stream via apply_post_scan_filters(), then filter columns are removed from output
SelectivityUpdatingStream wrapper updates tracker when stream completes
build_row_filter_with_metrics() returns per-filter metrics for selectivity tracking
Filters are reordered by observed effectiveness (most selective first)

Configuration:

parquet_options.filter_effectiveness_threshold (default: 0.8)
Effectiveness = 1 - (rows_matched / rows_total) = fraction of rows filtered out
Filters with effectiveness < threshold are demoted to post-scan

Files added:

datafusion/datasource-parquet/src/selectivity.rs - Core tracking infrastructure

Files modified:

opener.rs - Filter partitioning, post-scan application, SelectivityUpdatingStream
row_filter.rs - FilterMetrics, RowFilterWithMetrics, effectiveness-based reordering
source.rs - selectivity_tracker field and builder methods
config.rs - filter_effectiveness_threshold config option

Test plan

Unit tests for ExprKey hash/eq consistency
Unit tests for SelectivityStats::effectiveness() edge cases
Unit tests for SelectivityTracker::partition_filters() threshold logic
Existing test suite passes
Integration tests for post-scan filter application
End-to-end tests for adaptive behavior across files
Performance benchmarks

🤖 Generated with Claude Code

adriangb · 2026-01-04T22:53:50Z

run benchmarks

alamb-ghbot · 2026-01-04T22:53:54Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (3065a0e) to 955fd41 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

adriangb · 2026-01-04T22:54:04Z

show benchmark queue

alamb-ghbot · 2026-01-04T22:54:09Z

🤖 Hi @adriangb, you asked to view the benchmark queue (#19639 (comment)).

Job	User	Benchmarks	Comment
`19639_3708500887.sh`	adriangb	default	`https://github.com/apache/datafusion/pull/19639#issuecomment-3708500887`

alamb-ghbot · 2026-01-04T23:31:53Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ filter-pushdown-dynamic ┃         Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2417.81 ms │              2359.51 ms │      no change │
│ QQuery 1     │   909.35 ms │               930.52 ms │      no change │
│ QQuery 2     │  1873.59 ms │              1891.90 ms │      no change │
│ QQuery 3     │  1150.33 ms │              1160.92 ms │      no change │
│ QQuery 4     │  2297.04 ms │              2302.27 ms │      no change │
│ QQuery 5     │ 28141.18 ms │             28259.93 ms │      no change │
│ QQuery 6     │  3995.68 ms │               230.39 ms │ +17.34x faster │
│ QQuery 7     │  3748.71 ms │              3945.39 ms │   1.05x slower │
└──────────────┴─────────────┴─────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 44533.69ms │
│ Total Time (filter-pushdown-dynamic)   │ 41080.84ms │
│ Average Time (HEAD)                    │  5566.71ms │
│ Average Time (filter-pushdown-dynamic) │  5135.11ms │
│ Queries Faster                         │          1 │
│ Queries Slower                         │          1 │
│ Queries with No Change                 │          6 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ filter-pushdown-dynamic ┃         Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0     │     1.41 ms │                 1.46 ms │      no change │
│ QQuery 1     │    50.80 ms │                51.07 ms │      no change │
│ QQuery 2     │   133.96 ms │               133.81 ms │      no change │
│ QQuery 3     │   153.01 ms │               151.42 ms │      no change │
│ QQuery 4     │  1056.19 ms │              1088.14 ms │      no change │
│ QQuery 5     │  1351.78 ms │              1406.79 ms │      no change │
│ QQuery 6     │     1.45 ms │                 1.44 ms │      no change │
│ QQuery 7     │    55.47 ms │                69.68 ms │   1.26x slower │
│ QQuery 8     │  1441.54 ms │              1478.63 ms │      no change │
│ QQuery 9     │  1874.93 ms │              1844.17 ms │      no change │
│ QQuery 10    │   340.79 ms │               479.23 ms │   1.41x slower │
│ QQuery 11    │   398.06 ms │               547.82 ms │   1.38x slower │
│ QQuery 12    │  1254.38 ms │              1489.86 ms │   1.19x slower │
│ QQuery 13    │  2007.28 ms │              2138.37 ms │   1.07x slower │
│ QQuery 14    │  1233.55 ms │              1461.97 ms │   1.19x slower │
│ QQuery 15    │  1253.12 ms │              1280.44 ms │      no change │
│ QQuery 16    │  2582.68 ms │              2557.03 ms │      no change │
│ QQuery 17    │  2577.73 ms │              2584.75 ms │      no change │
│ QQuery 18    │  5810.75 ms │              4857.70 ms │  +1.20x faster │
│ QQuery 19    │   122.85 ms │               142.04 ms │   1.16x slower │
│ QQuery 20    │  1938.15 ms │              1881.92 ms │      no change │
│ QQuery 21    │  2253.41 ms │              2312.49 ms │      no change │
│ QQuery 22    │  3794.87 ms │              3262.80 ms │  +1.16x faster │
│ QQuery 23    │ 19266.34 ms │              1268.50 ms │ +15.19x faster │
│ QQuery 24    │   213.96 ms │               298.77 ms │   1.40x slower │
│ QQuery 25    │   473.16 ms │               621.61 ms │   1.31x slower │
│ QQuery 26    │   232.57 ms │               328.22 ms │   1.41x slower │
│ QQuery 27    │  2702.33 ms │              2516.97 ms │  +1.07x faster │
│ QQuery 28    │ 23553.89 ms │             21856.71 ms │  +1.08x faster │
│ QQuery 29    │   976.86 ms │               972.07 ms │      no change │
│ QQuery 30    │  1328.89 ms │              1332.73 ms │      no change │
│ QQuery 31    │  1373.19 ms │              1355.09 ms │      no change │
│ QQuery 32    │  5179.01 ms │              4582.04 ms │  +1.13x faster │
│ QQuery 33    │  5626.50 ms │              5289.73 ms │  +1.06x faster │
│ QQuery 34    │  5864.31 ms │              5660.80 ms │      no change │
│ QQuery 35    │  1928.59 ms │              1940.03 ms │      no change │
│ QQuery 36    │    66.34 ms │                14.39 ms │  +4.61x faster │
│ QQuery 37    │    43.87 ms │                14.31 ms │  +3.07x faster │
│ QQuery 38    │    64.91 ms │                13.90 ms │  +4.67x faster │
│ QQuery 39    │   104.78 ms │                12.40 ms │  +8.45x faster │
│ QQuery 40    │    26.49 ms │                15.59 ms │  +1.70x faster │
│ QQuery 41    │    22.49 ms │                14.04 ms │  +1.60x faster │
│ QQuery 42    │    19.10 ms │                13.66 ms │  +1.40x faster │
└──────────────┴─────────────┴─────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 100755.75ms │
│ Total Time (filter-pushdown-dynamic)   │  79344.60ms │
│ Average Time (HEAD)                    │   2343.16ms │
│ Average Time (filter-pushdown-dynamic) │   1845.22ms │
│ Queries Faster                         │          14 │
│ Queries Slower                         │          10 │
│ Queries with No Change                 │          19 │
│ Queries with Failure                   │           0 │
└────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 121.20 ms │               115.82 ms │     no change │
│ QQuery 2     │  29.95 ms │                27.73 ms │ +1.08x faster │
│ QQuery 3     │  38.04 ms │                32.55 ms │ +1.17x faster │
│ QQuery 4     │  29.65 ms │                29.55 ms │     no change │
│ QQuery 5     │  87.21 ms │                86.04 ms │     no change │
│ QQuery 6     │  19.65 ms │                19.50 ms │     no change │
│ QQuery 7     │ 215.75 ms │               214.50 ms │     no change │
│ QQuery 8     │  33.85 ms │                32.06 ms │ +1.06x faster │
│ QQuery 9     │  93.47 ms │                96.83 ms │     no change │
│ QQuery 10    │  62.68 ms │                63.08 ms │     no change │
│ QQuery 11    │  17.67 ms │                18.66 ms │  1.06x slower │
│ QQuery 12    │  50.57 ms │                50.22 ms │     no change │
│ QQuery 13    │  46.61 ms │                46.59 ms │     no change │
│ QQuery 14    │  13.05 ms │                13.65 ms │     no change │
│ QQuery 15    │  24.12 ms │                24.02 ms │     no change │
│ QQuery 16    │  24.23 ms │                23.95 ms │     no change │
│ QQuery 17    │ 148.81 ms │               149.81 ms │     no change │
│ QQuery 18    │ 279.28 ms │               269.82 ms │     no change │
│ QQuery 19    │  37.62 ms │                37.50 ms │     no change │
│ QQuery 20    │  50.02 ms │                50.24 ms │     no change │
│ QQuery 21    │ 312.84 ms │               305.36 ms │     no change │
│ QQuery 22    │  17.35 ms │                17.24 ms │     no change │
└──────────────┴───────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 1753.60ms │
│ Total Time (filter-pushdown-dynamic)   │ 1724.72ms │
│ Average Time (HEAD)                    │   79.71ms │
│ Average Time (filter-pushdown-dynamic) │   78.40ms │
│ Queries Faster                         │         3 │
│ Queries Slower                         │         1 │
│ Queries with No Change                 │        18 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

adriangb · 2026-01-04T23:39:26Z

run benchmarks tpch

alamb-ghbot · 2026-01-04T23:39:31Z

🤖 Hi @adriangb, thanks for the request (#19639 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, external_aggr, tpcds, tpch, tpch10, tpch_mem, tpch_mem10
Criterion: aggregate_query_sql, aggregate_vectorized, case_when, character_length, in_list, range_and_generate_series, sort, sql_planner, strpos, with_hashes

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

adriangb · 2026-01-04T23:42:49Z

run benchmark tpch

alamb-ghbot · 2026-01-04T23:42:57Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (3065a0e) to 955fd41 diff using: tpch
Results will be posted here when complete

alamb-ghbot · 2026-01-04T23:43:43Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ filter-pushdown-dynamic ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 203.54 ms │               196.22 ms │    no change │
│ QQuery 2     │  91.58 ms │               128.48 ms │ 1.40x slower │
│ QQuery 3     │ 126.46 ms │               195.37 ms │ 1.54x slower │
│ QQuery 4     │  77.57 ms │               125.63 ms │ 1.62x slower │
│ QQuery 5     │ 172.74 ms │               334.33 ms │ 1.94x slower │
│ QQuery 6     │  67.70 ms │               109.04 ms │ 1.61x slower │
│ QQuery 7     │ 216.22 ms │               256.53 ms │ 1.19x slower │
│ QQuery 8     │ 162.62 ms │               266.87 ms │ 1.64x slower │
│ QQuery 9     │ 228.36 ms │               396.32 ms │ 1.74x slower │
│ QQuery 10    │ 182.90 ms │               273.24 ms │ 1.49x slower │
│ QQuery 11    │  73.68 ms │                97.62 ms │ 1.32x slower │
│ QQuery 12    │ 115.03 ms │               259.93 ms │ 2.26x slower │
│ QQuery 13    │ 212.04 ms │               202.51 ms │    no change │
│ QQuery 14    │  91.21 ms │               105.22 ms │ 1.15x slower │
│ QQuery 15    │ 119.99 ms │               167.31 ms │ 1.39x slower │
│ QQuery 16    │  56.96 ms │                86.84 ms │ 1.52x slower │
│ QQuery 17    │ 281.93 ms │               276.68 ms │    no change │
│ QQuery 18    │ 317.53 ms │               664.90 ms │ 2.09x slower │
│ QQuery 19    │ 135.10 ms │               179.47 ms │ 1.33x slower │
│ QQuery 20    │ 125.82 ms │               153.15 ms │ 1.22x slower │
│ QQuery 21    │ 264.64 ms │               328.90 ms │ 1.24x slower │
│ QQuery 22    │  42.98 ms │                67.05 ms │ 1.56x slower │
└──────────────┴───────────┴─────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 3366.60ms │
│ Total Time (filter-pushdown-dynamic)   │ 4871.61ms │
│ Average Time (HEAD)                    │  153.03ms │
│ Average Time (filter-pushdown-dynamic) │  221.44ms │
│ Queries Faster                         │         0 │
│ Queries Slower                         │        19 │
│ Queries with No Change                 │         3 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

adriangb · 2026-01-04T23:47:33Z

run benchmark tpch

alamb-ghbot · 2026-01-04T23:47:42Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (6af7b28) to 955fd41 diff using: tpch
Results will be posted here when complete

alamb-ghbot · 2026-01-05T00:00:45Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 201.63 ms │                    FAIL │  incomparable │
│ QQuery 2     │  95.11 ms │                    FAIL │  incomparable │
│ QQuery 3     │ 124.15 ms │                    FAIL │  incomparable │
│ QQuery 4     │  76.53 ms │                    FAIL │  incomparable │
│ QQuery 5     │ 173.60 ms │                    FAIL │  incomparable │
│ QQuery 6     │  68.07 ms │                    FAIL │  incomparable │
│ QQuery 7     │ 214.34 ms │              9433.89 ms │ 44.01x slower │
│ QQuery 8     │ 161.99 ms │                    FAIL │  incomparable │
│ QQuery 9     │ 225.75 ms │                    FAIL │  incomparable │
│ QQuery 10    │ 187.86 ms │                    FAIL │  incomparable │
│ QQuery 11    │  73.04 ms │                    FAIL │  incomparable │
│ QQuery 12    │ 118.45 ms │                    FAIL │  incomparable │
│ QQuery 13    │ 212.77 ms │                    FAIL │  incomparable │
│ QQuery 14    │  91.33 ms │                    FAIL │  incomparable │
│ QQuery 15    │ 120.45 ms │                    FAIL │  incomparable │
│ QQuery 16    │  56.10 ms │                    FAIL │  incomparable │
│ QQuery 17    │ 271.89 ms │                    FAIL │  incomparable │
│ QQuery 18    │ 317.17 ms │              9294.69 ms │ 29.30x slower │
│ QQuery 19    │ 134.01 ms │                    FAIL │  incomparable │
│ QQuery 20    │ 127.50 ms │                    FAIL │  incomparable │
│ QQuery 21    │ 263.41 ms │                    FAIL │  incomparable │
│ QQuery 22    │  43.88 ms │                    FAIL │  incomparable │
└──────────────┴───────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │   531.52ms │
│ Total Time (filter-pushdown-dynamic)   │ 18728.57ms │
│ Average Time (HEAD)                    │   265.76ms │
│ Average Time (filter-pushdown-dynamic) │  9364.29ms │
│ Queries Faster                         │          0 │
│ Queries Slower                         │          2 │
│ Queries with No Change                 │          0 │
│ Queries with Failure                   │         20 │
└────────────────────────────────────────┴────────────┘

adriangb · 2026-01-05T04:43:41Z

This is probably not a good issue to pick up. This is a draft PR for an unproven idea.

adriangb · 2026-01-05T05:08:47Z

run benchmark tpch

alamb-ghbot · 2026-01-05T05:08:53Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (435e83f) to 955fd41 diff using: tpch
Results will be posted here when complete

alamb-ghbot · 2026-01-05T05:20:40Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 199.58 ms │               195.16 ms │     no change │
│ QQuery 2     │  94.23 ms │               149.31 ms │  1.58x slower │
│ QQuery 3     │ 122.60 ms │               193.79 ms │  1.58x slower │
│ QQuery 4     │  76.28 ms │                80.36 ms │  1.05x slower │
│ QQuery 5     │ 170.05 ms │               446.09 ms │  2.62x slower │
│ QQuery 6     │  66.15 ms │                56.58 ms │ +1.17x faster │
│ QQuery 7     │ 213.98 ms │               389.72 ms │  1.82x slower │
│ QQuery 8     │ 167.91 ms │               522.12 ms │  3.11x slower │
│ QQuery 9     │ 223.74 ms │               997.35 ms │  4.46x slower │
│ QQuery 10    │ 178.95 ms │               473.07 ms │  2.64x slower │
│ QQuery 11    │  74.02 ms │                92.17 ms │  1.25x slower │
│ QQuery 12    │ 113.71 ms │               191.25 ms │  1.68x slower │
│ QQuery 13    │ 214.39 ms │               213.82 ms │     no change │
│ QQuery 14    │  93.74 ms │               328.15 ms │  3.50x slower │
│ QQuery 15    │ 114.82 ms │               123.16 ms │  1.07x slower │
│ QQuery 16    │  56.52 ms │                80.61 ms │  1.43x slower │
│ QQuery 17    │ 270.57 ms │               302.50 ms │  1.12x slower │
│ QQuery 18    │ 312.20 ms │               678.27 ms │  2.17x slower │
│ QQuery 19    │ 132.32 ms │               172.23 ms │  1.30x slower │
│ QQuery 20    │ 123.55 ms │               257.72 ms │  2.09x slower │
│ QQuery 21    │ 267.67 ms │               386.39 ms │  1.44x slower │
│ QQuery 22    │  42.97 ms │                60.79 ms │  1.41x slower │
└──────────────┴───────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 3329.95ms │
│ Total Time (filter-pushdown-dynamic)   │ 6390.60ms │
│ Average Time (HEAD)                    │  151.36ms │
│ Average Time (filter-pushdown-dynamic) │  290.48ms │
│ Queries Faster                         │         1 │
│ Queries Slower                         │        19 │
│ Queries with No Change                 │         2 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

adriangb · 2026-01-05T06:07:19Z

run benchmark tpch

alamb-ghbot · 2026-01-05T06:07:25Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (78a587d) to 955fd41 diff using: tpch
Results will be posted here when complete

alamb-ghbot · 2026-01-05T06:19:16Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 199.59 ms │               198.00 ms │     no change │
│ QQuery 2     │  93.78 ms │               149.46 ms │  1.59x slower │
│ QQuery 3     │ 123.68 ms │               195.58 ms │  1.58x slower │
│ QQuery 4     │  75.93 ms │                80.22 ms │  1.06x slower │
│ QQuery 5     │ 166.95 ms │               444.15 ms │  2.66x slower │
│ QQuery 6     │  68.71 ms │                58.23 ms │ +1.18x faster │
│ QQuery 7     │ 208.39 ms │               401.46 ms │  1.93x slower │
│ QQuery 8     │ 161.03 ms │               542.26 ms │  3.37x slower │
│ QQuery 9     │ 227.68 ms │               962.95 ms │  4.23x slower │
│ QQuery 10    │ 180.52 ms │               469.65 ms │  2.60x slower │
│ QQuery 11    │  74.00 ms │                95.13 ms │  1.29x slower │
│ QQuery 12    │ 114.07 ms │               190.16 ms │  1.67x slower │
│ QQuery 13    │ 215.88 ms │               217.12 ms │     no change │
│ QQuery 14    │  91.12 ms │               323.06 ms │  3.55x slower │
│ QQuery 15    │ 119.72 ms │               118.02 ms │     no change │
│ QQuery 16    │  55.77 ms │                85.25 ms │  1.53x slower │
│ QQuery 17    │ 267.21 ms │               306.83 ms │  1.15x slower │
│ QQuery 18    │ 308.46 ms │               660.48 ms │  2.14x slower │
│ QQuery 19    │ 132.97 ms │               174.91 ms │  1.32x slower │
│ QQuery 20    │ 124.63 ms │               260.62 ms │  2.09x slower │
│ QQuery 21    │ 260.58 ms │               373.49 ms │  1.43x slower │
│ QQuery 22    │  42.65 ms │                58.52 ms │  1.37x slower │
└──────────────┴───────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 3313.34ms │
│ Total Time (filter-pushdown-dynamic)   │ 6365.57ms │
│ Average Time (HEAD)                    │  150.61ms │
│ Average Time (filter-pushdown-dynamic) │  289.34ms │
│ Queries Faster                         │         1 │
│ Queries Slower                         │        18 │
│ Queries with No Change                 │         3 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

alamb · 2026-01-05T20:22:19Z

Did you find any evidence that the selectivity of predicates changes over the course of the query (or put another way that reordering them during execution would help?)

adriangb · 2026-01-05T20:24:31Z

Did you find any evidence that the selectivity of predicates changes over the course of the query (or put another way that reordering them during execution would help?)

One case where that for sure happens is dynamic filters 😉 (although we treat each version of it as a different filter, the point is that we need to be dynamic about new filters showing up mid query). But I think it's also not unusual to have unevenly distributed data across files. I do agree that if we change how we apply a filter between files that probably captures 95% of the benefit (as opposed to within a scan of a single file).

But the main point is that we start from "we know nothing" which we're treating as "nothing is selective" and then once we know a filter is selective enough we move it over to be a row filter.

sdf-jkl · 2026-01-05T21:35:09Z

Hey @adriangb, I've been thinking about something like this since the New Year. It's really cool to see you putting together a draft for it.

I haven't had a chance to give a full go at your code, but I wanted to share some research I've done earlier that might be relevant:

Clickhouse release blog about ~~adaptive~~ filter selectivity:
https://clickhouse.com/blog/clickhouse-release-23-11#column-statistics-for-prewhere
How clickhouse measures ~~adaptive~~ filter selectivity:
https://clickhouse.com/docs/optimize/prewhere#how-to-measure-prewhere-impact
First clickhouse column statistics PR:
[RFC] use statistic to order prewhere conditions better ClickHouse/ClickHouse#53240

Before seeing your PR and comments in #3463 I was thinking about using more simple heuristics for sorting predicates.

col type -> size
cardinality of the predicate operator -> = > (>, <) > (>=, <=) > != etc.
how simple/complex the predicate -> how long/ how much CPU it takes to evaluate
col encoding -> if it supports random access, we could filter without decoding (API for filtering / evaluation directly on encoded data arrow-rs#8842)
prioritize using indexes first too

From a quick skim of the clickhouse original PR, they still rely on some simple heuristics when columns statistics aren't available.

I would like to give your PR a proper review once I'm home, but I already love the direction you're taking.

Dandandan · 2026-02-04T15:00:43Z

Seeing the results in #19694 (comment) I think something like this is needed.

However I also see some regressions there that are related to the execution order of the filter expressions (such as regex / string matching functions coming earlier now). I guess we also need some adaptiveness for that (e.g. based on measured time or some static heuristic).

Dandandan · 2026-02-04T15:16:37Z

Perhaps that should also help with other expensive expressions like for join filter pushdown - we materialize columns using cheap/selective filters first and then expensive ones like case when 1 then ... 2... 3....

adriangb · 2026-02-04T16:20:15Z

Yeah makes sense to me 👍🏻. But if we can't show via benchmarks this is an immediate win... can we justify moving forward with it?

Dandandan · 2026-02-04T18:12:45Z

In #19694 (comment) it seems this adaptivity is better than main (almost no regression on tpch and clickbench) when also setting DATAFUSION_OPTIMIZER_ENABLE_JOIN_DYNAMIC_FILTER_PUSHDOWN to false.
The remaining part seems mostly about getting the expression order right...

adriangb · 2026-02-14T16:40:55Z

run benchmark tpcds
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true
DATAFUSION_OPTIMIZER_ENABLE_JOIN_DYNAMIC_FILTER_PUSHDOWN=false

adriangb · 2026-02-14T16:41:07Z

run benchmark clickbench
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true
DATAFUSION_OPTIMIZER_ENABLE_JOIN_DYNAMIC_FILTER_PUSHDOWN=false

alamb-ghbot · 2026-02-14T16:41:15Z

🤖 Hi @adriangb, thanks for the request (#19639 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, external_aggr, tpcds, tpch, tpch10, tpch_mem, tpch_mem10
Criterion: aggregate_query_sql, aggregate_vectorized, case_when, character_length, in_list, left, plan_reuse, range_and_generate_series, replace, reset_plan_states, sort, sql_planner, strpos, substr_index, with_hashes

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: clickbench.

adriangb · 2026-02-14T16:41:28Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true
DATAFUSION_OPTIMIZER_ENABLE_JOIN_DYNAMIC_FILTER_PUSHDOWN=false

alamb-ghbot · 2026-02-14T16:53:47Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (b27a9c3) to af5f470 diff using: tpcds
Results will be posted here when complete

alamb-ghbot · 2026-02-14T17:21:06Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │    72.51 ms │                73.25 ms │     no change │
│ QQuery 2  │   208.80 ms │               213.23 ms │     no change │
│ QQuery 3  │   151.99 ms │               157.30 ms │     no change │
│ QQuery 4  │  2134.03 ms │              2168.18 ms │     no change │
│ QQuery 5  │   257.11 ms │               267.68 ms │     no change │
│ QQuery 6  │  1226.54 ms │              1237.84 ms │     no change │
│ QQuery 7  │   668.45 ms │               658.41 ms │     no change │
│ QQuery 8  │   170.07 ms │               174.58 ms │     no change │
│ QQuery 9  │   812.88 ms │               863.56 ms │  1.06x slower │
│ QQuery 10 │   175.79 ms │               169.77 ms │     no change │
│ QQuery 11 │  1326.73 ms │              1341.39 ms │     no change │
│ QQuery 12 │    70.47 ms │                71.04 ms │     no change │
│ QQuery 13 │   737.89 ms │               753.91 ms │     no change │
│ QQuery 14 │  1950.11 ms │              1935.28 ms │     no change │
│ QQuery 15 │   125.42 ms │               124.98 ms │     no change │
│ QQuery 16 │   184.17 ms │               183.93 ms │     no change │
│ QQuery 17 │   450.58 ms │               442.27 ms │     no change │
│ QQuery 18 │   534.63 ms │               535.66 ms │     no change │
│ QQuery 19 │   211.08 ms │               211.44 ms │     no change │
│ QQuery 20 │   114.01 ms │               114.13 ms │     no change │
│ QQuery 21 │   395.89 ms │               389.65 ms │     no change │
│ QQuery 22 │   870.94 ms │               840.81 ms │     no change │
│ QQuery 23 │  1967.01 ms │              1957.82 ms │     no change │
│ QQuery 24 │   681.25 ms │               671.87 ms │     no change │
│ QQuery 25 │   525.96 ms │               518.13 ms │     no change │
│ QQuery 26 │   444.47 ms │               438.60 ms │     no change │
│ QQuery 27 │   658.98 ms │               667.76 ms │     no change │
│ QQuery 28 │   503.03 ms │               490.19 ms │     no change │
│ QQuery 29 │   457.72 ms │               449.04 ms │     no change │
│ QQuery 30 │    74.89 ms │                73.46 ms │     no change │
│ QQuery 31 │   295.31 ms │               302.08 ms │     no change │
│ QQuery 32 │   132.18 ms │               133.80 ms │     no change │
│ QQuery 33 │   201.25 ms │               221.68 ms │  1.10x slower │
│ QQuery 34 │   159.22 ms │               160.21 ms │     no change │
│ QQuery 35 │   198.79 ms │               200.22 ms │     no change │
│ QQuery 36 │   281.65 ms │               281.00 ms │     no change │
│ QQuery 37 │   942.49 ms │               980.90 ms │     no change │
│ QQuery 38 │   173.69 ms │               177.75 ms │     no change │
│ QQuery 39 │  1706.21 ms │              1669.16 ms │     no change │
│ QQuery 40 │   186.44 ms │               189.96 ms │     no change │
│ QQuery 41 │    26.86 ms │                27.37 ms │     no change │
│ QQuery 42 │   139.53 ms │               144.92 ms │     no change │
│ QQuery 43 │   127.10 ms │               129.12 ms │     no change │
│ QQuery 44 │    29.15 ms │                28.13 ms │     no change │
│ QQuery 45 │    87.08 ms │                83.79 ms │     no change │
│ QQuery 46 │   323.93 ms │               326.96 ms │     no change │
│ QQuery 47 │  1080.92 ms │              1080.38 ms │     no change │
│ QQuery 48 │   606.53 ms │               614.04 ms │     no change │
│ QQuery 49 │   533.92 ms │               547.54 ms │     no change │
│ QQuery 50 │   356.05 ms │               348.40 ms │     no change │
│ QQuery 51 │   308.85 ms │               313.84 ms │     no change │
│ QQuery 52 │   142.38 ms │               144.93 ms │     no change │
│ QQuery 53 │   141.49 ms │               146.69 ms │     no change │
│ QQuery 54 │   222.97 ms │               225.98 ms │     no change │
│ QQuery 55 │   140.78 ms │               144.44 ms │     no change │
│ QQuery 56 │   208.11 ms │               206.75 ms │     no change │
│ QQuery 57 │   449.28 ms │               444.83 ms │     no change │
│ QQuery 58 │   498.79 ms │               501.64 ms │     no change │
│ QQuery 59 │   290.94 ms │               292.54 ms │     no change │
│ QQuery 60 │   209.78 ms │               212.35 ms │     no change │
│ QQuery 61 │   249.39 ms │               254.07 ms │     no change │
│ QQuery 62 │  1313.50 ms │              1255.19 ms │     no change │
│ QQuery 63 │   146.02 ms │               147.47 ms │     no change │
│ QQuery 64 │  1135.21 ms │              1145.34 ms │     no change │
│ QQuery 65 │   353.30 ms │               365.45 ms │     no change │
│ QQuery 66 │   389.74 ms │               387.84 ms │     no change │
│ QQuery 67 │   548.81 ms │               535.32 ms │     no change │
│ QQuery 68 │   370.03 ms │               379.57 ms │     no change │
│ QQuery 69 │   167.41 ms │               169.50 ms │     no change │
│ QQuery 70 │   488.50 ms │               488.25 ms │     no change │
│ QQuery 71 │   182.73 ms │               185.64 ms │     no change │
│ QQuery 72 │ 12970.55 ms │             13115.33 ms │     no change │
│ QQuery 73 │   155.54 ms │               153.51 ms │     no change │
│ QQuery 74 │   836.28 ms │               865.13 ms │     no change │
│ QQuery 75 │   408.75 ms │               418.90 ms │     no change │
│ QQuery 76 │   198.19 ms │               206.75 ms │     no change │
│ QQuery 77 │   277.19 ms │               280.48 ms │     no change │
│ QQuery 78 │   691.63 ms │               695.66 ms │     no change │
│ QQuery 79 │   324.21 ms │               332.26 ms │     no change │
│ QQuery 80 │   534.60 ms │               536.42 ms │     no change │
│ QQuery 81 │    59.64 ms │                57.94 ms │     no change │
│ QQuery 82 │   972.25 ms │               989.86 ms │     no change │
│ QQuery 83 │    78.96 ms │                77.59 ms │     no change │
│ QQuery 84 │    67.66 ms │                70.22 ms │     no change │
│ QQuery 85 │   315.94 ms │               323.98 ms │     no change │
│ QQuery 86 │    57.75 ms │                57.18 ms │     no change │
│ QQuery 87 │   180.39 ms │               182.15 ms │     no change │
│ QQuery 88 │   256.71 ms │               264.75 ms │     no change │
│ QQuery 89 │   160.08 ms │               165.93 ms │     no change │
│ QQuery 90 │    44.92 ms │                44.07 ms │     no change │
│ QQuery 91 │   144.70 ms │               136.94 ms │ +1.06x faster │
│ QQuery 92 │    78.38 ms │                79.39 ms │     no change │
│ QQuery 93 │   292.94 ms │               290.44 ms │     no change │
│ QQuery 94 │    86.36 ms │                88.47 ms │     no change │
│ QQuery 95 │   244.75 ms │               241.87 ms │     no change │
│ QQuery 96 │   113.65 ms │               115.39 ms │     no change │
│ QQuery 97 │   226.73 ms │               233.61 ms │     no change │
│ QQuery 98 │   217.10 ms │               220.05 ms │     no change │
│ QQuery 99 │ 14433.39 ms │             14270.52 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 68138.90ms │
│ Total Time (filter-pushdown-dynamic)   │ 68304.95ms │
│ Average Time (HEAD)                    │   688.27ms │
│ Average Time (filter-pushdown-dynamic) │   689.95ms │
│ Queries Faster                         │          1 │
│ Queries Slower                         │          2 │
│ Queries with No Change                 │         96 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

alamb-ghbot · 2026-02-14T17:21:08Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (b27a9c3) to af5f470 diff using: clickbench_partitioned
Results will be posted here when complete

alamb-ghbot · 2026-02-14T17:35:57Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.62 ms │                 2.64 ms │     no change │
│ QQuery 1  │    54.29 ms │                54.00 ms │     no change │
│ QQuery 2  │   139.32 ms │               131.79 ms │ +1.06x faster │
│ QQuery 3  │   157.90 ms │               159.57 ms │     no change │
│ QQuery 4  │  1023.55 ms │              1031.84 ms │     no change │
│ QQuery 5  │  1300.49 ms │              1287.39 ms │     no change │
│ QQuery 6  │    16.27 ms │                14.56 ms │ +1.12x faster │
│ QQuery 7  │    68.30 ms │                70.15 ms │     no change │
│ QQuery 8  │  1414.88 ms │              1411.41 ms │     no change │
│ QQuery 9  │  1762.87 ms │              1796.19 ms │     no change │
│ QQuery 10 │   484.09 ms │               476.85 ms │     no change │
│ QQuery 11 │   517.43 ms │               519.10 ms │     no change │
│ QQuery 12 │  1371.26 ms │              1422.57 ms │     no change │
│ QQuery 13 │  2071.77 ms │              2084.33 ms │     no change │
│ QQuery 14 │  1412.70 ms │              1396.25 ms │     no change │
│ QQuery 15 │  1158.06 ms │              1205.29 ms │     no change │
│ QQuery 16 │  2494.84 ms │              2495.92 ms │     no change │
│ QQuery 17 │  2421.90 ms │              2458.61 ms │     no change │
│ QQuery 18 │  4934.94 ms │              4663.00 ms │ +1.06x faster │
│ QQuery 19 │   136.32 ms │               140.24 ms │     no change │
│ QQuery 20 │  1946.06 ms │              1898.34 ms │     no change │
│ QQuery 21 │  2314.07 ms │              2338.94 ms │     no change │
│ QQuery 22 │  4098.86 ms │              3249.58 ms │ +1.26x faster │
│ QQuery 23 │  1085.25 ms │              1327.49 ms │  1.22x slower │
│ QQuery 24 │   247.11 ms │               209.19 ms │ +1.18x faster │
│ QQuery 25 │   641.40 ms │               647.15 ms │     no change │
│ QQuery 26 │   360.93 ms │               291.90 ms │ +1.24x faster │
│ QQuery 27 │  2970.13 ms │              2503.54 ms │ +1.19x faster │
│ QQuery 28 │ 25581.91 ms │             24355.34 ms │     no change │
│ QQuery 29 │   978.83 ms │               976.84 ms │     no change │
│ QQuery 30 │  1295.60 ms │              1272.14 ms │     no change │
│ QQuery 31 │  1334.83 ms │              1302.15 ms │     no change │
│ QQuery 32 │  4160.84 ms │              4166.91 ms │     no change │
│ QQuery 33 │  5340.68 ms │              5219.89 ms │     no change │
│ QQuery 34 │  6201.84 ms │              5949.84 ms │     no change │
│ QQuery 35 │  1915.14 ms │              1837.58 ms │     no change │
│ QQuery 36 │   191.59 ms │               180.57 ms │ +1.06x faster │
│ QQuery 37 │    94.81 ms │                86.02 ms │ +1.10x faster │
│ QQuery 38 │    95.22 ms │                93.91 ms │     no change │
│ QQuery 39 │   302.85 ms │               310.73 ms │     no change │
│ QQuery 40 │    61.46 ms │                59.89 ms │     no change │
│ QQuery 41 │    52.71 ms │                55.73 ms │  1.06x slower │
│ QQuery 42 │    38.45 ms │                39.10 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 84254.38ms │
│ Total Time (filter-pushdown-dynamic)   │ 81194.46ms │
│ Average Time (HEAD)                    │  1959.40ms │
│ Average Time (filter-pushdown-dynamic) │  1888.24ms │
│ Queries Faster                         │          9 │
│ Queries Slower                         │          2 │
│ Queries with No Change                 │         32 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

adriangb · 2026-02-14T18:55:11Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-14T18:55:20Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (b27a9c3) to af5f470 diff using: clickbench_partitioned
Results will be posted here when complete

adriangb · 2026-02-14T18:55:26Z

run benchmark tpcds
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

adriangb · 2026-02-14T19:07:35Z

@Dandandan @alamb these numbers are looking pretty good

alamb-ghbot · 2026-02-14T19:24:52Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.59 ms │                 2.63 ms │     no change │
│ QQuery 1  │    55.45 ms │                52.36 ms │ +1.06x faster │
│ QQuery 2  │   147.16 ms │               132.93 ms │ +1.11x faster │
│ QQuery 3  │   155.28 ms │               159.63 ms │     no change │
│ QQuery 4  │  1006.83 ms │              1044.41 ms │     no change │
│ QQuery 5  │  1295.85 ms │              1268.86 ms │     no change │
│ QQuery 6  │    18.02 ms │                14.52 ms │ +1.24x faster │
│ QQuery 7  │    70.16 ms │                68.96 ms │     no change │
│ QQuery 8  │  1392.99 ms │              1387.51 ms │     no change │
│ QQuery 9  │  1763.73 ms │              1699.82 ms │     no change │
│ QQuery 10 │   474.73 ms │               469.21 ms │     no change │
│ QQuery 11 │   515.79 ms │               517.90 ms │     no change │
│ QQuery 12 │  1380.58 ms │              1388.35 ms │     no change │
│ QQuery 13 │  2068.06 ms │              1997.90 ms │     no change │
│ QQuery 14 │  1413.11 ms │              1403.19 ms │     no change │
│ QQuery 15 │  1157.26 ms │              1177.77 ms │     no change │
│ QQuery 16 │  2442.67 ms │              2436.55 ms │     no change │
│ QQuery 17 │  2416.65 ms │              2421.16 ms │     no change │
│ QQuery 18 │  4656.00 ms │              4789.27 ms │     no change │
│ QQuery 19 │   141.35 ms │               142.45 ms │     no change │
│ QQuery 20 │  1908.47 ms │              1891.88 ms │     no change │
│ QQuery 21 │  2353.18 ms │              2367.93 ms │     no change │
│ QQuery 22 │  4047.30 ms │              3256.70 ms │ +1.24x faster │
│ QQuery 23 │  1097.02 ms │              1301.34 ms │  1.19x slower │
│ QQuery 24 │   242.99 ms │               203.91 ms │ +1.19x faster │
│ QQuery 25 │   645.90 ms │               639.50 ms │     no change │
│ QQuery 26 │   341.98 ms │               314.57 ms │ +1.09x faster │
│ QQuery 27 │  3013.13 ms │              2526.00 ms │ +1.19x faster │
│ QQuery 28 │ 24783.63 ms │             24401.91 ms │     no change │
│ QQuery 29 │   956.82 ms │               970.88 ms │     no change │
│ QQuery 30 │  1340.82 ms │              1283.11 ms │     no change │
│ QQuery 31 │  1320.37 ms │              1314.20 ms │     no change │
│ QQuery 32 │  4278.84 ms │              4515.13 ms │  1.06x slower │
│ QQuery 33 │  5114.64 ms │              5190.63 ms │     no change │
│ QQuery 34 │  5754.13 ms │              5823.29 ms │     no change │
│ QQuery 35 │  1865.97 ms │              1856.49 ms │     no change │
│ QQuery 36 │   177.67 ms │               183.88 ms │     no change │
│ QQuery 37 │    91.66 ms │                94.01 ms │     no change │
│ QQuery 38 │    93.74 ms │                91.08 ms │     no change │
│ QQuery 39 │   303.73 ms │               297.59 ms │     no change │
│ QQuery 40 │    58.49 ms │                58.75 ms │     no change │
│ QQuery 41 │    50.42 ms │                50.85 ms │     no change │
│ QQuery 42 │    38.18 ms │                37.61 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 82453.33ms │
│ Total Time (filter-pushdown-dynamic)   │ 81246.66ms │
│ Average Time (HEAD)                    │  1917.52ms │
│ Average Time (filter-pushdown-dynamic) │  1889.46ms │
│ Queries Faster                         │          7 │
│ Queries Slower                         │          2 │
│ Queries with No Change                 │         34 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

alamb-ghbot · 2026-02-14T19:24:56Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic (b27a9c3) to af5f470 diff using: tpcds
Results will be posted here when complete

alamb-ghbot · 2026-02-14T19:39:16Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │    85.92 ms │                88.20 ms │     no change │
│ QQuery 2  │   242.54 ms │               252.54 ms │     no change │
│ QQuery 3  │   160.18 ms │               164.62 ms │     no change │
│ QQuery 4  │  1665.42 ms │              1682.98 ms │     no change │
│ QQuery 5  │   311.58 ms │               301.07 ms │     no change │
│ QQuery 6  │   431.75 ms │               429.27 ms │     no change │
│ QQuery 7  │   539.45 ms │               537.77 ms │     no change │
│ QQuery 8  │   238.58 ms │               247.73 ms │     no change │
│ QQuery 9  │   824.16 ms │               868.72 ms │  1.05x slower │
│ QQuery 10 │   169.95 ms │               164.29 ms │     no change │
│ QQuery 11 │  1044.10 ms │              1069.46 ms │     no change │
│ QQuery 12 │    57.19 ms │                55.60 ms │     no change │
│ QQuery 13 │   855.32 ms │               868.19 ms │     no change │
│ QQuery 14 │  2546.73 ms │              2528.43 ms │     no change │
│ QQuery 15 │    32.31 ms │                31.64 ms │     no change │
│ QQuery 16 │    65.08 ms │                64.00 ms │     no change │
│ QQuery 17 │   302.23 ms │               293.93 ms │     no change │
│ QQuery 18 │   477.25 ms │               488.30 ms │     no change │
│ QQuery 19 │   192.74 ms │               192.49 ms │     no change │
│ QQuery 20 │    27.66 ms │                27.51 ms │     no change │
│ QQuery 21 │    47.66 ms │                48.23 ms │     no change │
│ QQuery 22 │   777.31 ms │               753.22 ms │     no change │
│ QQuery 23 │  2527.93 ms │              2530.41 ms │     no change │
│ QQuery 24 │   256.95 ms │               254.55 ms │     no change │
│ QQuery 25 │   456.59 ms │               451.10 ms │     no change │
│ QQuery 26 │   275.87 ms │               286.97 ms │     no change │
│ QQuery 27 │   547.53 ms │               544.51 ms │     no change │
│ QQuery 28 │   498.76 ms │               484.31 ms │     no change │
│ QQuery 29 │   388.83 ms │               394.11 ms │     no change │
│ QQuery 30 │    92.99 ms │                95.57 ms │     no change │
│ QQuery 31 │   289.31 ms │               306.91 ms │  1.06x slower │
│ QQuery 32 │    76.97 ms │                74.76 ms │     no change │
│ QQuery 33 │   193.79 ms │               196.70 ms │     no change │
│ QQuery 34 │   192.71 ms │               202.87 ms │  1.05x slower │
│ QQuery 35 │   258.47 ms │               252.22 ms │     no change │
│ QQuery 36 │   354.53 ms │               343.95 ms │     no change │
│ QQuery 37 │   959.12 ms │              1019.77 ms │  1.06x slower │
│ QQuery 38 │   183.43 ms │               181.22 ms │     no change │
│ QQuery 39 │   218.58 ms │               214.02 ms │     no change │
│ QQuery 40 │   196.74 ms │               194.13 ms │     no change │
│ QQuery 41 │    26.43 ms │                26.41 ms │     no change │
│ QQuery 42 │   152.90 ms │               144.47 ms │ +1.06x faster │
│ QQuery 43 │   166.86 ms │               169.08 ms │     no change │
│ QQuery 44 │    29.24 ms │                28.91 ms │     no change │
│ QQuery 45 │    71.61 ms │                70.77 ms │     no change │
│ QQuery 46 │   325.08 ms │               321.10 ms │     no change │
│ QQuery 47 │  1259.33 ms │              1246.52 ms │     no change │
│ QQuery 48 │   728.97 ms │               734.84 ms │     no change │
│ QQuery 49 │   545.52 ms │               555.76 ms │     no change │
│ QQuery 50 │   743.89 ms │               748.74 ms │     no change │
│ QQuery 51 │   362.21 ms │               361.91 ms │     no change │
│ QQuery 52 │   145.96 ms │               150.55 ms │     no change │
│ QQuery 53 │   221.21 ms │               211.98 ms │     no change │
│ QQuery 54 │   181.83 ms │               183.74 ms │     no change │
│ QQuery 55 │   146.08 ms │               143.52 ms │     no change │
│ QQuery 56 │   195.19 ms │               192.69 ms │     no change │
│ QQuery 57 │   320.35 ms │               316.03 ms │     no change │
│ QQuery 58 │   616.47 ms │               603.38 ms │     no change │
│ QQuery 59 │   370.36 ms │               370.93 ms │     no change │
│ QQuery 60 │   204.07 ms │               205.64 ms │     no change │
│ QQuery 61 │   267.36 ms │               271.51 ms │     no change │
│ QQuery 62 │  1353.17 ms │              1348.46 ms │     no change │
│ QQuery 63 │   215.63 ms │               212.22 ms │     no change │
│ QQuery 64 │ 29835.87 ms │             29326.73 ms │     no change │
│ QQuery 65 │   443.14 ms │               456.41 ms │     no change │
│ QQuery 66 │   285.00 ms │               285.52 ms │     no change │
│ QQuery 67 │   871.59 ms │               866.72 ms │     no change │
│ QQuery 68 │   348.88 ms │               347.49 ms │     no change │
│ QQuery 69 │   172.10 ms │               171.81 ms │     no change │
│ QQuery 70 │   657.55 ms │               654.35 ms │     no change │
│ QQuery 71 │   177.72 ms │               173.72 ms │     no change │
│ QQuery 72 │  1778.52 ms │              1789.50 ms │     no change │
│ QQuery 73 │   182.30 ms │               176.25 ms │     no change │
│ QQuery 74 │   753.34 ms │               760.75 ms │     no change │
│ QQuery 75 │   459.03 ms │               468.48 ms │     no change │
│ QQuery 76 │   403.50 ms │               397.76 ms │     no change │
│ QQuery 77 │   339.76 ms │               337.09 ms │     no change │
│ QQuery 78 │   686.79 ms │               692.78 ms │     no change │
│ QQuery 79 │   373.88 ms │               375.40 ms │     no change │
│ QQuery 80 │   534.01 ms │               535.21 ms │     no change │
│ QQuery 81 │    58.32 ms │                58.16 ms │     no change │
│ QQuery 82 │   257.64 ms │               255.07 ms │     no change │
│ QQuery 83 │   107.34 ms │               106.23 ms │     no change │
│ QQuery 84 │    98.57 ms │                95.52 ms │     no change │
│ QQuery 85 │   411.41 ms │               405.68 ms │     no change │
│ QQuery 86 │    73.81 ms │                73.40 ms │     no change │
│ QQuery 87 │   190.00 ms │               186.15 ms │     no change │
│ QQuery 88 │   319.45 ms │               311.55 ms │     no change │
│ QQuery 89 │   238.29 ms │               244.93 ms │     no change │
│ QQuery 90 │    49.97 ms │                49.23 ms │     no change │
│ QQuery 91 │   155.84 ms │               150.72 ms │     no change │
│ QQuery 92 │    75.11 ms │                73.60 ms │     no change │
│ QQuery 93 │   286.46 ms │               287.42 ms │     no change │
│ QQuery 94 │    93.80 ms │                94.15 ms │     no change │
│ QQuery 95 │   318.58 ms │               317.20 ms │     no change │
│ QQuery 96 │   129.17 ms │               125.50 ms │     no change │
│ QQuery 97 │   245.32 ms │               243.50 ms │     no change │
│ QQuery 98 │   162.59 ms │               157.96 ms │     no change │
│ QQuery 99 │ 14544.65 ms │             14592.30 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 83833.23ms │
│ Total Time (filter-pushdown-dynamic)   │ 83449.68ms │
│ Average Time (HEAD)                    │   846.80ms │
│ Average Time (filter-pushdown-dynamic) │   842.93ms │
│ Queries Faster                         │          1 │
│ Queries Slower                         │          4 │
│ Queries with No Change                 │         94 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘

Dandandan · 2026-02-14T20:58:21Z

datafusion/common/src/config.rs

+        /// promoted. A value of 0.0 means all filters will be promoted.
+        /// Because there can be a high I/O cost to pushing down ineffective filters,
+        /// recommended values are in the range [0.8, 0.95], depending on random I/0 costs.
+        pub filter_effectiveness_threshold: f64, default = 0.8


Can we check 0.5 as well here?

Dandandan · 2026-02-14T21:01:30Z

Is it me or don't I see really large difference in performamce (I would have expected more).

adriangb · 2026-02-14T21:53:36Z

#19639 (comment)

Dandandan · 2026-02-15T19:15:28Z

datafusion/datasource-parquet/src/row_filter.rs

+    }
+
+    if reorder_predicates {
+        candidates_with_exprs.sort_unstable_by(|(_, c1), (_, c2)| {


AFAIK build_row_filter_with_metrics only runs once on file open?
Which means for all files/partitions that are opened directly on start it will not do anything.

(E.g. for TPCH / TPCDS this is not helping much as the number of files is limited, so it will only help if partitions are started one after another).

clickbench_partitioned consists of 100 files - so it might help there more.

github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate datasource Changes to the datasource crate proto Related to proto crate labels Jan 4, 2026

github-actions bot added documentation Improvements or additions to documentation core Core DataFusion crate labels Jan 4, 2026

adriangb mentioned this pull request Jan 6, 2026

fix: DynamicFilterPhysicalExpr violates Hash/Eq contract #19659

Merged

This was referenced Jan 17, 2026

support DynamicFilterPhysicalExpr vortex-data/vortex#4961

Open

Pushing down HashJoinExec build side dynamic filters makes tpch queries slower #19858

Open

feat: adaptive filter selectivity tracking for Parquet row filters

b13bd64

adriangb force-pushed the filter-pushdown-dynamic branch from c0b86d7 to b13bd64 Compare February 14, 2026 16:27

fix

b27a9c3

adriangb marked this pull request as ready for review February 14, 2026 19:07

adriangb mentioned this pull request Feb 14, 2026

[EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) #20324

Open

2 tasks

Dandandan reviewed Feb 14, 2026

View reviewed changes

Dandandan reviewed Feb 15, 2026

View reviewed changes

Conversation

adriangb commented Jan 4, 2026

Summary

Test plan

Uh oh!

adriangb commented Jan 4, 2026

Uh oh!

alamb-ghbot commented Jan 4, 2026

Uh oh!

adriangb commented Jan 4, 2026

Uh oh!

alamb-ghbot commented Jan 4, 2026

Uh oh!

alamb-ghbot commented Jan 4, 2026

Uh oh!

adriangb commented Jan 4, 2026

Uh oh!

alamb-ghbot commented Jan 4, 2026

Uh oh!

adriangb commented Jan 4, 2026

Uh oh!

alamb-ghbot commented Jan 4, 2026

Uh oh!

alamb-ghbot commented Jan 4, 2026

Uh oh!

adriangb commented Jan 4, 2026

Uh oh!

alamb-ghbot commented Jan 4, 2026

Uh oh!

alamb-ghbot commented Jan 5, 2026

Uh oh!

adriangb commented Jan 5, 2026

Uh oh!

adriangb commented Jan 5, 2026

Uh oh!

alamb-ghbot commented Jan 5, 2026

Uh oh!

alamb-ghbot commented Jan 5, 2026

Uh oh!

adriangb commented Jan 5, 2026

Uh oh!

alamb-ghbot commented Jan 5, 2026

Uh oh!

alamb-ghbot commented Jan 5, 2026

Uh oh!

alamb commented Jan 5, 2026

Uh oh!

adriangb commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdf-jkl commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dandandan commented Feb 4, 2026

Uh oh!

Dandandan commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dandandan commented Feb 4, 2026

Uh oh!

adriangb commented Feb 14, 2026

Uh oh!

adriangb commented Feb 14, 2026

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

adriangb commented Feb 14, 2026

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

adriangb commented Feb 14, 2026

adriangb commented Jan 5, 2026 •

edited

Loading

sdf-jkl commented Jan 5, 2026 •

edited

Loading

Dandandan commented Feb 4, 2026 •

edited

Loading

adriangb commented Feb 4, 2026 •

edited

Loading