Skip to content

Optimize Clickbench Query 29 by adding a new Optimizer rule#20180

Open
devanshu0987 wants to merge 13 commits intoapache:mainfrom
devanshu0987:query_29
Open

Optimize Clickbench Query 29 by adding a new Optimizer rule#20180
devanshu0987 wants to merge 13 commits intoapache:mainfrom
devanshu0987:query_29

Conversation

@devanshu0987
Copy link
Contributor

@devanshu0987 devanshu0987 commented Feb 6, 2026

Which issue does this PR close?

Rationale for this change

  • ClickBench Query 29 executes 90 separate SUM operations:
    • SUM(ResolutionWidth), SUM(ResolutionWidth + 1), SUM(ResolutionWidth + 2), ..., SUM(ResolutionWidth + 89)
  • https://www.redpanda.com/blog/oxla-our-road-to-improving-oxla-results-on-clickbench#what-we-had-to-optimize
    • This blog post from Redpand mentioned an optimization they did to optimize this query.
  • Transform aggregations SUM(A + k) into SUM(A) + k * COUNT(*)
  • This will transform 90 SUM aggregations into 1 SUM and 1 COUNT aggregation and the results will be reused for all projections.
    • It is added before CommonSubexprEliminate to allow re use of the common SUM and COUNT expression.

What changes are included in this PR?

  • New optimizer rule RewriteAggregateWithConstant

Are these changes tested?

  • Unit Tests
  • SLT Tests
  • Explain over Query 29 shows the reuse
    • I would like the core-contributors to run the benchmark to prove the impact.

Are there any user-facing changes?

No

AI Usage

  • Yes. I have heavily used assistance from various models.
  • Although, I have gone through multiple rounds of clean up to reduce unintended/unnecessary changes.

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Feb 6, 2026
@devanshu0987 devanshu0987 marked this pull request as draft February 6, 2026 04:11
@devanshu0987 devanshu0987 marked this pull request as ready for review February 6, 2026 07:41
@alamb
Copy link
Contributor

alamb commented Feb 7, 2026

run benchmark sql_planner

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing query_29 (59dad7f) to aef2965 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=query_29
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Feb 7, 2026

run benchmarks

@alamb
Copy link
Contributor

alamb commented Feb 7, 2026

Thanks @devanshu0987 -- I kicked off some benchmarks

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                 main                                   query_29
-----                                                 ----                                   --------
logical_aggregate_with_join                           1.00    645.8±6.88µs        ? ?/sec    1.00    643.6±7.13µs        ? ?/sec
logical_plan_struct_join_agg_sort                     1.01    291.4±6.60µs        ? ?/sec    1.00    289.5±3.76µs        ? ?/sec
logical_select_all_from_1000                          1.03     10.7±0.17ms        ? ?/sec    1.00     10.4±0.07ms        ? ?/sec
logical_select_one_from_700                           1.01    423.6±4.09µs        ? ?/sec    1.00    421.1±6.45µs        ? ?/sec
logical_trivial_join_high_numbered_columns            1.01    380.4±3.20µs        ? ?/sec    1.00    377.6±6.47µs        ? ?/sec
logical_trivial_join_low_numbered_columns             1.01    367.1±4.34µs        ? ?/sec    1.00    363.8±5.17µs        ? ?/sec
physical_intersection                                 1.00  1617.1±19.12µs        ? ?/sec    1.00  1612.5±33.57µs        ? ?/sec
physical_join_consider_sort                           1.01      2.3±0.03ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
physical_join_distinct                                1.01    358.9±7.61µs        ? ?/sec    1.00    354.9±2.64µs        ? ?/sec
physical_many_self_joins                              1.00     12.7±0.17ms        ? ?/sec    1.00     12.7±0.06ms        ? ?/sec
physical_plan_clickbench_all                          1.00    199.2±1.58ms        ? ?/sec    1.03    205.1±2.29ms        ? ?/sec
physical_plan_clickbench_q1                           1.01      2.1±0.06ms        ? ?/sec    1.00      2.1±0.02ms        ? ?/sec
physical_plan_clickbench_q10                          1.00      3.6±0.03ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
physical_plan_clickbench_q11                          1.00      4.1±0.03ms        ? ?/sec    1.00      4.1±0.02ms        ? ?/sec
physical_plan_clickbench_q12                          1.00      4.2±0.02ms        ? ?/sec    1.00      4.2±0.06ms        ? ?/sec
physical_plan_clickbench_q13                          1.00      3.7±0.03ms        ? ?/sec    1.00      3.7±0.04ms        ? ?/sec
physical_plan_clickbench_q14                          1.00      4.1±0.02ms        ? ?/sec    1.01      4.1±0.05ms        ? ?/sec
physical_plan_clickbench_q15                          1.00      3.8±0.02ms        ? ?/sec    1.00      3.8±0.10ms        ? ?/sec
physical_plan_clickbench_q16                          1.00      3.6±0.02ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
physical_plan_clickbench_q17                          1.00      3.8±0.04ms        ? ?/sec    1.00      3.7±0.07ms        ? ?/sec
physical_plan_clickbench_q18                          1.00      2.6±0.02ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
physical_plan_clickbench_q19                          1.00      4.1±0.02ms        ? ?/sec    1.00      4.1±0.10ms        ? ?/sec
physical_plan_clickbench_q2                           1.00      2.8±0.04ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q20                          1.00      2.1±0.02ms        ? ?/sec    1.00      2.1±0.03ms        ? ?/sec
physical_plan_clickbench_q21                          1.00      2.8±0.01ms        ? ?/sec    1.00      2.8±0.04ms        ? ?/sec
physical_plan_clickbench_q22                          1.00      3.9±0.03ms        ? ?/sec    1.01      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q23                          1.01      4.2±0.07ms        ? ?/sec    1.00      4.2±0.05ms        ? ?/sec
physical_plan_clickbench_q24                          1.00      4.8±0.03ms        ? ?/sec    1.01      4.8±0.05ms        ? ?/sec
physical_plan_clickbench_q25                          1.01      3.5±0.03ms        ? ?/sec    1.00      3.4±0.04ms        ? ?/sec
physical_plan_clickbench_q26                          1.00      2.9±0.02ms        ? ?/sec    1.00      2.9±0.05ms        ? ?/sec
physical_plan_clickbench_q27                          1.00      3.5±0.08ms        ? ?/sec    1.00      3.5±0.05ms        ? ?/sec
physical_plan_clickbench_q28                          1.00      4.4±0.07ms        ? ?/sec    1.00      4.4±0.04ms        ? ?/sec
physical_plan_clickbench_q29                          1.01      4.8±0.19ms        ? ?/sec    1.00      4.7±0.07ms        ? ?/sec
physical_plan_clickbench_q3                           1.00      2.5±0.02ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
physical_plan_clickbench_q30                          1.00     15.5±0.42ms        ? ?/sec    1.39     21.5±0.50ms        ? ?/sec
physical_plan_clickbench_q31                          1.00      4.4±0.05ms        ? ?/sec    1.00      4.4±0.05ms        ? ?/sec
physical_plan_clickbench_q32                          1.00      4.4±0.09ms        ? ?/sec    1.00      4.4±0.07ms        ? ?/sec
physical_plan_clickbench_q33                          1.00      3.6±0.04ms        ? ?/sec    1.00      3.6±0.04ms        ? ?/sec
physical_plan_clickbench_q34                          1.00      3.2±0.08ms        ? ?/sec    1.00      3.2±0.03ms        ? ?/sec
physical_plan_clickbench_q35                          1.00      3.3±0.06ms        ? ?/sec    1.01      3.4±0.02ms        ? ?/sec
physical_plan_clickbench_q36                          1.00      4.2±0.10ms        ? ?/sec    1.00      4.2±0.03ms        ? ?/sec
physical_plan_clickbench_q37                          1.00      4.6±0.07ms        ? ?/sec    1.00      4.6±0.03ms        ? ?/sec
physical_plan_clickbench_q38                          1.01      4.6±0.11ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
physical_plan_clickbench_q39                          1.01      4.1±0.10ms        ? ?/sec    1.00      4.0±0.03ms        ? ?/sec
physical_plan_clickbench_q4                           1.01      2.2±0.02ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
physical_plan_clickbench_q40                          1.01      4.9±0.17ms        ? ?/sec    1.00      4.8±0.03ms        ? ?/sec
physical_plan_clickbench_q41                          1.00      4.2±0.03ms        ? ?/sec    1.00      4.2±0.02ms        ? ?/sec
physical_plan_clickbench_q42                          1.00      4.2±0.03ms        ? ?/sec    1.00      4.2±0.03ms        ? ?/sec
physical_plan_clickbench_q43                          1.01      4.5±0.03ms        ? ?/sec    1.00      4.5±0.03ms        ? ?/sec
physical_plan_clickbench_q44                          1.00      2.3±0.02ms        ? ?/sec    1.00      2.3±0.05ms        ? ?/sec
physical_plan_clickbench_q45                          1.00      2.3±0.02ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
physical_plan_clickbench_q46                          1.01      3.2±0.02ms        ? ?/sec    1.00      3.2±0.02ms        ? ?/sec
physical_plan_clickbench_q47                          1.00      4.7±0.03ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
physical_plan_clickbench_q48                          1.00      5.1±0.04ms        ? ?/sec    1.00      5.1±0.03ms        ? ?/sec
physical_plan_clickbench_q49                          1.00      5.4±0.04ms        ? ?/sec    1.01      5.4±0.03ms        ? ?/sec
physical_plan_clickbench_q5                           1.00      2.5±0.03ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
physical_plan_clickbench_q50                          1.00      4.1±0.03ms        ? ?/sec    1.00      4.1±0.03ms        ? ?/sec
physical_plan_clickbench_q51                          1.00      3.5±0.03ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
physical_plan_clickbench_q6                           1.01      2.5±0.05ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
physical_plan_clickbench_q7                           1.00      2.1±0.02ms        ? ?/sec    1.00      2.1±0.02ms        ? ?/sec
physical_plan_clickbench_q8                           1.02      3.4±0.06ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
physical_plan_clickbench_q9                           1.00      3.6±0.03ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec
physical_plan_struct_join_agg_sort                    1.00      2.7±0.04ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
physical_plan_tpcds_all                               1.00  1892.6±10.34ms        ? ?/sec    1.00  1897.3±11.17ms        ? ?/sec
physical_plan_tpch_all                                1.00    125.4±0.71ms        ? ?/sec    1.00    125.3±0.76ms        ? ?/sec
physical_plan_tpch_q1                                 1.01      3.0±0.05ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
physical_plan_tpch_q10                                1.00      7.2±0.04ms        ? ?/sec    1.01      7.2±0.16ms        ? ?/sec
physical_plan_tpch_q11                                1.00      8.4±0.03ms        ? ?/sec    1.02      8.6±0.10ms        ? ?/sec
physical_plan_tpch_q12                                1.00      3.1±0.02ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
physical_plan_tpch_q13                                1.00      3.0±0.03ms        ? ?/sec    1.00      3.0±0.03ms        ? ?/sec
physical_plan_tpch_q14                                1.00      3.0±0.04ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_plan_tpch_q16                                1.00      5.2±0.05ms        ? ?/sec    1.00      5.2±0.04ms        ? ?/sec
physical_plan_tpch_q17                                1.00      5.6±0.05ms        ? ?/sec    1.00      5.6±0.06ms        ? ?/sec
physical_plan_tpch_q18                                1.00      5.9±0.07ms        ? ?/sec    1.00      5.9±0.03ms        ? ?/sec
physical_plan_tpch_q19                                1.00      5.0±0.06ms        ? ?/sec    1.00      5.0±0.02ms        ? ?/sec
physical_plan_tpch_q2                                 1.00     12.2±0.10ms        ? ?/sec    1.00     12.2±0.09ms        ? ?/sec
physical_plan_tpch_q20                                1.00      8.0±0.10ms        ? ?/sec    1.00      8.0±0.05ms        ? ?/sec
physical_plan_tpch_q21                                1.00     10.0±0.19ms        ? ?/sec    1.00     10.0±0.06ms        ? ?/sec
physical_plan_tpch_q22                                1.00      6.4±0.10ms        ? ?/sec    1.00      6.4±0.03ms        ? ?/sec
physical_plan_tpch_q3                                 1.01      5.6±0.06ms        ? ?/sec    1.00      5.6±0.03ms        ? ?/sec
physical_plan_tpch_q4                                 1.00      3.0±0.03ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_plan_tpch_q5                                 1.00      5.9±0.03ms        ? ?/sec    1.00      5.9±0.04ms        ? ?/sec
physical_plan_tpch_q6                                 1.00  1585.1±13.50µs        ? ?/sec    1.01  1598.2±24.06µs        ? ?/sec
physical_plan_tpch_q7                                 1.00      7.1±0.05ms        ? ?/sec    1.00      7.1±0.04ms        ? ?/sec
physical_plan_tpch_q8                                 1.00      9.2±0.05ms        ? ?/sec    1.01      9.2±0.04ms        ? ?/sec
physical_plan_tpch_q9                                 1.00      6.6±0.03ms        ? ?/sec    1.00      6.6±0.22ms        ? ?/sec
physical_select_aggregates_from_200                   1.01     17.4±0.09ms        ? ?/sec    1.00     17.2±0.31ms        ? ?/sec
physical_select_all_from_1000                         1.02     23.8±0.13ms        ? ?/sec    1.00     23.4±0.08ms        ? ?/sec
physical_select_one_from_700                          1.01   1359.0±9.79µs        ? ?/sec    1.00  1349.0±12.65µs        ? ?/sec
physical_sorted_union_order_by_10_int64               1.00     10.9±0.09ms        ? ?/sec    1.00     10.9±0.12ms        ? ?/sec
physical_sorted_union_order_by_10_uint64              1.00     29.4±0.22ms        ? ?/sec    1.00     29.5±0.24ms        ? ?/sec
physical_sorted_union_order_by_50_int64               1.00    194.6±1.33ms        ? ?/sec    1.00    195.2±2.52ms        ? ?/sec
physical_sorted_union_order_by_50_uint64              1.00   1059.2±7.25ms        ? ?/sec    1.00   1063.0±6.21ms        ? ?/sec
physical_theta_join_consider_sort                     1.01      2.7±0.02ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
physical_unnest_to_join                               1.00      3.0±0.02ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_window_function_partition_by_12_on_values    1.00  1567.9±10.75µs        ? ?/sec    1.00  1569.4±18.95µs        ? ?/sec
physical_window_function_partition_by_30_on_values    1.01      2.9±0.03ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
physical_window_function_partition_by_4_on_values     1.00  1088.0±22.59µs        ? ?/sec    1.01  1094.3±14.38µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00   1258.6±8.38µs        ? ?/sec    1.00  1258.0±11.79µs        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00  1319.2±14.85µs        ? ?/sec    1.00  1317.9±11.44µs        ? ?/sec
with_param_values_many_columns                        1.00    580.4±6.20µs        ? ?/sec    1.00    579.2±5.56µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing query_29 (59dad7f) to aef2965 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and query_29
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃    query_29 ┃    Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0 │  2254.27 ms │  2343.55 ms │ no change │
│ QQuery 1 │   876.02 ms │   915.26 ms │ no change │
│ QQuery 2 │  1817.12 ms │  1746.38 ms │ no change │
│ QQuery 3 │  1072.20 ms │  1063.61 ms │ no change │
│ QQuery 4 │  2221.79 ms │  2179.35 ms │ no change │
│ QQuery 5 │ 27920.46 ms │ 28292.51 ms │ no change │
│ QQuery 6 │  3848.13 ms │  4005.75 ms │ no change │
│ QQuery 7 │  2673.85 ms │  2778.38 ms │ no change │
└──────────┴─────────────┴─────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)       │ 42683.84ms │
│ Total Time (query_29)   │ 43324.79ms │
│ Average Time (HEAD)     │  5335.48ms │
│ Average Time (query_29) │  5415.60ms │
│ Queries Faster          │          0 │
│ Queries Slower          │          0 │
│ Queries with No Change  │          8 │
│ Queries with Failure    │          0 │
└─────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃    query_29 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.56 ms │     2.63 ms │     no change │
│ QQuery 1  │    50.07 ms │    49.71 ms │     no change │
│ QQuery 2  │   132.14 ms │   132.96 ms │     no change │
│ QQuery 3  │   143.96 ms │   157.85 ms │  1.10x slower │
│ QQuery 4  │   945.91 ms │  1020.43 ms │  1.08x slower │
│ QQuery 5  │  1242.50 ms │  1295.05 ms │     no change │
│ QQuery 6  │     6.01 ms │     6.75 ms │  1.12x slower │
│ QQuery 7  │    53.09 ms │    57.08 ms │  1.08x slower │
│ QQuery 8  │  1330.59 ms │  1384.87 ms │     no change │
│ QQuery 9  │  1687.62 ms │  1745.12 ms │     no change │
│ QQuery 10 │   340.57 ms │   349.00 ms │     no change │
│ QQuery 11 │   388.59 ms │   402.39 ms │     no change │
│ QQuery 12 │  1140.66 ms │  1237.61 ms │  1.08x slower │
│ QQuery 13 │  1832.41 ms │  1988.08 ms │  1.08x slower │
│ QQuery 14 │  1184.64 ms │  1250.19 ms │  1.06x slower │
│ QQuery 15 │  1121.50 ms │  1208.25 ms │  1.08x slower │
│ QQuery 16 │  2386.75 ms │  2712.96 ms │  1.14x slower │
│ QQuery 17 │  2379.26 ms │  2528.66 ms │  1.06x slower │
│ QQuery 18 │  6784.83 ms │  5216.15 ms │ +1.30x faster │
│ QQuery 19 │   120.16 ms │   119.03 ms │     no change │
│ QQuery 20 │  1986.45 ms │  1882.84 ms │ +1.06x faster │
│ QQuery 21 │  2254.82 ms │  2187.07 ms │     no change │
│ QQuery 22 │  9581.37 ms │  3756.64 ms │ +2.55x faster │
│ QQuery 23 │ 29241.52 ms │ 12210.32 ms │ +2.39x faster │
│ QQuery 24 │   211.73 ms │   214.71 ms │     no change │
│ QQuery 25 │   466.39 ms │   471.70 ms │     no change │
│ QQuery 26 │   214.00 ms │   216.25 ms │     no change │
│ QQuery 27 │  2674.61 ms │  2663.18 ms │     no change │
│ QQuery 28 │ 23643.36 ms │ 23510.20 ms │     no change │
│ QQuery 29 │   962.70 ms │   127.32 ms │ +7.56x faster │
│ QQuery 30 │  1201.79 ms │  1274.10 ms │  1.06x slower │
│ QQuery 31 │  1464.11 ms │  1322.86 ms │ +1.11x faster │
│ QQuery 32 │  6155.65 ms │  4443.68 ms │ +1.39x faster │
│ QQuery 33 │  6216.62 ms │  5543.97 ms │ +1.12x faster │
│ QQuery 34 │  6258.25 ms │  6056.19 ms │     no change │
│ QQuery 35 │  1815.04 ms │  1865.88 ms │     no change │
│ QQuery 36 │   186.59 ms │   195.90 ms │     no change │
│ QQuery 37 │    72.14 ms │    73.77 ms │     no change │
│ QQuery 38 │   118.22 ms │   114.96 ms │     no change │
│ QQuery 39 │   341.79 ms │   350.45 ms │     no change │
│ QQuery 40 │    40.02 ms │    40.80 ms │     no change │
│ QQuery 41 │    35.73 ms │    35.38 ms │     no change │
│ QQuery 42 │    29.63 ms │    30.96 ms │     no change │
└───────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary       ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)       │ 118446.35ms │
│ Total Time (query_29)   │  91453.91ms │
│ Average Time (HEAD)     │   2754.57ms │
│ Average Time (query_29) │   2126.84ms │
│ Queries Faster          │           8 │
│ Queries Slower          │          11 │
│ Queries with No Change  │          24 │
│ Queries with Failure    │           0 │
└─────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃  query_29 ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 101.81 ms │ 102.65 ms │    no change │
│ QQuery 2  │  32.60 ms │  33.32 ms │    no change │
│ QQuery 3  │  35.34 ms │  35.03 ms │    no change │
│ QQuery 4  │  30.86 ms │  30.91 ms │    no change │
│ QQuery 5  │  90.84 ms │  89.52 ms │    no change │
│ QQuery 6  │  20.76 ms │  20.90 ms │    no change │
│ QQuery 7  │ 160.01 ms │ 154.53 ms │    no change │
│ QQuery 8  │  39.93 ms │  41.71 ms │    no change │
│ QQuery 9  │ 102.19 ms │ 107.31 ms │ 1.05x slower │
│ QQuery 10 │  65.44 ms │  68.16 ms │    no change │
│ QQuery 11 │  18.29 ms │  18.91 ms │    no change │
│ QQuery 12 │  51.91 ms │  51.84 ms │    no change │
│ QQuery 13 │  49.86 ms │  48.99 ms │    no change │
│ QQuery 14 │  15.26 ms │  15.20 ms │    no change │
│ QQuery 15 │  30.53 ms │  30.53 ms │    no change │
│ QQuery 16 │  29.01 ms │  28.39 ms │    no change │
│ QQuery 17 │ 139.32 ms │ 143.69 ms │    no change │
│ QQuery 18 │ 284.97 ms │ 281.62 ms │    no change │
│ QQuery 19 │  40.55 ms │  39.74 ms │    no change │
│ QQuery 20 │  56.51 ms │  54.71 ms │    no change │
│ QQuery 21 │ 183.84 ms │ 192.68 ms │    no change │
│ QQuery 22 │  22.54 ms │  24.29 ms │ 1.08x slower │
└───────────┴───────────┴───────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary       ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)       │ 1602.34ms │
│ Total Time (query_29)   │ 1614.62ms │
│ Average Time (HEAD)     │   72.83ms │
│ Average Time (query_29) │   73.39ms │
│ Queries Faster          │         0 │
│ Queries Slower          │         2 │
│ Queries with No Change  │        20 │
│ Queries with Failure    │         0 │
└─────────────────────────┴───────────┘

@UBarney
Copy link
Contributor

UBarney commented Feb 11, 2026

I think we can implement simplify for sum

fn simplify(&self) -> Option<AggregateFunctionSimplification> {
None
}
, just like in #18837; it should be simpler that way.

@devanshu0987
Copy link
Contributor Author

I think we can implement simplify for sum

fn simplify(&self) -> Option<AggregateFunctionSimplification> {
None
}

, just like in #18837; it should be simpler that way.

Hi, is this a blocking PR review comment?
I am not able to understand how it would be simpler?

I am thinking that we still have the same logical flow

  • Detect the pattern of SUM(col + literal)
  • Modify the plan appropriately.

Copy link
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for queries with a HAVING clause?

I wonder if it would make sense to support this optimization for other aggregates? e.g., AVG(a+k) => AVG(a) + k.

FROM test_table;
----
logical_plan
01)Projection: sum(test_table.a) AS sum_a, sum(test_table.a + Int64(1)) AS sum_a_plus_1, sum(test_table.b) AS sum_b, sum(test_table.b + Int64(2)) AS sum_b_plus_2, sum(test_table.c + Int64(3)) AS sum_c_plus_3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like the rewrite is not actually being applied here? (Contra the comment above: "Test 7: Mixed sum types (rewrites a and b, not c)").

| ScalarValue::UInt16(_)
| ScalarValue::UInt32(_)
| ScalarValue::UInt64(_)
| ScalarValue::Float32(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float16?

if let Expr::BinaryExpr(BinaryExpr { left, op, right }) = arg
&& matches!(op, Operator::Plus | Operator::Minus)
{
// Check if right side is a literal constant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate comment


/// Check if a scalar value is a numeric constant
/// (guards against non-arithmetic types like strings, booleans, dates, etc.)
fn is_numeric_constant(value: &ScalarValue) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should NULL values be handled by this function? I suppose we want to return false for them?

@alamb
Copy link
Contributor

alamb commented Feb 12, 2026

My real concern with this approach is that this optimization is so clickbench specific. I realize that other systems are doing it too (aka the aforementioned ClickHouse and DuckDB PR) but I struggle to find any actual real world usecase

On the other hand, the performance results are pretty compelling so let's see if we can make it work

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃    query_29 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 29 │   962.70 ms │   127.32 ms │ +7.56x faster │

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @devanshu0987

I think we should proceed with the high level idea of rewriting the expressions

Howevr, I really like @neilconway and @UBarney 's suggestions to use the rewrite API instead of a new optimizer rule

After providing this feedback, however, it occurs to me the "don't use function names" and "try not to add new optimizer passes" guidance isn't written down anywhere so I'll try and make a PR to do so later


match inner {
Expr::AggregateFunction(agg_fn) => {
// Rule only applies to SUM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for SUM this was is non ideal because if someone has added a user defined function with a sum that has different behavior / semantics than the built in SUM aggregate this rule will still trigger

Hi, is this a blocking PR review comment? I am not able to understand how it would be simpler?

I am thinking that we still have the same logical flow

  • Detect the pattern of SUM(col + literal)
  • Modify the plan appropriately.

I think it would be better for several reasons:

  1. A new optimizer pass adds non trivial overhead (it ends up copying each plan node I think) so we see planning time go up with each new rule we add to the base DataFusion
  2. It would fix the above issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a PR with some suggested documentation improvements:

@UBarney
Copy link
Contributor

UBarney commented Feb 12, 2026

I think we can implement simplify for sum

fn simplify(&self) -> Option<AggregateFunctionSimplification> {
None
}

, just like in #18837; it should be simpler that way.

Hi, is this a blocking PR review comment? I am not able to understand how it would be simpler?

I am thinking that we still have the same logical flow

  • Detect the pattern of SUM(col + literal)
  • Modify the plan appropriately.

The solution I proposed avoids the need to locate the LogicalAgg in the query plan and manually check if the aggr_func is sum. Even more importantly, it eliminates dependency on datafusion-functions-aggregate.

However, there is an issue I overlooked: the expression after simplification becomes sum(col) + lit * count(col), which is no longer a single AggregateFunction. This causes an error in create_aggregate_expr_with_name_and_maybe_filter during the conversion to PhysicalExpr, as that function expects a pure aggregate expression

I am still exploring the best way to handle this. One possible direction is to introduce a new rule or modify map_logical_node_to_physical to automatically inject a Projection when Aggregate.aggr_expr contains non-pure aggregate expression. If we can find a clean way to support this, it would enable us to handle many other aggregate rewrites via simplify, such as the one mentioned in #19637.

Regarding the architecture, my current view is that the datafusion-optimizer crate should not depend on datafusion-functions-aggregate. It seems more appropriate for function-specific optimization rules to be implemented within functions-aggregate rather than in the datafusion-optimizer.

@devanshu0987
Copy link
Contributor Author

devanshu0987 commented Feb 13, 2026

Hi @UBarney @alamb

Thanks for sharing your thought process on why it should be better to write a simplify rather than an optimizer rule. Helps me reorient my understanding.

However, there is an issue I overlooked

  • It seems @UBarney tried to do it and hit a snag?
  • Let me attempt the same over the weekend to reach the same state and share if I can find something new.
  • Will ping again for guidance on that.

For other comments from @neilconway, I will take care of them post I can convert this to simplify

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make Clickbench Q29 5x faster for datafusion by extracting SUM(..) clauses

6 participants