You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The concurrent_qps test introduced in #860 produces misleading, non-reproducible numbers for single-process / in-process engines (embedded CLIs and dataframe-server systems). For these systems it spawns N independent full-machine processes that collectively oversubscribe RAM, so on memory-constrained instances the metric measures swap-thrashing throughput, not query throughput — and whether it completes at all depends on the runner environment rather than the engine.
This surfaced in #943 (DataFusion), but the root cause is in lib/benchmark-common.sh (bench_concurrent_qps) and affects a whole class of systems.
Evidence
bench_concurrent_qps launches BENCH_CONCURRENT_CONNECTIONS (default 10) workers for BENCH_CONCURRENT_DURATION (default 600s). For BENCH_DURABLE=no engines like datafusion-cli, each worker's ./query spawns a fresh process that assumes it owns the whole machine. 10 × ~7 GB RSS ≈ 70 GB against 32 GB RAM.
Checked-in DataFusion (single) results, concurrent_qps vs. machine RAM:
Machine
vCPU / RAM
concurrent_qps
err
c6a.xlarge
4 / 8 GB
0.025
0
c6a.2xlarge
8 / 16 GB
0.042
0
c6a.4xlarge
16 / 32 GB
0.11
0
c8g.4xlarge
16 / 32 GB
0.092
0
c6a.metal
192 / 384 GB
4.017
0
c8g.metal-48xl
192 / 384 GB
5.388
0
c7a.metal-48xl
192 / 384 GB
5.888
0
(datafusion-partitioned shows the same shape: 0.025–0.097 on the small boxes, 4.9–7.0 on the metals.)
The QPS tracks RAM, not core count — the metric on the sub-RAM machines is reporting how fast the box can page, not how fast the engine runs queries. err=0 everywhere, so nothing crashed in the automated runs; the processes just crawled through swap.
Why the same instance type gives different outcomes
cloud-init.sh.in adds a 16 GB swapfile and installs earlyoom on launch. In the automated fleet run a c6a.4xlarge therefore thrashes through swap and finishes at ~0.11 qps. A contributor running ./benchmark.sh on a bare c6a.4xlarge (no swap) gets the kernel OOM-killing datafusion-cli outright (see #943). So a published number depends on the presence of swap in the runner environment — a reproducibility problem independent of any one engine.
Scope
The same dynamic applies to every BENCH_DURABLE=no single-process / in-process system where N independent processes oversubscribe RAM — e.g. duckdb (and dataframe/memory variants), sqlite, hyper, chdb, pandas, polars-dataframe, daft, sirius. Any of these on a sub-RAM machine will show the same artifact.
Add a "fit in RAM" guard analogous to the partial-load >= 5 GB check: if the concurrent test's working set exceeds available RAM, record null instead of a thrashing number.
Pin the environment: document/encode whether swap is part of the canonical setup so the same instance type reproduces the same result. (Weakest on its own — a number that only completes because of swap shouldn't be published regardless.)
Recommendation: option 1 or 2 for the single-process class (subsumes #943 into a consistent, class-wide fix), plus nulling the existing swap-artifact concurrent_qps values already checked in for these systems on the RAM-constrained machines.
Summary
The
concurrent_qpstest introduced in #860 produces misleading, non-reproducible numbers for single-process / in-process engines (embedded CLIs and dataframe-server systems). For these systems it spawns N independent full-machine processes that collectively oversubscribe RAM, so on memory-constrained instances the metric measures swap-thrashing throughput, not query throughput — and whether it completes at all depends on the runner environment rather than the engine.This surfaced in #943 (DataFusion), but the root cause is in
lib/benchmark-common.sh(bench_concurrent_qps) and affects a whole class of systems.Evidence
bench_concurrent_qpslaunchesBENCH_CONCURRENT_CONNECTIONS(default 10) workers forBENCH_CONCURRENT_DURATION(default 600s). ForBENCH_DURABLE=noengines likedatafusion-cli, each worker's./queryspawns a fresh process that assumes it owns the whole machine. 10 × ~7 GB RSS ≈ 70 GB against 32 GB RAM.Checked-in DataFusion (single) results,
concurrent_qpsvs. machine RAM:(
datafusion-partitionedshows the same shape: 0.025–0.097 on the small boxes, 4.9–7.0 on the metals.)The QPS tracks RAM, not core count — the metric on the sub-RAM machines is reporting how fast the box can page, not how fast the engine runs queries.
err=0everywhere, so nothing crashed in the automated runs; the processes just crawled through swap.Why the same instance type gives different outcomes
cloud-init.sh.inadds a 16 GB swapfile and installs earlyoom on launch. In the automated fleet run a c6a.4xlarge therefore thrashes through swap and finishes at ~0.11 qps. A contributor running./benchmark.shon a bare c6a.4xlarge (no swap) gets the kernel OOM-killingdatafusion-clioutright (see #943). So a published number depends on the presence of swap in the runner environment — a reproducibility problem independent of any one engine.Scope
The same dynamic applies to every
BENCH_DURABLE=nosingle-process / in-process system where N independent processes oversubscribe RAM — e.g. duckdb (and dataframe/memory variants), sqlite, hyper, chdb, pandas, polars-dataframe, daft, sirius. Any of these on a sub-RAM machine will show the same artifact.Options
concurrent_qpsfor the whole single-process / in-process class. fix(datafusion): Update docs, skip concurrent_qps, Skip re-downloading partitioned files #943 does this for DataFusion only (BENCH_CONCURRENT_DURATION=0); make it class-wide so the dashboard isn't uneven (one embedded engine reportingnullwhile peers report swap artifacts).>= 5 GBcheck: if the concurrent test's working set exceeds available RAM, recordnullinstead of a thrashing number.Recommendation: option 1 or 2 for the single-process class (subsumes #943 into a consistent, class-wide fix), plus nulling the existing swap-artifact
concurrent_qpsvalues already checked in for these systems on the RAM-constrained machines.Related