Skip to content

Fix #933 — bound compaction memory so wide-row tables don't OOM#942

Merged
erikdarlingdata merged 1 commit intodevfrom
feature/933-compaction-memory-tuning
May 7, 2026
Merged

Fix #933 — bound compaction memory so wide-row tables don't OOM#942
erikdarlingdata merged 1 commit intodevfrom
feature/933-compaction-memory-tuning

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

  • Lite: fix compaction OOM by setting DuckDB temp_directory (#933) #935 added temp_directory so DuckDB could spill during compaction, but on wider workloads the working set still blew past the 4 GB cap before spill caught up (reporter saw OOM at 3.7 GiB compacting 15 query_snapshots files).
  • Three knobs combined to feed that: memory_limit = 4GB was too high (DuckDB held off spilling), threads defaulted to N cores (per-thread row-group buffers multiplied), and ROW_GROUP_SIZE 122880 buffered up to 122k wide-VARCHAR rows per group.
  • Drop memory_limit to 1GB, cap threads = 2, shrink ROW_GROUP_SIZE to 8192. Memory now plateaus instead of growing with row count.

Fixes #933

Repro tool

tools/CompactionRepro — standalone .NET console app that splits a real monthly parquet file into N per-cycle-shaped chunks and runs the same pair-merge logic with the tuning knobs exposed on the command line. Useful for validating future changes to compaction.

Validation

On a real local archive (202604_query_stats.parquet, 1.7M rows, ~70 MB):

Setting Peak Working Set Wall Time Output
OLD (4GB / default threads / 122880) 1236 MB 12.0 s 68.3 MB
NEW (1GB / 2 threads / 8192) 166 MB 15.7 s 77.6 MB

87% peak memory reduction. 31% slower wall time. Output 14% larger (smaller row groups → smaller compression dictionaries — acceptable trade for not crashing).

Test plan

  • Lite builds clean (0 errors)
  • Repro tool reproduces under OLD settings, succeeds under NEW settings on real archive data
  • Reporter validates against their query_snapshots workload in next nightly

🤖 Generated with Claude Code

#935 added temp_directory so DuckDB could spill, but on wider workloads
the working set still blew past the 4 GB cap before spill caught up
(reporter saw OOM at 3.7 GiB compacting 15 query_snapshots files).
Three knobs combined to feed that:

- memory_limit = 4 GB was too high — DuckDB held off spilling until late
- threads defaulted to N cores, multiplying per-thread row-group buffers
- ROW_GROUP_SIZE 122880 buffered up to 122k wide-VARCHAR rows per group

Drop memory_limit to 1 GB, cap threads to 2, and shrink ROW_GROUP_SIZE
to 8192. On 1.7 M rows of real query_stats data this drops peak working
set from 1236 MB → 166 MB (87% reduction) at a 31% wall-time cost.
Memory now plateaus instead of growing with row count, which is the
load-bearing change for issue #933.

Adds tools/CompactionRepro — a standalone reproducer that splits a real
monthly parquet file into N per-cycle-shaped chunks and runs the same
pair-merge logic with the tuning knobs exposed on the command line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 46dd1e5 into dev May 7, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant