Skip to content

Add --gzip-large-csvs flag to compress large result CSVs#117

Merged
arght merged 3 commits into
masterfrom
feat/outputs-gzip-large-csvs
May 25, 2026
Merged

Add --gzip-large-csvs flag to compress large result CSVs#117
arght merged 3 commits into
masterfrom
feat/outputs-gzip-large-csvs

Conversation

@erikfilias
Copy link
Copy Markdown
Contributor

Summary

  • New opt-in CLI flag --gzip-large-csvs (+ --gzip-threshold-mb,
    default 5) for openTEPES_Main.
  • Post-write pass in openTEPES_run rewrites oT_Result_*.csv files
    at or above the threshold as .csv.gz. Pandas reads the result
    natively; downstream tooling unchanged.
  • Sentinel JSON gains gzip_threshold_mb, gzip_files,
    gzip_mb_saved for batch monitoring.

Behaviour

  • Default off — bit-for-bit identical to current output.
  • Threshold is uncompressed size on disk, checked after each
    writer has finished. Pure rename pass, no chained-DataFrame churn.

Test plan

  • 9n case, full output, flag unset → identical to pre-change
    baseline (sentinel reports gzip_files: 0).
  • 9n case, --gzip-large-csvs at default 5 MB → 2/87 CSVs
    compressed (~6x shrink); pandas read-back parity passes
    (shape, .equals()).

…hold

  After all *Results() writers finish but before the run-status sentinel
  JSON is written, scan the output directory and rewrite every
  oT_Result_*.csv whose uncompressed size is at least --gzip-threshold-mb
  (default 5 MB) as .csv.gz. Pandas reads .csv.gz transparently with
  compression='infer', so downstream tooling needs no change.

  Opt-in: default behaviour is bit-for-bit identical to before. The
  sentinel JSON gains three new fields (gzip_threshold_mb, gzip_files,
  gzip_mb_saved) reporting what the pass did.

  Verified on the bundled 9n case: 2 of 87 CSVs cleared the 5 MB
  threshold and shrank ~6x (6.46 -> 1.05 MB, 6.98 -> 1.11 MB); pandas
  read-back parity matches shape and frame equality with the plain-CSV
  baseline.
  EOF
@erikfilias erikfilias requested a review from arght May 25, 2026 14:26
@erikfilias erikfilias self-assigned this May 25, 2026
@erikfilias erikfilias added the enhancement New feature or request label May 25, 2026
  Changes:
  - Drop --gzip-threshold-mb; add --gzip-patterns (comma-separated
    name-prefixes, applied after the leading 'oT_Result_').
  - Default pattern set: Generation, Consumption, Balance, MarketResults,
    Network — exposed as openTEPES.DEFAULT_GZIP_PATTERNS for reuse.
  - Help text now flags the Excel caveat explicitly (.csv.gz cannot be
    opened directly in Excel).
  - Sentinel JSON: gzip_threshold_mb -> gzip_patterns.

  Verified on bundled 9n: under default patterns, 49 of 87 oT_Result_*.csv
  files compress, saving 31 MB (56 MB -> 25 MB total). Read-back parity
  passes on the largest tables (shape + frame equality).
  EOF
@arght arght merged commit 4e7da4b into master May 25, 2026
10 checks passed
@erikfilias erikfilias deleted the feat/outputs-gzip-large-csvs branch May 25, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants