Skip to content

ci: #796 — linear-time normalize_output + per-test output cap#825

Merged
proggeramlug merged 1 commit into
mainfrom
worktree-fix-796-normalize-output
May 16, 2026
Merged

ci: #796 — linear-time normalize_output + per-test output cap#825
proggeramlug merged 1 commit into
mainfrom
worktree-fix-796-normalize-output

Conversation

@proggeramlug
Copy link
Copy Markdown
Contributor

Summary

Two distinct fixes for the pathological-output DOS that burned 3 hours of CI on #712:

  1. Linear-time normalize_output — replaces the bash while IFS= read; decoded+=\"\$line\"\\n loop (O(n²) on string concat) with a single python3 filter that streams stdin once. <Buffer XX XX …> lines decode via bytes.fromhex(...).decode(\"utf-8\", errors=\"replace\"); every other line passes through. python3 is preinstalled on every runner image we target.
  2. Per-test output cap — new cap_output awk filter caps at MAX_OUTPUT_LINES (default 50000) with a TRUNCATED at N lines (total: M) marker on overflow so the cap is visible, not silent. Routes node/perry output through a tempfile so the cap fires before bash ever holds more than 50k lines (and to make node's actual exit code recoverable — PIPESTATUS doesn't propagate across \$(…)).

Benchmark

Synthetic input, 50k lines:

  • Old bash while read; decoded+= loop: ~1s.
  • New python3 pipeline: sub-second.

At the 5.7M-line pathological point that triggered #712, the old code timed out at 3 hours; the new path completes in seconds and surfaces the TRUNCATED marker at 50k lines.

Test plan

  • bash -n run_parity_tests.sh — syntax check passes.
  • cap_output unit-test with input shorter than / equal to / longer than MAX_OUTPUT_LINES.
  • python3 buffer-decode unit-test on mixed <Buffer XX …> + plain lines.
  • Full normalize_output round-trip on Buffer + true/false + float-precision + plain lines matches pre-fix behavior.

Robustness criterion from the issue: "the job is one bug away from another 3-hour timeout" — both legs of that (O(n²) walk + uncapped bash variable) are now closed.

Closes #796.

`test_parity_timers_promises` (root cause #712, now closed) emitted 5.7M
identical lines pre-fix and burned ~3 hours on the runner before being
killed. Two distinct issues:

1. The buffer-decode pass in `normalize_output` was a bash `while IFS=
   read` loop with `decoded+="$line"\n` per iteration. That's O(n²) on
   the input length — every concat copies the whole accumulated string.
   At 5.7M lines, the inner copy walks ~16T bytes total.
2. Bash command substitution captured the full pathological output into
   a single variable before normalization even started, blowing up
   memory before the loop fired.

Fixes:

- Replace the bash decode loop with a single `python3` filter that
  walks stdin once, decoding `<Buffer XX XX …>` lines via
  `bytes.fromhex(...).decode("utf-8", errors="replace")` and passing
  every other line through. Linear time, no growing bash string.
  python3 is preinstalled on every CI runner image we target.
- New `cap_output` awk filter caps output at `MAX_OUTPUT_LINES`
  (default 50000) and emits a `TRUNCATED at N lines (total: M)`
  marker on overflow so the cap is visible, not silent. Override
  via `MAX_OUTPUT_LINES=...` env when investigating a specific test.
- Route node/perry output through a tempfile + `cap_output` instead of
  capturing directly into a bash variable, so the cap fires before the
  bash side ever holds more than 50k lines. The tempfile detour is
  also what makes the actual node/perry exit code recoverable —
  PIPESTATUS doesn't propagate across `$(...)`.

Local benchmark (5k → 50k synthetic lines): old bash loop took 1s at
50k; new python3 pipeline stays sub-second. At the 5.7M-line
pathological point, the old code was 3 hours; the new path completes
in seconds and emits the TRUNCATED marker at 50k.
@proggeramlug proggeramlug merged commit 28d4c49 into main May 16, 2026
9 checks passed
@proggeramlug proggeramlug deleted the worktree-fix-796-normalize-output branch May 16, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: stop truncating gap-suite output, add per-test summary

1 participant