feat(check): opt-in session-trace archive for multi check#150
Open
RobbieMcKinstry wants to merge 1 commit into
Open
feat(check): opt-in session-trace archive for multi check#150RobbieMcKinstry wants to merge 1 commit into
multi check#150RobbieMcKinstry wants to merge 1 commit into
Conversation
Contributor
Author
Add `--trace-archive <PATH>` (off by default; also `MULTI_CHECKS_TRACE_ARCHIVE`
/ `[checks].trace_archive`) to capture every check agent session and bundle
them into a single in-memory `.tar.gz` for debugging failed runs. Captures all
executions, not just failures.
Capture is via `Agent::on_event`, not cersei's `.memory()` session
persistence: the executor cancels its agent the instant a verdict lands and
wraps the run in a drop-on-timeout, and cersei only flushes its transcript on
the agentic loop's normal exit — so a memory-backed transcript would miss
exactly the cancelled/timed-out/errored runs this feature exists to debug.
`emit()` fires the event handler for every event before those early returns,
so an event-sourced trace survives them.
- executor: a coalescing `TraceRecorder` (text/thinking deltas joined into
per-turn blocks, tool results bounded) serializes each execution into a
self-contained NDJSON doc — header, records, outcome footer — returned in
`AgentOutcome::trace_jsonl`.
- execution actor: pushes each attempt (retries included) into a shared
`TraceCollector` before signalling completion downstream, so retry traces
are captured and collection is race-free with run finalization.
- run: builds the `.tar.gz` with pure-Rust `tar` + `flate2` (no temp dir, no
subprocess), laid out as `{NN}-{requirement}/{check}.attempt-{k}.jsonl` —
one directory per requirement, one file per check execution.
Trace archiving is best-effort: a failure logs but never fails an otherwise
successful check run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
a5cec40 to
6ef5963
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What
Adds opt-in session-trace capture to
multi check, bundled into a single, in-memory.tar.gzfor debugging failed (and all other) agent runs.--trace-archive <PATH>— off by default; also settable asMULTI_CHECKS_TRACE_ARCHIVEor[checks].trace_archive. When set, every check execution's agent session is captured (not just failures).Why
on_event, not cersei's.memory()persistenceThe obvious approach — attach a
Memorybackend so cersei writes its transcript — would miss exactly the runs worth debugging. cersei flushesmemory.store()only on the agentic loop's normal exit, butCerseiExecutorcancels its agent the instant a verdict lands and wraps the run in a drop-on-timeout — so successful (cancelled), timed-out, and errored runs all skip the store. Capture is instead viaAgent::on_event, whichemit()fires synchronously for every event before those early returns, so the trace survives cancellation/timeout/error.How it fits together
executor/trace.rs): a coalescingTraceRecorder(text/thinking deltas joined into per-turn blocks; tool results bounded) serializes each execution into a self-contained NDJSON doc —headerline, records,outcomefooter — returned inAgentOutcome::trace_jsonl.TraceCollectorbefore signalling completion downstream, so retry traces are captured and collection is race-free with run finalization.run(trace_archive.rs): builds the.tar.gzwith pure-Rusttar+flate2(miniz_oxide backend) — no temp dir, no subprocess.Trace archiving is best-effort: a failure logs but never fails an otherwise-successful check run.
Archive layout
One directory per requirement, one file per check execution, numbered by retry.
Testing
cargo make clippy— clean (incl.--all-targets).cargo make test— 106 tests pass, including 4 new:TraceRecordercoalescing + NDJSON framing + tool-result truncation, and a tar.gz round-trip asserting exact archive paths (incl. same-title collision disambiguation).--trace-archiveappears inmulti check --help.CHECKS.md): a live agent run producing a real archive.Stacked on #149.
🤖 Generated with Claude Code