Skip to content

feat(check): opt-in session-trace archive for multi check#150

Open
RobbieMcKinstry wants to merge 1 commit into
trunkfrom
check-session-trace-archive
Open

feat(check): opt-in session-trace archive for multi check#150
RobbieMcKinstry wants to merge 1 commit into
trunkfrom
check-session-trace-archive

Conversation

@RobbieMcKinstry

@RobbieMcKinstry RobbieMcKinstry commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What

Adds opt-in session-trace capture to multi check, bundled into a single, in-memory .tar.gz for debugging failed (and all other) agent runs.

--trace-archive <PATH> — off by default; also settable as MULTI_CHECKS_TRACE_ARCHIVE or [checks].trace_archive. When set, every check execution's agent session is captured (not just failures).

Why on_event, not cersei's .memory() persistence

The obvious approach — attach a Memory backend so cersei writes its transcript — would miss exactly the runs worth debugging. cersei flushes memory.store() only on the agentic loop's normal exit, but CerseiExecutor cancels its agent the instant a verdict lands and wraps the run in a drop-on-timeout — so successful (cancelled), timed-out, and errored runs all skip the store. Capture is instead via Agent::on_event, which emit() fires synchronously for every event before those early returns, so the trace survives cancellation/timeout/error.

How it fits together

  • Executor (executor/trace.rs): a coalescing TraceRecorder (text/thinking deltas joined into per-turn blocks; tool results bounded) serializes each execution into a self-contained NDJSON doc — header line, records, outcome footer — returned in AgentOutcome::trace_jsonl.
  • Execution actor: pushes each attempt (retries included) into a shared TraceCollector before signalling completion downstream, so retry traces are captured and collection is race-free with run finalization.
  • run (trace_archive.rs): builds the .tar.gz with pure-Rust tar + flate2 (miniz_oxide backend) — no temp dir, no subprocess.

Trace archiving is best-effort: a failure logs but never fails an otherwise-successful check run.

Archive layout

01-requirement-slug/
  check-title-slug.attempt-1.jsonl
  check-title-slug.attempt-2.jsonl   # a retry
02-another-requirement/
  other-check.attempt-1.jsonl

One directory per requirement, one file per check execution, numbered by retry.

Testing

  • cargo make clippy — clean (incl. --all-targets).
  • cargo make test — 106 tests pass, including 4 new: TraceRecorder coalescing + NDJSON framing + tool-result truncation, and a tar.gz round-trip asserting exact archive paths (incl. same-title collision disambiguation).
  • Verified --trace-archive appears in multi check --help.
  • Not exercised end-to-end here (needs an Anthropic key + a CHECKS.md): a live agent run producing a real archive.

Stacked on #149.

🤖 Generated with Claude Code

Base automatically changed from fix-presenter-tui-corruption to trunk July 1, 2026 22:18
Add `--trace-archive <PATH>` (off by default; also `MULTI_CHECKS_TRACE_ARCHIVE`
/ `[checks].trace_archive`) to capture every check agent session and bundle
them into a single in-memory `.tar.gz` for debugging failed runs. Captures all
executions, not just failures.

Capture is via `Agent::on_event`, not cersei's `.memory()` session
persistence: the executor cancels its agent the instant a verdict lands and
wraps the run in a drop-on-timeout, and cersei only flushes its transcript on
the agentic loop's normal exit — so a memory-backed transcript would miss
exactly the cancelled/timed-out/errored runs this feature exists to debug.
`emit()` fires the event handler for every event before those early returns,
so an event-sourced trace survives them.

- executor: a coalescing `TraceRecorder` (text/thinking deltas joined into
  per-turn blocks, tool results bounded) serializes each execution into a
  self-contained NDJSON doc — header, records, outcome footer — returned in
  `AgentOutcome::trace_jsonl`.
- execution actor: pushes each attempt (retries included) into a shared
  `TraceCollector` before signalling completion downstream, so retry traces
  are captured and collection is race-free with run finalization.
- run: builds the `.tar.gz` with pure-Rust `tar` + `flate2` (no temp dir, no
  subprocess), laid out as `{NN}-{requirement}/{check}.attempt-{k}.jsonl` —
  one directory per requirement, one file per check execution.

Trace archiving is best-effort: a failure logs but never fails an otherwise
successful check run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant