feat(check): opt-in session-trace archive for `multi check` by RobbieMcKinstry · Pull Request #150 · wack/multitool

RobbieMcKinstry · 2026-07-01T22:07:24Z

What

Adds opt-in session-trace capture to multi check, bundled into a single, in-memory .tar.gz for debugging failed (and all other) agent runs.

--trace-archive <PATH> — off by default; also settable as MULTI_CHECKS_TRACE_ARCHIVE or [checks].trace_archive. When set, every check execution's agent session is captured (not just failures).

Why `on_event`, not cersei's `.memory()` persistence

The obvious approach — attach a Memory backend so cersei writes its transcript — would miss exactly the runs worth debugging. cersei flushes memory.store() only on the agentic loop's normal exit, but CerseiExecutor cancels its agent the instant a verdict lands and wraps the run in a drop-on-timeout — so successful (cancelled), timed-out, and errored runs all skip the store. Capture is instead via Agent::on_event, which emit() fires synchronously for every event before those early returns, so the trace survives cancellation/timeout/error.

How it fits together

Executor (executor/trace.rs): a coalescing TraceRecorder (text/thinking deltas joined into per-turn blocks; tool results bounded) serializes each execution into a self-contained NDJSON doc — header line, records, outcome footer — returned in AgentOutcome::trace_jsonl.
Execution actor: pushes each attempt (retries included) into a shared TraceCollector before signalling completion downstream, so retry traces are captured and collection is race-free with run finalization.
run (trace_archive.rs): builds the .tar.gz with pure-Rust tar + flate2 (miniz_oxide backend) — no temp dir, no subprocess.

Trace archiving is best-effort: a failure logs but never fails an otherwise-successful check run.

Archive layout

01-requirement-slug/
  check-title-slug.attempt-1.jsonl
  check-title-slug.attempt-2.jsonl   # a retry
02-another-requirement/
  other-check.attempt-1.jsonl

One directory per requirement, one file per check execution, numbered by retry.

Testing

cargo make clippy — clean (incl. --all-targets).
cargo make test — 106 tests pass, including 4 new: TraceRecorder coalescing + NDJSON framing + tool-result truncation, and a tar.gz round-trip asserting exact archive paths (incl. same-title collision disambiguation).
Verified --trace-archive appears in multi check --help.
Not exercised end-to-end here (needs an Anthropic key + a CHECKS.md): a live agent run producing a real archive.

Stacked on #149.

🤖 Generated with Claude Code

RobbieMcKinstry · 2026-07-01T22:07:38Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Add `--trace-archive <PATH>` (off by default; also `MULTI_CHECKS_TRACE_ARCHIVE` / `[checks].trace_archive`) to capture every check agent session and bundle them into a single in-memory `.tar.gz` for debugging failed runs. Captures all executions, not just failures. Capture is via `Agent::on_event`, not cersei's `.memory()` session persistence: the executor cancels its agent the instant a verdict lands and wraps the run in a drop-on-timeout, and cersei only flushes its transcript on the agentic loop's normal exit — so a memory-backed transcript would miss exactly the cancelled/timed-out/errored runs this feature exists to debug. `emit()` fires the event handler for every event before those early returns, so an event-sourced trace survives them. - executor: a coalescing `TraceRecorder` (text/thinking deltas joined into per-turn blocks, tool results bounded) serializes each execution into a self-contained NDJSON doc — header, records, outcome footer — returned in `AgentOutcome::trace_jsonl`. - execution actor: pushes each attempt (retries included) into a shared `TraceCollector` before signalling completion downstream, so retry traces are captured and collection is race-free with run finalization. - run: builds the `.tar.gz` with pure-Rust `tar` + `flate2` (no temp dir, no subprocess), laid out as `{NN}-{requirement}/{check}.attempt-{k}.jsonl` — one directory per requirement, one file per check execution. Trace archiving is best-effort: a failure logs but never fails an otherwise successful check run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

RobbieMcKinstry mentioned this pull request Jul 1, 2026

fix(check): stop the presenter and CLAUDE.md loading from corrupting output #149

Merged

Base automatically changed from fix-presenter-tui-corruption to trunk July 1, 2026 22:18

RobbieMcKinstry force-pushed the check-session-trace-archive branch from a5cec40 to 6ef5963 Compare July 1, 2026 23:33

This was referenced Jul 1, 2026

fix(check): stop multi check from hanging on exit after a run completes #151

Open

fix(check): stop check agents from getting lost outside their sandbox #152

Open

feat(check): retire the thinking-disabled workaround — effort buys extended thinking #153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(check): opt-in session-trace archive for `multi check`#150

feat(check): opt-in session-trace archive for `multi check`#150
RobbieMcKinstry wants to merge 1 commit into
trunkfrom
check-session-trace-archive

RobbieMcKinstry commented Jul 1, 2026 •

edited

Loading

Uh oh!

RobbieMcKinstry commented Jul 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

RobbieMcKinstry commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why on_event, not cersei's .memory() persistence

How it fits together

Archive layout

Testing

Uh oh!

RobbieMcKinstry commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RobbieMcKinstry commented Jul 1, 2026 •

edited

Loading

Why `on_event`, not cersei's `.memory()` persistence

RobbieMcKinstry commented Jul 1, 2026 •

edited

Loading