test(scenarios): reproducible test scenarios harness#574
Merged
Conversation
- 5 canonical multi-agent scenarios as seed.sql + inputs.jsonl + expected.json - In-process harness with vi.useFakeTimers + path normalization - pnpm scenarios / scenarios:filter / scenarios:explain / scenarios:record - 2 harness self-tests (fails-closed on missing expected, clear diff on mismatch) - Separate scenarios CI job on Node 20
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
tests/scenarios/— a reproducible, in-process scenarios harness for canonical multi-agent situations, with deterministic clocks and diffable plaintext expected-state files.seed.sql+inputs.jsonl+expected.json01-claim-before-edit— Codex pre_tool_use auto-claims target02-cross-runtime-handoff— Codex relays out, Claude accepts, claim flips03-stale-claim-sweep— TTL expiry triggersrelease_expired_quota04-plan-claim-adoption— Queen sub-task adopted by Codex05-path-mismatch-reclaim— Pre_tool_use re-claims after mismatchvi.useFakeTimers()+vi.setSystemTime(BASE_TS + at_ms)+ path normalizationpnpm scenarios,scenarios:filter,scenarios:explain,scenarios:recordscenariosCI job on Node 20Closes
⏳ Reproducible test fixture set under tests/scenarios/under README §v0.x "Multi-runtime confidence".OpenSpec
openspec/changes/scenarios-harness-2026-05-16/CHANGE.mdTest plan
pnpm scenarios— 7/7 pass (5 scenarios + 2 self-tests, 1.95s)pnpm scenarios:filter 03-stale-claim-sweep— 1 pass, 6 skippnpm scenarios:explain 02-cross-runtime-handoff— prints timelinepnpm scenarios:record <slug>— confirmed regenerates expected.jsonpnpm build— cleanNotes
.mtsfor harness internals (tsx CJS-fallback workaround for@colony/compressESM-only exports)taskenvelope kind beyond the design'slifecycle|mcp|ticktriplet to driveTaskThreaddirectly⏳ → ✅— revert if you prefer to land that separatelyMerge order
Independent of #573 (bridge replay). Safe to merge second.
🤖 Generated with Claude Code