Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-15
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
## Why

`claude-supervisor.sh`'s asking/blocked classifier had two structural
weaknesses that produced classifier cost and precision problems:

1. The "is this a real interactive cursor?" gate (`last_line_is_prompt`)
accepted any line ending in `:` or `?` as a waiting cursor. Worker
tails routinely end with `Reading file:` mid-work, so an old
`Continue?` or `Should I` 40 lines back in the 80-line capture
was enough to call sonnet/medium and paste an answer codex never
asked for. Pure false-positive cost.
2. `is_busy` grep'd the whole 80-line window for `Working (` or
`esc to interrupt`. A pane that finished `Working (12s)` 40 lines
ago and is now sitting at a fresh `[Y/n] ` cursor read as "busy"
and never reached the ask path — real asks slipped past the
supervisor entirely.

Additionally, several real codex-fleet blockers (merge conflict,
uncommitted-changes, fatal git, ssh permission-denied, MCP server
missing, bad GH credentials, BLOCKED: prefix) were missing from
`BLOCKED_PATTERNS`, so panes parked on these states classified as
`quiet` and the supervisor stayed silent.

## What Changes

- Extract the classifier (BUSY/ASK/BLOCKED patterns + `is_busy`,
`is_asking`, `is_blocked`, `last_line_is_prompt`, `classify_tail`,
`tail_hash`) into a pure-bash library at
`scripts/codex-fleet/lib/claude-supervisor-classifier.sh` so the
daemon and a replay harness share one implementation.
- Tighten `last_line_is_prompt`: drop the bare `[?:][[:space:]]*$`
rule. Bare-`?` is only admitted when the line carries a known
question lead-word. `:$` no longer counts.
- Tighten `is_busy`: anchor BUSY_PATTERNS to the LAST non-empty line
only. codex rewrites the `Working (…)` footer in place; if the
worker is busy it's at the bottom. Stale `Working (` in scrollback
no longer masks a fresh interactive prompt.
- Tighten `is_asking`: scope ASK_PATTERN matching to the recent N
non-empty lines (default 8 via `CLAUDE_SUPERVISOR_RECENT_LINES`)
AND require `last_line_is_prompt` to pass.
- Extend `BLOCKED_PATTERNS` with the codex-fleet-specific stuck
states listed under "Why".
- Add a fixture-driven replay harness at
`scripts/codex-fleet/test/test-claude-supervisor-classifier.sh`
with 24 pane-capture fixtures covering the false-positive,
missed-block, and previously-correct cases. Filename prefix
encodes the expected classification.

## Impact

- Cost: fewer ASK false positives → fewer sonnet/medium calls per
tick. Sonnet stays the workhorse for the remaining real asks.
Opus calls are gated on the (now more accurate) BLOCKED set;
strike guard caps per-pane spend.
- Behavior: panes the supervisor used to ignore (real ask under
stale `Working (`, merge-conflict, MCP-missing, bad GH creds)
now classify correctly.
- Risk: the harness pins the precision/recall trade-off — any
future loosening of the gates regresses the harness.
- Surfaces touched: `claude-supervisor.sh` replaces its inline
classifier with `source`; new lib; new test + fixtures. No
daemon-state files, no plan-watcher changes, no cap-swap-daemon
changes.
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
## ADDED Requirements

### Requirement: classifier lives in a pure-bash library

The system SHALL keep the classifier (BUSY/ASK/BLOCKED pattern
arrays, `is_busy`, `is_asking`, `is_blocked`, `last_line_is_prompt`,
`classify_tail`, and `tail_hash`) at
`scripts/codex-fleet/lib/claude-supervisor-classifier.sh`. The lib
SHALL be safe to `source` with no side effects (no tmux calls, no
claude calls, no file writes at source time). `claude-supervisor.sh`
SHALL source this lib rather than inlining the classifier.

#### Scenario: library is sourceable with no side effects
- **WHEN** `scripts/codex-fleet/lib/claude-supervisor-classifier.sh`
is sourced from a bash script
- **THEN** no tmux, claude, or filesystem-mutating command runs
- **AND** `classify_tail`, `is_busy`, `is_asking`, `is_blocked`,
`last_line_is_prompt`, and `tail_hash` are defined

### Requirement: classify_tail returns one of four labels

`classify_tail "<tail>"` SHALL echo exactly one of
`busy`, `asking`, `blocked`, `quiet`. `asking` SHALL outrank `blocked`
when both conditions match — a pane that mentioned a stale blocker
but is now showing an interactive menu wants an answer to the menu.

#### Scenario: asking outranks blocked
- **WHEN** the recent tail contains both a BLOCKED_PATTERN (e.g.,
stale-claim) and an ASK_PATTERN (e.g., a numbered menu with a
`(recommended)` option) AND `last_line_is_prompt` accepts the
bottom line
- **THEN** `classify_tail` echoes `asking`

### Requirement: busy is anchored to the last non-empty line

`is_busy` SHALL match BUSY_PATTERNS only against the last non-empty
line of the ANSI-stripped tail. A stale `Working (` or
`esc to interrupt` earlier in scrollback SHALL NOT mask a fresh
interactive cursor at the bottom.

#### Scenario: stale Working in scrollback does not mask a fresh ask
- **WHEN** the tail contains `Working (` nine lines from the bottom
AND the bottom line is a bare prompt sigil (`❯`) under a numbered
menu with `(recommended)` option
- **THEN** `classify_tail` echoes `asking`, not `busy`

### Requirement: last_line_is_prompt rejects bare-colon endings

`last_line_is_prompt` SHALL NOT accept a bare trailing `:` as a
waiting cursor. A bare trailing `?` SHALL be accepted only when the
same line carries a known question lead-word (Continue, Approve,
Proceed, Confirm, Apply, Should I, Do you want, Would you like,
Which option/approach/one, Choose, Select, Pick, Need clarification,
Need more …, Please clarify/confirm/choose/specify).

#### Scenario: narrative status line ending in ":" does not trigger asking
- **WHEN** the bottom line of the tail is `Reading file: …:` AND an
older `Continue?` appears earlier in scrollback
- **THEN** `classify_tail` echoes `quiet`

### Requirement: BLOCKED_PATTERNS cover codex-fleet stuck states

`BLOCKED_PATTERNS` SHALL match each of: git merge conflict
(`CONFLICT (content`), `error: uncommitted changes`, `fatal: ` git
errors, `Permission denied (publickey)`, `gh: command not found`,
`Bad credentials`, `MCP server <name> (not found|missing|unavailable)`,
`429 Too Many Requests`, and the canonical `BLOCKED:` prefix —
in addition to the prior set (PLAN_SUBTASK_NOT_FOUND, stale-claim,
told-not-to-rescue, less-than-5%-limit, etc.).

#### Scenario: missing MCP server is classified as blocked
- **WHEN** the tail contains `Error: MCP server colony not found in
the registered servers`
- **THEN** `classify_tail` echoes `blocked`

### Requirement: replay harness pins classifier behavior

The system SHALL ship a replay harness at
`scripts/codex-fleet/test/test-claude-supervisor-classifier.sh`
that discovers every `*.txt` fixture under
`scripts/codex-fleet/test/fixtures/claude-supervisor-classifier/`,
parses the expected label from the filename prefix
(`<label>__*.txt`), and asserts `classify_tail` returns that label.
The harness SHALL exit non-zero if any fixture mismatches.

#### Scenario: harness passes against the current classifier
- **WHEN** the harness is run with the lib at its current head
- **THEN** every fixture's `actual` label equals its `expected` label
- **AND** the harness exits 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## Definition of Done

This change is complete only when **all** of the following are true:

- Every checkbox below is checked.
- The agent branch reaches `MERGED` state on `origin` and the PR URL + state are recorded in the completion handoff.
- If any step blocks, append a `BLOCKED:` line under section 4 and stop.

## Handoff

- Handoff: change=`agent-claude-claude-supervisor-classifier-audit-2026-05-15-22-25`; branch=`agent/claude/claude-supervisor-classifier-audit-2026-05-15-22-25`; scope=`tighten the asking/blocked classifier inside claude-supervisor.sh, extract it to a pure lib, add fixture-driven replay harness`; evidence=`scripts/codex-fleet/test/test-claude-supervisor-classifier.sh`.

## 1. Specification

- [x] 1.1 Finalize proposal scope and acceptance criteria.
- [x] 1.2 Define normative requirements in `specs/claude-supervisor-classifier-audit/spec.md`.

## 2. Implementation

- [x] 2.1 Extract classifier into `scripts/codex-fleet/lib/claude-supervisor-classifier.sh`.
- [x] 2.2 Tighten `last_line_is_prompt` (drop bare `:$`; gate `?$` on a question lead-word).
- [x] 2.3 Anchor `is_busy` to the last non-empty line.
- [x] 2.4 Scope `is_asking` to recent N lines AND require `last_line_is_prompt`.
- [x] 2.5 Extend `BLOCKED_PATTERNS` with codex-fleet stuck states.
- [x] 2.6 Patch `claude-supervisor.sh` to `source` the new lib.
- [x] 2.7 Add replay harness + 24 fixtures under `scripts/codex-fleet/test/`.

## 3. Verification

- [x] 3.1 `bash -n` clean on lib, harness, daemon.
- [x] 3.2 `bash scripts/codex-fleet/test/test-claude-supervisor-classifier.sh` — 24 pass, 0 fail.
- [x] 3.3 `claude-supervisor.sh --once --dry-run` runs without tmux (no-op tick, rc=0).

## 4. Cleanup (mandatory; run before claiming completion)

- [ ] 4.1 Run the cleanup pipeline: `gx branch finish --branch agent/claude/claude-supervisor-classifier-audit-2026-05-15-22-25 --base main --via-pr --wait-for-merge --cleanup`.
- [ ] 4.2 Record the PR URL and final merge state (`MERGED`) in the completion handoff.
- [ ] 4.3 Confirm the sandbox worktree is gone.
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Plan Workspace: agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25

This folder stores durable planning artifacts before implementation changes.

## Shared files
- `summary.md`
- `checkpoints.md`
- `phases.md`
- `open-questions.md`
- `coordinator-prompt.md`
- `kickoff-prompts.md`

## Role folders
- `planner/`
- `architect/`
- `critic/`
- `executor/`
- `writer/`
- `verifier/`

When Codex or Claude hits an unresolved question that should survive chat, add it to `open-questions.md` as an unchecked `- [ ]` item.

Each role folder contains OpenSpec-style artifacts:
- `.openspec.yaml`
- `prompt.md` (copy/paste role prompt)
- `proposal.md`
- `tasks.md` (Spec / Tests / Implementation / Checkpoints checklists)
- `specs/<role>/spec.md`
Planner also gets `plan.md`; executor also gets `checkpoints.md`.
Planner plans should follow `openspec/plan/PLANS.md`.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
schema: 1
plan: agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25
role: architect
status: draft
artifacts:
prompt: prompt.md
proposal: proposal.md
tasks: tasks.md
spec: specs/architect/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# architect

Role workspace for `architect`.

Default artifacts:
- `.openspec.yaml`
- `prompt.md`
- `proposal.md`
- `tasks.md`
- `specs/<role>/spec.md`

Use this folder for role notes, artifacts, and status updates.
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# architect Prompt

You are the `architect` role for OpenSpec plan `agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25`.

## Objective

Complete only this role's assigned checklist and leave compact evidence for the coordinator.

## Source of truth

- `openspec/plan/agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25/summary.md`
- `openspec/plan/agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25/checkpoints.md`
- `openspec/plan/agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25/open-questions.md`
- `openspec/plan/agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25/architect/tasks.md`
- `openspec/plan/agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25/architect/proposal.md`

## Before edits

1. Confirm branch/worktree with `git status --short --branch`.
2. Claim every touched file before editing:
- Prefer Colony `task_claim_file` when an active task exists.
- Otherwise run `gx locks claim --branch <agent-branch> <file...>`.
3. Stay inside assigned files/modules; coordinate before touching shared paths.

## Working rules

- Update `architect/tasks.md` as each item completes.
- Record durable unresolved questions in `open-questions.md`.
- Keep handoffs short: files changed, behavior touched, verification, risks.
- Do not revert another agent's edits.

## Cleanup

Only the owner/finalizer lane runs `gx branch finish --branch <agent-branch> --base dev --via-pr --wait-for-merge --cleanup`. If blocked, append `BLOCKED:` with branch, task, blocker, next, evidence.
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Proposal: architect (agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25)

## Why

Summarize why this role's work is required for plan `agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25`.

## What Changes

- [ ] List the planned role-specific changes

## Impact

- Scope:
- Risks:
- Dependencies:
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Capability Spec: architect

## ADDED Requirements

### Requirement: architect responsibilities for `agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25`
This role MUST define and deliver its scoped outputs with evidence.

#### Scenario: Role executes assigned scope
- **WHEN** the role begins execution for `agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25`
- **THEN** it follows `tasks.md` and records evidence for completion
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# architect tasks

## 1. Spec

- [ ] 1.1 Define ownership boundaries, interfaces, and artifact responsibilities for `agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25`
- [ ] 1.2 Validate architecture constraints and non-functional requirements coverage

## 2. Tests

- [ ] 2.1 Define architectural verification checkpoints (integration boundaries, failure modes, compatibility)
- [ ] 2.2 Validate that acceptance criteria map to concrete architecture decisions

## 3. Implementation

- [ ] 3.1 Review plan for strongest antithesis/tradeoff tensions
- [ ] 3.2 Propose synthesis path and guardrails for implementation teams
- [ ] 3.3 Record architecture sign-off notes for downstream execution

## 4. Checkpoints

- [ ] [A1] READY - Architecture review checkpoint

## 5. Collaboration

- [ ] 5.1 Owner recorded this lane before edits.
- [ ] 5.2 Record joined agents / handoffs, or mark `N/A` when solo.
- [ ] 5.3 Record unresolved plan questions in `../open-questions.md`, or mark `N/A` when none.

## 6. Cleanup

- [ ] 6.1 If this lane owns finalization, run `gx branch finish --branch <agent-branch> --base dev --via-pr --wait-for-merge --cleanup`.
- [ ] 6.2 Record PR URL + final `MERGED` state in the handoff.
- [ ] 6.3 Confirm sandbox cleanup (`git worktree list`, `git branch -a`) or append `BLOCKED:` and stop.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Plan Checkpoints: agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25

Chronological checkpoint log for all roles.

Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Master Coordinator Prompt

You are the coordinator for plan `agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25`.

## Objective

Drive this plan from draft to execution-ready status with strict checkpoint discipline and no scope drift.

## Source-of-truth artifacts

- `openspec/plan/agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25/summary.md`
- `openspec/plan/agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25/checkpoints.md`
- `openspec/plan/agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25/open-questions.md`
- `openspec/plan/agent-claude-masterplan-claude-supervisor-classifier-audit-2026-05-15-22-25/planner/plan.md`
- role `prompt.md` files for copy/paste helper startup
- role `tasks.md` files for planner/architect/critic/executor/writer/verifier

## Coordinator responsibilities

1. Keep checkpoints current in each role `tasks.md` and root `checkpoints.md`.
2. Route unresolved questions and branching decisions into `open-questions.md`.
3. Ensure each role has explicit acceptance criteria and verification evidence.
4. Prevent implementation from starting before planning gates are complete.
5. Keep handoffs concise: files changed, behavior touched, verification output, risks.

## Wave-splitting decision (optional)

Create wave prompts in `kickoff-prompts.md` only when at least one applies:

- 3+ independent implementation lanes can run in parallel.
- Runtime cutover/rollback sequencing needs explicit lane ownership.
- Risk is high enough that bounded execution packets reduce coordination mistakes.

If wave splitting is not needed, keep execution under a single owner with normal role checkpoints.

## Exit criteria

- All role checkpoints required for planning are done.
- Execution lanes (if any) have clear ownership boundaries.
- `open-questions.md` captures unresolved decisions that still need answers.
- Verification plan and rollback expectations are explicit and testable.
Loading