run_skill: channel_won branch unconditionally SIGKILLs Claude CLI, producing systematic exit_code=-9 + success=true ambiguity

## Observed symptom

- `resolve-review` session for PR #787 returned `exit_code: -9` (SIGKILL) but with a complete, successful result dict.
- Session ID: `46cbdc5b-1631-41db-a356-72c090c6f198`
- `summary.json` recorded simultaneously: `exit_code=-9`, `subtype="success"`, `cli_subtype="success"`, `termination_reason="completed"`, `success=true`, `write_call_count=2`, `write_path_warnings=[]`.
- The result content showed fixes applied (commit `2a6097c` on PR #799), tests passing, threads resolved (2/2), inline replies posted (8/0 failed) — session ran to completion.
- **Orchestrator-facing ambiguity:** pipeline-mode `run_skill` formatter emits `run_skill: OK [success]` immediately followed by `exit_code: -9` with no annotation, creating a cognitive contradiction for the LLM orchestrator. The orchestrator has to parse result content to decide what happened because the exit code says otherwise.

## Scope — systemic, not one-off

5 of the last 500 sessions in `~/.local/share/autoskillit/logs/sessions.jsonl` show the same `(exit_code=-9, success=true)` pattern across five different skills:

| step_name | skill |
|---|---|
| `compose_pr` | `/compose-pr` |
| `fix` | `/resolve-failures` |
| `verify` | `/dry-walkthrough` |
| `implement` | `/implement-worktree-no-merge` |
| `resolve_review` | `/resolve-review` |

All 5 share identical `peak_rss_kb=2176`, `peak_oom_score=666` (which is a **separate** tracer-wrong-PID issue — flagged below for its own issue).

## Root cause

`src/autoskillit/execution/process.py:289-297` — the `channel_won` branch in `run_managed_async`:

```python
else:
    proc_log.debug(
        "kill_decision",
        reason="channel_won",
        channel_a=signals.channel_a_confirmed,
        channel_b=signals.channel_b_status,
    )
    # Channel A or B won; process still alive — kill immediately.
    await async_kill_process_tree(proc.pid)
```

When Channel A (stdout heartbeat, 0.5 s poll for `type=result`) fires before `_watch_process` has observed natural exit (which is ~always, because the Claude CLI takes longer than 500 ms to flush, close fds, and return from `main()`), this branch runs `async_kill_process_tree` immediately. That sends `SIGTERM` → 2 s `psutil.wait_procs` → `SIGKILL`, so `proc.returncode == -9` for every monitored successful session.

**Git blame:** the unconditional kill was inherited from pre-anyio code in commit `f35c87690` (#72, "Anyio concurrency refactor", 2026-03-01). The pre-anyio code had separate `elif channel_a_confirmed:` and `elif channel_b_status == "completion":` branches, both ending in the kill. The refactor collapsed them into a single `else:` preserving semantics. **No commit message explained why immediate kill is required** — it was treated as mechanical preservation, not a deliberate design decision.

## Why the kill exists at all (important constraint)

The Claude CLI does **not** self-exit promptly after writing its final `type=result` output without `CLAUDE_CODE_EXIT_AFTER_STOP_DELAY`. `src/autoskillit/config/defaults.yaml:46` sets `exit_after_stop_delay_ms: 120000` (2 minutes). That 2-minute timer is the CLI's self-exit mechanism. Waiting 120 s per `run_skill` call is prohibitive, so AutoSkillit force-kills the CLI once channels confirm completion. **Any fix must preserve this constraint.**

## Adjudication is already correct — this is NOT a `_compute_success` bug

`src/autoskillit/execution/session.py:579-682` `_compute_success` explicitly handles `COMPLETED + nonzero returncode`. The authoritative comment at `session.py:647-649`:

> "The process was killed by our own `async_kill_process_tree` (signal -15 or -9), so a non-zero returncode is expected and trustworthy when the session envelope says 'success'."

And `session.py:799-801` in `_compute_retry`:

> "Infrastructure killed the process. SIGTERM/SIGKILL produce nonzero returncode by design — do not gate on returncode here."

So `success=True` + `exit_code=-9` is documented, intended behavior. **The ambiguity is NOT in adjudication** — it is in (a) surface presentation in the run_skill formatter and (b) the absence of a natural-exit grace window in the kill path.

## Session 46cbdc5b timeline (UTC)

```
10:18:30.104  start_ts; subprocess spawned
10:18:32.047  queue-operation enqueue (prompt with %%ORDER_UP::8221c391%%)
...           714 s of work; 97 assistant messages; 43 tool-use round-trips
10:29:30.864  Bash tool_use
10:29:33.477  TodoWrite tool_use
10:29:35.508  assistant message stop_reason=end_turn (thinking block only)
10:29:39.333  assistant text message — "resolve-review complete / Status: PASS"
              + %%ORDER_UP::8221c391%% token
10:29:40.009  seq=143 final proc_trace snapshot
  └─ Channel A heartbeat detects %%ORDER_UP:: in stdout within next 0.5 s
  └─ trigger.set(); await trigger.wait() returns
  └─ tg.cancel_scope.cancel() (process.py:249)
  └─ tracing_handle.stop()
  └─ resolve_termination() → COMPLETED + CHANNEL_A
  └─ process.py:289 else-branch → async_kill_process_tree(pid)
  └─ SIGTERM → 2 s wait → SIGKILL → 1 s kernel reap
  └─ stdout_file.close(); stderr_file.close()
  └─ read_temp_output() drains 714 s of PTY output
  └─ SubprocessResult(returncode=-9, termination=COMPLETED,
                      channel_confirmation=CHANNEL_A)
  └─ _build_skill_result → _compute_outcome → _compute_success → True
  └─ SkillResult(success=True, exit_code=-9)
10:30:25.041  end_ts (44.7 s after last snapshot)
              flush_session_log writes summary.json + appends sessions.jsonl
```

## Affected components (file:line)

- `src/autoskillit/execution/process.py:289-297` — unconditional kill; **the primary fix site**
- `src/autoskillit/execution/process.py:269-274` — existing `signals.process_exited` branch; demonstrates correct "no-kill-needed" shape
- `src/autoskillit/execution/_process_kill.py:16-58` — `kill_process_tree` SIGTERM→2s→SIGKILL→1s
- `src/autoskillit/execution/_process_race.py:236-250` — existing `completion_drain_timeout` pattern (precedent for bounded waits)
- `src/autoskillit/execution/_process_monitor.py:25-64` — `_heartbeat` / Channel A 0.5 s stdout poll
- `src/autoskillit/execution/session.py:579-682` — `_compute_success` (**already correct; do NOT change logic**)
- `src/autoskillit/execution/session.py:647-649` — documented `-15/-9` contract
- `src/autoskillit/execution/headless.py:539-876` — `_build_skill_result` / `_compute_outcome`
- `src/autoskillit/execution/headless.py:1062-1063` — `end_ts` computation from `time.monotonic()`
- `src/autoskillit/execution/headless.py:1121, 1136` — `flush_session_log` writes `termination_reason`
- `src/autoskillit/execution/commands.py:231` — injects `CLAUDE_CODE_EXIT_AFTER_STOP_DELAY` from config
- `src/autoskillit/config/defaults.yaml:46` — `exit_after_stop_delay_ms: 120000`
- `src/autoskillit/core/_type_results.py:130-157` — `SkillResult.exit_code` always serialized
- `src/autoskillit/hooks/_fmt_execution.py:30,52,90,104` — formatter unconditionally surfaces `exit_code` to orchestrator LLM
- `tests/execution/test_headless.py:1401,1420,1654` — COMPLETED+nonzero tests exist but only use `-15`, not `-9`
- `tests/execution/test_session_adjudication.py:118-140` — CHANNEL_A/B COMPLETED tests only use `-15`

## Test gaps

1. **No test asserts the `(success=True, exit_code=-9)` combination.** Every COMPLETED+nonzero returncode test uses `-15` (SIGTERM); `-9` is logically equivalent but unguarded against future drift (e.g. a misguided "treat -9 as OOM-killed, retry" refactor).
2. **No test exercises the `channel_won` branch at `process.py:289`.** Existing process tests use synthetic subprocesses that exit naturally; none simulate the race where Channel A wins before `_watch_process` observes exit.
3. **No test asserts formatter behavior for `run_skill` success + `exit_code=-9`** — the ambiguity is effectively "by spec" at the formatter layer.
4. **`CLAUDE_CODE_EXIT_AFTER_STOP_DELAY=120000` interaction with the kill path is untested** — whether the CLI's internal 2-minute timer could deliver a delayed SIGKILL racing with AutoSkillit's monitoring is not verified.

## Related prior commits (for context, NOT recurrence)

- `f35c87690` (#72, 2026-03-01) — Anyio concurrency refactor; introduced current unconditional kill shape
- `d15db94ef` — "fix(process): add symmetric drain window to eliminate asymmetric drain race" (drain-window precedent)
- `fa8d98032` — "fix(session): strengthen Channel A predicate to eliminate drain-race false negative"
- `5e46614c` (#568) — "Session Adjudication False-Positive — Recovery/Channel-B Contract Breach" (most similar rectify; tightened CHANNEL_B bypass invariant)
- `ae7b95a5` (#358) — "eliminate SkillResult dual-source subtype contradiction" (closest precedent for "same thing reported two different ways" bug)
- `225d904d` (#735) — "Stdout Idle Watchdog + Bounded Suppression"
- `a7ad04a4` (#750) — "Channel B Blind Spot" (expanded ChannelBStatus state space)
- `974920ed` (#793) — "Per-Step idle_output_timeout Override"

No prior `/autoskillit:investigate` found this specific root cause. **This is NOT a recurring bug** — standard remediation path is appropriate.

## Fix recommendations (three layers, any subset can be adopted)

### Layer 1 — Architectural fix (primary; HIGH value)

Insert a bounded natural-exit grace window in `process.py:289-297` before the kill:

```python
else:  # channel_won
    proc_log.debug("kill_decision", reason="channel_won", ...)
    with anyio.move_on_after(natural_exit_grace):  # new config, default 3.0 s
        await process_exited_event.wait()
    if proc.returncode is not None:
        proc_log.debug("natural_exit_after_channel_won", returncode=proc.returncode)
        # fall through; no kill; returncode will be 0 for clean sessions
    else:
        await async_kill_process_tree(proc.pid)
```

Requirements:

- `_watch_process` must expose an `anyio.Event` set when `proc.wait()` returns (it already observes this internally — just needs to wire an event into the accumulator).
- Add `natural_exit_grace_seconds: 3.0` to `defaults.yaml`.
- **Must pair with** reduced `exit_after_stop_delay_ms` (e.g. 2000 ms), since the current 120000 ms default would block the CLI from self-exiting within any reasonable grace window.
- Validate against recording/replay fixtures in `tests/execution/recording/` — the paired change may break existing scenario recordings.

**Benefit:** most sessions record `exit_code=0` (true natural exit), eliminating the ambiguity at its source.
**Cost:** adds up to 3 s to every `run_skill` where Channel A won but the process would have exited naturally anyway.
**Risk:** if the Claude CLI genuinely hangs post-finalization for a path we haven't mapped, this extends session wall-clock by 3 s before the kill. Acceptable.

### Layer 2 — Surface fix (LOW effort, LOW value; optional)

In `src/autoskillit/hooks/_fmt_execution.py:30,52` annotate or suppress `exit_code` when `success=True`:

```
exit_code: -9 (infrastructure kill after completion signal — expected)
```

Or suppress entirely when `success=True`, since the orchestrator routes on `success`, not `exit_code`, per `tools_recipe.py:150-151`. Minimal cost; removes the cognitive contradiction immediately without any execution-layer changes. **Downside:** hides diagnostic info when the combination IS pathological.

### Layer 3 — Test gap (LOW effort, MEDIUM value)

Add three regression tests:

1. `test_compute_success_completed_returncode_minus_9_is_success` in `tests/execution/test_session_adjudication.py` — unit-level guard for the `-9` path.
2. `test_build_skill_result_channel_a_win_preserves_success_with_minus_9` in `tests/execution/test_headless.py` — end-to-end assertion.
3. `test_run_managed_async_channel_won_grace_window_permits_natural_exit` in `tests/execution/test_process_run.py` — if Layer 1 adopted, assert grace-window behavior.

## What NOT to do

- **Do NOT remove the kill entirely.** 120 s `CLAUDE_CODE_EXIT_AFTER_STOP_DELAY` default means sessions would hang post-completion.
- **Do NOT change `_compute_success`** to unconditionally accept COMPLETED+success-subtype regardless of content. Existing content check guards against false positives (`session.py:650-663`); PR #568 history shows the cost of loosening this.
- **Do NOT transform `exit_code` in `SkillResult.to_json()`.** Wide blast radius for a formatter-layer cosmetic issue.
- **Do NOT conflate with the tracer-wrong-PID issue** (separate defect below).

## Separate defect flagged (should be its own issue)

The Linux process monitor tracks the **wrong PID**. Session 46cbdc5b `proc_trace` shows `vm_rss_kb=2176`, `threads=1`, `state=sleeping`, `wchan=do_sys_poll`, `cpu_percent=0.0` across all 144 snapshots — consistent with a Python/shell wrapper waiting on a PTY, NOT the Node.js Claude CLI (which would be hundreds of MB RSS, multi-threaded, active CPU). All 5 sessions with the `(exit_code=-9, success=true)` pattern share identical `peak_rss_kb=2176` and `peak_oom_score=666`, confirming systematic tracer attachment to the wrong PID. Consequences:

- `anomaly_count` is effectively meaningless — the detector has zero visibility into actual workload.
- `peak_oom_score=666` alongside `peak_rss_kb=2176` is physically implausible (OOM killer would not target a 2 MB process).
- Token/resource telemetry for the Claude CLI process is unavailable for diagnostics.

This is orthogonal to the `exit_code=-9` issue and should be filed as a separate issue. Suggested title: *"Linux process monitor tracks launcher wrapper PID instead of Claude CLI child PID"*. (Related but distinct from #771 which covers test-code `/dev/shm` pollution.)

## Recommended minimal action (priority order)

1. **Layer 3 regression tests first** (catches regressions when anyone touches this code): ~30 min
2. **Layer 2 formatter annotation** (immediate orchestrator-side win): ~15 min
3. **Layer 1 architectural fix:** plan via `/autoskillit:make-plan`. Combined change to `defaults.yaml` + `_process_race.py` + `_process_monitor.py` + `process.py` + validation against recording/replay fixtures.


step_name	skill
`compose_pr`	`/compose-pr`
`fix`	`/resolve-failures`
`verify`	`/dry-walkthrough`
`implement`	`/implement-worktree-no-merge`
`resolve_review`	`/resolve-review`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_skill: channel_won branch unconditionally SIGKILLs Claude CLI, producing systematic exit_code=-9 + success=true ambiguity #804

Observed symptom

Scope — systemic, not one-off

Root cause

Why the kill exists at all (important constraint)

Adjudication is already correct — this is NOT a `_compute_success` bug

Session 46cbdc5b timeline (UTC)

Affected components (file:line)

Test gaps

Related prior commits (for context, NOT recurrence)

Fix recommendations (three layers, any subset can be adopted)

Layer 1 — Architectural fix (primary; HIGH value)

Layer 2 — Surface fix (LOW effort, LOW value; optional)

Layer 3 — Test gap (LOW effort, MEDIUM value)

What NOT to do

Separate defect flagged (should be its own issue)

Recommended minimal action (priority order)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

run_skill: channel_won branch unconditionally SIGKILLs Claude CLI, producing systematic exit_code=-9 + success=true ambiguity #804

Description

Observed symptom

Scope — systemic, not one-off

Root cause

Why the kill exists at all (important constraint)

Adjudication is already correct — this is NOT a _compute_success bug

Session 46cbdc5b timeline (UTC)

Affected components (file:line)

Test gaps

Related prior commits (for context, NOT recurrence)

Fix recommendations (three layers, any subset can be adopted)

Layer 1 — Architectural fix (primary; HIGH value)

Layer 2 — Surface fix (LOW effort, LOW value; optional)

Layer 3 — Test gap (LOW effort, MEDIUM value)

What NOT to do

Separate defect flagged (should be its own issue)

Recommended minimal action (priority order)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Adjudication is already correct — this is NOT a `_compute_success` bug