Skip to content

run_skill: channel_won branch unconditionally SIGKILLs Claude CLI, producing systematic exit_code=-9 + success=true ambiguity #804

@Trecek

Description

@Trecek

Observed symptom

  • resolve-review session for PR Research environment isolation — containerized execution with micromamba #787 returned exit_code: -9 (SIGKILL) but with a complete, successful result dict.
  • Session ID: 46cbdc5b-1631-41db-a356-72c090c6f198
  • summary.json recorded simultaneously: exit_code=-9, subtype="success", cli_subtype="success", termination_reason="completed", success=true, write_call_count=2, write_path_warnings=[].
  • The result content showed fixes applied (commit 2a6097c on PR Implementation Plan: Research Environment Isolation — Containerized Execution with Micromamba #799), tests passing, threads resolved (2/2), inline replies posted (8/0 failed) — session ran to completion.
  • Orchestrator-facing ambiguity: pipeline-mode run_skill formatter emits run_skill: OK [success] immediately followed by exit_code: -9 with no annotation, creating a cognitive contradiction for the LLM orchestrator. The orchestrator has to parse result content to decide what happened because the exit code says otherwise.

Scope — systemic, not one-off

5 of the last 500 sessions in ~/.local/share/autoskillit/logs/sessions.jsonl show the same (exit_code=-9, success=true) pattern across five different skills:

step_name skill
compose_pr /compose-pr
fix /resolve-failures
verify /dry-walkthrough
implement /implement-worktree-no-merge
resolve_review /resolve-review

All 5 share identical peak_rss_kb=2176, peak_oom_score=666 (which is a separate tracer-wrong-PID issue — flagged below for its own issue).

Root cause

src/autoskillit/execution/process.py:289-297 — the channel_won branch in run_managed_async:

else:
    proc_log.debug(
        "kill_decision",
        reason="channel_won",
        channel_a=signals.channel_a_confirmed,
        channel_b=signals.channel_b_status,
    )
    # Channel A or B won; process still alive — kill immediately.
    await async_kill_process_tree(proc.pid)

When Channel A (stdout heartbeat, 0.5 s poll for type=result) fires before _watch_process has observed natural exit (which is ~always, because the Claude CLI takes longer than 500 ms to flush, close fds, and return from main()), this branch runs async_kill_process_tree immediately. That sends SIGTERM → 2 s psutil.wait_procsSIGKILL, so proc.returncode == -9 for every monitored successful session.

Git blame: the unconditional kill was inherited from pre-anyio code in commit f35c87690 (#72, "Anyio concurrency refactor", 2026-03-01). The pre-anyio code had separate elif channel_a_confirmed: and elif channel_b_status == "completion": branches, both ending in the kill. The refactor collapsed them into a single else: preserving semantics. No commit message explained why immediate kill is required — it was treated as mechanical preservation, not a deliberate design decision.

Why the kill exists at all (important constraint)

The Claude CLI does not self-exit promptly after writing its final type=result output without CLAUDE_CODE_EXIT_AFTER_STOP_DELAY. src/autoskillit/config/defaults.yaml:46 sets exit_after_stop_delay_ms: 120000 (2 minutes). That 2-minute timer is the CLI's self-exit mechanism. Waiting 120 s per run_skill call is prohibitive, so AutoSkillit force-kills the CLI once channels confirm completion. Any fix must preserve this constraint.

Adjudication is already correct — this is NOT a _compute_success bug

src/autoskillit/execution/session.py:579-682 _compute_success explicitly handles COMPLETED + nonzero returncode. The authoritative comment at session.py:647-649:

"The process was killed by our own async_kill_process_tree (signal -15 or -9), so a non-zero returncode is expected and trustworthy when the session envelope says 'success'."

And session.py:799-801 in _compute_retry:

"Infrastructure killed the process. SIGTERM/SIGKILL produce nonzero returncode by design — do not gate on returncode here."

So success=True + exit_code=-9 is documented, intended behavior. The ambiguity is NOT in adjudication — it is in (a) surface presentation in the run_skill formatter and (b) the absence of a natural-exit grace window in the kill path.

Session 46cbdc5b timeline (UTC)

10:18:30.104  start_ts; subprocess spawned
10:18:32.047  queue-operation enqueue (prompt with %%ORDER_UP::8221c391%%)
...           714 s of work; 97 assistant messages; 43 tool-use round-trips
10:29:30.864  Bash tool_use
10:29:33.477  TodoWrite tool_use
10:29:35.508  assistant message stop_reason=end_turn (thinking block only)
10:29:39.333  assistant text message — "resolve-review complete / Status: PASS"
              + %%ORDER_UP::8221c391%% token
10:29:40.009  seq=143 final proc_trace snapshot
  └─ Channel A heartbeat detects %%ORDER_UP:: in stdout within next 0.5 s
  └─ trigger.set(); await trigger.wait() returns
  └─ tg.cancel_scope.cancel() (process.py:249)
  └─ tracing_handle.stop()
  └─ resolve_termination() → COMPLETED + CHANNEL_A
  └─ process.py:289 else-branch → async_kill_process_tree(pid)
  └─ SIGTERM → 2 s wait → SIGKILL → 1 s kernel reap
  └─ stdout_file.close(); stderr_file.close()
  └─ read_temp_output() drains 714 s of PTY output
  └─ SubprocessResult(returncode=-9, termination=COMPLETED,
                      channel_confirmation=CHANNEL_A)
  └─ _build_skill_result → _compute_outcome → _compute_success → True
  └─ SkillResult(success=True, exit_code=-9)
10:30:25.041  end_ts (44.7 s after last snapshot)
              flush_session_log writes summary.json + appends sessions.jsonl

Affected components (file:line)

  • src/autoskillit/execution/process.py:289-297 — unconditional kill; the primary fix site
  • src/autoskillit/execution/process.py:269-274 — existing signals.process_exited branch; demonstrates correct "no-kill-needed" shape
  • src/autoskillit/execution/_process_kill.py:16-58kill_process_tree SIGTERM→2s→SIGKILL→1s
  • src/autoskillit/execution/_process_race.py:236-250 — existing completion_drain_timeout pattern (precedent for bounded waits)
  • src/autoskillit/execution/_process_monitor.py:25-64_heartbeat / Channel A 0.5 s stdout poll
  • src/autoskillit/execution/session.py:579-682_compute_success (already correct; do NOT change logic)
  • src/autoskillit/execution/session.py:647-649 — documented -15/-9 contract
  • src/autoskillit/execution/headless.py:539-876_build_skill_result / _compute_outcome
  • src/autoskillit/execution/headless.py:1062-1063end_ts computation from time.monotonic()
  • src/autoskillit/execution/headless.py:1121, 1136flush_session_log writes termination_reason
  • src/autoskillit/execution/commands.py:231 — injects CLAUDE_CODE_EXIT_AFTER_STOP_DELAY from config
  • src/autoskillit/config/defaults.yaml:46exit_after_stop_delay_ms: 120000
  • src/autoskillit/core/_type_results.py:130-157SkillResult.exit_code always serialized
  • src/autoskillit/hooks/_fmt_execution.py:30,52,90,104 — formatter unconditionally surfaces exit_code to orchestrator LLM
  • tests/execution/test_headless.py:1401,1420,1654 — COMPLETED+nonzero tests exist but only use -15, not -9
  • tests/execution/test_session_adjudication.py:118-140 — CHANNEL_A/B COMPLETED tests only use -15

Test gaps

  1. No test asserts the (success=True, exit_code=-9) combination. Every COMPLETED+nonzero returncode test uses -15 (SIGTERM); -9 is logically equivalent but unguarded against future drift (e.g. a misguided "treat -9 as OOM-killed, retry" refactor).
  2. No test exercises the channel_won branch at process.py:289. Existing process tests use synthetic subprocesses that exit naturally; none simulate the race where Channel A wins before _watch_process observes exit.
  3. No test asserts formatter behavior for run_skill success + exit_code=-9 — the ambiguity is effectively "by spec" at the formatter layer.
  4. CLAUDE_CODE_EXIT_AFTER_STOP_DELAY=120000 interaction with the kill path is untested — whether the CLI's internal 2-minute timer could deliver a delayed SIGKILL racing with AutoSkillit's monitoring is not verified.

Related prior commits (for context, NOT recurrence)

No prior /autoskillit:investigate found this specific root cause. This is NOT a recurring bug — standard remediation path is appropriate.

Fix recommendations (three layers, any subset can be adopted)

Layer 1 — Architectural fix (primary; HIGH value)

Insert a bounded natural-exit grace window in process.py:289-297 before the kill:

else:  # channel_won
    proc_log.debug("kill_decision", reason="channel_won", ...)
    with anyio.move_on_after(natural_exit_grace):  # new config, default 3.0 s
        await process_exited_event.wait()
    if proc.returncode is not None:
        proc_log.debug("natural_exit_after_channel_won", returncode=proc.returncode)
        # fall through; no kill; returncode will be 0 for clean sessions
    else:
        await async_kill_process_tree(proc.pid)

Requirements:

  • _watch_process must expose an anyio.Event set when proc.wait() returns (it already observes this internally — just needs to wire an event into the accumulator).
  • Add natural_exit_grace_seconds: 3.0 to defaults.yaml.
  • Must pair with reduced exit_after_stop_delay_ms (e.g. 2000 ms), since the current 120000 ms default would block the CLI from self-exiting within any reasonable grace window.
  • Validate against recording/replay fixtures in tests/execution/recording/ — the paired change may break existing scenario recordings.

Benefit: most sessions record exit_code=0 (true natural exit), eliminating the ambiguity at its source.
Cost: adds up to 3 s to every run_skill where Channel A won but the process would have exited naturally anyway.
Risk: if the Claude CLI genuinely hangs post-finalization for a path we haven't mapped, this extends session wall-clock by 3 s before the kill. Acceptable.

Layer 2 — Surface fix (LOW effort, LOW value; optional)

In src/autoskillit/hooks/_fmt_execution.py:30,52 annotate or suppress exit_code when success=True:

exit_code: -9 (infrastructure kill after completion signal — expected)

Or suppress entirely when success=True, since the orchestrator routes on success, not exit_code, per tools_recipe.py:150-151. Minimal cost; removes the cognitive contradiction immediately without any execution-layer changes. Downside: hides diagnostic info when the combination IS pathological.

Layer 3 — Test gap (LOW effort, MEDIUM value)

Add three regression tests:

  1. test_compute_success_completed_returncode_minus_9_is_success in tests/execution/test_session_adjudication.py — unit-level guard for the -9 path.
  2. test_build_skill_result_channel_a_win_preserves_success_with_minus_9 in tests/execution/test_headless.py — end-to-end assertion.
  3. test_run_managed_async_channel_won_grace_window_permits_natural_exit in tests/execution/test_process_run.py — if Layer 1 adopted, assert grace-window behavior.

What NOT to do

  • Do NOT remove the kill entirely. 120 s CLAUDE_CODE_EXIT_AFTER_STOP_DELAY default means sessions would hang post-completion.
  • Do NOT change _compute_success to unconditionally accept COMPLETED+success-subtype regardless of content. Existing content check guards against false positives (session.py:650-663); PR Rectify: Session Adjudication False-Positive — Recovery/Channel-B Contract Breach #568 history shows the cost of loosening this.
  • Do NOT transform exit_code in SkillResult.to_json(). Wide blast radius for a formatter-layer cosmetic issue.
  • Do NOT conflate with the tracer-wrong-PID issue (separate defect below).

Separate defect flagged (should be its own issue)

The Linux process monitor tracks the wrong PID. Session 46cbdc5b proc_trace shows vm_rss_kb=2176, threads=1, state=sleeping, wchan=do_sys_poll, cpu_percent=0.0 across all 144 snapshots — consistent with a Python/shell wrapper waiting on a PTY, NOT the Node.js Claude CLI (which would be hundreds of MB RSS, multi-threaded, active CPU). All 5 sessions with the (exit_code=-9, success=true) pattern share identical peak_rss_kb=2176 and peak_oom_score=666, confirming systematic tracer attachment to the wrong PID. Consequences:

  • anomaly_count is effectively meaningless — the detector has zero visibility into actual workload.
  • peak_oom_score=666 alongside peak_rss_kb=2176 is physically implausible (OOM killer would not target a 2 MB process).
  • Token/resource telemetry for the Claude CLI process is unavailable for diagnostics.

This is orthogonal to the exit_code=-9 issue and should be filed as a separate issue. Suggested title: "Linux process monitor tracks launcher wrapper PID instead of Claude CLI child PID". (Related but distinct from #771 which covers test-code /dev/shm pollution.)

Recommended minimal action (priority order)

  1. Layer 3 regression tests first (catches regressions when anyone touches this code): ~30 min
  2. Layer 2 formatter annotation (immediate orchestrator-side win): ~15 min
  3. Layer 1 architectural fix: plan via /autoskillit:make-plan. Combined change to defaults.yaml + _process_race.py + _process_monitor.py + process.py + validation against recording/replay fixtures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugExisting behavior is brokenrecipe:remediationRoute: investigate/decompose before implementationstagedImplementation staged and waiting for promotion to main

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions