You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Orchestrator-facing ambiguity: pipeline-mode run_skill formatter emits run_skill: OK [success] immediately followed by exit_code: -9 with no annotation, creating a cognitive contradiction for the LLM orchestrator. The orchestrator has to parse result content to decide what happened because the exit code says otherwise.
Scope — systemic, not one-off
5 of the last 500 sessions in ~/.local/share/autoskillit/logs/sessions.jsonl show the same (exit_code=-9, success=true) pattern across five different skills:
step_name
skill
compose_pr
/compose-pr
fix
/resolve-failures
verify
/dry-walkthrough
implement
/implement-worktree-no-merge
resolve_review
/resolve-review
All 5 share identical peak_rss_kb=2176, peak_oom_score=666 (which is a separate tracer-wrong-PID issue — flagged below for its own issue).
Root cause
src/autoskillit/execution/process.py:289-297 — the channel_won branch in run_managed_async:
else:
proc_log.debug(
"kill_decision",
reason="channel_won",
channel_a=signals.channel_a_confirmed,
channel_b=signals.channel_b_status,
)
# Channel A or B won; process still alive — kill immediately.awaitasync_kill_process_tree(proc.pid)
When Channel A (stdout heartbeat, 0.5 s poll for type=result) fires before _watch_process has observed natural exit (which is ~always, because the Claude CLI takes longer than 500 ms to flush, close fds, and return from main()), this branch runs async_kill_process_tree immediately. That sends SIGTERM → 2 s psutil.wait_procs → SIGKILL, so proc.returncode == -9 for every monitored successful session.
Git blame: the unconditional kill was inherited from pre-anyio code in commit f35c87690 (#72, "Anyio concurrency refactor", 2026-03-01). The pre-anyio code had separate elif channel_a_confirmed: and elif channel_b_status == "completion": branches, both ending in the kill. The refactor collapsed them into a single else: preserving semantics. No commit message explained why immediate kill is required — it was treated as mechanical preservation, not a deliberate design decision.
Why the kill exists at all (important constraint)
The Claude CLI does not self-exit promptly after writing its final type=result output without CLAUDE_CODE_EXIT_AFTER_STOP_DELAY. src/autoskillit/config/defaults.yaml:46 sets exit_after_stop_delay_ms: 120000 (2 minutes). That 2-minute timer is the CLI's self-exit mechanism. Waiting 120 s per run_skill call is prohibitive, so AutoSkillit force-kills the CLI once channels confirm completion. Any fix must preserve this constraint.
Adjudication is already correct — this is NOT a _compute_success bug
src/autoskillit/execution/session.py:579-682_compute_success explicitly handles COMPLETED + nonzero returncode. The authoritative comment at session.py:647-649:
"The process was killed by our own async_kill_process_tree (signal -15 or -9), so a non-zero returncode is expected and trustworthy when the session envelope says 'success'."
And session.py:799-801 in _compute_retry:
"Infrastructure killed the process. SIGTERM/SIGKILL produce nonzero returncode by design — do not gate on returncode here."
So success=True + exit_code=-9 is documented, intended behavior. The ambiguity is NOT in adjudication — it is in (a) surface presentation in the run_skill formatter and (b) the absence of a natural-exit grace window in the kill path.
Session 46cbdc5b timeline (UTC)
10:18:30.104 start_ts; subprocess spawned
10:18:32.047 queue-operation enqueue (prompt with %%ORDER_UP::8221c391%%)
... 714 s of work; 97 assistant messages; 43 tool-use round-trips
10:29:30.864 Bash tool_use
10:29:33.477 TodoWrite tool_use
10:29:35.508 assistant message stop_reason=end_turn (thinking block only)
10:29:39.333 assistant text message — "resolve-review complete / Status: PASS"
+ %%ORDER_UP::8221c391%% token
10:29:40.009 seq=143 final proc_trace snapshot
└─ Channel A heartbeat detects %%ORDER_UP:: in stdout within next 0.5 s
└─ trigger.set(); await trigger.wait() returns
└─ tg.cancel_scope.cancel() (process.py:249)
└─ tracing_handle.stop()
└─ resolve_termination() → COMPLETED + CHANNEL_A
└─ process.py:289 else-branch → async_kill_process_tree(pid)
└─ SIGTERM → 2 s wait → SIGKILL → 1 s kernel reap
└─ stdout_file.close(); stderr_file.close()
└─ read_temp_output() drains 714 s of PTY output
└─ SubprocessResult(returncode=-9, termination=COMPLETED,
channel_confirmation=CHANNEL_A)
└─ _build_skill_result → _compute_outcome → _compute_success → True
└─ SkillResult(success=True, exit_code=-9)
10:30:25.041 end_ts (44.7 s after last snapshot)
flush_session_log writes summary.json + appends sessions.jsonl
Affected components (file:line)
src/autoskillit/execution/process.py:289-297 — unconditional kill; the primary fix site
src/autoskillit/hooks/_fmt_execution.py:30,52,90,104 — formatter unconditionally surfaces exit_code to orchestrator LLM
tests/execution/test_headless.py:1401,1420,1654 — COMPLETED+nonzero tests exist but only use -15, not -9
tests/execution/test_session_adjudication.py:118-140 — CHANNEL_A/B COMPLETED tests only use -15
Test gaps
No test asserts the (success=True, exit_code=-9) combination. Every COMPLETED+nonzero returncode test uses -15 (SIGTERM); -9 is logically equivalent but unguarded against future drift (e.g. a misguided "treat -9 as OOM-killed, retry" refactor).
No test exercises the channel_won branch at process.py:289. Existing process tests use synthetic subprocesses that exit naturally; none simulate the race where Channel A wins before _watch_process observes exit.
No test asserts formatter behavior for run_skill success + exit_code=-9 — the ambiguity is effectively "by spec" at the formatter layer.
CLAUDE_CODE_EXIT_AFTER_STOP_DELAY=120000 interaction with the kill path is untested — whether the CLI's internal 2-minute timer could deliver a delayed SIGKILL racing with AutoSkillit's monitoring is not verified.
Related prior commits (for context, NOT recurrence)
No prior /autoskillit:investigate found this specific root cause. This is NOT a recurring bug — standard remediation path is appropriate.
Fix recommendations (three layers, any subset can be adopted)
Layer 1 — Architectural fix (primary; HIGH value)
Insert a bounded natural-exit grace window in process.py:289-297 before the kill:
else: # channel_wonproc_log.debug("kill_decision", reason="channel_won", ...)
withanyio.move_on_after(natural_exit_grace): # new config, default 3.0 sawaitprocess_exited_event.wait()
ifproc.returncodeisnotNone:
proc_log.debug("natural_exit_after_channel_won", returncode=proc.returncode)
# fall through; no kill; returncode will be 0 for clean sessionselse:
awaitasync_kill_process_tree(proc.pid)
Requirements:
_watch_process must expose an anyio.Event set when proc.wait() returns (it already observes this internally — just needs to wire an event into the accumulator).
Add natural_exit_grace_seconds: 3.0 to defaults.yaml.
Must pair with reduced exit_after_stop_delay_ms (e.g. 2000 ms), since the current 120000 ms default would block the CLI from self-exiting within any reasonable grace window.
Validate against recording/replay fixtures in tests/execution/recording/ — the paired change may break existing scenario recordings.
Benefit: most sessions record exit_code=0 (true natural exit), eliminating the ambiguity at its source. Cost: adds up to 3 s to every run_skill where Channel A won but the process would have exited naturally anyway. Risk: if the Claude CLI genuinely hangs post-finalization for a path we haven't mapped, this extends session wall-clock by 3 s before the kill. Acceptable.
In src/autoskillit/hooks/_fmt_execution.py:30,52 annotate or suppress exit_code when success=True:
exit_code: -9 (infrastructure kill after completion signal — expected)
Or suppress entirely when success=True, since the orchestrator routes on success, not exit_code, per tools_recipe.py:150-151. Minimal cost; removes the cognitive contradiction immediately without any execution-layer changes. Downside: hides diagnostic info when the combination IS pathological.
Layer 3 — Test gap (LOW effort, MEDIUM value)
Add three regression tests:
test_compute_success_completed_returncode_minus_9_is_success in tests/execution/test_session_adjudication.py — unit-level guard for the -9 path.
test_build_skill_result_channel_a_win_preserves_success_with_minus_9 in tests/execution/test_headless.py — end-to-end assertion.
test_run_managed_async_channel_won_grace_window_permits_natural_exit in tests/execution/test_process_run.py — if Layer 1 adopted, assert grace-window behavior.
What NOT to do
Do NOT remove the kill entirely. 120 s CLAUDE_CODE_EXIT_AFTER_STOP_DELAY default means sessions would hang post-completion.
Do NOT transform exit_code in SkillResult.to_json(). Wide blast radius for a formatter-layer cosmetic issue.
Do NOT conflate with the tracer-wrong-PID issue (separate defect below).
Separate defect flagged (should be its own issue)
The Linux process monitor tracks the wrong PID. Session 46cbdc5b proc_trace shows vm_rss_kb=2176, threads=1, state=sleeping, wchan=do_sys_poll, cpu_percent=0.0 across all 144 snapshots — consistent with a Python/shell wrapper waiting on a PTY, NOT the Node.js Claude CLI (which would be hundreds of MB RSS, multi-threaded, active CPU). All 5 sessions with the (exit_code=-9, success=true) pattern share identical peak_rss_kb=2176 and peak_oom_score=666, confirming systematic tracer attachment to the wrong PID. Consequences:
anomaly_count is effectively meaningless — the detector has zero visibility into actual workload.
peak_oom_score=666 alongside peak_rss_kb=2176 is physically implausible (OOM killer would not target a 2 MB process).
Token/resource telemetry for the Claude CLI process is unavailable for diagnostics.
This is orthogonal to the exit_code=-9 issue and should be filed as a separate issue. Suggested title: "Linux process monitor tracks launcher wrapper PID instead of Claude CLI child PID". (Related but distinct from #771 which covers test-code /dev/shm pollution.)
Recommended minimal action (priority order)
Layer 3 regression tests first (catches regressions when anyone touches this code): ~30 min
Layer 2 formatter annotation (immediate orchestrator-side win): ~15 min
Layer 1 architectural fix: plan via /autoskillit:make-plan. Combined change to defaults.yaml + _process_race.py + _process_monitor.py + process.py + validation against recording/replay fixtures.
Observed symptom
resolve-reviewsession for PR Research environment isolation — containerized execution with micromamba #787 returnedexit_code: -9(SIGKILL) but with a complete, successful result dict.46cbdc5b-1631-41db-a356-72c090c6f198summary.jsonrecorded simultaneously:exit_code=-9,subtype="success",cli_subtype="success",termination_reason="completed",success=true,write_call_count=2,write_path_warnings=[].2a6097con PR Implementation Plan: Research Environment Isolation — Containerized Execution with Micromamba #799), tests passing, threads resolved (2/2), inline replies posted (8/0 failed) — session ran to completion.run_skillformatter emitsrun_skill: OK [success]immediately followed byexit_code: -9with no annotation, creating a cognitive contradiction for the LLM orchestrator. The orchestrator has to parse result content to decide what happened because the exit code says otherwise.Scope — systemic, not one-off
5 of the last 500 sessions in
~/.local/share/autoskillit/logs/sessions.jsonlshow the same(exit_code=-9, success=true)pattern across five different skills:compose_pr/compose-prfix/resolve-failuresverify/dry-walkthroughimplement/implement-worktree-no-mergeresolve_review/resolve-reviewAll 5 share identical
peak_rss_kb=2176,peak_oom_score=666(which is a separate tracer-wrong-PID issue — flagged below for its own issue).Root cause
src/autoskillit/execution/process.py:289-297— thechannel_wonbranch inrun_managed_async:When Channel A (stdout heartbeat, 0.5 s poll for
type=result) fires before_watch_processhas observed natural exit (which is ~always, because the Claude CLI takes longer than 500 ms to flush, close fds, and return frommain()), this branch runsasync_kill_process_treeimmediately. That sendsSIGTERM→ 2 spsutil.wait_procs→SIGKILL, soproc.returncode == -9for every monitored successful session.Git blame: the unconditional kill was inherited from pre-anyio code in commit
f35c87690(#72, "Anyio concurrency refactor", 2026-03-01). The pre-anyio code had separateelif channel_a_confirmed:andelif channel_b_status == "completion":branches, both ending in the kill. The refactor collapsed them into a singleelse:preserving semantics. No commit message explained why immediate kill is required — it was treated as mechanical preservation, not a deliberate design decision.Why the kill exists at all (important constraint)
The Claude CLI does not self-exit promptly after writing its final
type=resultoutput withoutCLAUDE_CODE_EXIT_AFTER_STOP_DELAY.src/autoskillit/config/defaults.yaml:46setsexit_after_stop_delay_ms: 120000(2 minutes). That 2-minute timer is the CLI's self-exit mechanism. Waiting 120 s perrun_skillcall is prohibitive, so AutoSkillit force-kills the CLI once channels confirm completion. Any fix must preserve this constraint.Adjudication is already correct — this is NOT a
_compute_successbugsrc/autoskillit/execution/session.py:579-682_compute_successexplicitly handlesCOMPLETED + nonzero returncode. The authoritative comment atsession.py:647-649:And
session.py:799-801in_compute_retry:So
success=True+exit_code=-9is documented, intended behavior. The ambiguity is NOT in adjudication — it is in (a) surface presentation in the run_skill formatter and (b) the absence of a natural-exit grace window in the kill path.Session 46cbdc5b timeline (UTC)
Affected components (file:line)
src/autoskillit/execution/process.py:289-297— unconditional kill; the primary fix sitesrc/autoskillit/execution/process.py:269-274— existingsignals.process_exitedbranch; demonstrates correct "no-kill-needed" shapesrc/autoskillit/execution/_process_kill.py:16-58—kill_process_treeSIGTERM→2s→SIGKILL→1ssrc/autoskillit/execution/_process_race.py:236-250— existingcompletion_drain_timeoutpattern (precedent for bounded waits)src/autoskillit/execution/_process_monitor.py:25-64—_heartbeat/ Channel A 0.5 s stdout pollsrc/autoskillit/execution/session.py:579-682—_compute_success(already correct; do NOT change logic)src/autoskillit/execution/session.py:647-649— documented-15/-9contractsrc/autoskillit/execution/headless.py:539-876—_build_skill_result/_compute_outcomesrc/autoskillit/execution/headless.py:1062-1063—end_tscomputation fromtime.monotonic()src/autoskillit/execution/headless.py:1121, 1136—flush_session_logwritestermination_reasonsrc/autoskillit/execution/commands.py:231— injectsCLAUDE_CODE_EXIT_AFTER_STOP_DELAYfrom configsrc/autoskillit/config/defaults.yaml:46—exit_after_stop_delay_ms: 120000src/autoskillit/core/_type_results.py:130-157—SkillResult.exit_codealways serializedsrc/autoskillit/hooks/_fmt_execution.py:30,52,90,104— formatter unconditionally surfacesexit_codeto orchestrator LLMtests/execution/test_headless.py:1401,1420,1654— COMPLETED+nonzero tests exist but only use-15, not-9tests/execution/test_session_adjudication.py:118-140— CHANNEL_A/B COMPLETED tests only use-15Test gaps
(success=True, exit_code=-9)combination. Every COMPLETED+nonzero returncode test uses-15(SIGTERM);-9is logically equivalent but unguarded against future drift (e.g. a misguided "treat -9 as OOM-killed, retry" refactor).channel_wonbranch atprocess.py:289. Existing process tests use synthetic subprocesses that exit naturally; none simulate the race where Channel A wins before_watch_processobserves exit.run_skillsuccess +exit_code=-9— the ambiguity is effectively "by spec" at the formatter layer.CLAUDE_CODE_EXIT_AFTER_STOP_DELAY=120000interaction with the kill path is untested — whether the CLI's internal 2-minute timer could deliver a delayed SIGKILL racing with AutoSkillit's monitoring is not verified.Related prior commits (for context, NOT recurrence)
f35c87690(Anyio concurrency refactor #72, 2026-03-01) — Anyio concurrency refactor; introduced current unconditional kill shaped15db94ef— "fix(process): add symmetric drain window to eliminate asymmetric drain race" (drain-window precedent)fa8d98032— "fix(session): strengthen Channel A predicate to eliminate drain-race false negative"5e46614c(Rectify: Session Adjudication False-Positive — Recovery/Channel-B Contract Breach #568) — "Session Adjudication False-Positive — Recovery/Channel-B Contract Breach" (most similar rectify; tightened CHANNEL_B bypass invariant)ae7b95a5(fix: eliminate SkillResult dual-source subtype contradiction (Issue #346) #358) — "eliminate SkillResult dual-source subtype contradiction" (closest precedent for "same thing reported two different ways" bug)225d904d(Rectify: Stdout Idle Watchdog + Bounded Suppression for Stall Detection #735) — "Stdout Idle Watchdog + Bounded Suppression"a7ad04a4(Rectify: Channel B Blind Spot + Pre-open_kitchen AskUserQuestion Deviation #750) — "Channel B Blind Spot" (expanded ChannelBStatus state space)974920ed(Add idle_output_timeout as Per-Step Recipe Override #793) — "Per-Step idle_output_timeout Override"No prior
/autoskillit:investigatefound this specific root cause. This is NOT a recurring bug — standard remediation path is appropriate.Fix recommendations (three layers, any subset can be adopted)
Layer 1 — Architectural fix (primary; HIGH value)
Insert a bounded natural-exit grace window in
process.py:289-297before the kill:Requirements:
_watch_processmust expose ananyio.Eventset whenproc.wait()returns (it already observes this internally — just needs to wire an event into the accumulator).natural_exit_grace_seconds: 3.0todefaults.yaml.exit_after_stop_delay_ms(e.g. 2000 ms), since the current 120000 ms default would block the CLI from self-exiting within any reasonable grace window.tests/execution/recording/— the paired change may break existing scenario recordings.Benefit: most sessions record
exit_code=0(true natural exit), eliminating the ambiguity at its source.Cost: adds up to 3 s to every
run_skillwhere Channel A won but the process would have exited naturally anyway.Risk: if the Claude CLI genuinely hangs post-finalization for a path we haven't mapped, this extends session wall-clock by 3 s before the kill. Acceptable.
Layer 2 — Surface fix (LOW effort, LOW value; optional)
In
src/autoskillit/hooks/_fmt_execution.py:30,52annotate or suppressexit_codewhensuccess=True:Or suppress entirely when
success=True, since the orchestrator routes onsuccess, notexit_code, pertools_recipe.py:150-151. Minimal cost; removes the cognitive contradiction immediately without any execution-layer changes. Downside: hides diagnostic info when the combination IS pathological.Layer 3 — Test gap (LOW effort, MEDIUM value)
Add three regression tests:
test_compute_success_completed_returncode_minus_9_is_successintests/execution/test_session_adjudication.py— unit-level guard for the-9path.test_build_skill_result_channel_a_win_preserves_success_with_minus_9intests/execution/test_headless.py— end-to-end assertion.test_run_managed_async_channel_won_grace_window_permits_natural_exitintests/execution/test_process_run.py— if Layer 1 adopted, assert grace-window behavior.What NOT to do
CLAUDE_CODE_EXIT_AFTER_STOP_DELAYdefault means sessions would hang post-completion._compute_successto unconditionally accept COMPLETED+success-subtype regardless of content. Existing content check guards against false positives (session.py:650-663); PR Rectify: Session Adjudication False-Positive — Recovery/Channel-B Contract Breach #568 history shows the cost of loosening this.exit_codeinSkillResult.to_json(). Wide blast radius for a formatter-layer cosmetic issue.Separate defect flagged (should be its own issue)
The Linux process monitor tracks the wrong PID. Session 46cbdc5b
proc_traceshowsvm_rss_kb=2176,threads=1,state=sleeping,wchan=do_sys_poll,cpu_percent=0.0across all 144 snapshots — consistent with a Python/shell wrapper waiting on a PTY, NOT the Node.js Claude CLI (which would be hundreds of MB RSS, multi-threaded, active CPU). All 5 sessions with the(exit_code=-9, success=true)pattern share identicalpeak_rss_kb=2176andpeak_oom_score=666, confirming systematic tracer attachment to the wrong PID. Consequences:anomaly_countis effectively meaningless — the detector has zero visibility into actual workload.peak_oom_score=666alongsidepeak_rss_kb=2176is physically implausible (OOM killer would not target a 2 MB process).This is orthogonal to the
exit_code=-9issue and should be filed as a separate issue. Suggested title: "Linux process monitor tracks launcher wrapper PID instead of Claude CLI child PID". (Related but distinct from #771 which covers test-code/dev/shmpollution.)Recommended minimal action (priority order)
/autoskillit:make-plan. Combined change todefaults.yaml+_process_race.py+_process_monitor.py+process.py+ validation against recording/replay fixtures.