fix: gr spawn down graceful shutdown + per-agent teardown#567
fix: gr spawn down graceful shutdown + per-agent teardown#567laynepenney merged 2 commits intomainfrom
Conversation
laynepenney
left a comment
There was a problem hiding this comment.
Opus review: grip#567 (spawn down graceful shutdown)
Clean, well-structured implementation. Three-phase approach (send /exit, poll pane_dead, force-kill) is the right pattern.
Strengths:
is_pane_dead()correctly handles both#{pane_dead}check and "window already gone" fallback- 500ms poll interval is reasonable; configurable
--timeout(default 10s) gives operators control - Clear user-facing messaging (green check for graceful, red X for force-kill with timeout)
- Per-agent teardown via
--agentflag properly wired through dispatch - Spawn state written before shutdown starts (captures intent even if force-kill needed)
Minor notes:
- The
remain-on-exittmux setting affects behavior: if not set, panes disappear on exit andis_pane_deadreturns true via the fallback. Correct behavior, worth a comment in code.
Premium boundary: core OSS (workspace orchestration).
LGTM from Opus.
laynepenney
left a comment
There was a problem hiding this comment.
Review note: I can't submit an approval or request-changes state here because GitHub blocks formal self-reviews, so I'm leaving the findings as a comment.
-
src/cli/commands/spawn.rs:657-669treats anytmux list-panesfailure aspane dead = true. That means a broken tmux socket, wrong session target, or any other tmux error is reported as a graceful exit. The command can print success even when it never observed the pane state. This needs to distinguish 'window already gone' from 'tmux failed' and surface the latter as an error. -
src/cli/commands/spawn.rs:585-647ignores the status from everytmux send-keys,kill-window, andkill-sessioncall. If teardown fails,gr spawn downstill returnsOk(())and prints that the agent exited or was force-killed. That is exactly the path where we need reliable error reporting.
Separately, the implementation is currently 'send /exit, wait, then kill the tmux window' rather than a true TERM/KILL escalation. If that is the intended model, the CLI/help text and PR framing should say that explicitly; otherwise the implementation should actually signal the child process before falling back to hard kill.
laynepenney
left a comment
There was a problem hiding this comment.
Review note: GitHub blocks formal self-approve/request-changes on this PR author, so I am leaving actionable review findings here.
Findings:
-
src/cli/commands/spawn.rs:657-669treats anytmux list-panesfailure aspane dead = true. That means a broken tmux socket, wrong session target, or any other tmux error is reported as a graceful exit. The command can print success even when it never observed the pane state. This should distinguish "window already gone" from generic tmux failure and surface the latter as an error. -
src/cli/commands/spawn.rs:585-647ignores the status from everytmux send-keys,kill-window, andkill-sessioncall. If teardown fails,gr spawn downstill returnsOk(())and prints that the agent exited or was force-killed. This is exactly the path where we need reliable error reporting.
Separately, the implementation is currently send /exit, wait, then kill the tmux window rather than a true TERM/KILL escalation. If that is the intended model, the CLI/help text and PR framing should say that explicitly; otherwise the implementation should actually signal the child process before falling back to hard kill.
- is_pane_dead() -> pane_exit_state(): returns Option<bool> tri-state to distinguish "exited" / "still running" / "tmux error". No longer treats arbitrary tmux failures as graceful exit. - Log warnings from send-keys, kill-window, kill-session failures instead of silently discarding return statuses. - Report "state unknown" for agents where tmux queries failed, so the operator sees the ambiguity rather than a false "exited gracefully". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
gr spawn downsends/exitto each agent's tmux pane (asking the agent process to clean up and terminate), then polls#{pane_dead}every 500ms up to a configurable timeout. Agents that don't exit in time get their tmux window force-killed viakill-window. Per-agent status is reported (graceful exit, force-killed, or state unknown if tmux queries failed).gr spawn down --agent <name>tears down a single agent instead of the whole team.gr spawn down --timeout <secs>(default: 10s) controls how long to wait before force-killing.Shutdown model
This is
/exit+ tmux lifecycle, not OS-level signal escalation (SIGTERM/SIGKILL). The/exitcommand asks the agent CLI to shut down cooperatively. If the agent process does not exit within the timeout,kill-windowdestroys the tmux pane (which sends SIGHUP to the child). This is appropriate because agent processes (Claude Code, Codex CLI) respond to/exitbut may not handle bare SIGTERM gracefully.Error handling (review feedback)
pane_exit_state()returnsOption<bool>tri-state:Some(true)= exited,Some(false)= running,None= tmux query failed. Tmux errors are logged as warnings, not silently treated as graceful exit.send-keys,kill-window, andkill-sessionfailures are logged with warning output instead of being silently ignored.Premium boundary: core OSS (CLI workspace orchestration).
Test plan
gr spawn upa team, thengr spawn down-- verify all agents get/exitand exit gracefullygr spawn down --agent apollo-- verify only that agent is torn down, session stays alivegr spawn down --timeout 2with a hung agent -- verify it force-kills after 2s with status messagecargo test-- all tests passcargo clippyandcargo fmt --checkclean🤖 Generated with Claude Code