fix: gr spawn down graceful shutdown + per-agent teardown by laynepenney · Pull Request #567 · synapt-dev/grip

laynepenney · 2026-04-14T12:52:58Z

Summary

Graceful shutdown: gr spawn down sends /exit to each agent's tmux pane (asking the agent process to clean up and terminate), then polls #{pane_dead} every 500ms up to a configurable timeout. Agents that don't exit in time get their tmux window force-killed via kill-window. Per-agent status is reported (graceful exit, force-killed, or state unknown if tmux queries failed).
Per-agent teardown: gr spawn down --agent <name> tears down a single agent instead of the whole team.
Configurable timeout: gr spawn down --timeout <secs> (default: 10s) controls how long to wait before force-killing.

Shutdown model

This is /exit + tmux lifecycle, not OS-level signal escalation (SIGTERM/SIGKILL). The /exit command asks the agent CLI to shut down cooperatively. If the agent process does not exit within the timeout, kill-window destroys the tmux pane (which sends SIGHUP to the child). This is appropriate because agent processes (Claude Code, Codex CLI) respond to /exit but may not handle bare SIGTERM gracefully.

Error handling (review feedback)

pane_exit_state() returns Option<bool> tri-state: Some(true) = exited, Some(false) = running, None = tmux query failed. Tmux errors are logged as warnings, not silently treated as graceful exit.
send-keys, kill-window, and kill-session failures are logged with warning output instead of being silently ignored.

Premium boundary: core OSS (CLI workspace orchestration).

Test plan

gr spawn up a team, then gr spawn down -- verify all agents get /exit and exit gracefully
gr spawn down --agent apollo -- verify only that agent is torn down, session stays alive
gr spawn down --timeout 2 with a hung agent -- verify it force-kills after 2s with status message
cargo test -- all tests pass
cargo clippy and cargo fmt --check clean

🤖 Generated with Claude Code

laynepenney

Opus review: grip#567 (spawn down graceful shutdown)

Clean, well-structured implementation. Three-phase approach (send /exit, poll pane_dead, force-kill) is the right pattern.

Strengths:

is_pane_dead() correctly handles both #{pane_dead} check and "window already gone" fallback
500ms poll interval is reasonable; configurable --timeout (default 10s) gives operators control
Clear user-facing messaging (green check for graceful, red X for force-kill with timeout)
Per-agent teardown via --agent flag properly wired through dispatch
Spawn state written before shutdown starts (captures intent even if force-kill needed)

Minor notes:

The remain-on-exit tmux setting affects behavior: if not set, panes disappear on exit and is_pane_dead returns true via the fallback. Correct behavior, worth a comment in code.

Premium boundary: core OSS (workspace orchestration).

LGTM from Opus.

laynepenney

Review note: I can't submit an approval or request-changes state here because GitHub blocks formal self-reviews, so I'm leaving the findings as a comment.

src/cli/commands/spawn.rs:657-669 treats any tmux list-panes failure as pane dead = true. That means a broken tmux socket, wrong session target, or any other tmux error is reported as a graceful exit. The command can print success even when it never observed the pane state. This needs to distinguish 'window already gone' from 'tmux failed' and surface the latter as an error.
src/cli/commands/spawn.rs:585-647 ignores the status from every tmux send-keys, kill-window, and kill-session call. If teardown fails, gr spawn down still returns Ok(()) and prints that the agent exited or was force-killed. That is exactly the path where we need reliable error reporting.

Separately, the implementation is currently 'send /exit, wait, then kill the tmux window' rather than a true TERM/KILL escalation. If that is the intended model, the CLI/help text and PR framing should say that explicitly; otherwise the implementation should actually signal the child process before falling back to hard kill.

laynepenney

Review note: GitHub blocks formal self-approve/request-changes on this PR author, so I am leaving actionable review findings here.

Findings:

src/cli/commands/spawn.rs:657-669 treats any tmux list-panes failure as pane dead = true. That means a broken tmux socket, wrong session target, or any other tmux error is reported as a graceful exit. The command can print success even when it never observed the pane state. This should distinguish "window already gone" from generic tmux failure and surface the latter as an error.
src/cli/commands/spawn.rs:585-647 ignores the status from every tmux send-keys, kill-window, and kill-session call. If teardown fails, gr spawn down still returns Ok(()) and prints that the agent exited or was force-killed. This is exactly the path where we need reliable error reporting.

Separately, the implementation is currently send /exit, wait, then kill the tmux window rather than a true TERM/KILL escalation. If that is the intended model, the CLI/help text and PR framing should say that explicitly; otherwise the implementation should actually signal the child process before falling back to hard kill.

- is_pane_dead() -> pane_exit_state(): returns Option<bool> tri-state to distinguish "exited" / "still running" / "tmux error". No longer treats arbitrary tmux failures as graceful exit. - Log warnings from send-keys, kill-window, kill-session failures instead of silently discarding return statuses. - Report "state unknown" for agents where tmux queries failed, so the operator sees the ambiguity rather than a false "exited gracefully". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: gr spawn down waits for graceful exit before killing

299ff32

laynepenney commented Apr 14, 2026

View reviewed changes

laynepenney merged commit 9530863 into main Apr 14, 2026
9 of 10 checks passed

laynepenney deleted the fix/spawn-down-graceful branch April 14, 2026 15:15

laynepenney mentioned this pull request Apr 14, 2026

feat: gr spawn down graceful shutdown #568

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gr spawn down graceful shutdown + per-agent teardown#567

fix: gr spawn down graceful shutdown + per-agent teardown#567
laynepenney merged 2 commits intomainfrom
fix/spawn-down-graceful

laynepenney commented Apr 14, 2026 •

edited

Loading

Uh oh!

laynepenney left a comment

Uh oh!

laynepenney left a comment

Uh oh!

laynepenney left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

laynepenney commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Shutdown model

Error handling (review feedback)

Test plan

Uh oh!

laynepenney left a comment

Choose a reason for hiding this comment

Uh oh!

laynepenney left a comment

Choose a reason for hiding this comment

Uh oh!

laynepenney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

laynepenney commented Apr 14, 2026 •

edited

Loading