Skip to content

refactor(relay): consolidate GOAWAY shutdown to one owner per session#30

Merged
floatdrop merged 1 commit into
draft-18from
refactor/consolidate-goaway-shutdown
Jun 28, 2026
Merged

refactor(relay): consolidate GOAWAY shutdown to one owner per session#30
floatdrop merged 1 commit into
draft-18from
refactor/consolidate-goaway-shutdown

Conversation

@floatdrop

Copy link
Copy Markdown
Owner

Problem

Relay shutdown spread the GOAWAY / drain / force-close lifecycle across three GOAWAY senders and two drain implementations, kept correct only by SendGoaway idempotency:

  1. Stop's broadcast loop — for every session in its snapshot.
  2. serveSession's per-session stopWatch goroutine — "GOAWAY → wait grace → close", a full duplicate of Stop's per-session logic.
  3. addSession's race cover — sent a GOAWAY (but did not close) when it observed stopCh already closed.

The stopWatch goroutine existed only to cover the race where a session finishes SETUP and registers via addSession after Stop snapshotted the live-session set (so the snapshot misses it). Its own comment admitted it was "BOTH the failsafe for the race ... AND the normal per-session drain path."

Change

Close the race at its source so there is exactly one owner per session, with no reliance on idempotency:

  • beginShutdown sets a shuttingDown flag and snapshots the session set in one step under sessionsMu.
  • addSession reads shuttingDown under the same lock when it registers.

Because both happen under sessionsMu, every session lands in exactly one bucket:

When the session registers Owner
Before beginShutdown's critical section In Stop's snapshot → drained by Stop (steps 3–5)
After beginShutdown's critical section Sees shuttingDown=true → drained by addSession via the new drainStraggler

Never both. The per-session stopWatch goroutine is deleted; serveSession no longer watches for Stop. drainStraggler runs under r.handlers, so Stop's handlers.Wait() still joins it.

Result: 3 GOAWAY senders → 2 disjoint ones; 2 drain implementations → 1 bulk (Stop) + a single-session mirror for the rare straggler.

Testing

  • New TestRelay_addSessionDrainsStraggler (white-box) drives the straggler path deterministically: beginShutdown with an empty set, then addSession of a live session, and asserts the peer receives GOAWAY and the session is force-closed at the grace boundary. This is the path stopWatch used to cover.
  • All existing Stop / GOAWAY / cleanup tests pass unchanged under -race (TestRelay_StopBroadcastsGoaway, StopReturnsEarlyOnCleanDrain, StopForceClosesOnTimeout, InboundGoaway*, StopIsIdempotent, SessionCleanup_*).
  • go test -race ./pkg/relay/... ./pkg/moqt/session/..., full go test ./..., golangci-lint run ./pkg/relay/ — 0 issues.

No behavior change for the common shutdown path.

🤖 Generated with Claude Code

Shutdown previously sent GOAWAY from three places (Stop's broadcast,
serveSession's per-session stopWatch goroutine, and addSession's race
cover) and implemented "GOAWAY → wait grace → force-close" twice (Stop
and stopWatch), relying on SendGoaway idempotency to make the overlap
harmless. The stopWatch goroutine existed only to cover the race where a
session registers after Stop snapshots the live-session set.

Close that race at its source instead. beginShutdown sets a shuttingDown
flag and snapshots the session set in one step under sessionsMu;
addSession reads the flag under the same lock when it registers. This
partitions every session into exactly one owner: in Stop's snapshot
(drained by Stop) or registered afterward (drained by addSession via the
new drainStraggler helper) — never both, with no dependence on
idempotency.

The per-session stopWatch goroutine is deleted; serveSession no longer
watches for Stop. Net: three GOAWAY senders → two disjoint ones, two
drain implementations → one bulk (Stop) plus a single-session mirror for
stragglers.

Adds TestRelay_addSessionDrainsStraggler to pin the straggler path that
stopWatch used to cover. No behavior change for the common path; existing
Stop/GOAWAY tests pass unchanged under -race.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@floatdrop floatdrop merged commit 5bc5a20 into draft-18 Jun 28, 2026
9 checks passed
@floatdrop floatdrop deleted the refactor/consolidate-goaway-shutdown branch June 28, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant