refactor(relay): consolidate GOAWAY shutdown to one owner per session#30
Merged
Merged
Conversation
Shutdown previously sent GOAWAY from three places (Stop's broadcast, serveSession's per-session stopWatch goroutine, and addSession's race cover) and implemented "GOAWAY → wait grace → force-close" twice (Stop and stopWatch), relying on SendGoaway idempotency to make the overlap harmless. The stopWatch goroutine existed only to cover the race where a session registers after Stop snapshots the live-session set. Close that race at its source instead. beginShutdown sets a shuttingDown flag and snapshots the session set in one step under sessionsMu; addSession reads the flag under the same lock when it registers. This partitions every session into exactly one owner: in Stop's snapshot (drained by Stop) or registered afterward (drained by addSession via the new drainStraggler helper) — never both, with no dependence on idempotency. The per-session stopWatch goroutine is deleted; serveSession no longer watches for Stop. Net: three GOAWAY senders → two disjoint ones, two drain implementations → one bulk (Stop) plus a single-session mirror for stragglers. Adds TestRelay_addSessionDrainsStraggler to pin the straggler path that stopWatch used to cover. No behavior change for the common path; existing Stop/GOAWAY tests pass unchanged under -race. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Relay shutdown spread the GOAWAY / drain / force-close lifecycle across three GOAWAY senders and two drain implementations, kept correct only by
SendGoawayidempotency:Stop's broadcast loop — for every session in its snapshot.serveSession's per-sessionstopWatchgoroutine — "GOAWAY → wait grace → close", a full duplicate of Stop's per-session logic.addSession's race cover — sent a GOAWAY (but did not close) when it observedstopChalready closed.The
stopWatchgoroutine existed only to cover the race where a session finishes SETUP and registers viaaddSessionafterStopsnapshotted the live-session set (so the snapshot misses it). Its own comment admitted it was "BOTH the failsafe for the race ... AND the normal per-session drain path."Change
Close the race at its source so there is exactly one owner per session, with no reliance on idempotency:
beginShutdownsets ashuttingDownflag and snapshots the session set in one step undersessionsMu.addSessionreadsshuttingDownunder the same lock when it registers.Because both happen under
sessionsMu, every session lands in exactly one bucket:beginShutdown's critical sectionbeginShutdown's critical sectionshuttingDown=true→ drained byaddSessionvia the newdrainStragglerNever both. The per-session
stopWatchgoroutine is deleted;serveSessionno longer watches for Stop.drainStragglerruns underr.handlers, soStop'shandlers.Wait()still joins it.Result: 3 GOAWAY senders → 2 disjoint ones; 2 drain implementations → 1 bulk (Stop) + a single-session mirror for the rare straggler.
Testing
TestRelay_addSessionDrainsStraggler(white-box) drives the straggler path deterministically:beginShutdownwith an empty set, thenaddSessionof a live session, and asserts the peer receives GOAWAY and the session is force-closed at the grace boundary. This is the pathstopWatchused to cover.Stop/ GOAWAY / cleanup tests pass unchanged under-race(TestRelay_StopBroadcastsGoaway,StopReturnsEarlyOnCleanDrain,StopForceClosesOnTimeout,InboundGoaway*,StopIsIdempotent,SessionCleanup_*).go test -race ./pkg/relay/... ./pkg/moqt/session/..., fullgo test ./...,golangci-lint run ./pkg/relay/— 0 issues.No behavior change for the common shutdown path.
🤖 Generated with Claude Code