fix: reconnect-path hardening (C1′ barrier, catch-up reinsert, reconnect window) — v0.3.1#3
Merged
Merged
Conversation
…n (C1') A snapshot's snap-end or a catch-up's uptodate carries the current seq, which can include changes whose deltas are still buffered in the egress coalescer for that socket. The client's single cursor then claims a seq it never applied; a drop before the tick loses the write permanently (reconnect resumes past it). Found by adversarial review of the SSR plan (ADR-0011) but pre-existing: any multi-collection reconnect could hit it. ADR-0002 C1 generalized: a socket's pending coalesced deltas always precede any cursor boundary on that socket. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A catch-up emits the latest CDC op per changed key, so a key deleted- and-reinserted while the client was away arrives as op=insert for a key the client still holds. TanStack's sync write throws DuplicateKeySyncError on insert-over-existing (unless deep-equal), aborting the whole catch-up transaction and wedging the client on stale state. Map a held-key insert to update — the same update-upsert contract move-in already relies on (ADR-0002 C4). Pre-existing on reconnect; load-bearing for SSR hydration catch-up (ADR-0011 D4). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Grill-session finding (Q9), pre-existing: a connect() triggered on demand -- a mutation fired within reconnectDelayMs of a drop -- established the fresh socket with the reconnecting flag still false: no resubscribeAll, every subscription silently dead on the new socket, and the late timer connect() early-returned, wedging the flag. The flag now sets when the reconnect is SCHEDULED, so whichever connect() establishes -- timer- or demand-driven -- runs the resubscribe path. Pinned with a fake-socket test driving the exact interleaving. Adapted for main (0.3.x): the forced-reconnect/suppressAdvance variant of this race lives in the unmerged SSR work; only the plain reconnect path is on this branch, so this carries just the close-handler hoist. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three pre-existing reconnect-path fixes, surfaced by the SSR review but independent of it (released @tanstack/db, no vendored tarballs): cursor barrier C1' (flush pending deltas before any cursor-advancing emission), held-key catch-up reinsert applied as an upsert (no DuplicateKeySyncError wedge), and the reconnecting flag set at scheduling so a demand-driven connect in the reconnect window still resubscribes. See CHANGELOG. CHANGELOG entries are self-contained (no ADR-0011 reference, which is unmerged SSR design); code/test comments keep it as provenance. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Three pre-existing, user-reachable bugs in the reconnect/catch-up path,
surfaced during the SSR review (PR #2 / the grill session) but independent of
SSR: they run against released
@tanstack/db(>=0.6), carry no vendoredtarballs, and have tests that don't depend on the PR-1564 vendored build.
Today's non-SSR users are exposed, so these shouldn't wait for upstream to ship.
Cherry-picked from
feat/ssr(d61e259,0a378a7,71e6bde); the third wasadapted because
mainhas noscheduleReconnect/forced-reconnect machinery —see below.
Fixes
Cursor barrier (C1′) —
src/server/sync-do.ts. A snapshot'ssnap-endor a catch-up's
uptodatecarries the current seq, which may include changeswhose deltas are still buffered in the coalescer for that socket. The client's
single cursor then claims a seq it never applied; a drop before the tick loses
the write permanently (reconnect resumes past it). The server now flushes the
socket's pending deltas before any cursor-advancing emission. Reachable on any
multi-collection reconnect. Generalizes ADR-0002 C1 from
committedto allcursor boundaries.
Held-key catch-up reinsert —
src/client/do-collection.ts. A catch-upemits the latest CDC op per key, so a key deleted-and-reinserted while the
client was away arrives as
insertfor a key the client still holds.TanStack throws
DuplicateKeySyncErroron insert-over-existing, aborting thewhole catch-up transaction and wedging the client on stale state. The adapter
now applies a held-key insert as the upsert it semantically is (the move-in
update-upsert contract, ADR-0002 C4).
Reconnect window —
src/client/transport.ts. Thereconnectingflag wasset inside the reconnect timer, so a mutation fired within
reconnectDelayMsof a drop established the fresh socket before the timer ran — with no
resubscribe, leaving every subscription silently dead on the new socket (and
the late timer wedged the flag). The flag is now set when the reconnect is
scheduled, so whichever connect wins — timer- or demand-driven — resubscribes
from the cursor.
Adaptation note (fix 3)
mainhas noscheduleReconnect()/forceReconnect()/suppressAdvance(thoseare SSR-only). The fix here is the equivalent one-line hoist in the close
handler's inline reconnect timer. The
suppressAdvancehalf of the originalrace (a frozen cursor on the forced-reconnect path) does not exist on
main, sothis commit carries only the close-handler hoist, and the test's WHY drops that
sentence. Reviewed adversarially with
codex --model gpt-5.5: hoisting iscorrect given
connect()is idempotent and demand-callable; no new race.ADR-0011 references
ADR-0011 (SSR design) is unmerged. CHANGELOG entries are self-contained — no
ADR-0011 reference. Code/test comments keep it as provenance (the fixes
genuinely originated there); those resolve once
feat/ssrlands.Verification
npm run typecheck— clean.npm run test— 143 passed (36 files), including the three new tests(
cursor-barrier,reinsert-catchup,reconnect-window), each of whichfails without its fix.
Release
Includes
chore(release): v0.3.1(CHANGELOG[0.3.1]+package.jsonbump),mirroring the 0.3.0 release commit. The
v0.3.1git tag andnpm publishareintentionally left for @grrowl to run (authed publish).
🤖 Generated with Claude Code