Skip to content

fix: reconnect-path hardening (C1′ barrier, catch-up reinsert, reconnect window) — v0.3.1#3

Merged
grrowl merged 4 commits into
mainfrom
fix/reconnect-hardening
Jun 11, 2026
Merged

fix: reconnect-path hardening (C1′ barrier, catch-up reinsert, reconnect window) — v0.3.1#3
grrowl merged 4 commits into
mainfrom
fix/reconnect-hardening

Conversation

@grrowl

@grrowl grrowl commented Jun 11, 2026

Copy link
Copy Markdown
Owner

What & why

Three pre-existing, user-reachable bugs in the reconnect/catch-up path,
surfaced during the SSR review (PR #2 / the grill session) but independent of
SSR
: they run against released @tanstack/db (>=0.6), carry no vendored
tarballs, and have tests that don't depend on the PR-1564 vendored build.
Today's non-SSR users are exposed, so these shouldn't wait for upstream to ship.

Cherry-picked from feat/ssr (d61e259, 0a378a7, 71e6bde); the third was
adapted because main has no scheduleReconnect/forced-reconnect machinery —
see below.

Fixes

  1. Cursor barrier (C1′)src/server/sync-do.ts. A snapshot's snap-end
    or a catch-up's uptodate carries the current seq, which may include changes
    whose deltas are still buffered in the coalescer for that socket. The client's
    single cursor then claims a seq it never applied; a drop before the tick loses
    the write permanently (reconnect resumes past it). The server now flushes the
    socket's pending deltas before any cursor-advancing emission. Reachable on any
    multi-collection reconnect. Generalizes ADR-0002 C1 from committed to all
    cursor boundaries.

  2. Held-key catch-up reinsertsrc/client/do-collection.ts. A catch-up
    emits the latest CDC op per key, so a key deleted-and-reinserted while the
    client was away arrives as insert for a key the client still holds.
    TanStack throws DuplicateKeySyncError on insert-over-existing, aborting the
    whole catch-up transaction and wedging the client on stale state. The adapter
    now applies a held-key insert as the upsert it semantically is (the move-in
    update-upsert contract, ADR-0002 C4).

  3. Reconnect windowsrc/client/transport.ts. The reconnecting flag was
    set inside the reconnect timer, so a mutation fired within reconnectDelayMs
    of a drop established the fresh socket before the timer ran — with no
    resubscribe, leaving every subscription silently dead on the new socket (and
    the late timer wedged the flag). The flag is now set when the reconnect is
    scheduled, so whichever connect wins — timer- or demand-driven — resubscribes
    from the cursor.

Adaptation note (fix 3)

main has no scheduleReconnect()/forceReconnect()/suppressAdvance (those
are SSR-only). The fix here is the equivalent one-line hoist in the close
handler's inline reconnect timer. The suppressAdvance half of the original
race (a frozen cursor on the forced-reconnect path) does not exist on main, so
this commit carries only the close-handler hoist, and the test's WHY drops that
sentence. Reviewed adversarially with codex --model gpt-5.5: hoisting is
correct given connect() is idempotent and demand-callable; no new race.

ADR-0011 references

ADR-0011 (SSR design) is unmerged. CHANGELOG entries are self-contained — no
ADR-0011 reference.
Code/test comments keep it as provenance (the fixes
genuinely originated there); those resolve once feat/ssr lands.

Verification

  • npm run typecheck — clean.
  • npm run test143 passed (36 files), including the three new tests
    (cursor-barrier, reinsert-catchup, reconnect-window), each of which
    fails without its fix.

Release

Includes chore(release): v0.3.1 (CHANGELOG [0.3.1] + package.json bump),
mirroring the 0.3.0 release commit. The v0.3.1 git tag and npm publish are
intentionally left for @grrowl to run
(authed publish).

🤖 Generated with Claude Code

grrowl and others added 4 commits June 11, 2026 15:59
…n (C1')

A snapshot's snap-end or a catch-up's uptodate carries the current seq,
which can include changes whose deltas are still buffered in the egress
coalescer for that socket. The client's single cursor then claims a seq
it never applied; a drop before the tick loses the write permanently
(reconnect resumes past it). Found by adversarial review of the SSR plan
(ADR-0011) but pre-existing: any multi-collection reconnect could hit it.

ADR-0002 C1 generalized: a socket's pending coalesced deltas always
precede any cursor boundary on that socket.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A catch-up emits the latest CDC op per changed key, so a key deleted-
and-reinserted while the client was away arrives as op=insert for a key
the client still holds. TanStack's sync write throws
DuplicateKeySyncError on insert-over-existing (unless deep-equal),
aborting the whole catch-up transaction and wedging the client on stale
state. Map a held-key insert to update — the same update-upsert contract
move-in already relies on (ADR-0002 C4). Pre-existing on reconnect;
load-bearing for SSR hydration catch-up (ADR-0011 D4).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Grill-session finding (Q9), pre-existing: a connect() triggered on
demand -- a mutation fired within reconnectDelayMs of a drop --
established the fresh socket with the reconnecting flag still false: no
resubscribeAll, every subscription silently dead on the new socket, and
the late timer connect() early-returned, wedging the flag. The flag now
sets when the reconnect is SCHEDULED, so whichever connect() establishes
-- timer- or demand-driven -- runs the resubscribe path. Pinned with a
fake-socket test driving the exact interleaving.

Adapted for main (0.3.x): the forced-reconnect/suppressAdvance variant
of this race lives in the unmerged SSR work; only the plain reconnect
path is on this branch, so this carries just the close-handler hoist.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three pre-existing reconnect-path fixes, surfaced by the SSR review but
independent of it (released @tanstack/db, no vendored tarballs): cursor
barrier C1' (flush pending deltas before any cursor-advancing emission),
held-key catch-up reinsert applied as an upsert (no DuplicateKeySyncError
wedge), and the reconnecting flag set at scheduling so a demand-driven
connect in the reconnect window still resubscribes. See CHANGELOG.

CHANGELOG entries are self-contained (no ADR-0011 reference, which is
unmerged SSR design); code/test comments keep it as provenance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@grrowl grrowl merged commit f616989 into main Jun 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant