Skip to content

GUI read-model overhaul: server-assembled channel windows (Correct™ pagination + relay-signed bounds)#1500

Open
tlongwell-block wants to merge 29 commits into
mainfrom
overhaul/gui-read-model
Open

GUI read-model overhaul: server-assembled channel windows (Correct™ pagination + relay-signed bounds)#1500
tlongwell-block wants to merge 29 commits into
mainfrom
overhaul/gui-read-model

Conversation

@tlongwell-block

@tlongwell-block tlongwell-block commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

GUI Read-Model Overhaul: server-assembled channel windows

Problem

Messages that reliably appear via API/CLI were missing from GUI channels, and channel/thread opens took multiple seconds. Four stacked patches (#1473, #1478, #1483, #1486) each fixed a real bug and each revealed the next. The root disease: the timeline was a client-assembled soup of raw events paginated by author-controlled created_at — threads derived from the channel cache, ancestors spliced in by id, aux events backfilled by #e fan-out, and a "page until N visible rows" loop on top. Every merge path was a correctness hazard; every channel open was 3–10 round trips.

Deterministic reproducer (ancestor-island): a cached reply referencing an old root outside the contiguous window splices that root into the channel cache; scrollback then anchors its cursor on the island and pages backward from it, permanently stranding every row between the island and the real frontier. CLI sees them; the GUI never does. (parity-ancestor-island.spec.ts: 0/100 gap rows reachable on main after exhausted scroll.)

Approach

Paginate the channel by top-level rows, not raw events; the server assembles the page, the client renders it. No new endpoints — a /query filter extension with nostr semantics everywhere (normative spec: docs/nips/NIP-CW.md; docs/bridge-channel-window.md remains as the ratified contract record).

  • Filter extension: top_level: true (+ include_summaries, include_aux) on POST /query
  • Composite (created_at DESC, id ASC) keyset cursor — until + before_id, both or neither (400 on half); no timestamp-only fallback, killing the dense-second dup/loss class
  • Relay-signed overlays, synthesized at query time, never stored, ingest-rejected from clients:
    • kind:39005 thread-summary per replied row (reply_count, participants — batched in one window-wide query, no per-root fan-out)
    • kind:39006 window-bounds, exactly one per response: {has_more, next_cursor} — the sole exhaustion authority; clients never infer "done" from row counts (limit+1 probe server-side; exact-multiple final pages lie)
  • Aux closure: two-hop (reactions/edits/deletions targeting rows, then deletions targeting aux), doesn't consume the row budget
  • Client: single get_channel_window command per page; cursor fed back verbatim from 39006; live updates via since: now subscription

Evidence

RED→GREEN flip, same fixture byte-identical in both states:

State Tree Result
main relay + client @ main 0/100 gap rows reachable (39s, exhausted scroll)
this PR relay + client from this branch 100/100 (18.4s, zero stranded rows)

The flip is CI-enforced: parity-ancestor-island.spec.ts runs under Desktop E2E Integration against the live relay (green in 20.3s on the tip — a real seed+scroll+assert execution, not a fast bail).

Full CI green on the final tree (run 28677545063; the tip SHA was subsequently amended for commit-trailer compliance only — tree byte-identical, git rev-parse HEAD^{tree} unchanged).

Suites on the assembled tip dfb1cd0: buzz-relay 462/462 unit, buzz-db 57/57 Postgres-backed (incl. dense-second tie paging, exact-multiple exhaustion, top-level predicate, summary join), live-wire interop e2e 25/25, desktop 1564/1564, tauri 873/873, mobile 395/1 skipped, focused window tests 17/17.

Formal verification

The frozen contract (v3, docs/bridge-channel-window.md, reviewer header is the durable record) carries a TLA+ invariant set (GuiRelaySync.tla). Two distinct sign-offs: the contract was ratified pre-implementation; a full theorem map then discharged every invariant against the assembled tip dfb1cd06, and was re-verified against the final tip after integration (spec-guardian sign-off; invariant-load-bearing files bit-identical between the two, additions strictly strengthen the discharge chain — see below) — Inv_NoPhantomEvents (relay-only overlay kinds rejected at ingest), Inv_ResumeIntervalComplete (limit+1 probe before kind filtering), Inv_NoSilentGap (signed, request-bound, atomic 39006), Inv_CursorSound (live overlay never mutates frozen pages), Inv_QuiescentConvergence (page-0 refetch on reconnect), Inv_CausalView (two-hop aux closure), and response atomicity. Known non-blocking gap: no retention floor on 39006 yet; oldest_retained can be added to its content JSON wire-break-free.

Landed after assembly (all reviewed + re-verified)

  • NIP-CW (docs/nips/NIP-CW.md): the frozen contract promoted to a normative NIP.
  • Thread read-model consumption (fix(desktop): consume relay thread read models): the thread panel consumes relay 39005 summaries via mergeThreadSummaries — a monotone merge (Math.max on counts/lastReplyAt, participant union capped slice(-3) favoring local knowledge). 39005 counts are a floor, not sole authority; deliberate, since the local reply cache can lead the relay summary. Also repaired a family of pre-existing scroll-anchor/deep-backfill spec failures present on the pre-assembly base.
  • Reconnect self-heal paging (fix(desktop): self-heal older-history paging after reconnect): hasOlderMessages was a private latch reset only on channel change — after a reconnect refreshed the window, an exhausted pre-reconnect latch kept the scroll observer uninstalled and froze paging at page one (the founding complaint's second mechanism). Now derived reactively from the store tail's hasMore — the same signed 39006.has_more the parser reads, so there is no parallel derivation to drift. Closes the user-visible half of Inv_QuiescentConvergence; relay-reconnect.spec.ts goes green with no test edit.
  • Parity spec defaults to the canonical :3000 relay (CI integration); BUZZ_E2E_RELAY_URL still overrides for isolated stacks.

Notes for reviewers

Branches merged (integration branch: overhaul/gui-read-model; assembled at dfb1cd0, final tip carries the post-assembly commits above)

  • overhaul/relay-window-api @ e30a301 (relay: buzz-db window query + bridge dispatch + interop e2e)
  • overhaul/client-core @ 6d6eed5 (client read model + cursor trust + e2e bridge handler + mock window contract; assembled tip dfb1cd0)
  • overhaul/test-harness @ 2edc527 (ancestor-island parity gate, nonce-stable seeding, bridge URL threading)

Named follow-ups (out of scope here)

npub1cc3ha7z055mu0rwwu7806t2wt8mj3pvu0uv5mfp2c50dahaqhczshdalg6 and others added 14 commits July 3, 2026 08:49
…d-model overhaul

Stands up a fully-isolated relay + backing services for the GUI read-model
overhaul test lane, and adds a hard-dataset seeder that publishes real signed
events through the relay ingest path (never raw SQL) so thread_metadata is
computed at ingest — the fidelity the server-assembled window surface reads.

- seedRelay.ts: signed-event seeder (finalizeEvent + X-Pubkey POST), barrier-
  ordered publish (parents before children), canonical tag shapes, and
  scenarios (dense same-second wall, deep thread, backdated, legacy root-only
  reply, two-hop aux closure, bulk 3k+, exact N*limit page boundary).
- docker-compose.harness.yml: pg/redis/minio on offset ports under the
  dedicated `buzz-harness` project (no collision with other stacks).
- start-isolated-test-relay.sh: one-command bring-up — compose up, schema apply
  + partitions, community/channel/member seed, build from source (rustup shim
  honors the pinned toolchain), tmux-launch relay, health-poll.
- setup-desktop-test-data.sh: make the community host overridable
  (BUZZ_COMMUNITY_HOST, default unchanged) so an alternate-port relay seeds its
  own tenant instead of silently binding to localhost:3000.
- bridge.ts: DEFAULT_RELAY_WS_URL follows BUZZ_E2E_RELAY_URL (matching seed.ts)
  so the suite points at the isolated relay without per-spec wiring.

Co-authored-by: npub1cc3ha7z055mu0rwwu7806t2wt8mj3pvu0uv5mfp2c50dahaqhczshdalg6 <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…kinds

Frozen contract v2 for the GUI read-model overhaul: top_level window
requests on the existing bridge /query, composite (created_at, id)
keyset via until+before_id, transitive aux closure, and two relay-only
query-time overlay kinds — 39005 thread-summary and 39006 window-bounds
(the sole exhaustion authority; row counts prove nothing on an
exact-multiple final page). Registered in the kind registry and
rejected at ingest via is_relay_only_kind.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…ed overlays

Implements docs/bridge-channel-window.md (contract v2) end to end.

buzz-db: get_channel_messages_top_level becomes get_channel_window(pool,
community, channel, limit, cursor, kinds) -> ChannelWindow { rows,
has_more, next_cursor }. Composite (created_at DESC, id ASC) keyset
cursor with no timestamp-only fallback, limit+1 exhaustion probe (the
sentinel row never leaves the module), thread summaries joined in-query
with participants batch-filled in one window-wide ROW_NUMBER query
instead of per-root fan-out.

buzz-relay bridge: POST /query filters with top_level:true dispatch to
the window path first (never fall through to feed/thread). Response
order: rows, aux closure (include_aux: two-hop — deletions/reactions/
edits/moves targeting rows, then deletions targeting aux), one
relay-signed 39005 per replied row (include_summaries), and exactly one
relay-signed 39006 window-bounds carrying has_more + next_cursor — the
sole exhaustion authority; clients must never infer it from row counts.
Half a cursor (until XOR before_id) is a 400.

Live-wire e2e (buzz-test-client): exact-multiple final page reports
has_more=false with two full rows; dense-second paging chains without
dup/loss; replies stay out of rows; 39005 reply_count and aux reactions
ride along; half-cursor 400 and client-submitted 39005/39006 ingest
rejection.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Broken on main since #1463 added the 0003 NIP-11 icon migration: the
parser test already asserts migrations[2].version == 3, but the
Postgres-backed applied-versions assertion still expected [1, 2].

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Co-authored-by: npub12gtutshhh76rx0jx697f32f9tffd4hhp3hx58fp4x6u4uemkm7sqf8f757 <5217c5c2f7bfb4333e46d17c98a9255a52dadee18dcd43a43536b95e6776dfa0@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub12gtutshhh76rx0jx697f32f9tffd4hhp3hx58fp4x6u4uemkm7sqf8f757 <5217c5c2f7bfb4333e46d17c98a9255a52dadee18dcd43a43536b95e6776dfa0@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub12gtutshhh76rx0jx697f32f9tffd4hhp3hx58fp4x6u4uemkm7sqf8f757 <5217c5c2f7bfb4333e46d17c98a9255a52dadee18dcd43a43536b95e6776dfa0@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub12gtutshhh76rx0jx697f32f9tffd4hhp3hx58fp4x6u4uemkm7sqf8f757 <5217c5c2f7bfb4333e46d17c98a9255a52dadee18dcd43a43536b95e6776dfa0@sprout-oss.stage.blox.sqprod.co>
…el overhaul)

The current-main disease the overhaul deletes: an out-of-band ancestor merge
poisons the scrollback cursor. useLoadMissingAncestors fetches an old thread
root by id and merges it into the channel cache; pageOlderMessages then anchors
oldestTimestamp on that island and pages backward from it, permanently skipping
every relay row between the island and the true frontier. Rows the relay returns
never render — the exact CLI-present / GUI-missing bug.

This is the live-relay + real-UI twin of the proven unit fixture (239cc16):
seed an old root, a band of top-level gap rows above it, and a newest window
whose one reply points at the old root (the ancestor-fetch trigger); open the
channel, wait for the merge, scroll to the history floor, and assert every gap
row the relay holds is reachable. RED on main at seen.size == 0 of 100 gap rows
(the pager strands the entire span behind the island).

- parity-ancestor-island.spec.ts: the gate. Nonce-scoped per run so the shared
  `general` channel's cross-run accumulation cannot inflate the reachable set
  and false-green it — verified RED (0/100) even on a contaminated relay.
- seedRelay.ts: `ancestorIsland` scenario builder (nonce-tagged content).
- bridge.ts: installRelayBridge threads BUZZ_E2E_RELAY_URL into BOTH the HTTP
  and WS transports; relay mode defaulted HTTP to :3000, so channel-list/feed
  queries missed an isolated relay and surfaced as "Failed to fetch".
- playwright.config.ts: run the parity spec in the relay project.

Expected to flip to 100/100 reachable once the client rebases onto the
server-assembled windowed read model, whose relay-owned page cursor an
out-of-band ancestor or thread-subtree merge cannot move.

Co-authored-by: npub1cc3ha7z055mu0rwwu7806t2wt8mj3pvu0uv5mfp2c50dahaqhczshdalg6 <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Co-authored-by: npub12gtutshhh76rx0jx697f32f9tffd4hhp3hx58fp4x6u4uemkm7sqf8f757 <5217c5c2f7bfb4333e46d17c98a9255a52dadee18dcd43a43536b95e6776dfa0@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub12gtutshhh76rx0jx697f32f9tffd4hhp3hx58fp4x6u4uemkm7sqf8f757 <5217c5c2f7bfb4333e46d17c98a9255a52dadee18dcd43a43536b95e6776dfa0@sprout-oss.stage.blox.sqprod.co>
…D→GREEN flip

The spec's island-root precondition was a hard 30s `expect(...).toBeVisible()`.
That only holds on the diseased client: main's useLoadMissingAncestors fetches
the old root by id and merges it into the timeline, so "island root" appears —
the injected island is what strands the pager. The overhaul DELETES
useLoadMissingAncestors (the relay-owned window cursor can't be moved by an
out-of-band merge), so the island is never injected and that row never appears.
The hard wait made the spec fail at the precondition on the cured client,
before it could reach the reachability invariant that actually flips.

Relax the precondition to a best-effort `waitFor(...).catch()`: on main it
settles the poison before scrollback; on the overhaul it times out harmlessly
and we proceed. The gap-reachability assertion is unchanged — it is the
invariant that flips 0/100 (stranded) → 100/100 (reachable), and it is what
keeps the RED honest: the diseased client still establishes the poison and
strands all 100 rows.

Verified both states, same fixture, live relay + real UI:
- main (useLoadMissingAncestors present): RED, seen.size = 0/100, ~46s scroll.
- integrated window read-model tree: GREEN, seen.size = 100/100, 0 missing.

Co-authored-by: npub1cc3ha7z055mu0rwwu7806t2wt8mj3pvu0uv5mfp2c50dahaqhczshdalg6 <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The overhaul's channel open dispatches the get_channel_window Tauri command
(src-tauri/src/commands/channel_window.rs). The e2e bridge had no handler for
it, so under Playwright the timeline rendered empty and the ancestor-island
parity spec could not flip GREEN.

handleGetChannelWindow mirrors channel_window.rs / build_channel_window_filter:
top_level dispatch with include_summaries + include_aux, composite (until,
before_id) cursor (both-or-neither), returning the relay-assembled array
unchanged so 39005 summaries + the single 39006 bounds event ride along. The
client derives cursor/exhaustion from 39006 only, never from rows. Mock-mode
returns top-level rows in relay order; relay-mode goes through /query.

Twin of channel_window.rs by hand (mock-mode mirrors contract v2) — a future
contract change must touch both.

Co-authored-by: npub1cc3ha7z055mu0rwwu7806t2wt8mj3pvu0uv5mfp2c50dahaqhczshdalg6 <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The get_channel_window mock branch returned rows only, but every caller
runs parseChannelWindowResponse, which requires exactly one kind-39006 bounds
event and throws otherwise. So ordinary mock-mode channel opens broke even
though the relay parity gate (parity-ancestor-island, relay mode) stayed GREEN.

Fix (additive):
- Mock branch now emits exactly one correctly-keyed 39006 bounds event. The d
  tag matches expectedBoundsKey (<channel>:head at the frontier, else
  <channel>:<created_at>:<event_id> of the request cursor); has_more/next_cursor
  agree, computed from a limit+1-style probe (candidates past the cursor > cap).
- Mock branch now honors the composite (until, before_id) cursor like the relay
  keyset (created_at DESC, id ASC), so page 2 advances instead of re-returning
  page 1.

Adds channel-window-mock-paging.spec.ts: seeds 75 top-level rows into the empty
random channel, walks page-1 (head) -> page-2 (signed cursor), asserts each page
parses through the real parseChannelWindowResponse and the union has no
duplication and no loss. Registered in the smoke project.

Mock-mode mirrors contract v2 by hand alongside the Tauri command — a future
contract change must touch both.

Co-authored-by: npub1cc3ha7z055mu0rwwu7806t2wt8mj3pvu0uv5mfp2c50dahaqhczshdalg6 <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…l (client + relay)

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…, GREEN on this tree)

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…dow extension

Promote the frozen bridge-channel-window contract to NIP form:
kinds 39005/39006, the top_level/include_summaries/include_aux
filter extension, wire-derivable top-level classification, composite
keyset cursor with server-authoritative has_more, access scoping,
strict-parser-safe degradation, and explicit overlay trust profiles
(TLS-origin as shipped; identity-verified for other clients).
bridge-channel-window.md now points to NIP-CW as normative.

Spec red-teamed and independently validated by Wren against the
relay implementation, desktop parser, and live wire probes
(#buzz-gui-formal-relay-interaction-spec, thread a7c68013).

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

@wesbillman wesbillman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the server-assembled, top-level-row window and composite cursor direction, but the desktop integration is not merge-safe yet. I found four correctness blockers:

  1. Relay summaries are dead data, so channel thread affordances regress. parseChannelWindowResponse stores each 39005 in ChannelWindowPage.rows[].thread, and channelWindowThreadSummaries() can expose them, but that helper has no callers. flattenChannelWindowEvents() deliberately emits only raw rows/aux, and replies were removed from the channel cache. formatTimelineMessages / buildMainTimelineEntries therefore cannot derive the existing MessageThreadSummaryRow for top-level messages. Reply counts/participants/last-reply UI disappear even though the relay computes them. The unread/thread badge derivations in useChannelUnreadState likewise still consume only the reply-free formatted timeline.

  2. Live broadcast replies disappear from the channel timeline. In useChannelSubscription.appendMessage, every content event with a parent returns through the thread-only branch before checking the ["broadcast", "1"] marker. But the new window contract explicitly defines depth-1 broadcast replies as top-level rows. A broadcast reply arriving live is written only to threadRepliesKey; it does not enter liveOverlay and remains absent from the timeline until a later page-zero refetch.

  3. A partial live reply cache suppresses the authoritative thread fetch. The same live branch calls setQueryData(threadRepliesKey(...)) even when that thread query has never run. React Query now considers that newly created entry fresh for useThreadReplies' 60-second staleTime. Opening the thread during that interval can render only the live reply(s), without running the full paged getThreadReplies query. Live delivery must not masquerade as a complete/fresh subtree (track completeness separately, invalidate, or ensure opening always fetches authoritative data).

  4. Retained live overlays eventually collide with authoritative older pages. replaceNewestChannelWindow removes a live row only when its ID appears in the refreshed page zero. If enough newer rows push that event below page zero, it remains in liveOverlay. Paging down later returns the same event authoritatively, and appendOlderChannelWindow throws on the overlap. Reconcile overlays by the authoritative interval/cursor chain (or remove matching overlays as every page is appended), with a regression covering: initial page → live row → enough new rows to push it out of refreshed page zero → paginate until that row.

There is also a weaker but real continuity concern: subscribeToChannelLive uses since: Date.now() only after the independent page-zero request. That creates a request/subscription handoff gap and excludes newly ingested backdated events (the exact class the PR says cannot be trusted by author timestamp). The old overlapping subscription avoided this. Establish the subscription boundary before/atomically with the snapshot, or overlap and dedupe.

The architecture is promising, and the composite keyset/server-owned exhaustion model is substantially better than the current client-derived cursor. But these are user-visible missing/incomplete-thread and stalled-pagination failures, so I recommend fixing them and adding focused client regressions before merge.

The dense-second reachability smoke still asserted the legacy
get_channel_messages_before escape hatch fired, but 87eaa42 replaced
older-history paging with the channel-window read path — the command
never runs, so the assertion failed while the product path was correct
(instrumented repro: 9 get_channel_window calls, seen.size === 450).

Assert the new invariant instead: at least one *continuation*
get_channel_window request (payload.cursor != null) — the head load
always issues the command with a null cursor, so bare containment would
be vacuous. Reachability parity assertion unchanged.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

@wesbillman wesbillman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relay-side review (Brain) — server half of the split review with Pinky

Verdict on the relay/server side: sound. I read the full relay/db diff, then verified against a fresh isolated stack (clean Postgres 15 + Redis + this branch's relay): all 4 new Postgres-gated channel_window tests pass, both new interop e2e window tests pass (test_channel_window_rows_overlays_and_exact_multiple_exhaustion, test_channel_window_rejects_half_cursor_and_client_overlay_kinds), and the full buzz-core/buzz-relay/buzz-db lib suites are green (193/462/79). CI's Backend Integration (relay e2e) is also green — the red checks are all desktop-side, consistent with the client blockers already filed.

Verified invariants (code + live):

  • Keyset predicate created_at < $ts OR (created_at = $ts AND id > $id) matches the (created_at DESC, id ASC) total order, and is index-backed by idx_events_community_channel_created. bytea > is bytewise, as the spec requires.
  • limit + 1 probe runs after all predicates (access, deleted_at, top-level, kinds) — same SQL statement, so no false has_more.
  • next_cursor is the scan position captured before reconstruction filtering; next_cursor = Some ⇔ has_more holds by construction.
  • 39005/39006 are in is_relay_only_kind, so client submission is rejected at ingest (e2e-pinned).
  • Inaccessible/nonexistent channel → silent empty result, no rows, no bounds — no existence oracle, matching the spec's access-scoping section.
  • Aux closure two-hop logic is correct (hop 2 keys off hop-1 event ids via mem::take, dedup by id, access check per event).
  • The handled set means a window filter is never double-served by the feed/thread/catchall paths.
  • The migration-count test change (vec![1,2]vec![1,2,3]) is a legitimate fix: 0003_community_icon.sql already exists on main and main's assertion is stale — I reproduced the fresh-DB migration run and it applies 3 versions.

Two minor, non-blocking relay findings:

  1. Spec-conformance gap on malformed before_id. NIP-CW §Request says a malformed cursor value "MUST cause rejection — it MUST NOT be ignored and demoted to a half cursor or a head request." But extract_before_id maps malformed → None, so top_level: true with a malformed before_id and no until becomes (None, None) — a silent head request, exactly the demotion the spec forbids. (Malformed before_id with until is fine: it becomes a half cursor and 400s.) One-line fix: on the window path, distinguish "key absent" from "key present but malformed" and 400 the latter. Impact is near-zero for the shipped client (it only echoes relay-issued cursors), but the doc is normative and says MUST.
  2. Multiple window filters in one request each append rows + one 39006 into the same flat array. The d-tag request binding disambiguates the bounds overlays, and rows carry h tags, so cross-channel batching partitions cleanly — but two windows of the same channel at different cursors would have indistinguishable row sets. The shipped client sends one filter per request, so this is a doc footnote, not a defect.

I defer to Pinky's changes-requested review for the merge gate — his four desktop integration blockers are real and the client half is where the holes are. The relay contract itself (window query, cursor semantics, overlay synthesis, ingest rejection, exhaustion authority) is correct and well-tested, and the NIP-CW/bridge contract doc accurately describes what the relay actually does, modulo finding 1.

npub1qyvc0c5kl4gqv2fd97fsk46tu378sqgy35vc83rvgfwne90sel7s0ed67d and others added 5 commits July 3, 2026 13:53
Original author: Wren (npub12gtutshhh76rx0jx697f32f9tffd4hhp3hx58fp4x6u4uemkm7sqf8f757)

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Original author: Wren (npub12gtutshhh76rx0jx697f32f9tffd4hhp3hx58fp4x6u4uemkm7sqf8f757)

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Reconnect refreshes the newest channel window with fresh `hasMore:true`
rows, but `useFetchOlderMessages` kept a private `hasOlderMessages` latch
that only reset on channelId change. When the pre-reconnect window had
exhausted (flipping the latch false), the scroll observer in
`useLoadOlderOnScroll` stayed uninstalled after reconnect, freezing paging
at page one — history older than the refreshed window was unreachable
(Tyler's founding complaint: present in API/CLI, missing from the GUI).

Derive `hasOlderMessages` reactively from the window store's tail
`hasMore` — the authoritative signal already kept correct by
`replaceNewestChannelWindow`/`appendOlderChannelWindow`. A passive
`useQuery` on the window key with `select: channelWindowHasMore` gives the
observer a self-healing signal that re-arms the moment the refreshed
window reports more history. `fetchOlder`'s early-return guard reads the
same store fresh at call time, so a re-armed observer never fires into a
stale-ref no-op. Empty `pages: []` reports false: no cursor to page with,
authoritative the instant the first window lands.

Per-page-on-scroll is untouched — one 50-row page per gesture, no
reintroduction of the #1153 multi-batch loop. `relay-reconnect.spec.ts`
goes green with no test edit.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
… override

`parity-ancestor-island.spec.ts` defaulted `BUZZ_E2E_RELAY_URL` to
`http://localhost:3030` — the documented override port for an isolated
read-model harness (TESTING.md), not the canonical relay. CI integration
boots the relay on :3000 (`just relay`, every other integration spec and
all docs) and does not export `BUZZ_E2E_RELAY_URL`, so the spec's seed
fetch hit a dead :3030 and failed pre-assertion with `fetch failed`.

Default to :3000 to match the canonical relay; the `BUZZ_E2E_RELAY_URL`
override still points anyone running an isolated :3030 stack at it.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
@tlongwell-block tlongwell-block force-pushed the overhaul/gui-read-model branch from 8bc6841 to f2a551f Compare July 3, 2026 18:46
npub12gtutshhh76rx0jx697f32f9tffd4hhp3hx58fp4x6u4uemkm7sqf8f757 and others added 7 commits July 3, 2026 15:28
Keep depth-one broadcast replies in both their thread cache and the main timeline live overlay, matching the relay-owned channel-row classification.

When pagination acquires a live-overlay row, transfer ownership to the authoritative page. Establish the live subscription before a final page-zero refresh so events during the original fetch cannot fall between history and live delivery.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…d-request demotion

Review finding on #1500: extract_before_id returned Option<Vec<u8>>, so a
present-but-malformed before_id decoded to None and the window request was
silently demoted to a head fetch — the client got page zero instead of an
error, violating NIP-CW's cursor grammar (malformed cursor MUST reject).

Three-state BeforeId enum (Absent/Valid/Malformed): both the window path and
the general query path now 400 on Malformed; Absent keeps prior behavior.
Unit tests rewritten for the enum; live-relay e2e regression added
(malformed before_id -> 400, confirmed RED on the pre-fix binary).

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…he mock bridge

The e2e mock bridge classified channel-window top-level rows purely by
`rootEventId === null`, silently excluding broadcast depth-1 replies that the
real relay serves as timeline rows (buzz-db `thread.rs`; NIP-CW §Top-level
Classification). This let the mock disagree with the relay on the exact event
class the live-window fixes concern.

Add `isMockBroadcastReply` / `isMockTopLevelRow` (top-level iff depth 0, OR
depth-1 AND broadcast) and apply at both window predicate sites
(`get_channel_messages_before`, `get_channel_window`). Harness-only; keeps the
mock a faithful twin of the relay so the accompanying regressions test the
client, not a mock divergence.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Two regressions for the PR #1500 client-side live-window review. Both are RED at
this base and GREEN on wren/review-live-window-fixes @ 9a533a9.

Blocker 2 — live broadcast reply must enter the authoritative window store.
`appendMessage` routes every parented timeline event into the thread cache and
returns before the window-store merge, gating on `parentId !== null` with no
broadcast check, so a live broadcast depth-1 reply never reaches the
`channel-window` `liveOverlay`. The spec asserts the durable store invariant
(overlay contains the broadcast reply, not the ordinary control reply) rather
than a DOM row — the timeline render is masked by a second unfiltered subscriber
(`useLiveChannelUpdates`), and the window store is the authoritative source
`flattenChannelWindowEvents` rebuilds from.

Blocker 4 — older-page append must reconcile the live overlay by content, not
just count. When a page-zero refresh pushes a live row below page zero and
pagination re-serves it, `appendOlderChannelWindow` leaves the row in both the
page and the overlay; `flattenChannelWindowEvents` sets page rows before the
overlay, so a stale/pending overlay copy shadows the authoritative relay row.
The added store test asserts the rendered copy is the confirmed relay row —
sharper than the existing count-only reconcile check.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
#1500 review blockers 2+4, subscription handoff)

* origin/wren/review-live-window-fixes:
  fix(desktop): close live channel window gaps

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…w blockers 2+4, mock bridge broadcast-row parity

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…id instead of silent head-request demotion (PR #1500 relay review finding 1)

* fix/window-cursor-malformed-before-id:
  fix(relay): reject malformed before_id with 400 instead of silent head-request demotion

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Formatter-only line wrap; keeps the integration branch clean under
`pnpm check`.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
tlongwell-block pushed a commit that referenced this pull request Jul 3, 2026
#1500 review blockers 2+4, subscription handoff)

* origin/wren/review-live-window-fixes:
  fix(desktop): close live channel window gaps
tlongwell-block pushed a commit that referenced this pull request Jul 3, 2026
…id instead of silent head-request demotion (PR #1500 relay review finding 1)

* fix/window-cursor-malformed-before-id:
  fix(relay): reject malformed before_id with 400 instead of silent head-request demotion
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
@tlongwell-block

Copy link
Copy Markdown
Collaborator Author

Thanks for the split review — the four desktop blockers and both relay findings were all real. Everything is addressed on the branch; head is 36b91e50. Response keyed to your numbering, with the regression evidence for each.

Desktop blockers (Pinky)

1. Relay 39005 summaries were dead data — fixed in 41901aef (fix(desktop): consume relay thread read models). channelWindowThreadSummaries() is now consumed at ChannelScreen.tsx:448; summaries flow through buildMainTimelineEntries/threadPanel.ts into the thread affordance rows, and useChannelUnreadState picks them up. Covered by new cases in threadPanel.test.mjs.

2. Live broadcast replies dropped from the timeline — fixed in 9a533a9e (merged via f448b7b6). appendMessage keeps dual routing deliberately: a parented event always enters the thread cache, and a ["broadcast","1"] reply now continues into the window-store live overlay instead of returning early — broadcast replies belong in both places per NIP-CW §Top-level.

One honest refinement to your framing, found while building the regression: at the reviewed tip the rendered timeline was masked by a second, unfiltered live subscriber (useLiveChannelUpdates writes every channel event into channel-messages), so the user-visible row was usually present — but only as an orphan cache write that any window-store rebuild erased. The durable bug was that the broadcast reply never reached the authoritative window store. The new e2e regression asserts exactly that invariant (live-broadcast-reply-timeline.spec.ts): RED at the reviewed tip, GREEN now. It includes a non-broadcast depth-1 control so a "merge everything" fix can't false-green it.

3. Live-written thread cache masquerading as fresh — fixed in 41901aef. useThreadReplies now has staleTime: 0 (useThreadReplies.ts:41), so opening a thread always runs the authoritative paged fetch; live writes can't suppress it.

4. Overlay/older-page collision throw — fixed in 9a533a9e. appendOlderChannelWindow now removes overlay entries whose IDs arrive in the appended authoritative page, so reconciliation happens on every page append, not just page-zero refresh. Two regressions in channelWindowStore.test.mjs: your exact scenario (live row → newer rows push it below page zero → paginate to it → reconcile, not throw), plus a sharper content-divergence case asserting the authoritative relay copy wins when the overlay holds a stale/optimistic version of the same ID. Both RED at the reviewed tip, GREEN now.

Subscription handoff gap (since: now after page-zero) — fixed in 9a533a9e. Order inverted: subscribe first, then once the live REQ reaches EOSE/readiness, page zero is invalidated and refetched. The live overlay covers the fetch interval and the authoritative response reconciles the overlap — no window between fetch and subscription.

Relay findings (Brain)

1. Malformed before_id silently demoted to a head request — fixed in 1b03128d. extract_before_id is now a three-state BeforeId enum (Absent / Valid / Malformed); Malformed → 400 on the window path and the general query path (same grammar; no reason to keep a silent-ignore anywhere). Regression added to test_channel_window_rejects_half_cursor_and_client_overlay_kinds: confirmed 200 (the bug) against a relay running the pre-fix binary, 400 after — plus the unit tests rewritten for the enum.

2. Same-channel multi-window ambiguity — agreed it's a doc footnote, not a defect. Queued as a NIP-CW follow-up alongside the other non-blocking notes (39006 oldest_retained retention floor, relay-ordered participants) rather than growing this PR.

A regression we introduced and fixed while responding (full disclosure)

The blocker-2 fix's post-EOSE page-zero refetch (9a533a9e, the subscription-handoff inversion) introduced a real regression that CI caught at the assembled tip: thread-unread.spec.ts went 13/13 → 5-6 failing. Root cause (bisected in code, deterministic): refreshNewestWindow() rebuilt channel-messages wholesale from the window store, but get_channel_window returns top-level rows + aux + 39005 summaries only — so live thread replies already in the cache were dropped by the refetch, killing the unread-badge derivation. Same mechanism also clobbered pending optimistic rows (a DM-open refetch raced the in-flight send).

Fixed in 36b91e50 (23 lines, hooks.ts only): the refetch now reconciles instead of clobbering — the flattened authoritative window is merged with retained prior rows, where retention is limited to pending optimistic rows and non-broadcast thread replies, and on any id collision the authoritative relay copy wins (retained rows are filtered out when the refetch returned the same id — so this cannot reintroduce the blocker-4 stale-shadow bug; broadcast depth-1 rows are excluded from retention entirely because they come from the authoritative window). Invariant: a live thread reply or pending optimistic row present before a page-zero refetch survives it; anything the relay returned authoritatively replaces what was retained. Verified independently by three of us at the identical tree, all --retries=0 fresh builds: thread-unread 13/13, the DM optimistic-row spec (mentions.spec.ts:1425) green 4/4 consecutive runs, both blocker regressions and channelWindowStore 14/14 still green.

Verification at 36b91e50 (assembled tip, not per-lane)

  • CI green on the head SHA (both workflows)
  • cargo test -p buzz-relay: 463 green; interop e2e 25/25 against a live relay
  • Desktop: typecheck green, 1568/1568 unit (14/14 channelWindowStore), pnpm check green
  • Playwright integration 108/108 against a fresh relay built at this tip (includes parity-ancestor-island); smoke suite green in CI at --retries=0-honest local runs for every spec the review touched
  • The RED→GREEN matrix for blockers 2/4 was independently re-verified at the merged tip in a separate worktree, and the mock bridge's top-level predicate was corrected to mirror the relay's broadcast depth-1 classification (5d2482e2) — the harness had a latent contract divergence on exactly the event class blocker 2 concerns; the parity fix is isolated in its own commit so the specs' RED-at-tip stands independently.

Re-requesting review.

@tlongwell-block tlongwell-block force-pushed the overhaul/gui-read-model branch from 36b91e5 to eecc778 Compare July 3, 2026 21:30
@tlongwell-block

Copy link
Copy Markdown
Collaborator Author

Head note: the branch was force-pushed 36b91e50eecc7784 to satisfy the DCO gate — three merge commits were missing sign-off trailers. Message-only rewrite via commit-tree: every tree is byte-identical (final tree 6884e907 unchanged, git diff between old and new head is empty), so all verification in the comment above applies to eecc7784 as-is. CI + DCO both green on the new head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants