-
Notifications
You must be signed in to change notification settings - Fork 4
Fix erratic queue behavior in party play view #487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Claude Reviewpackages/web/app/components/graphql-queue/QueueContext.tsx
|
Claude ReviewNo changes to review. This PR has 0 file changes - the fix commit was reverted by a subsequent commit. The PR description mentions investigating further. Consider closing this PR if the approach isn't viable, or force-push a new implementation when ready. |
Claude ReviewIssues found:
|
Claude ReviewPerformance Issue - QueueContext.tsx:259-276: The cleanup effect creates new timeouts for ALL pending updates every time the array changes. This means rapidly navigating through 50 items creates and clears 1+2+3+...+50 = 1275 timeouts instead of 50. Each pending update should have its timeout set only once. Potential Memory Leak - QueueContext.tsx:264-270: The setTimeout callback captures the uuid by closure but the dispatch function may be stale if the component re-renders. While dispatch from useReducer is stable, this pattern is fragile if refactored. Logic Issue - reducer.ts:166-168: The deduplication check returns early without updating pendingCurrentClimbUpdates for local updates. If a user rapidly clicks the same climb, it will not be tracked as pending, but the server echo will still arrive and potentially cause issues since it is not in the pending list. |
Claude Reviewpackages/web/app/components/graphql-queue/QueueContext.tsx:727 - Missing explicit packages/web/app/components/graphql-queue/QueueContext.tsx:294-297 - The cleanup function in useEffect runs on every dependency change, not just unmount. This could cause issues if the component re-renders while timeouts are still active - timeouts get cleared prematurely before the new effect runs. packages/web/app/components/queue-control/reducer.ts:183 - Potential memory issue: using |
Claude ReviewMinor Issues:
No critical bugs or security issues found. The implementation correctly handles the race condition between optimistic updates and server echoes. |
Claude ReviewIssues Found
|
Claude ReviewIssues Found
|
When navigating rapidly in party mode, server echoes of optimistic updates could overwrite newer local state, causing the queue to jump erratically. The fix tracks pending optimistic updates and skips server events that match them. This ensures that when a user swipes quickly through the queue, each navigation is preserved and not reverted by delayed server responses. - Add pendingCurrentClimbUpdatesRef to track UUIDs of pending updates - Skip CurrentClimbChanged events that match pending updates - Clear pending updates on FullSync (full state reset) - Add timeout to cleanup stale pending entries (5 seconds) - Remove pending entries on error to prevent future updates being blocked
The previous fix for queue navigation race conditions was causing the party connection to drop on each navigation. Reverting to investigate further.
When navigating rapidly in party mode, server echoes of optimistic updates could overwrite newer local state, causing the queue to jump erratically. This fix uses the reducer to track pending local updates instead of a ref, which is safer and avoids potential issues with modifying state during subscription callbacks. Changes: - Add pendingCurrentClimbUpdates array to QueueState - Add isServerEvent flag to DELTA_UPDATE_CURRENT_CLIMB action - Reducer tracks local updates as pending, skips server echoes that match - Pending list is bounded to last 10 entries to prevent unbounded growth - Clear pending updates on INITIAL_QUEUE_DATA (full sync)
This commit addresses critical issues in the pending updates mechanism and adds comprehensive test coverage: Fixes: - Add missing 5-second timeout cleanup for stale pending updates - Add error handling to remove pending updates on failed server requests - Increase pending array limit from 10 to 50 items for rapid navigation - Fix logic order: check pending updates before deduplication - Fix test initialization to include pendingCurrentClimbUpdates Changes: - Add CLEANUP_PENDING_UPDATE action type and reducer handler - Add useEffect in QueueContext for timeout-based cleanup - Add error handling in setCurrentClimbQueueItem - Fix order of checks in DELTA_UPDATE_CURRENT_CLIMB reducer Tests Added: - 13 new unit tests for server event handling and cleanup - 6 new integration tests covering rapid navigation and edge cases - All 79 queue-control tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
CRITICAL FIX: The previous timeout cleanup implementation caused cascading re-render storms leading to component crashes and websocket reconnection loops. Problem: - useEffect re-ran on EVERY pendingCurrentClimbUpdates change - Cleared ALL timeouts and recreated them for ALL items - Rapid clicks created exponential effect re-runs - Component crashes → websocket reconnections → multiple connections Solution: - Use useRef Map to track timeouts per UUID - Only create timeouts for NEW items (not already tracked) - Only clear timeouts for items removed from pending - Remove dispatch from dependency array (it's stable) Impact: - Eliminates re-render storm during rapid navigation - Prevents component crashes and websocket reconnection loops - Maintains 5-second timeout cleanup for stale pending updates - All 79 tests still passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace complex timeout-based cleanup with simpler timestamp-based cleanup directly in the reducer. This eliminates ~40 lines of useEffect complexity and prevents potential re-render storms.
Changes:
- Change pendingCurrentClimbUpdates from string[] to Array<{ uuid: string; addedAt: number }>
- Add automatic stale entry filtering (>5 seconds) in DELTA_UPDATE_CURRENT_CLIMB reducer
- Remove entire timeout useEffect and useRef Map from QueueContext.tsx
- Update CLEANUP_PENDING_UPDATE to work with timestamp objects
- Update all test assertions to work with timestamp objects
- Add test for timestamp-based stale cleanup
Benefits:
- Pure reducer logic with no side effects
- Deterministic cleanup on every state transition
- No crash loops from effect dependencies
- Easier to test and maintain
- Same 5-second cleanup behavior, simpler implementation
All 80 tests passing ✓
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace UUID-based echo detection with client ID matching to properly distinguish between own updates and updates from other users. This fixes the issue where other users' queue changes caused weird jumps in the play view. **Problem:** - Previously used UUID matching to detect server echoes - If User A navigated to climb-2 (pending) and then User B also navigated to climb-2, User A would skip User B's update because climb-2 was in the pending list - This caused "weird jumps" when other clients changed the queue state **Solution:** - Add clientId to CurrentClimbChanged event payload (schema, backend, subscription) - Primary echo detection: compare eventClientId === myClientId - Fallback to pending list for backward compatibility - Updates from other clients are now properly applied **Changes:** - Schema: Added clientId: ID! to CurrentClimbChanged type - Backend: Include ctx.connectionId as clientId when publishing events - Subscription: Query clientId field in CurrentClimbChanged fragment - Reducer: Use clientId comparison as primary echo detection method - QueueContext: Pass eventClientId and myClientId in dispatch - Tests: Added 4 new tests for clientId-based detection scenarios **Test Coverage:** - Own echo skipping (clientId match) - Other client updates applied (clientId mismatch) - Same climb from different clients handled correctly - Fallback to pending list when clientIds unavailable All 84 tests passing ✓ (+4 new tests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements a comprehensive reliability system for party mode queue synchronization based on industry patterns from Kafka, Spotify playlists, and database replication. Phase 1 - Sequence Numbers & State Hashes: - Add sequence number to all queue events for gap detection - Add state hash (FNV-1a) for periodic drift verification - Client validates sequence continuity and logs gaps - Periodic hash verification every 60 seconds Phase 2 - Delta Sync (Event Replay): - Store last 100 events per session in Redis (5-min TTL) - Add eventsReplay GraphQL query for efficient recovery - Client attempts delta sync on reconnection for gaps ≤ 100 - Falls back to full sync for larger gaps or errors Phase 3 - Bug Fix: - Fix pendingTimestamps Map in QueueContext (was recreated every render) - Now uses useRef for proper persistence across renders This builds on the previous echo detection work to provide: - Gap detection during normal operation - Efficient recovery from brief disconnections - State verification to detect any drift - Graceful degradation to full sync when needed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…eview - Remove ClientQueueEvent type - use unified QueueEvent everywhere - Remove GraphQL field aliases (addedItem→item, currentItem→item) - Fix stale closure in sequence gap detection using ref - Add batch cleanup action to prevent render thrashing - Fix updateQueueState calls with missing sequence/stateHash params - Add auto-resync on hash mismatch detection - Keep state hash in sync after delta events - Add sequence column to Postgres (migration 0027) - Add exponential backoff for session restoration lock - Reduce cleanup timing from 10s/5s to 5s/2s - Add proper error logging for event buffer failures 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
25c93fe to
05bb8ef
Compare
Claude ReviewIssues Found
|
GraphQL doesn't allow the same field name with different nullability in a union type. QueueItemAdded.item is non-null while CurrentClimbChanged.item is nullable, causing subscription errors. Solution: - Restore field aliases in GraphQL operations (addedItem: item, currentItem: item) - Add SubscriptionQueueEvent type for client-side use with aliased field names - Add transformToSubscriptionEvent function to convert eventsReplay responses - Update all client-side handlers to use SubscriptionQueueEvent 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Claude ReviewIssues Found:
Minor:
|
Claude ReviewIssues Found
|
Summary
Comprehensive fix for party mode queue reliability issues, addressing erratic navigation behavior and implementing robust synchronization.
Problem
When navigating rapidly in party mode, server echoes of optimistic updates could overwrite newer local state, causing the queue to jump erratically. Additionally, reconnection after brief disconnections could result in missed events.
Solution
Echo Detection (Previous Commits)
Sequence-Based Reliability (This Commit)
Delta Sync for Reconnection
eventsReplayGraphQL query for efficient recoveryBug Fix
pendingTimestampsMap in QueueContext (was recreated every render, breaking cleanup logic)Technical Details
Based on industry patterns from:
Files changed:
packages/shared-schema/: Updated GraphQL schema and TypeScript typespackages/backend/src/: Sequence tracking, hash computation, event bufferpackages/web/: Client-side validation, delta sync on reconnectionTest Plan
🤖 Generated with Claude Code