Fix flaky TestNetworkEvents (checkpoint before triggering events)#292
Merged
Conversation
waitForNew snapshotted the event count after the messages that trigger the awaited event were already sent. Events are published asynchronously (readLoop dispatch and the async response-body goroutine), so the event could be appended before the snapshot, get treated as old, and the wait would block until its deadline. This caused intermittent ~2s/~3s timeouts in the loading_failed and binary_resource_skips_body subtests. Replace the internal snapshot with an explicit checkpoint() taken before the trigger, and have waitForNew wait for events at or after it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR is in the To monitor this PR anyway, reply with |
archandatta
approved these changes
Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TestNetworkEventsintermittently failed in CI on theloading_failedandbinary_resource_skips_bodysubtests, each timing out after a suspiciously round ~2s / ~3s — thewaitForNewdeadlines.Root cause
eventCollector.waitForNewsnapshotted the current event count (skip := len(c.events)) inside the call, i.e. after the test had already sent the messages that trigger the awaited event. Events are published asynchronously relative to the test goroutine —readLoopdispatches CDP messages on its own goroutine, andhandleLoadingFinishedpublishesnetwork_responsefrom anasyncWg.Gogoroutine. So the triggering event could be appended toc.eventsbeforewaitForNewrecorded its checkpoint. The just-published event then fell belowskip, was treated as "old", and the wait blocked until its deadline.Only subtests where a same-typed event had already been emitted earlier in the shared monitor were affected (
request_and_responseuseswaitFor, which scans all events, so it was immune). On fast unloaded machines the publish reliably lost the race against the checkpoint, so it passed locally; under CI load it didn't.This was confirmed by injecting a small delay before the checkpoint, which reproduces the exact two failing subtests at ~2.04s / ~3.04s; with the fix the same delay passes.
Fix
Add
eventCollector.checkpoint(), taken before the triggering messages are sent, and havewaitForNewwait for events at or after that index. Updated all four call sites. No production code changed.Test plan
go test ./lib/cdpmonitor/ -run TestNetworkEvents -count=100 -racepassesgo test ./lib/cdpmonitor/ -run TestNetworkEvents -count=100 -race -cpu=1passesgo test ./lib/cdpmonitor/ -race -count=10passesgo vet ./lib/cdpmonitor/clean🤖 Generated with Claude Code
Note
Low Risk
Test harness changes only; no runtime CDP monitor behavior is modified.
Overview
Fixes intermittent CI timeouts in CDP monitor handler tests by changing how
eventCollector.waitForNewdecides which events count as "new."checkpoint()snapshots the current event slice length; callers take it before sending CDP messages that should produce the awaited event.waitForNewnow takes that index and only matches events at or after it, instead of recording the skip count insidewaitForNewafter triggers may already have published asynchronously (read loop / async response-body work).handlers_test.goupdates four sites: console non-string args, network loading failed / binary image response, and per-sessionnetwork_idleattribution (checkpoint after sess-b's idle, then finish sess-a's request).No production
cdpmonitorcode changes.Reviewed by Cursor Bugbot for commit 6640ed2. Bugbot is set up for automated code reviews on this repo. Configure here.