Fix/129 label events cross path dedup#135
Open
enjoyandlove wants to merge 2 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes duplicate rows in
label_eventscaused by the live webhook and backfill paths recording the same label action under two different identities.The root cause: the live webhook payload carries no GraphQL
LabeledEvent/UnlabeledEventnode id, so the live path could only key a row ontimestamp— and it stored the mirror-receive time, while backfill stored GitHub'screatedAt. The two clocks never coincide for the same action, so the olduq_label_events_natural_key(which includestimestamp) never collapsed the pair: every action seen by both paths produced two rows.This PR introduces a path-independent identity and reconciles the two paths around it:
github_node_idcolumn onlabel_events— the globally-unique GraphQL node id, the only path-independent event identity. Set on authoritative backfill rows;NULLon provisional live-webhook rows.github_node_idNULL, timestamp = mirror-receive time).github_node_idset, timestamp = GitHubcreatedAt) and then reconciles: it deletes each provisional duplicate the authoritative row supersedes, pairing 1:1 and nearest-in-time within a 120s delivery-latency window. A provisional row with no authoritative row to claim it survives, so events backfill hasn't captured yet are never lost.uq_label_events_natural_key, add a partialuq_label_events_github_node_id(WHERE github_node_id IS NOT NULL) so backfill↔backfill collapses by true identity while provisional rows are never constrained against each other.The 120s reconcile window is defined in both
github-fetcher.service.ts(LABEL_EVENT_RECONCILE_WINDOW_SECONDS) and07_label_events.sql, and the two must stay in sync.Related Issues
Fixes #129
Type of Change
Testing
github_node_idNULL, mirror-receive timestamp) and backfill inserts the authoritative row (github_node_idset,createdAttimestamp).uq_label_events_natural_keyis present and is a no-op on re-deploy.uq_label_events_github_node_idindex.Checklist