Skip to content

Fix/129 label events cross path dedup#135

Open
enjoyandlove wants to merge 2 commits into
entrius:testfrom
enjoyandlove:fix/129-label-events-cross-path-dedup
Open

Fix/129 label events cross path dedup#135
enjoyandlove wants to merge 2 commits into
entrius:testfrom
enjoyandlove:fix/129-label-events-cross-path-dedup

Conversation

@enjoyandlove
Copy link
Copy Markdown
Contributor

Summary

Fixes duplicate rows in label_events caused by the live webhook and backfill paths recording the same label action under two different identities.

The root cause: the live webhook payload carries no GraphQL LabeledEvent/UnlabeledEvent node id, so the live path could only key a row on timestamp — and it stored the mirror-receive time, while backfill stored GitHub's createdAt. The two clocks never coincide for the same action, so the old uq_label_events_natural_key (which includes timestamp) never collapsed the pair: every action seen by both paths produced two rows.

This PR introduces a path-independent identity and reconciles the two paths around it:

  • github_node_id column on label_events — the globally-unique GraphQL node id, the only path-independent event identity. Set on authoritative backfill rows; NULL on provisional live-webhook rows.
  • Live path writes a provisional row (github_node_id NULL, timestamp = mirror-receive time).
  • Backfill writes the authoritative row (github_node_id set, timestamp = GitHub createdAt) and then reconciles: it deletes each provisional duplicate the authoritative row supersedes, pairing 1:1 and nearest-in-time within a 120s delivery-latency window. A provisional row with no authoritative row to claim it survives, so events backfill hasn't captured yet are never lost.
  • Dedup guard repointed onto the stable identity: drop uq_label_events_natural_key, add a partial uq_label_events_github_node_id (WHERE github_node_id IS NOT NULL) so backfill↔backfill collapses by true identity while provisional rows are never constrained against each other.
  • One-time historic dedup in the migration, gated on the presence of the old index so it runs exactly once and is a no-op on every later (re)deploy.

The 120s reconcile window is defined in both github-fetcher.service.ts (LABEL_EVENT_RECONCILE_WINDOW_SECONDS) and 07_label_events.sql, and the two must stay in sync.

Related Issues

Fixes #129

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Documentation
  • Other (describe below)

Testing

  • Verified the live path inserts a provisional row (github_node_id NULL, mirror-receive timestamp) and backfill inserts the authoritative row (github_node_id set, createdAt timestamp).
  • Confirmed reconciliation deletes the provisional duplicate once the authoritative row lands, with pairing limited to the closest provisional row inside the 120s window.
  • Confirmed a provisional row with no matching authoritative row is preserved (events not yet/never backfilled are not lost).
  • Confirmed genuine repeat actions (add → remove → re-add) spaced beyond the window are not merged.
  • Confirmed the one-time historic dedup runs only when uq_label_events_natural_key is present and is a no-op on re-deploy.
  • Confirmed backfill re-runs / BullMQ retries collapse to a no-op via the partial uq_label_events_github_node_id index.

Checklist

  • I have read the Contributing Guide
  • Code builds without errors
  • New and existing tests pass (if applicable)
  • Documentation updated (if applicable)
  • No unnecessary dependencies added

@xiao-xiao-mao xiao-xiao-mao Bot added the bug Something isn't working label May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

1 participant