perf(etl): upsert social writes in place instead of demote-then-insert#331
Merged
Conversation
Reposts/saves/follows/subscriptions replace the demote-then-insert pattern (UPDATE is_current=false + INSERT new row per action) with a single INSERT ... ON CONFLICT (identity) WHERE is_current = true DO UPDATE, keying on a new partial unique index per table (migration 0030). This bounds table growth (one current row per identity rather than a row per toggle) and gives the api aggregate triggers an O(1) is_delete transition to track instead of a count(*) recount. The partial index is on is_current=true only, so historical is_current=false rows are ignored and no dedup/backfill is required against existing data. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
raymondjacobson
added a commit
to AudiusProject/api
that referenced
this pull request
Jun 2, 2026
Packages OpenAudio/go-openaudio#331 (merged), which switches the etl indexer from demote-then-insert to upsert-in-place for reposts/saves/ follows/subscriptions. The delta-based aggregate triggers in this PR require that behavior to track is_delete transitions correctly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Merged
4 tasks
raymondjacobson
added a commit
to AudiusProject/api
that referenced
this pull request
Jun 2, 2026
…top (#898) ## Summary Replaces the per-write `count(*)` recounts in `handle_repost` / `handle_save` with O(1) deltas (consistency with `handle_follow`), and adds a periodic reconciliation job as the drift backstop for all aggregate counts. ### 1. Delta-based aggregate maintenance (hot path) - `handle_repost.sql` / `handle_save.sql`: every `count(*)` recount → `<col> = <col> + delta`. - Delta is now **transition-aware**: `delta = (new active?1:0) − (UPDATE & old active?1:0)`, where `active = NOT is_delete`. Correct for upsert-in-place toggles; idempotent (delta 0) on no-op re-delivery. - All three triggers (`handle_repost`, `handle_save`, `handle_follow`) move from `AFTER INSERT` → `AFTER INSERT OR UPDATE` so the entity_manager upsert path maintains counts. `handle_follow` gets the same transition-delta for consistency. ### 2. Reconciliation backstop (off hot path) - New `jobs/reconcile_aggregates.go` (`ReconcileAggregatesJob`), modeled on `prune_plays.go`. Ports the three full-recompute queries from discovery's `update_aggregates.py` (user / track / playlist counts + `dominant_genre`). - Scheduled every **10 min** in `indexer.go` (matches discovery's celery cadence). Writes only the count columns — **column-disjoint** from the score-only `AggregatesCalculator`, so they run concurrently without collision. - Faithful 1:1 port of the SQL, with one deliberate improvement: nullable `dominant_genre` / `dominant_genre_count` comparisons use `is distinct from` instead of Python's `!=`. `!=` returns NULL on a genre flipping to/from NULL, so such a row is silently skipped until a count also changes; `is distinct from` converges it. Never writes a different value — strictly catches one edge case Python misses. ### 3. Packaged etl dependency - Bumps `go-openaudio` + `go-openaudio/pkg/etl` to `5ed068b`, which includes the now-merged **OpenAudio/go-openaudio#331** (upsert social writes in place instead of demote-then-insert). - This is a **hard requirement** for section 1: the `AFTER INSERT OR UPDATE` triggers only track `is_delete` transitions correctly when the indexer upserts the single `is_current` row in place. The dependency ordering is satisfied by this bump — #331 is merged and packaged here, so this PR is safe to merge on its own. ## Test plan - [x] `go build ./...` passes against the bumped etl. - [x] Rebased onto current `main`; dropped a stale copy of the per-processor-timing change already merged via #897 (would have double-declared `slowProcessorThreshold`). - [ ] After deploy: confirm repost/save/follow counts increment/decrement on toggle and stay correct on re-delivery. - [ ] Confirm `ReconcileAggregatesJob` logs only when it corrects drift (`corrected > 0`) and otherwise stays quiet. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
raymondjacobson
added a commit
that referenced
this pull request
Jun 2, 2026
social_subscription.go was the one social handler #331 left on demote-then-insert; follow/repost/save were converted to ON CONFLICT upserts. subscriptions is also the only social table with two writers (explicit Subscribe + Follow auto-subscribe), and demote-then-insert is a two-statement write: between the demote and the insert the other writer can land a second is_current row. With no uniqueness constraint, that accumulated the 92 duplicate current rows that later failed the 0030 index build. Convert insertSubscription to the same single-statement ON CONFLICT upsert as the Follow path so the arbiter index fully enforces one-current-row-per- identity for both writers and dupes can't recur. The migration dedupe remains required to clean pre-existing rows so the unique index can build. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 tasks
raymondjacobson
added a commit
that referenced
this pull request
Jun 2, 2026
social_subscription.go was the one social handler #331 left on demote-then-insert; follow/repost/save were converted to ON CONFLICT upserts. subscriptions is also the only social table with two writers (explicit Subscribe + Follow auto-subscribe), and demote-then-insert is a two-statement write: between the demote and the insert the other writer can land a second is_current row. With no uniqueness constraint, that accumulated the 92 duplicate current rows that later failed the 0030 index build. Convert insertSubscription to the same single-statement ON CONFLICT upsert as the Follow path so the arbiter index fully enforces one-current-row-per- identity for both writers and dupes can't recur. The migration dedupe remains required to clean pre-existing rows so the unique index can build. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
raymondjacobson
added a commit
that referenced
this pull request
Jun 2, 2026
…335) social_subscription.go was the one social handler #331 left on demote-then-insert; follow/repost/save were converted to ON CONFLICT upserts. subscriptions is also the only social table with two writers (explicit Subscribe + Follow auto-subscribe), and demote-then-insert is a two-statement write: between the demote and the insert the other writer can land a second is_current row. With no uniqueness constraint, that accumulated the 92 duplicate current rows that later failed the 0030 index build. Convert insertSubscription to the same single-statement ON CONFLICT upsert as the Follow path so the arbiter index fully enforces one-current-row-per- identity for both writers and dupes can't recur. The migration dedupe remains required to clean pre-existing rows so the unique index can build. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the demote-then-insert pattern for social writes (reposts / saves / follows / subscriptions) with an in-place upsert. Each action previously did
UPDATE ... SET is_current=falsethen inserted a brand-new row, accumulating oneis_current=falserow per toggle. Now it's a single:... WHERE is_current = true) as the upsert arbiter. Partial on the current row only, so the "one current row per identity" invariant already holds — no dedup or backfill against existing data, and historicalis_current=falserows are simply not indexed.apiaggregate triggers an O(1)is_deletetransition to track instead of acount(*)recount.Companion PR (deployment order matters)
Pairs with AudiusProject/api#898, which switches
handle_repost/handle_save/handle_followto transition-aware deltas +AFTER INSERT OR UPDATE. This PR must deploy first — those triggers only produce correct counts once writes are in-place upserts.Validation
go build ./...clean.entity_managerRepost/Save/Follow/Subscription test suite passes.is_delete; the partial unique index lets legacyis_current=falserows coexist and blocks a secondis_current=truerow.Test plan
🤖 Generated with Claude Code