Skip to content

perf(etl): upsert social writes in place instead of demote-then-insert#331

Merged
raymondjacobson merged 1 commit into
mainfrom
perf/social-write-upserts
Jun 2, 2026
Merged

perf(etl): upsert social writes in place instead of demote-then-insert#331
raymondjacobson merged 1 commit into
mainfrom
perf/social-write-upserts

Conversation

@raymondjacobson
Copy link
Copy Markdown
Contributor

Summary

Replaces the demote-then-insert pattern for social writes (reposts / saves / follows / subscriptions) with an in-place upsert. Each action previously did UPDATE ... SET is_current=false then inserted a brand-new row, accumulating one is_current=false row per toggle. Now it's a single:

INSERT INTO <table> (...) VALUES (...)
ON CONFLICT (<identity>) WHERE is_current = true
DO UPDATE SET is_delete = EXCLUDED.is_delete, created_at = ..., txhash = ..., blocknumber = ...
  • Migration 0030 adds a partial unique index per table (... WHERE is_current = true) as the upsert arbiter. Partial on the current row only, so the "one current row per identity" invariant already holds — no dedup or backfill against existing data, and historical is_current=false rows are simply not indexed.
  • Bounds table growth (one current row per identity instead of a row per toggle).
  • Gives the downstream api aggregate triggers an O(1) is_delete transition to track instead of a count(*) recount.

Companion PR (deployment order matters)

Pairs with AudiusProject/api#898, which switches handle_repost/handle_save/handle_follow to transition-aware deltas + AFTER INSERT OR UPDATE. This PR must deploy first — those triggers only produce correct counts once writes are in-place upserts.

Validation

  • go build ./... clean.
  • Full entity_manager Repost/Save/Follow/Subscription test suite passes.
  • Manually verified against Postgres: repost → unrepost → re-repost yields 1 row (not 3) with the latest txhash and correct is_delete; the partial unique index lets legacy is_current=false rows coexist and blocks a second is_current=true row.

Test plan

  • CI green.
  • Migration 0030 applies cleanly against a prod-shaped DB (index build time on large reposts/saves/follows tables is the main thing to watch).
  • Re-delivery of the same chain tx is idempotent (DO UPDATE writes identical content).

🤖 Generated with Claude Code

Reposts/saves/follows/subscriptions replace the demote-then-insert pattern
(UPDATE is_current=false + INSERT new row per action) with a single
INSERT ... ON CONFLICT (identity) WHERE is_current = true DO UPDATE,
keying on a new partial unique index per table (migration 0030).

This bounds table growth (one current row per identity rather than a row
per toggle) and gives the api aggregate triggers an O(1) is_delete
transition to track instead of a count(*) recount. The partial index is
on is_current=true only, so historical is_current=false rows are ignored
and no dedup/backfill is required against existing data.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@raymondjacobson raymondjacobson merged commit 24084d2 into main Jun 2, 2026
8 of 9 checks passed
@raymondjacobson raymondjacobson deleted the perf/social-write-upserts branch June 2, 2026 04:12
raymondjacobson added a commit to AudiusProject/api that referenced this pull request Jun 2, 2026
Packages OpenAudio/go-openaudio#331 (merged), which switches the etl
indexer from demote-then-insert to upsert-in-place for reposts/saves/
follows/subscriptions. The delta-based aggregate triggers in this PR
require that behavior to track is_delete transitions correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
raymondjacobson added a commit to AudiusProject/api that referenced this pull request Jun 2, 2026
…top (#898)

## Summary

Replaces the per-write `count(*)` recounts in `handle_repost` /
`handle_save` with O(1) deltas (consistency with `handle_follow`), and
adds a periodic reconciliation job as the drift backstop for all
aggregate counts.

### 1. Delta-based aggregate maintenance (hot path)
- `handle_repost.sql` / `handle_save.sql`: every `count(*)` recount →
`<col> = <col> + delta`.
- Delta is now **transition-aware**: `delta = (new active?1:0) − (UPDATE
& old active?1:0)`, where `active = NOT is_delete`. Correct for
upsert-in-place toggles; idempotent (delta 0) on no-op re-delivery.
- All three triggers (`handle_repost`, `handle_save`, `handle_follow`)
move from `AFTER INSERT` → `AFTER INSERT OR UPDATE` so the
entity_manager upsert path maintains counts. `handle_follow` gets the
same transition-delta for consistency.

### 2. Reconciliation backstop (off hot path)
- New `jobs/reconcile_aggregates.go` (`ReconcileAggregatesJob`), modeled
on `prune_plays.go`. Ports the three full-recompute queries from
discovery's `update_aggregates.py` (user / track / playlist counts +
`dominant_genre`).
- Scheduled every **10 min** in `indexer.go` (matches discovery's celery
cadence). Writes only the count columns — **column-disjoint** from the
score-only `AggregatesCalculator`, so they run concurrently without
collision.
- Faithful 1:1 port of the SQL, with one deliberate improvement:
nullable `dominant_genre` / `dominant_genre_count` comparisons use `is
distinct from` instead of Python's `!=`. `!=` returns NULL on a genre
flipping to/from NULL, so such a row is silently skipped until a count
also changes; `is distinct from` converges it. Never writes a different
value — strictly catches one edge case Python misses.

### 3. Packaged etl dependency
- Bumps `go-openaudio` + `go-openaudio/pkg/etl` to `5ed068b`, which
includes the now-merged **OpenAudio/go-openaudio#331** (upsert social
writes in place instead of demote-then-insert).
- This is a **hard requirement** for section 1: the `AFTER INSERT OR
UPDATE` triggers only track `is_delete` transitions correctly when the
indexer upserts the single `is_current` row in place. The dependency
ordering is satisfied by this bump — #331 is merged and packaged here,
so this PR is safe to merge on its own.

## Test plan
- [x] `go build ./...` passes against the bumped etl.
- [x] Rebased onto current `main`; dropped a stale copy of the
per-processor-timing change already merged via #897 (would have
double-declared `slowProcessorThreshold`).
- [ ] After deploy: confirm repost/save/follow counts
increment/decrement on toggle and stay correct on re-delivery.
- [ ] Confirm `ReconcileAggregatesJob` logs only when it corrects drift
(`corrected > 0`) and otherwise stays quiet.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
raymondjacobson added a commit that referenced this pull request Jun 2, 2026
social_subscription.go was the one social handler #331 left on
demote-then-insert; follow/repost/save were converted to ON CONFLICT
upserts. subscriptions is also the only social table with two writers
(explicit Subscribe + Follow auto-subscribe), and demote-then-insert is a
two-statement write: between the demote and the insert the other writer can
land a second is_current row. With no uniqueness constraint, that accumulated
the 92 duplicate current rows that later failed the 0030 index build.

Convert insertSubscription to the same single-statement ON CONFLICT upsert as
the Follow path so the arbiter index fully enforces one-current-row-per-
identity for both writers and dupes can't recur. The migration dedupe remains
required to clean pre-existing rows so the unique index can build.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
raymondjacobson added a commit that referenced this pull request Jun 2, 2026
social_subscription.go was the one social handler #331 left on
demote-then-insert; follow/repost/save were converted to ON CONFLICT
upserts. subscriptions is also the only social table with two writers
(explicit Subscribe + Follow auto-subscribe), and demote-then-insert is a
two-statement write: between the demote and the insert the other writer can
land a second is_current row. With no uniqueness constraint, that accumulated
the 92 duplicate current rows that later failed the 0030 index build.

Convert insertSubscription to the same single-statement ON CONFLICT upsert as
the Follow path so the arbiter index fully enforces one-current-row-per-
identity for both writers and dupes can't recur. The migration dedupe remains
required to clean pre-existing rows so the unique index can build.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
raymondjacobson added a commit that referenced this pull request Jun 2, 2026
…335)

social_subscription.go was the one social handler #331 left on
demote-then-insert; follow/repost/save were converted to ON CONFLICT
upserts. subscriptions is also the only social table with two writers
(explicit Subscribe + Follow auto-subscribe), and demote-then-insert is a
two-statement write: between the demote and the insert the other writer can
land a second is_current row. With no uniqueness constraint, that accumulated
the 92 duplicate current rows that later failed the 0030 index build.

Convert insertSubscription to the same single-statement ON CONFLICT upsert as
the Follow path so the arbiter index fully enforces one-current-row-per-
identity for both writers and dupes can't recur. The migration dedupe remains
required to clean pre-existing rows so the unique index can build.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant