perf(api): delta-based social aggregate counts + reconciliation backstop#898
Merged
Merged
Conversation
Merged
3 tasks
handle_repost/handle_save now maintain aggregate counts with O(1) deltas instead of recounting with count(*) on every social write (consistency with handle_follow). The delta is transition-aware (active = not is_delete) so it is correct for upsert-in-place toggles and idempotent (delta 0) on re-delivery. All three triggers move to AFTER INSERT OR UPDATE to support the entity_manager upsert. Adds ReconcileAggregatesJob, a low-priority drift backstop (every 10m, matching discovery's update_aggregates celery cadence) that recomputes the count columns from source tables. Column-disjoint from the score-only AggregatesCalculator. NOTE: requires the go-openaudio entity_manager social-upsert change to be deployed FIRST. The AFTER UPDATE triggers only behave correctly once the demote-then-insert writes are replaced by in-place upserts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Packages OpenAudio/go-openaudio#331 (merged), which switches the etl indexer from demote-then-insert to upsert-in-place for reposts/saves/ follows/subscriptions. The delta-based aggregate triggers in this PR require that behavior to track is_delete transitions correctly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
e104dda to
468212b
Compare
aggregate_monthly_plays was seeded in the randomized map-order phase of Seed(), so it raced the plays fixtures. When plays seeded first, the handle_play trigger created the aggregate_monthly_plays row and the explicit fixture insert then collided (aggregate_monthly_plays_pkey). Seed it in the deterministic entity phase, before plays, so the trigger upserts the seeded row instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the per-write
count(*)recounts inhandle_repost/handle_savewith O(1) deltas (consistency withhandle_follow), and adds a periodic reconciliation job as the drift backstop for all aggregate counts.1. Delta-based aggregate maintenance (hot path)
handle_repost.sql/handle_save.sql: everycount(*)recount →<col> = <col> + delta.delta = (new active?1:0) − (UPDATE & old active?1:0), whereactive = NOT is_delete. Correct for upsert-in-place toggles; idempotent (delta 0) on no-op re-delivery.handle_repost,handle_save,handle_follow) move fromAFTER INSERT→AFTER INSERT OR UPDATEso the entity_manager upsert path maintains counts.handle_followgets the same transition-delta for consistency.2. Reconciliation backstop (off hot path)
jobs/reconcile_aggregates.go(ReconcileAggregatesJob), modeled onprune_plays.go. Ports the three full-recompute queries from discovery'supdate_aggregates.py(user / track / playlist counts +dominant_genre).indexer.go(matches discovery's celery cadence). Writes only the count columns — column-disjoint from the score-onlyAggregatesCalculator, so they run concurrently without collision.dominant_genre/dominant_genre_countcomparisons useis distinct frominstead of Python's!=.!=returns NULL on a genre flipping to/from NULL, so such a row is silently skipped until a count also changes;is distinct fromconverges it. Never writes a different value — strictly catches one edge case Python misses.3. Packaged etl dependency
go-openaudio+go-openaudio/pkg/etlto5ed068b, which includes the now-merged perf(etl): upsert social writes in place instead of demote-then-insert OpenAudio/go-openaudio#331 (upsert social writes in place instead of demote-then-insert).AFTER INSERT OR UPDATEtriggers only trackis_deletetransitions correctly when the indexer upserts the singleis_currentrow in place. The dependency ordering is satisfied by this bump — [API-221] Add /v1/dashboard_wallet_users #331 is merged and packaged here, so this PR is safe to merge on its own.Test plan
go build ./...passes against the bumped etl.main; dropped a stale copy of the per-processor-timing change already merged via obs(api): log per-processor timing in IndexChallengesJob #897 (would have double-declaredslowProcessorThreshold).ReconcileAggregatesJoblogs only when it corrects drift (corrected > 0) and otherwise stays quiet.