Skip to content

perf(etl): index route (owner_id, title_slug) for slug-collision lookups#337

Merged
raymondjacobson merged 1 commit into
mainfrom
fix/slug-collision-route-index
Jun 4, 2026
Merged

perf(etl): index route (owner_id, title_slug) for slug-collision lookups#337
raymondjacobson merged 1 commit into
mainfrom
fix/slug-collision-route-index

Conversation

@raymondjacobson
Copy link
Copy Markdown
Contributor

Summary

Slug-collision resolution (GenerateSlugAndCollisionID / GeneratePlaylistSlugAndCollisionID) runs this on every track/playlist create and rename:

SELECT MAX(collision_id) FROM track_routes WHERE owner_id = $1 AND title_slug = $2

The only matching index is the PK (owner_id, slug) — nothing leads with title_slug, so the MAX falls back to scanning every route the owner has.

This migration adds (owner_id, title_slug, collision_id) to track_routes and playlist_routes, turning the MAX into an index-only lookup. No Go change is needed: the collision loop already uses the PK (owner_id, slug) and normally runs once — the cost was entirely in the unindexed aggregate.

Evidence

The slow tail showed up in indexer entity manager indexed log analysis over an 8h window:

entity_type median max
Playlist 17.6 ms 44,466 ms
Track 9.8 ms 7,817 ms

Production EXPLAIN on a prolific owner (15,771 routes) confirms the cause — a full per-owner scan on every upload:

Aggregate  (cost=37847.77..37847.78 rows=1 width=4)
  ->  Bitmap Heap Scan on track_routes  (cost=626.50..37847.76 rows=1 width=4)
        Recheck Cond: (owner_id = 179934260)
        Filter: ((title_slug)::text = 'mountains-over-egypt-tech-house'::text)
        ->  Bitmap Index Scan on track_routes_pkey  (cost=0.00..626.50 rows=16276 width=0)
              Index Cond: (owner_id = 179934260)

The Bitmap Heap Scan fetches ~16k rows and filters title_slug in memory. With the new index this becomes a single index-only seek to the (owner_id, title_slug) group's max collision_id.

Rollout — avoid the deploy lock

These are plain (non-CONCURRENT) builds to fit the ETL's single-transaction migration runner. They take an ACCESS EXCLUSIVE lock while building (track_routes ~2M rows / ~493 MB). To avoid blocking route writes on deploy, pre-build them CONCURRENTLY in prod firstIF NOT EXISTS then makes the migration a no-op (same pattern used to recover the 0030 indexes):

CREATE INDEX CONCURRENTLY IF NOT EXISTS track_routes_owner_title_slug_idx
  ON track_routes (owner_id, title_slug, collision_id);
CREATE INDEX CONCURRENTLY IF NOT EXISTS playlist_routes_owner_title_slug_idx
  ON playlist_routes (owner_id, title_slug, collision_id);

Test plan

  • go build ./pkg/etl/... passes (verified locally)
  • Migration 0031 applies cleanly on a fresh DB and is a no-op when the indexes are pre-built CONCURRENTLY
  • Post-deploy: EXPLAIN on the same query shows an index-only / index scan on *_owner_title_slug_idx instead of the bitmap heap scan
  • Track/Playlist max processing times drop in the indexer log analysis

🤖 Generated with Claude Code

Slug-collision resolution runs `SELECT MAX(collision_id) FROM track_routes
WHERE owner_id = $1 AND title_slug = $2` on every track/playlist create and
rename. The only matching index is the PK (owner_id, slug), so nothing leads
with title_slug and the MAX falls back to scanning every route the owner has.

For prolific owners this is the slow tail: production EXPLAIN on an owner with
15,771 routes shows a Bitmap Heap Scan filtering title_slug across all ~16k
rows (cost ~37,848) on every upload, which matches the multi-second Track
(7.8s) and Playlist (44s) max processing times seen in the indexer logs.

Add (owner_id, title_slug, collision_id) on both route tables so the MAX
becomes an index-only lookup. The collision loop already uses the PK
(owner_id, slug) and normally runs once, so no code change is needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@raymondjacobson raymondjacobson merged commit 43606e8 into main Jun 4, 2026
5 checks passed
@raymondjacobson raymondjacobson deleted the fix/slug-collision-route-index branch June 4, 2026 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant