perf(etl): index route (owner_id, title_slug) for slug-collision lookups#337
Merged
Merged
Conversation
Slug-collision resolution runs `SELECT MAX(collision_id) FROM track_routes WHERE owner_id = $1 AND title_slug = $2` on every track/playlist create and rename. The only matching index is the PK (owner_id, slug), so nothing leads with title_slug and the MAX falls back to scanning every route the owner has. For prolific owners this is the slow tail: production EXPLAIN on an owner with 15,771 routes shows a Bitmap Heap Scan filtering title_slug across all ~16k rows (cost ~37,848) on every upload, which matches the multi-second Track (7.8s) and Playlist (44s) max processing times seen in the indexer logs. Add (owner_id, title_slug, collision_id) on both route tables so the MAX becomes an index-only lookup. The collision loop already uses the PK (owner_id, slug) and normally runs once, so no code change is needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Slug-collision resolution (
GenerateSlugAndCollisionID/GeneratePlaylistSlugAndCollisionID) runs this on every track/playlist create and rename:The only matching index is the PK
(owner_id, slug)— nothing leads withtitle_slug, so theMAXfalls back to scanning every route the owner has.This migration adds
(owner_id, title_slug, collision_id)totrack_routesandplaylist_routes, turning theMAXinto an index-only lookup. No Go change is needed: the collision loop already uses the PK(owner_id, slug)and normally runs once — the cost was entirely in the unindexed aggregate.Evidence
The slow tail showed up in indexer
entity manager indexedlog analysis over an 8h window:Production
EXPLAINon a prolific owner (15,771 routes) confirms the cause — a full per-owner scan on every upload:The Bitmap Heap Scan fetches ~16k rows and filters
title_slugin memory. With the new index this becomes a single index-only seek to the(owner_id, title_slug)group's maxcollision_id.Rollout — avoid the deploy lock
These are plain (non-
CONCURRENT) builds to fit the ETL's single-transaction migration runner. They take anACCESS EXCLUSIVElock while building (track_routes~2M rows / ~493 MB). To avoid blocking route writes on deploy, pre-build themCONCURRENTLYin prod first —IF NOT EXISTSthen makes the migration a no-op (same pattern used to recover the 0030 indexes):Test plan
go build ./pkg/etl/...passes (verified locally)EXPLAINon the same query shows an index-only / index scan on*_owner_title_slug_idxinstead of the bitmap heap scan🤖 Generated with Claude Code