Redis-backed application cache for hot read paths by ptrlrd · Pull Request #388 · ptrlrd/spire-codex

ptrlrd · 2026-06-01T06:07:21Z

Run uploads went from ~9k/week to ~130k/week. Every uncached read is amplifying Mongo load proportionally. PR #385 added a Mongo-backed lazy cache (stats_summary) which handles the worst-case "first user pays 5-10 s" path; this PR puts an even faster, configurable, cross-endpoint layer in front of it and the other hot reads.

What

A fail-safe Redis layer (services/cache.py) plus wiring into the four hottest read paths.

Cache substrate

services/cache.py — lazy redis-py client, get_json / set_json / delete / delete_pattern (SCAN-based). Every operation is fail-safe: Redis being unavailable returns None on get and no-ops on set, so the caller falls back to its existing data source. The cache is always an optimization, never load-bearing.
metrics.py — spire_codex_cache_hits_total / _misses_total / _errors_total, labeled by key namespace (stats, leaderboard, run, entity_scores).

Wired in

Endpoint	Layer 0	TTL	Why
`/api/runs/stats`	Redis check before `stats_summary`	60s	Matches refresher cycle; cluster-wide instead of per-worker
`/api/runs/leaderboard`	Redis check before `leaderboard_summary`	60s	Same reasoning; covers today / paginated combos that aren't in `leaderboard_summary`
`/api/runs/shared/{hash}`	Redis check before disk	15min	Runs are immutable, but I don't want every viewed run squatting in cache forever as the collection grows. 15min absorbs share-link bursts on a hot URL and lets cold runs drop off the LRU
`/api/runs/scores/{entity_type}`	Redis check before snapshot read	5min	Hit constantly by tier-list + detail-page sort columns

Infra

redis:7-alpine service in docker-compose.yml (dev), docker-compose.beta.yml, docker-compose.prod.yml. allkeys-lru eviction, AOF off (rebuildable from Mongo on miss), maxmemory 512 MB in prod / 128 MB in beta+dev. Healthcheck in prod.
REDIS_URL env var passed into the backend in each compose, with sensible defaults that point at the bundled redis service. When unset, every cache call no-ops -- the existing data paths run unchanged.
redis==5.2.1 added to requirements.txt.

Operational story

Redis down or unreachable → all reads miss, all writes no-op. Every endpoint still works, just at pre-PR latency. No 500s.
Cache cold after restart / deploy → the existing Mongo materialization (stats_summary, leaderboard_summary, entity_stats_snapshot) is still warm, so first requests hit ~ms (Mongo find_one) rather than the live aggregation. Redis warms up naturally on traffic.
Memory pressure → allkeys-lru evicts oldest keys; cap protects the host.

What this unlocks for the next PR

Adding a new cache target is app_cache.get_json(key) / app_cache.set_json(key, val, ttl_seconds=N) -- no additional infrastructure. Obvious next wins:

/api/cards/*, /api/relics/*, /api/potions/* list responses (high QPS, deterministic per (entity, lang) between deploys; could go on a multi-hour TTL with a deploy-time delete_pattern).
/api/auth/me (one Mongo find_one per request becomes one Redis GET; cookie-keyed).
slowapi rate-limit storage so limits become cluster-wide and survive worker restarts (slowapi[redis] + a config line).

Compose deploy note

When this lands, the prod compose changes (Redis service + REDIS_URL passthrough) require a docker compose -f docker-compose.prod.yml up -d on the box to bring up the new spire-codex-redis container. The backend image alone won't pick up Redis until the compose is re-applied.

130k uploads/week from 9k a week earlier means every uncached query is amplifying Mongo load. This adds a Redis layer (per-namespace, fail-safe, opt-in via REDIS_URL) and wires it into the four hottest read paths. Cache substrate - backend/app/services/cache.py: lazy redis-py client, JSON+raw helpers, glob-pattern invalidation via SCAN. Every operation is fail-safe -- Redis being unavailable returns None on get / no-ops on set, so the caller falls back to its existing data source. The cache is always an optimization, never load-bearing. - backend/app/metrics.py: Prometheus hit/miss/error counters keyed on the key namespace (stats / leaderboard / run / entity_scores). Wired in - /api/runs/stats: layer 0 ahead of stats_summary; same write-through shape we already use for the lazy stats materialization. 60s TTL. - /api/runs/leaderboard: 60s TTL matching the leader-refresher cycle. - /api/runs/shared/{hash}: 6h TTL; runs are immutable once submitted so this turns share-link scrapes into pure Redis reads. - /api/runs/scores/{entity_type}: 5m TTL; tier-list / detail-page sort columns hit this constantly between snapshot rebuilds. Infra - redis service in dev / beta / prod compose. allkeys-lru eviction, AOF off (rebuildable from Mongo on miss), 512m cap in prod / 128m in beta+dev. Healthcheck on prod. - REDIS_URL passed into the backend service in each compose. Empty default in dev is fine -- the client init no-ops. Forward path Adding a new cache target is `app_cache.get_json(key)` / `app_cache.set_json(key, val, ttl_seconds=N)` -- no further infrastructure work. Next obvious wins: /api/cards/* and /api/relics/* list responses (high QPS, deterministic per (entity, lang) between deploys), /api/auth/me (one find_one per request becomes one Redis GET), and slowapi rate-limit storage so limits become cluster-wide.

Per follow-up on #388: runs are immutable, but caching every run that's ever been viewed for 6h means the runs collection growth pulls Redis memory along with it. 15min absorbs Discord share-link bursts onto a single URL (which is the main reuse pattern), keeps username claims/renames propagating quickly, and lets cold runs naturally drop off the LRU instead of squatting.

ptrlrd added 2 commits May 31, 2026 23:07

ptrlrd mentioned this pull request Jun 1, 2026

Lift per-entity stats caps to cover the full game catalog #393

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis-backed application cache for hot read paths#388

Redis-backed application cache for hot read paths#388
ptrlrd wants to merge 2 commits into
mainfrom
feat/redis-cache

ptrlrd commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ptrlrd commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Cache substrate

Wired in

Infra

Operational story

What this unlocks for the next PR

Compose deploy note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ptrlrd commented Jun 1, 2026 •

edited

Loading