Skip to content

Data contribution: 4-session workload comparison + fallback-percentage stability + usage-pattern spoofing variable #4

@cnighswonger

Description

@cnighswonger

Four long-running sessions from the same Max 5x account (US), spanning March 13 — April 16. Three without the interceptor, one with. All Opus 4.6.

Session comparison

Metric Cache Agent (research) Code Agent (ML) Sim Agent (weather) E3B Agent (nowcast)
Duration 11 days 21 days 17 days 6 days
API calls 4,420 7,911 6,568 467
Cache hit rate 99.4% 98.2% 97.9% 96.6%
Cache creation 9.4M 53.1M 27.6M 2.4M
Cold starts (>100K) 10 117 73 12
Cold start freq 1/442 1/68 1/90 1/39
Max cache_read/turn ~460K 760K 283K
Interceptor Yes No No No
Haiku spoofing 0 Not measured 0 0
Synthetic calls 0 0 14 1

Workload descriptions

  • Cache Agent: Research, community management, blog writing, issue triage. Text-heavy, small tool results per turn. Running with claude-code-cache-fix interceptor from April 11.
  • Code Agent: Heavy ML inference engine development (kanfei weather system). Large file reads, frequent edits, big diffs. No interceptor.
  • Sim Agent: Mixed coding + running weather simulations. External processes monitored by cron-driven status checks every 5 minutes. Intermittent: heavy coding bursts → lightweight cron checks → heavy coding. No interceptor.
  • E3B Agent: Code-intensive weather inference work. Short session, no interceptor.

Key findings

1. Interceptor impact: 99.4% vs 96-98%

The interceptor (Cache Agent) vs no interceptor (other three) shows a consistent 1-3% cache hit rate improvement. Small in percentage terms, large in token cost — at billions of cache_read tokens, every percentage point is millions of tokens of unnecessary cache creation.

2. Workload type drives cold start frequency

Agent Workload Cold start freq
Cache Agent Research/writing 1 per 442 calls
Sim Agent Mixed coding+sim 1 per 90 calls
Code Agent Heavy coding 1 per 68 calls
E3B Agent Intensive coding 1 per 39 calls

Coding workloads bust cache 5-11x more often than research workloads, even on the same account and plan.

3. fallback-percentage: 0.5, invariant

Across 14,000+ metered calls (claude-code-meter telemetry, April 4-16), anthropic-ratelimit-unified-fallback-percentage was 0.5 on every single call. Zero variance. This resolves your monitoring item — on Max 5x US, it does not change over time.

4. Usage pattern as spoofing variable

Sim Agent hit 760K cache_read/turn across a 17-day session with zero Haiku spoofing. This is above the ~500K threshold @fgrosswig identified in his live capture where Haiku bursts appeared at 587K.

The difference appears to be request intensity pattern. Sim Agent's 5-minute cron checks create natural gaps between coding bursts. The server never sees sustained high-volume pressure. fgrosswig's spoofing was captured during an 8-hour continuous intensive session.

We're not discounting the session-length/cache-size theory — it's a factor. But usage pattern (sustained burst vs intermittent) may carry the largest weight in triggering model substitution.

5. Cache hit rate improved over time (all sessions)

Agent Week 1 Week 2 Week 3
Cache Agent (interceptor) 99.2% 99.5% 99.9%
Sim Agent (no interceptor) 93.1% 96.3% 99.1%
Code Agent (no interceptor) 99.5% 96.9%* 98.9%

*Code Agent Week 2 dip coincides with the ~March 6 TTL change window.

Even without the interceptor, sessions that survive long enough show improving cache hit rates as the prefix stabilizes. The interceptor accelerates this stabilization.

Data availability

All telemetry is captured via claude-code-meter (metered window) and CC session JSONLs (full sessions). Happy to share specific subsets if useful for your analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions