Four long-running sessions from the same Max 5x account (US), spanning March 13 — April 16. Three without the interceptor, one with. All Opus 4.6.
Session comparison
| Metric |
Cache Agent (research) |
Code Agent (ML) |
Sim Agent (weather) |
E3B Agent (nowcast) |
| Duration |
11 days |
21 days |
17 days |
6 days |
| API calls |
4,420 |
7,911 |
6,568 |
467 |
| Cache hit rate |
99.4% |
98.2% |
97.9% |
96.6% |
| Cache creation |
9.4M |
53.1M |
27.6M |
2.4M |
| Cold starts (>100K) |
10 |
117 |
73 |
12 |
| Cold start freq |
1/442 |
1/68 |
1/90 |
1/39 |
| Max cache_read/turn |
~460K |
— |
760K |
283K |
| Interceptor |
Yes |
No |
No |
No |
| Haiku spoofing |
0 |
Not measured |
0 |
0 |
| Synthetic calls |
0 |
0 |
14 |
1 |
Workload descriptions
- Cache Agent: Research, community management, blog writing, issue triage. Text-heavy, small tool results per turn. Running with claude-code-cache-fix interceptor from April 11.
- Code Agent: Heavy ML inference engine development (kanfei weather system). Large file reads, frequent edits, big diffs. No interceptor.
- Sim Agent: Mixed coding + running weather simulations. External processes monitored by cron-driven status checks every 5 minutes. Intermittent: heavy coding bursts → lightweight cron checks → heavy coding. No interceptor.
- E3B Agent: Code-intensive weather inference work. Short session, no interceptor.
Key findings
1. Interceptor impact: 99.4% vs 96-98%
The interceptor (Cache Agent) vs no interceptor (other three) shows a consistent 1-3% cache hit rate improvement. Small in percentage terms, large in token cost — at billions of cache_read tokens, every percentage point is millions of tokens of unnecessary cache creation.
2. Workload type drives cold start frequency
| Agent |
Workload |
Cold start freq |
| Cache Agent |
Research/writing |
1 per 442 calls |
| Sim Agent |
Mixed coding+sim |
1 per 90 calls |
| Code Agent |
Heavy coding |
1 per 68 calls |
| E3B Agent |
Intensive coding |
1 per 39 calls |
Coding workloads bust cache 5-11x more often than research workloads, even on the same account and plan.
3. fallback-percentage: 0.5, invariant
Across 14,000+ metered calls (claude-code-meter telemetry, April 4-16), anthropic-ratelimit-unified-fallback-percentage was 0.5 on every single call. Zero variance. This resolves your monitoring item — on Max 5x US, it does not change over time.
4. Usage pattern as spoofing variable
Sim Agent hit 760K cache_read/turn across a 17-day session with zero Haiku spoofing. This is above the ~500K threshold @fgrosswig identified in his live capture where Haiku bursts appeared at 587K.
The difference appears to be request intensity pattern. Sim Agent's 5-minute cron checks create natural gaps between coding bursts. The server never sees sustained high-volume pressure. fgrosswig's spoofing was captured during an 8-hour continuous intensive session.
We're not discounting the session-length/cache-size theory — it's a factor. But usage pattern (sustained burst vs intermittent) may carry the largest weight in triggering model substitution.
5. Cache hit rate improved over time (all sessions)
| Agent |
Week 1 |
Week 2 |
Week 3 |
| Cache Agent (interceptor) |
99.2% |
99.5% |
99.9% |
| Sim Agent (no interceptor) |
93.1% |
96.3% |
99.1% |
| Code Agent (no interceptor) |
99.5% |
96.9%* |
98.9% |
*Code Agent Week 2 dip coincides with the ~March 6 TTL change window.
Even without the interceptor, sessions that survive long enough show improving cache hit rates as the prefix stabilizes. The interceptor accelerates this stabilization.
Data availability
All telemetry is captured via claude-code-meter (metered window) and CC session JSONLs (full sessions). Happy to share specific subsets if useful for your analysis.
Four long-running sessions from the same Max 5x account (US), spanning March 13 — April 16. Three without the interceptor, one with. All Opus 4.6.
Session comparison
Workload descriptions
Key findings
1. Interceptor impact: 99.4% vs 96-98%
The interceptor (Cache Agent) vs no interceptor (other three) shows a consistent 1-3% cache hit rate improvement. Small in percentage terms, large in token cost — at billions of cache_read tokens, every percentage point is millions of tokens of unnecessary cache creation.
2. Workload type drives cold start frequency
Coding workloads bust cache 5-11x more often than research workloads, even on the same account and plan.
3.
fallback-percentage: 0.5, invariantAcross 14,000+ metered calls (claude-code-meter telemetry, April 4-16),
anthropic-ratelimit-unified-fallback-percentagewas 0.5 on every single call. Zero variance. This resolves your monitoring item — on Max 5x US, it does not change over time.4. Usage pattern as spoofing variable
Sim Agent hit 760K cache_read/turn across a 17-day session with zero Haiku spoofing. This is above the ~500K threshold @fgrosswig identified in his live capture where Haiku bursts appeared at 587K.
The difference appears to be request intensity pattern. Sim Agent's 5-minute cron checks create natural gaps between coding bursts. The server never sees sustained high-volume pressure. fgrosswig's spoofing was captured during an 8-hour continuous intensive session.
We're not discounting the session-length/cache-size theory — it's a factor. But usage pattern (sustained burst vs intermittent) may carry the largest weight in triggering model substitution.
5. Cache hit rate improved over time (all sessions)
*Code Agent Week 2 dip coincides with the ~March 6 TTL change window.
Even without the interceptor, sessions that survive long enough show improving cache hit rates as the prefix stabilizes. The interceptor accelerates this stabilization.
Data availability
All telemetry is captured via claude-code-meter (metered window) and CC session JSONLs (full sessions). Happy to share specific subsets if useful for your analysis.