Data contribution: 4-session workload comparison + fallback-percentage stability + usage-pattern spoofing variable

Four long-running sessions from the same Max 5x account (US), spanning March 13 — April 16. Three without the interceptor, one with. All Opus 4.6.

## Session comparison

| Metric | Cache Agent (research) | Code Agent (ML) | Sim Agent (weather) | E3B Agent (nowcast) |
|---|---|---|---|---|
| Duration | 11 days | 21 days | 17 days | 6 days |
| API calls | 4,420 | 7,911 | 6,568 | 467 |
| Cache hit rate | 99.4% | 98.2% | 97.9% | 96.6% |
| Cache creation | 9.4M | 53.1M | 27.6M | 2.4M |
| Cold starts (>100K) | 10 | 117 | 73 | 12 |
| Cold start freq | 1/442 | 1/68 | 1/90 | 1/39 |
| Max cache_read/turn | ~460K | — | 760K | 283K |
| Interceptor | Yes | No | No | No |
| Haiku spoofing | 0 | Not measured | 0 | 0 |
| Synthetic calls | 0 | 0 | 14 | 1 |

### Workload descriptions

- **Cache Agent**: Research, community management, blog writing, issue triage. Text-heavy, small tool results per turn. Running with [claude-code-cache-fix](https://github.com/cnighswonger/claude-code-cache-fix) interceptor from April 11.
- **Code Agent**: Heavy ML inference engine development (kanfei weather system). Large file reads, frequent edits, big diffs. No interceptor.
- **Sim Agent**: Mixed coding + running weather simulations. External processes monitored by cron-driven status checks every 5 minutes. Intermittent: heavy coding bursts → lightweight cron checks → heavy coding. No interceptor.
- **E3B Agent**: Code-intensive weather inference work. Short session, no interceptor.

## Key findings

### 1. Interceptor impact: 99.4% vs 96-98%

The interceptor (Cache Agent) vs no interceptor (other three) shows a consistent 1-3% cache hit rate improvement. Small in percentage terms, large in token cost — at billions of cache_read tokens, every percentage point is millions of tokens of unnecessary cache creation.

### 2. Workload type drives cold start frequency

| Agent | Workload | Cold start freq |
|---|---|---|
| Cache Agent | Research/writing | 1 per 442 calls |
| Sim Agent | Mixed coding+sim | 1 per 90 calls |
| Code Agent | Heavy coding | 1 per 68 calls |
| E3B Agent | Intensive coding | 1 per 39 calls |

Coding workloads bust cache 5-11x more often than research workloads, even on the same account and plan.

### 3. `fallback-percentage`: 0.5, invariant

Across 14,000+ metered calls (claude-code-meter telemetry, April 4-16), `anthropic-ratelimit-unified-fallback-percentage` was **0.5 on every single call**. Zero variance. This resolves your monitoring item — on Max 5x US, it does not change over time.

### 4. Usage pattern as spoofing variable

Sim Agent hit **760K cache_read/turn** across a **17-day session** with **zero Haiku spoofing**. This is above the ~500K threshold @fgrosswig identified in his live capture where Haiku bursts appeared at 587K.

The difference appears to be request intensity pattern. Sim Agent's 5-minute cron checks create natural gaps between coding bursts. The server never sees sustained high-volume pressure. fgrosswig's spoofing was captured during an 8-hour continuous intensive session.

We're not discounting the session-length/cache-size theory — it's a factor. But usage pattern (sustained burst vs intermittent) may carry the largest weight in triggering model substitution.

### 5. Cache hit rate improved over time (all sessions)

| Agent | Week 1 | Week 2 | Week 3 |
|---|---|---|---|
| Cache Agent (interceptor) | 99.2% | 99.5% | 99.9% |
| Sim Agent (no interceptor) | 93.1% | 96.3% | 99.1% |
| Code Agent (no interceptor) | 99.5% | 96.9%* | 98.9% |

*Code Agent Week 2 dip coincides with the ~March 6 TTL change window.

Even without the interceptor, sessions that survive long enough show improving cache hit rates as the prefix stabilizes. The interceptor accelerates this stabilization.

## Data availability

All telemetry is captured via [claude-code-meter](https://github.com/cnighswonger/claude-code-meter) (metered window) and CC session JSONLs (full sessions). Happy to share specific subsets if useful for your analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data contribution: 4-session workload comparison + fallback-percentage stability + usage-pattern spoofing variable #4

Session comparison

Workload descriptions

Key findings

1. Interceptor impact: 99.4% vs 96-98%

2. Workload type drives cold start frequency

3. `fallback-percentage`: 0.5, invariant

4. Usage pattern as spoofing variable

5. Cache hit rate improved over time (all sessions)

Data availability

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metric	Cache Agent (research)	Code Agent (ML)	Sim Agent (weather)	E3B Agent (nowcast)
Duration	11 days	21 days	17 days	6 days
API calls	4,420	7,911	6,568	467
Cache hit rate	99.4%	98.2%	97.9%	96.6%
Cache creation	9.4M	53.1M	27.6M	2.4M
Cold starts (>100K)	10	117	73	12
Cold start freq	1/442	1/68	1/90	1/39
Max cache_read/turn	~460K	—	760K	283K
Interceptor	Yes	No	No	No
Haiku spoofing	0	Not measured	0	0
Synthetic calls	0	0	14	1

Agent	Workload	Cold start freq
Cache Agent	Research/writing	1 per 442 calls
Sim Agent	Mixed coding+sim	1 per 90 calls
Code Agent	Heavy coding	1 per 68 calls
E3B Agent	Intensive coding	1 per 39 calls

Agent	Week 1	Week 2	Week 3
Cache Agent (interceptor)	99.2%	99.5%	99.9%
Sim Agent (no interceptor)	93.1%	96.3%	99.1%
Code Agent (no interceptor)	99.5%	96.9%*	98.9%

Data contribution: 4-session workload comparison + fallback-percentage stability + usage-pattern spoofing variable #4

Description

Session comparison

Workload descriptions

Key findings

1. Interceptor impact: 99.4% vs 96-98%

2. Workload type drives cold start frequency

3. fallback-percentage: 0.5, invariant

4. Usage pattern as spoofing variable

5. Cache hit rate improved over time (all sessions)

Data availability

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

3. `fallback-percentage`: 0.5, invariant