Independent corroboration: 179K API calls confirm cache_read weight change is the dominant factor

[quota-composition-breakdown.mjs.md](https://github.com/user-attachments/files/26697927/quota-composition-breakdown.mjs.md)
[cache-read-weight-transition.mjs.md](https://github.com/user-attachments/files/26697928/cache-read-weight-transition.mjs.md)
[dual-window-simulation.mjs.md](https://github.com/user-attachments/files/26697930/dual-window-simulation.mjs.md)
[thinking-token-estimation.mjs.md](https://github.com/user-attachments/files/26697932/thinking-token-estimation.mjs.md)
[rates.json.md](https://github.com/user-attachments/files/26697929/rates.json.md)
[tool-use-deep-dive.mjs.md](https://github.com/user-attachments/files/26697931/tool-use-deep-dive.mjs.md)

## Independent corroboration of dual-window findings with 179K API calls (Dec 2025 - Apr 2026)

Your rate-limit header analysis in `02_RATELIMIT-HEADERS.md` identified the dual sliding window system and measured per-1% utilization costs. You noted a key limitation: "No before-data. Proxy started April 4."

We have before-data. 178,009 API calls across 5,605 session files from a Max 20x ($200/mo) account, spanning Dec 23, 2025 to April 11, 2026. We built four analysis scripts to validate your findings against the full transition period.

### Your benchmarks hold up

Our CacheRead per estimated 1% utilization: **1.62M-1.72M tokens**. Your range: 1.5M-2.1M. Dead center.

Our cache_read share of visible tokens: **89.8-95.2%** across months. Your measurement: 96-99%. Close match. The slight gap likely comes from our higher cache_write share in months with more cold starts.

### The cache_read weight change is the whole story

We computed daily quota under two formulas: the old (cache_read = 0x) and the new (cache_read = 1x).

| Month | Quota (0x) | Quota (1x) | Multiplier |
|-------|-----------|-----------|------------|
| Feb | 219.1M | 2,153.6M | 9.8x |
| Mar | 394.2M | 4,838.0M | 12.3x |
| Apr | 150.1M | 2,200.9M | 14.7x |

The multiplier is 10-15x on every single day since December 2025. Cache_read has always been 90-95% of visible token volume. The ratio doesn't jump at a transition date. What changed is whether the quota system *applies* that weight.

Under the old formula, our maximum quota consumption in any 5-hour window across the entire dataset was **21.2M tokens, or 11.8% of the budget**. We would never have been rate-limited. Not in any month.

### 5-hour window simulation confirms the transition

We bucketed all calls into 5-hour tumbling windows using your midpoint (100% = ~180M visible tokens):

| Month | Windows | Avg Util | Max Util | >80% | >100% |
|-------|---------|----------|----------|------|-------|
| Feb | 70 | 17.0% | 57.4% | 0 | 0 |
| Mar | 103 | 27.0% | 123.9% | 7 | 1 |
| Apr | 47 | 23.1% | 156.1% | 1 | 1 |

February: zero windows exceeded 80%. March: seven crossed 80%, one crossed 100%. That lines up with when we first hit rate limits.

Under 0x weight, every one of those windows was fine. The March window that crossed 100% had a quota(0x) of just 15.3M, or 8.5%.

### The thinking token "blind spot" is not a blind spot

You flagged thinking tokens as a major unknown. We can measure them because our JSONL logs contain the actual content blocks.

Thinking blocks appear in 1.9-36.7% of calls. Average size: 112-664 characters (~28-166 tokens). Quota impact:

| Month | Gap Tokens | Gap * 5x Weight | % of Total Quota |
|-------|-----------|----------------|-----------------|
| Feb | 0.01M | 0.03M | 0.0% |
| Mar | 0.06M | 0.29M | 0.0% |
| Apr | 0.49M | 2.44M | 0.1% |

Thinking tokens contribute 0.0-0.1% of total quota. The real hidden output comes from server-side `iterations` (a `usage` field that grew from 7.7% of calls in Feb to 52.4% in Apr, each producing 60x more output tokens). We documented this separately in [anthropics/claude-code#45756](https://github.com/anthropics/claude-code/issues/45756).

### @fgrosswig's 64x reduction in our data

Our heaviest pre-transition day (March 21): 356.2M visible tokens, quota(0x) = 27.4M, quota(1x) = 359.0M. A 13.1x multiplier. On peak days the multiplier reaches 18.9x to 31.0x.

A 64x multiplier requires cache_read at ~98.4% of visible tokens. You measured 96-99%. The numbers fit. Accounts with higher R:W ratios (less cache_write from cold starts) would see multipliers closer to 64x.

### Counterfactual

| Month | Active Days | Days > 180M quota(0x) | Days > 180M quota(1x) |
|-------|------------|----------------------|----------------------|
| Feb | 26 | 0 | 1 |
| Mar | 31 | 0 | 11 |
| Apr | 11 | 0 | 6 |

Zero days exceed budget under the old formula. Eighteen exceed it under the new formula.

### Scripts

All four scripts read the same 5,605 session files. They're in our analysis repo if you want to cross-reference:

- `quota-composition-breakdown.mjs` - Token composition under 0x vs 1x weight, ArkNill benchmark comparison
- `cache-read-weight-transition.mjs` - Daily/weekly quota under both regimes, transition detection, counterfactual
- `dual-window-simulation.mjs` - 5h tumbling window simulation, utilization distributions, 7d rolling peaks
- `thinking-token-estimation.mjs` - Thinking block prevalence, output gap analysis, blind spot cross-reference

**Dataset**: 178,009 non-synthetic API calls, 5,605 session files, Dec 23, 2025 to Apr 11, 2026, Max 20x ($200/mo), two machines (Linux workstation + Windows laptop).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Independent corroboration: 179K API calls confirm cache_read weight change is the dominant factor #3

Independent corroboration of dual-window findings with 179K API calls (Dec 2025 - Apr 2026)

Your benchmarks hold up

The cache_read weight change is the whole story

5-hour window simulation confirms the transition

The thinking token "blind spot" is not a blind spot

@fgrosswig's 64x reduction in our data

Counterfactual

Scripts

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Month	Quota (0x)	Quota (1x)	Multiplier
Feb	219.1M	2,153.6M	9.8x
Mar	394.2M	4,838.0M	12.3x
Apr	150.1M	2,200.9M	14.7x

Independent corroboration: 179K API calls confirm cache_read weight change is the dominant factor #3

Description

Independent corroboration of dual-window findings with 179K API calls (Dec 2025 - Apr 2026)

Your benchmarks hold up

The cache_read weight change is the whole story

5-hour window simulation confirms the transition

The thinking token "blind spot" is not a blind spot

@fgrosswig's 64x reduction in our data

Counterfactual

Scripts

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions