feat(memory): single authoritative token accounting for compaction (fire + boundary) by mattzcarey · Pull Request #1720 · cloudflare/agents

mattzcarey · 2026-06-09T20:38:14Z

Design

The Session derives one authoritative context size in model tokens, and that single number drives both decisions: whether to compact, and where to cut. Because both decisions read the same number, they cannot disagree (the #1593 failure was exactly such a disagreement).

This is how earendil-works/pi does it (core/compaction/compaction.ts), and hermes-agent (agent/context_compressor.py) follows the same rule: model-reported usage is a whole-prompt signal, per-message decisions use a cheap proportional heuristic, and the two scales never mix.

Counting

Session._knownContextTokens() resolves in priority order:

The compactAfter() counter, if configured. It is whole-prompt by signature, so () => lastUsage.inputTokens is legal and documented.
Usage metadata on assistant messages. SessionMessage gained metadata?: unknown (an AI SDK UIMessage passes straight through), and metadata.usage / metadata.totalUsage is read the way pi reads it: calculateContextTokens returns totalTokens when present, else the sum of the components. The context estimate is the last reported usage plus the heuristic for any newer messages. When messages carry usage, configuration is one line:
```
session.compactAfter(100_000);
```
Neither. The chars/4 heuristic applies, as before.

Compaction

The same total flows to createCompactFunction as CompactContext.contextTokens. The tail walk is always the plain heuristic walk; when an authoritative total is known, the budget is converted into heuristic units (tailTokenBudget * heuristic / contextTokens, equivalent to scaling every per-message estimate up to the model's scale). The walk's distribution is unchanged and the budget means model tokens.

That is the whole #1593 fix. Previously the Session's whole-prompt counter was adapted into per-message calls, so every message appeared to exceed the budget and tailTokenBudget silently degraded to minTailMessages.

There are no user counters in the boundary walk at all. pi has no per-message counter either; it is the footgun by which #1593 happened, so it is gone rather than documented against. Each remaining lever means one thing:

Lever	Meaning
`compactAfter(threshold, { tokenCounter })`	whole-prompt total (fire + calibration)
assistant `metadata.usage` / `totalUsage`	same, zero config
`protectHead` / `tailTokenBudget` / `minTailMessages`	tunables, all in model tokens

Breaking changes in the experimental package: CompactOptions.tokenCounter, the CompactTokenCounter type, and CompactContext.tokenCounter are removed. The changeset is minor.

Docs

docs/sessions.md gets a single "Token counting" section (priority order, usage-metadata example, new utils in the helper list), docs/think/index.md points at it, and experimental/session-memory/README.md documents the priority order under Auto-Compaction.

Tests

The tests assert the call contracts directly:

The Auto-compaction trigger fires, but compaction doesn't run due to internal token counting #1593 end-to-end repro checks that the whole-prompt counter only ever sees the full history, and that the boundary lands at the model-scale budget (toMessageId: "tool-5", not the minTailMessages degradation).
The same createCompactFunction no-ops on the raw heuristic and cuts correctly once the context supplies contextTokens.
compactAfter(1000) with no counter, and usage attached to the appended assistant message, fires and cuts at the calibrated boundary.
calculateContextTokens handles the totalTokens, input+output, prompt+completion and cached shapes. The newest assistant usage wins, usage on non-assistant messages is ignored, and no usage returns null.
A non-finite or negative contextTokens is ignored and the raw heuristic applies.

Full experimental-memory suite: 193 passed (7 files). oxfmt and oxlint are clean, and the memory module has no typecheck errors.

Notes for maintainers

Usage extraction only reads assistant messages and only accepts positive token counts. The component-sum fallback mirrors pi and can over-count caches on providers where cached tokens are a subset of input tokens, which errs toward compacting earlier.
Calibration assumes the model total distributes across messages roughly in proportion to the heuristic. pi and hermes make the same assumption for trailing estimates, and it is strictly better than the previous behavior, which used an identical value for every message.
Users who need a custom tokenizer still have a path: put it on compactAfter() as a whole-prompt counter. Per-message precision in the walk bought little once the budget is correct at the model scale, and it was the only place the two token scales could be confused.

…nction A usage-style counter (() => usage.inputTokens, ignoring its arguments) previously returned the same huge value for every message in the tail walk, degrading tailTokenBudget to minTailMessages. Probe the counter with zero messages to detect its scope: message-scoped counters are used per-message as before; whole-prompt counters are called once over the full history and the total calibrates the built-in heuristic, so the budget is honored at the model's scale with O(1) counter calls. Fixes #1593

changeset-bot · 2026-06-09T20:38:20Z

🦋 Changeset detected

Latest commit: 8e6cf35

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
agents	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

pkg-pr-new · 2026-06-09T20:44:24Z

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1720

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1720

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1720

create-think

npm i https://pkg.pr.new/create-think@1720

hono-agents

npm i https://pkg.pr.new/hono-agents@1720

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1720

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1720

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1720

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1720

commit: 8e6cf35

mattzcarey · 2026-06-09T21:18:08Z

not convinced by this

Replace the absolute empty-probe threshold (8 tokens) with a two-point probe that classifies by whether the counter responds to its input: - counter([]) throws or returns <= 0: message-scoped by construction, used per-message as before. - counter([]) > 0: probe counter(history). If it grew by at least half the heuristic estimate, it is a message-scoped counter with fixed per-call overhead (chat-template priming, baked-in system prompt) — use it per-message minus that overhead instead of misclassifying it and discarding its accuracy. If it stayed flat, it is whole-prompt / usage-style — calibrate the heuristic by total/heuristic as before. Matches the design principle in hermes-agent (context_compressor) and earendil-works/pi (compaction.ts): model-reported totals are only ever a whole-prompt signal distributed proportionally over a per-message heuristic; per-message accuracy is kept when the counter actually has it.

…ountTokens Zero-config path mirroring earendil-works/pi's compaction design: - SessionMessage.metadata (AI SDK UIMessage-compatible). When an assistant message carries model-reported usage (metadata.usage / metadata.totalUsage), the Session uses it automatically: the compactAfter fire decision becomes last-usage + heuristic for trailing messages (pi's estimateContextTokens), and compact() flows the same total to the boundary walk via CompactContext.contextTokens, which calibrates the built-in heuristic to the model's scale. No tokenCounter configuration needed. - calculateContextTokens mirrors pi: totalTokens || sum of components (input/output/cached, camelCase and OpenAI-style names). - New CompactOptions.countTokens: strictly message-scoped counter used directly by the tail walk, no shape detection; wins over tokenCounter. - Counter resolution priority: countTokens > tokenCounter (two-point probe) > usage-metadata calibration > raw heuristic. Changeset bumped to minor.

Strip the probe machinery. The model is: one authoritative context size in model tokens (compactAfter counter > usage metadata > none), computed once in Session._knownContextTokens and used for BOTH the fire decision and the boundary walk (CompactContext.contextTokens), which calibrates the chars/4 heuristic. The boundary walk never calls a whole-prompt counter per-message — the cause of #1593 — and never probes anything. - delete resolveTailCounter, MESSAGE_SCOPED_MIN_GROWTH, countTokens - CompactContext.tokenCounter removed; contextTokens is the only flow - CompactOptions.tokenCounter documented strictly message-scoped - tests rewritten to pin the simple contracts: whole-prompt counters only ever see the full history; explicit counters only ever see one message; contextTokens calibrates; usage metadata is zero-config

mattzcarey · 2026-06-09T22:00:21Z

hmmm still sloppy.

…tgun A whole-prompt counter passed there silently degraded tailTokenBudget to minTailMessages (the #1593 trap). pi has no per-message user counter at all; the calibrated heuristic covers the walk. With the counter gone the calibration also simplifies: scaling every per-message estimate by contextTokens/heuristic is identical to dividing the budget by it, so the walk is now always plain findTailCutByTokens against tailBudgetInHeuristicUnits(). CompactTokenCounter and findTailCutByTokensWithCounter are deleted. Docs updated: docs/sessions.md (single token-counting model, usage metadata example, new utils), docs/think/index.md pointer, experimental/session-memory README.

devin-ai-integration Bot reviewed Jun 9, 2026

View reviewed changes

mattzcarey marked this pull request as draft June 9, 2026 21:18

mattzcarey marked this pull request as ready for review June 9, 2026 21:32

mattzcarey changed the title ~~fix(memory): calibrate whole-prompt token counters in createCompactFunction~~ feat(memory): usage-metadata token accounting, explicit countTokens, and foolproof token counters Jun 9, 2026

mattzcarey marked this pull request as draft June 9, 2026 21:50

mattzcarey changed the title ~~feat(memory): usage-metadata token accounting, explicit countTokens, and foolproof token counters~~ feat(memory): single authoritative token accounting for compaction (fire + boundary) Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): single authoritative token accounting for compaction (fire + boundary)#1720

feat(memory): single authoritative token accounting for compaction (fire + boundary)#1720
mattzcarey wants to merge 5 commits into
mainfrom
fix/1593-autocompaction-calibration

mattzcarey commented Jun 9, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

pkg-pr-new Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

mattzcarey commented Jun 9, 2026

Uh oh!

mattzcarey commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mattzcarey commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design

Counting

Compaction

Docs

Tests

Notes for maintainers

Uh oh!

changeset-bot Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

pkg-pr-new Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattzcarey commented Jun 9, 2026

Uh oh!

mattzcarey commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mattzcarey commented Jun 9, 2026 •

edited

Loading

changeset-bot Bot commented Jun 9, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 9, 2026 •

edited

Loading