feat(memory): single authoritative token accounting for compaction (fire + boundary)#1720
Draft
mattzcarey wants to merge 5 commits into
Draft
feat(memory): single authoritative token accounting for compaction (fire + boundary)#1720mattzcarey wants to merge 5 commits into
mattzcarey wants to merge 5 commits into
Conversation
…nction A usage-style counter (() => usage.inputTokens, ignoring its arguments) previously returned the same huge value for every message in the tail walk, degrading tailTokenBudget to minTailMessages. Probe the counter with zero messages to detect its scope: message-scoped counters are used per-message as before; whole-prompt counters are called once over the full history and the total calibrates the built-in heuristic, so the budget is honored at the model's scale with O(1) counter calls. Fixes #1593
🦋 Changeset detectedLatest commit: 8e6cf35 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
agents
@cloudflare/ai-chat
@cloudflare/codemode
create-think
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
Contributor
Author
|
not convinced by this |
Replace the absolute empty-probe threshold (8 tokens) with a two-point probe that classifies by whether the counter responds to its input: - counter([]) throws or returns <= 0: message-scoped by construction, used per-message as before. - counter([]) > 0: probe counter(history). If it grew by at least half the heuristic estimate, it is a message-scoped counter with fixed per-call overhead (chat-template priming, baked-in system prompt) — use it per-message minus that overhead instead of misclassifying it and discarding its accuracy. If it stayed flat, it is whole-prompt / usage-style — calibrate the heuristic by total/heuristic as before. Matches the design principle in hermes-agent (context_compressor) and earendil-works/pi (compaction.ts): model-reported totals are only ever a whole-prompt signal distributed proportionally over a per-message heuristic; per-message accuracy is kept when the counter actually has it.
…ountTokens Zero-config path mirroring earendil-works/pi's compaction design: - SessionMessage.metadata (AI SDK UIMessage-compatible). When an assistant message carries model-reported usage (metadata.usage / metadata.totalUsage), the Session uses it automatically: the compactAfter fire decision becomes last-usage + heuristic for trailing messages (pi's estimateContextTokens), and compact() flows the same total to the boundary walk via CompactContext.contextTokens, which calibrates the built-in heuristic to the model's scale. No tokenCounter configuration needed. - calculateContextTokens mirrors pi: totalTokens || sum of components (input/output/cached, camelCase and OpenAI-style names). - New CompactOptions.countTokens: strictly message-scoped counter used directly by the tail walk, no shape detection; wins over tokenCounter. - Counter resolution priority: countTokens > tokenCounter (two-point probe) > usage-metadata calibration > raw heuristic. Changeset bumped to minor.
Strip the probe machinery. The model is: one authoritative context size in model tokens (compactAfter counter > usage metadata > none), computed once in Session._knownContextTokens and used for BOTH the fire decision and the boundary walk (CompactContext.contextTokens), which calibrates the chars/4 heuristic. The boundary walk never calls a whole-prompt counter per-message — the cause of #1593 — and never probes anything. - delete resolveTailCounter, MESSAGE_SCOPED_MIN_GROWTH, countTokens - CompactContext.tokenCounter removed; contextTokens is the only flow - CompactOptions.tokenCounter documented strictly message-scoped - tests rewritten to pin the simple contracts: whole-prompt counters only ever see the full history; explicit counters only ever see one message; contextTokens calibrates; usage metadata is zero-config
Contributor
Author
|
hmmm still sloppy. |
…tgun A whole-prompt counter passed there silently degraded tailTokenBudget to minTailMessages (the #1593 trap). pi has no per-message user counter at all; the calibrated heuristic covers the walk. With the counter gone the calibration also simplifies: scaling every per-message estimate by contextTokens/heuristic is identical to dividing the budget by it, so the walk is now always plain findTailCutByTokens against tailBudgetInHeuristicUnits(). CompactTokenCounter and findTailCutByTokensWithCounter are deleted. Docs updated: docs/sessions.md (single token-counting model, usage metadata example, new utils), docs/think/index.md pointer, experimental/session-memory README.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1593
Design
The Session derives one authoritative context size in model tokens, and that single number drives both decisions: whether to compact, and where to cut. Because both decisions read the same number, they cannot disagree (the #1593 failure was exactly such a disagreement).
This is how earendil-works/pi does it (
core/compaction/compaction.ts), and hermes-agent (agent/context_compressor.py) follows the same rule: model-reported usage is a whole-prompt signal, per-message decisions use a cheap proportional heuristic, and the two scales never mix.Counting
Session._knownContextTokens()resolves in priority order:compactAfter()counter, if configured. It is whole-prompt by signature, so() => lastUsage.inputTokensis legal and documented.SessionMessagegainedmetadata?: unknown(an AI SDKUIMessagepasses straight through), andmetadata.usage/metadata.totalUsageis read the way pi reads it:calculateContextTokensreturnstotalTokenswhen present, else the sum of the components. The context estimate is the last reported usage plus the heuristic for any newer messages. When messages carry usage, configuration is one line:Compaction
The same total flows to
createCompactFunctionasCompactContext.contextTokens. The tail walk is always the plain heuristic walk; when an authoritative total is known, the budget is converted into heuristic units (tailTokenBudget * heuristic / contextTokens, equivalent to scaling every per-message estimate up to the model's scale). The walk's distribution is unchanged and the budget means model tokens.That is the whole #1593 fix. Previously the Session's whole-prompt counter was adapted into per-message calls, so every message appeared to exceed the budget and
tailTokenBudgetsilently degraded tominTailMessages.There are no user counters in the boundary walk at all. pi has no per-message counter either; it is the footgun by which #1593 happened, so it is gone rather than documented against. Each remaining lever means one thing:
compactAfter(threshold, { tokenCounter })metadata.usage/totalUsageprotectHead/tailTokenBudget/minTailMessagesBreaking changes in the experimental package:
CompactOptions.tokenCounter, theCompactTokenCountertype, andCompactContext.tokenCounterare removed. The changeset isminor.Docs
docs/sessions.mdgets a single "Token counting" section (priority order, usage-metadata example, new utils in the helper list),docs/think/index.mdpoints at it, andexperimental/session-memory/README.mddocuments the priority order under Auto-Compaction.Tests
The tests assert the call contracts directly:
toMessageId: "tool-5", not theminTailMessagesdegradation).createCompactFunctionno-ops on the raw heuristic and cuts correctly once the context suppliescontextTokens.compactAfter(1000)with no counter, and usage attached to the appended assistant message, fires and cuts at the calibrated boundary.calculateContextTokenshandles the totalTokens, input+output, prompt+completion and cached shapes. The newest assistant usage wins, usage on non-assistant messages is ignored, and no usage returns null.contextTokensis ignored and the raw heuristic applies.Full experimental-memory suite: 193 passed (7 files). oxfmt and oxlint are clean, and the memory module has no typecheck errors.
Notes for maintainers
compactAfter()as a whole-prompt counter. Per-message precision in the walk bought little once the budget is correct at the model scale, and it was the only place the two token scales could be confused.