Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .changeset/uncached-input-tokens.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
'@electric-ax/agents-server-ui': patch
'@electric-ax/agents-runtime': patch
---

Show only uncached input tokens in the per-response token usage label.

The input side previously summed `input + cacheRead + cacheWrite`, so
on warm-cache turns the meta row re-counted the entire conversation on
every step and ballooned into a cumulative number that said nothing
about the work the response actually did. The adapter now surfaces the
uncached side only — fresh prompt tokens plus cache writes, with
prompt-cache reads excluded. (`cacheWrite` is counted because
cache-enabled providers report newly appended prompt tokens there,
with `input` collapsing to ~0.)

Steps recorded before this change keep their stored cache-inclusive
totals — both step fields are optional and the display just sums
what's persisted, so no migration is needed.
3 changes: 3 additions & 0 deletions packages/agents-runtime/src/entity-schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,9 @@ type StepValue = {
// end-of-message `usage` payload. Populated on `onStepEnd` when the
// adapter has the data — older events without these fields stay
// valid (both optional), so this is a strictly additive change.
// `input_tokens` is the *uncached* input side (fresh tokens plus
// cache writes; cache reads excluded) — the cache-inclusive total
// would re-count the whole conversation on every step.
input_tokens?: number
output_tokens?: number
}
Expand Down
6 changes: 4 additions & 2 deletions packages/agents-runtime/src/entity-timeline.ts
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,10 @@ export type EntityTimelineSection =
done?: true
error?: string
// Summed across all steps of the run that produced this section.
// Either side may be missing if the provider didn't report it
// (e.g. older events recorded before tokens were persisted).
// `input` is the uncached side only (fresh tokens + cache writes)
// — see `StepValue.input_tokens`. Either side may be missing if
// the provider didn't report it (e.g. older events recorded
// before tokens were persisted).
tokens?: {
input?: number
output?: number
Expand Down
3 changes: 3 additions & 0 deletions packages/agents-runtime/src/outbound-bridge.ts
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,9 @@ export interface OutboundBridge {
onStepStart: (opts?: { modelProvider?: string; modelId?: string }) => void
onStepEnd: (opts?: {
finishReason?: string
// Uncached input side only (fresh prompt tokens + cache writes;
// prompt-cache *reads* excluded) — the cache-inclusive total would
// re-count the whole conversation on every warm-cache step.
tokenInput?: number
tokenOutput?: number
durationMs?: number
Expand Down
31 changes: 16 additions & 15 deletions packages/agents-runtime/src/pi-adapter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -379,19 +379,24 @@ export function createPiAgentAdapter(
// `cacheRead` (prompt-cache hits — typically the
// system prompt + prior history once the cache is
// warm) and `cacheWrite` (tokens added to the cache
// this turn). What the user wants in the meta row is
// the total prompt volume the model actually saw, so
// we sum every side that arrived as a number. Reading
// only `usage.input` undercounts massively on second+
// turns where most of the prompt hits the cache and
// `usage.input` collapses to a handful of tokens.
// this turn). The meta row shows the *uncached* input
// — `input + cacheWrite` — i.e. the new prompt work
// this step did. `cacheRead` is deliberately excluded:
// it re-counts the entire conversation on every warm
// turn, so including it balloons the label into a
// cumulative number that says nothing about this
// response. `cacheWrite` IS counted: cache-enabled
// providers report newly appended prompt tokens there
// (with `input` collapsing to ~0), so excluding it
// would surface tiny "3 input" labels instead.
//
// `inputTokens` / `outputTokens` are legacy flat
// aliases (kept as a fallback for non-pi-ai providers
// that don't split the cache columns). We deliberately
// do NOT coerce a missing side to `0` — doing so
// would be indistinguishable from a real zero-token
// step in the meta row, and the query-layer
// that don't split the cache columns); with no cache
// split, the whole side counts as uncached. We
// deliberately do NOT coerce a missing side to `0` —
// doing so would be indistinguishable from a real
// zero-token step in the meta row, and the query-layer
// `count(...)` aggregate would mark the side as
// present when it really isn't.
const sumPresentNumbers = (
Expand All @@ -408,11 +413,7 @@ export function createPiAgentAdapter(
return saw ? total : undefined
}
const usageInput =
sumPresentNumbers([
usage?.input,
usage?.cacheRead,
usage?.cacheWrite,
]) ??
sumPresentNumbers([usage?.input, usage?.cacheWrite]) ??
(typeof usage?.inputTokens === `number`
? usage.inputTokens
: undefined)
Expand Down
13 changes: 8 additions & 5 deletions packages/agents-runtime/test/pi-adapter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -934,11 +934,14 @@ describe(`toAgentHistory`, () => {
expect(stepValue?.output_tokens).toBe(567)
})

it(`sums input + cacheRead + cacheWrite into the input token total`, async () => {
it(`sums input + cacheWrite (cache reads excluded) into the input tokens`, async () => {
// Anthropic + other prompt-cache providers split input across
// three counters; reading only `usage.input` would surface
// tiny "3 input" labels on cache-warm turns. The adapter sums
// all three so the meta row reflects the real prompt volume.
// three counters. The adapter surfaces the *uncached* side —
// fresh tokens plus cache writes. `cacheRead` re-counts the
// entire history on every warm turn, so including it would make
// the meta row a runaway cumulative number; `cacheWrite` must be
// counted because cache-enabled providers report newly appended
// prompt tokens there (with `input` collapsing to ~0).
const events = await runOnce(
makeCompletedMessage({
input: 50,
Expand All @@ -948,7 +951,7 @@ describe(`toAgentHistory`, () => {
})
)
const stepValue = findStepUpdate(events)
expect(stepValue?.input_tokens).toBe(1350)
expect(stepValue?.input_tokens).toBe(150)
expect(stepValue?.output_tokens).toBe(80)
})

Expand Down
5 changes: 5 additions & 0 deletions packages/agents-server-ui/src/components/TokenUsage.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ import styles from './TokenUsage.module.css'
* jittering as numbers tick up (input grows when a tool result is
* fed back; output grows when the model streams a new step).
*
* `input` is the uncached input side only — fresh prompt tokens plus
* cache writes, with prompt-cache *reads* excluded. The cache-inclusive
* total re-counts the entire history on every step, so it balloons into
* a cumulative number that says nothing about the work this response did.
*
* Either side may be `undefined` (the provider didn't emit it, or
* the section is historical and was recorded before tokens were
* persisted) — we skip the missing half rather than print `0`.
Expand Down
Loading