Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
2dd18b1
feat(agents): add /goal slash command with token-budget enforcement
kevin-dp Jun 10, 2026
44f855b
chore: bump changeset to patch for agent-goal-tracking
kevin-dp Jun 10, 2026
ffe88e3
docs(agents-runtime): explain ctx.replyText's five-write sequence
kevin-dp Jun 10, 2026
6c7910e
test(agents): stub getGoal/updateGoalUsage on the horton fakeCtx
kevin-dp Jun 10, 2026
f171562
test(agents): stub goal methods on horton-model-selection fakeCtx
kevin-dp Jun 10, 2026
4a37998
fix(agents): only enforce goal budget when the goal is active
kevin-dp Jun 10, 2026
9a35a93
fix(agents-runtime): coordinate synthetic run keys with the bridge id…
kevin-dp Jun 10, 2026
511f0f7
fix(agents): budget counts uncached input + output, not the display sum
kevin-dp Jun 10, 2026
4a47f7c
refactor(agents-runtime): single write path for goal usage
kevin-dp Jun 10, 2026
7da7102
refactor(agents-runtime): drop dead markGoalBudgetLimited API
kevin-dp Jun 10, 2026
2e4ba0d
feat(agents): register /goal in the static slash-command registry
kevin-dp Jun 10, 2026
e51aa7f
refactor(agents-server-ui): reuse the /goal parser from the client entry
kevin-dp Jun 10, 2026
bb9e9b5
fix(agents-runtime): persist the mark_goal_complete summary
kevin-dp Jun 10, 2026
7df7914
refactor(agents-runtime): ISO string timestamps on the goal entry
kevin-dp Jun 10, 2026
afecc4e
refactor(agents): trim speculative paused/blocked goal statuses
kevin-dp Jun 10, 2026
8ff01b4
refactor(agents): share one token-count formatter
kevin-dp Jun 10, 2026
132f49d
refactor(agents): only register mark_goal_complete when a goal is active
kevin-dp Jun 10, 2026
216ddfd
test(agents-runtime): cover createGoalApi
kevin-dp Jun 10, 2026
655cc46
test(agents): stub getGoal in the new observe-pg-sync tool test
kevin-dp Jun 11, 2026
7c08ed2
fix(agents): intercept /goal in composer-structured payloads
kevin-dp Jun 11, 2026
31ced2e
fix(agents-runtime): stop stale goal snapshots clobbering fresher usage
kevin-dp Jun 11, 2026
0c9d86c
fix(agents-runtime): count cache writes toward the goal token budget
kevin-dp Jun 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions .changeset/agent-goal-tracking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
'@electric-ax/agents-server-ui': patch
'@electric-ax/agents-runtime': patch
'@electric-ax/agents': patch
---

Add `/goal` slash command to Horton sessions. Lets the user set an
objective with an optional token budget; the agent works autonomously
toward the goal and stops when it calls `mark_goal_complete` or when
the run exceeds the budget.

```text
/goal set "ship feature X" --tokens 50k # default 50k tokens
/goal set "explore" --unlimited # opt out of the cap
/goal show # current state
/goal complete # mark done manually
/goal clear # remove the goal
```

## Behaviour

- **One goal per session**, persisted as a `kind: 'goal'` entry on the
`manifests` collection — resumes automatically across desktop
restarts.
- **Mid-run token enforcement**: an `onStepEnd` hook on the outbound
bridge surfaces per-step token counts; Horton accumulates them and
aborts the active `ctx.agent.run()` via an `AbortController` once
`tokensUsed >= tokenBudget`. The cap counts **new input (fresh +
cache-write tokens) + output** per step — prompt-cache reads (which
re-count the whole conversation on every warm step) are excluded, so
the budget tracks new work rather than context size.
- **Live progress**: the goal banner ticks up after each step. The
manifest update is written via `writeEvent` directly (not the
wake-session's staged manifest transaction, which only commits at
end-of-wake — too late for a long-running run).
- **`mark_goal_complete` tool**: registered on Horton's tool list.
Flips status to `complete`, surfaces in the chat as an ordinary
agent reply via the new `ctx.replyText` helper.
- **State-changing `/goal` commands interrupt the active run** —
typing `/goal complete`, `/goal clear`, or `/goal set` while a run
is in flight signals SIGINT alongside sending the message, so the
prior run aborts instead of finishing the old work first. `/goal
show` is read-only and does not interrupt.
- **Budget-limited stop message**: when the cap is hit mid-run, the
agent posts a synthetic reply explaining what happened and
suggesting a larger budget to resume.

## Plumbing

- `entity-schema.ts` — new `ManifestGoalEntryValue` (objective,
status, tokenBudget, tokensUsed, createdAt, updatedAt) added to the
manifest discriminated union.
- `goal-api.ts` (new) — `setGoal` / `clearGoal` / `getGoal` /
`markGoalComplete` / `updateGoalUsage`. All goal mutations share a
single ordered write channel (direct `writeEvent` upserts, live for
the UI) plus an in-wake read-your-writes cache, so a mutation firing
mid-run can never snapshot — and replay — a stale `tokensUsed` over
a fresher one. `updateGoalUsage` additionally never decreases the
counter.
- `goal-command.ts` (new) — `/goal` parser (`--tokens N|50k|1.2m|
unlimited`, `--unlimited` flag, subcommand aliases `done`/`status`)
and dispatcher.
- `tools/goal-tools.ts` (new) — `createMarkGoalCompleteTool` exposes
the completion signal to the LLM.
- `outbound-bridge.ts` — new optional `OutboundBridgeHooks.onStepEnd`
callback, threaded through `pi-adapter` and the `AgentConfig` passed
to `useAgent`.
- `context-factory.ts` — `AgentHandle.run` now accepts an optional
`abortSignal` and combines it with the runtime's `runSignal`. New
`ctx.replyText(text)` writes a complete runs + texts + textDeltas
sequence so synthetic replies render in the chat. New goal-related
methods exposed on `HandlerContext`.
- `horton.ts` — `tryHandleSlashCommand` intercepts `/goal *` before
the LLM; `/goal set` enqueues a one-shot kickoff so the agent starts
immediately; `assistantHandler` wires the budget-enforcing
`onStepEnd`, aborts on overflow, and posts the explanation reply.
- `agents-server-ui` — new `GoalBanner` component above the timeline
(objective + budget bar + status badge). `MessageInput` aborts the
active run when a state-changing `/goal` command is submitted.
`EntityTimeline` / `EntityContextDrawer` handle the new `goal`
manifest kind.
8 changes: 8 additions & 0 deletions packages/agents-runtime/src/client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ export {
createSlashCommandTokenRegex,
SLASH_COMMAND_TRIGGER_REGEX,
} from './composer-input'
// The /goal text grammar — pure parsing, shared with the UI so composer
// behavior (e.g. which subcommands interrupt a running agent) can't
// drift from the runtime dispatcher.
export { isGoalCommandText, parseGoalCommand } from './goal-command'
export { formatTokenCount } from './token-budget'
export type { GoalCommand } from './goal-command'

export type {
EntityStreamDB,
Expand All @@ -58,8 +64,10 @@ export type {
AttachmentStatus,
AttachmentSubject,
AttachmentSubjectType,
GoalStatus,
Manifest,
ManifestAttachmentEntry,
ManifestGoalEntry,
} from './entity-schema'
export type {
AttachmentCreateInput,
Expand Down
131 changes: 111 additions & 20 deletions packages/agents-runtime/src/context-factory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,13 @@ import { queryOnce } from '@durable-streams/state/db'
import { assembleContext } from './context-assembly'
import { createContextEntriesApi } from './context-entries'
import { entityStateSchema } from './entity-schema'
import { createGoalApi } from './goal-api'
import { formatPointerOrderToken } from './event-pointer'
import { createOutboundBridge, loadOutboundIdSeed } from './outbound-bridge'
import {
allocateRunKey,
createOutboundBridge,
loadOutboundIdSeed,
} from './outbound-bridge'
import { createPiAgentAdapter } from './pi-adapter'
import {
timelineMessages as runtimeTimelineMessages,
Expand Down Expand Up @@ -249,6 +254,29 @@ function getTriggerMessageText(
})
}

function combineAbortSignals(a: AbortSignal, b: AbortSignal): AbortSignal {
// Prefer the platform helper when available (Node 20+, modern browsers).
const any = (
AbortSignal as unknown as {
any?: (sigs: Array<AbortSignal>) => AbortSignal
}
).any
if (typeof any === `function`) return any.call(AbortSignal, [a, b])
const controller = new AbortController()
const linkTo = (source: AbortSignal): void => {
if (source.aborted) {
controller.abort(source.reason)
return
}
source.addEventListener(`abort`, () => controller.abort(source.reason), {
once: true,
})
}
linkTo(a)
linkTo(b)
return controller.signal
}

function toHandlerWake(wakeEvent: WakeEvent): HandlerWake {
if (wakeEvent.type === `inbox`) {
return {
Expand Down Expand Up @@ -450,23 +478,16 @@ export function createHandlerContext<TState extends StateProxy = StateProxy>(
let useContextConfig: UseContextConfig | null = null
let useContextHash = ``
let useContextRegistrations = 0
// Lazy-loaded run-id counter used by ctx.recordRun(). Initialized
// from the runs already present in the entity's StreamDB so keys
// remain monotonic across handler invocations.
let recordRunCounter: number | null = null
// Run-id allocation for ctx.recordRun() / ctx.replyText(). Delegates
// to the outbound bridge's shared id-seed cache so synthetic runs
// can't collide with `run-N` keys the bridge allocated for events
// that haven't round-tripped into the local collection yet. The local
// floor keeps sequential allocations monotonic within this handler
// even when the collection lags (or has no stable id, as in tests).
let localRunFloor = 0
const nextRunKey = (): string => {
if (recordRunCounter == null) {
let max = 0
const rows = config.db.collections.runs.toArray as Array<{ key: string }>
for (const row of rows) {
const m = row.key.match(/^run-(\d+)/)
if (!m) continue
max = Math.max(max, parseInt(m[1]!, 10) + 1)
}
recordRunCounter = max
}
const key = `run-${recordRunCounter}`
recordRunCounter += 1
const key = allocateRunKey(config.db, localRunFloor)
localRunFloor = parseInt(key.slice(`run-`.length), 10) + 1
return key
}

Expand All @@ -476,6 +497,12 @@ export function createHandlerContext<TState extends StateProxy = StateProxy>(
wakeSession: config.wakeSession,
})

const goalApi = createGoalApi({
db: config.db,
wakeSession: config.wakeSession,
writeEvent: config.writeEvent,
})

const listAttachments: AttachmentsApi[`list`] = (filter) => {
const attachments = config.db.collections.manifests.toArray
.filter((entry) => entry.kind === `attachment`)
Expand Down Expand Up @@ -713,7 +740,10 @@ export function createHandlerContext<TState extends StateProxy = StateProxy>(
}

const agent: AgentHandle = {
async run(input?: string): Promise<AgentRunResult> {
async run(
input?: string,
abortSignal?: AbortSignal
): Promise<AgentRunResult> {
if (!agentConfig) {
throw new Error(
`[agent-runtime] agent.run() called without useAgent().`
Expand Down Expand Up @@ -755,6 +785,8 @@ export function createHandlerContext<TState extends StateProxy = StateProxy>(
getApiKey: activeAgentConfig.getApiKey,

onPayload: activeAgentConfig.onPayload,

onStepEnd: activeAgentConfig.onStepEnd,
})
const handle = adapterFactory({
entityUrl: config.entityUrl,
Expand Down Expand Up @@ -802,7 +834,11 @@ export function createHandlerContext<TState extends StateProxy = StateProxy>(
)
}

await handle.run(runInput, config.runSignal)
const combinedSignal =
config.runSignal && abortSignal
? combineAbortSignals(config.runSignal, abortSignal)
: (abortSignal ?? config.runSignal)
await handle.run(runInput, combinedSignal)
runtimeLog.info(logPrefix, `agent.run completed`)

return {
Expand Down Expand Up @@ -947,6 +983,11 @@ export function createHandlerContext<TState extends StateProxy = StateProxy>(
removeContext: contextApi.removeContext,
getContext: contextApi.getContext,
listContext: contextApi.listContext,
setGoal: goalApi.setGoal,
clearGoal: goalApi.clearGoal,
getGoal: goalApi.getGoal,
markGoalComplete: goalApi.markGoalComplete,
updateGoalUsage: goalApi.updateGoalUsage,
__debug: {
useContextRegistrations: () => useContextRegistrations,
},
Expand Down Expand Up @@ -1040,6 +1081,53 @@ export function createHandlerContext<TState extends StateProxy = StateProxy>(
},
}
},
// Renders `text` as an ordinary assistant message in the chat without
// calling the LLM. Used for runtime-driven replies like slash-command
// responses and budget-limit notices. The five writes synthesize the
// same run + text + delta event sequence the outbound bridge would
// emit for a real LLM turn; the UI needs all of them to render.
replyText(text: string): void {
if (typeof text !== `string` || text.length === 0) return
const runKey = nextRunKey()
const msgKey = `${runKey}:msg`
config.writeEvent(
entityStateSchema.runs.insert({
key: runKey,
value: { status: `started` } as never,
}) as ChangeEvent
)
config.writeEvent(
entityStateSchema.texts.insert({
key: msgKey,
value: { status: `streaming`, run_id: runKey } as never,
}) as ChangeEvent
)
config.writeEvent(
entityStateSchema.textDeltas.insert({
key: `${msgKey}:0`,
value: {
text_id: msgKey,
run_id: runKey,
delta: text,
} as never,
}) as ChangeEvent
)
config.writeEvent(
entityStateSchema.texts.update({
key: msgKey,
value: { status: `completed`, run_id: runKey } as never,
}) as ChangeEvent
)
config.writeEvent(
entityStateSchema.runs.update({
key: runKey,
value: {
status: `completed`,
finish_reason: `stop`,
} as never,
}) as ChangeEvent
)
},
sleep(): void {
sleepRequested = true
},
Expand All @@ -1051,5 +1139,8 @@ export function createHandlerContext<TState extends StateProxy = StateProxy>(
},
}

return { ctx, getSleepRequested: () => sleepRequested }
return {
ctx,
getSleepRequested: () => sleepRequested,
}
}
Loading
Loading