feat(p4): native tools — six runtime tools inside the agent sandbox#82
Merged
Conversation
Agents can now call two native tools by emitting fenced forge:* blocks in their reply. The runtime parses the first block per turn, executes it, feeds the structured result back as a user message, and loops up to maxTurns (capped at 10). Tools live in tools-core/src/runtime/ and stay distinct from the host-side FileWrite : they are sandboxed to /workspace, overwrite by default (in-sandbox iteration), and surface stdout/stderr/exit/timed-out to the LLM. DockerLaunch now bind-mounts a per-run host directory at /workspace so the agent has somewhere writable, and so artifacts survive the container (used later in P5 for extraction). Runtime mode switches automatically based on AGENT.md.maxTurns : single turn keeps the P3 one-shot path, multi-turn enables the tool loop and prepends a TOOLS section to the system prompt explaining the protocol. Tool output is wrapped in [forge:tool] / [/forge:tool] markers on stdout so the host TUI can route it to its action card instead of mixing it with prose. Tests : - agent-side parser (none / tool / invalid / first-block-only) - runtime FileWrite path traversal, sandbox escape, overwrite - runtime Bash stdout/stderr/exit/timeout, cwd respected Tests use a FORGE_WORKSPACE override so they don't try to touch /workspace on the host.
Completes the P4 tool catalog. Agents now have the full six : bash,
write, read, edit, grep, glob. All sandboxed to /workspace, all callable
via fenced forge:* blocks, all validated by Zod and capped on output
size to protect the LLM context.
read — line-based offset/limit, 16 KB clip, fails on missing or
non-regular files
edit — exact substring patch, refuses ambiguous matches unless
replaceAll=true, refuses identical old/new
grep — pure JS regex over a glob filter, skips binary files (NUL
bytes), 200 hits cap, line clipped at 400 chars
glob — hand-rolled matcher for *, **, ? (no dep), 200 results cap,
walk bounded at 5000 nodes
Tool dispatcher in the runtime is now a switch over six branches.
System prompt lists all six with their JSON shape.
Tests added for each tool plus four new parser cases (forge:read /
edit / grep / glob, and a refine-rule violation on edit).
resolveSandboxedPath is now exported so tools that don't write but
still need the sandbox root (grep, glob) reuse it instead of
duplicating the FORGE_WORKSPACE override logic.
Mistral Small (and likely most small models) regularly emits an AGENT.md
where the `description` value embeds a colon — typically when listing
steps ("Step 1: ..., Step 2: ...") or quoting another key (`maxTurns: 8`,
`timeout: 60s`). YAML reads that as a nested mapping and rejects the
whole frontmatter.
Two fixes :
1. The builder system prompt now spells out the rule in both EN and FR :
no colon / no embedded YAML / wrap in double quotes if needed. Comes
with an example so the LLM has a template to follow.
2. The CLI normalizer now scans the frontmatter and wraps any
`description` value containing an unquoted colon in double quotes,
escaping any embedded double quotes in the process. Already-quoted
values are left alone.
Tests cover both : an unquoted "Step 1: ... Step 2: ..." is fixed up
and accepted ; an already-quoted equivalent is left untouched.
Cards in Mission Control are now keyboard-navigable :
- Tab cycle focus forward (lands on the most recent card
the first time)
- Shift+Tab cycle focus backward
- Enter open the focused card in a full-screen detail view
- Esc / q close the detail view
The detail view uses the entire terminal, shows the action's full
content (the AGENT.md body for write actions ; prompt + streamed
output for run actions) with line numbers, and supports scrolling
with arrow keys / PgUp / PgDn / g / G.
Tab/Enter are only captured when there are actions, no permission
dialog is up, the detail view is closed, and the prompt input is
empty — so typing in the prompt always wins. The prompt draft is
now lifted into useChat so App can read it for that guard.
Visual cues : the focused card switches to a brighter "double" border
and gains a leading triangle ; the Mission Control header changes its
hint line depending on whether anything is focused.
When a card is focused but the detail view isn't open, pressing Esc now drops the focus without opening anything. Guarded so it only fires when the prompt is empty and no permission dialog is up — Esc keeps its meaning everywhere else. Header hint updated accordingly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Milestone P4 : agents running in their own Docker container can now call six native tools by emitting fenced `forge:*` blocks. The runtime parses one block per turn, executes the tool, feeds the structured result back as a user message, and loops up to `maxTurns` (capped at 10).
Why a text-structured protocol (not OpenAI tool_calls)
Sandbox
Files
tools-core
runtime
Tests
Test plan
Out of scope