feat(p4): native tools — six runtime tools inside the agent sandbox by garniergeorges · Pull Request #82 · garniergeorges/agent-forge

garniergeorges · 2026-04-27T11:38:18Z

Summary

Milestone P4 : agents running in their own Docker container can now call six native tools by emitting fenced `forge:*` blocks. The runtime parses one block per turn, executes the tool, feeds the structured result back as a user message, and loops up to `maxTurns` (capped at 10).

Tool	Tag	What it does
Bash	`forge:bash`	`bash -lc` in /workspace, 30s default timeout (max 120s), 16 KB output cap
FileWrite	`forge:write`	Create or overwrite under /workspace, parent dirs auto-created
FileRead	`forge:read`	Line-based offset/limit, 16 KB clip, fails on non-regular files
FileEdit	`forge:edit`	Exact-substring patch, refuses ambiguous matches unless `replaceAll: true`
Grep	`forge:grep`	Pure JS regex across a glob filter, skips binaries, 200 hits / 400 char/line cap
Glob	`forge:glob`	Hand-rolled `` / `*` / `?` matcher, 200 results / 5000 node walk

Why a text-structured protocol (not OpenAI tool_calls)

Local LLMs (MLX, llama.cpp) don't all honor `tool_calls`.
A single protocol across builder and agents is easier to debug and audit — the raw stream is human-readable.

Sandbox

Tools live in `packages/tools-core/src/runtime/`, distinct from the host-side FileWrite.
All paths are resolved through `resolveSandboxedPath` : path traversal, null bytes and absolute paths outside the sandbox are refused.
`DockerLaunch` bind-mounts a per-run host dir at `/workspace` (`~/.agent-forge/workspaces//`) — kept after the container exits, sets up P5 artifact extraction.
All output sizes are bounded so a runaway tool can't blow the LLM context.

Files

tools-core

`runtime/bash.ts` (new)
`runtime/file-write.ts` (new) — also exports `resolveSandboxedPath` reused by grep/glob
`runtime/file-read.ts` (new)
`runtime/file-edit.ts` (new)
`runtime/grep.ts` (new)
`runtime/glob.ts` (new)
`docker-launch.ts` — adds the `/workspace` bind mount

runtime

`tool-protocol.ts` (new) — six-tag parser table + result renderers
`index.ts` — refactored from one-shot `streamText` to a tool loop with `maxTurns` ; system prompt lists all six tools

Tests

agent-side parser : none / tool / invalid JSON / invalid schema / first-block-wins / each tag round-trip / refine-rule violation
runtime FileWrite / FileRead / FileEdit / Bash / Grep / Glob : path safety, sandbox escape, edge cases
tests use `FORGE_WORKSPACE` to retarget the sandbox at a temp dir

Test plan

`bun install && bun run --cwd packages/runtime build` succeeds
`bun test` passes for tools-core and runtime
`bun run forge` boots, an agent created with `maxTurns: 5` can chain : write a file → read it → grep it → edit it → bash to verify
artifacts persist in `~/.agent-forge/workspaces//` after the container exits
check-attribution CI is green

Out of scope

Hardened sandbox (read-only root FS, network policy, resource caps) → P5
Persistent agents (`docker exec` instead of one-shot `run --rm`) → P5
Artifact extraction back to host → P5

Agents can now call two native tools by emitting fenced forge:* blocks in their reply. The runtime parses the first block per turn, executes it, feeds the structured result back as a user message, and loops up to maxTurns (capped at 10). Tools live in tools-core/src/runtime/ and stay distinct from the host-side FileWrite : they are sandboxed to /workspace, overwrite by default (in-sandbox iteration), and surface stdout/stderr/exit/timed-out to the LLM. DockerLaunch now bind-mounts a per-run host directory at /workspace so the agent has somewhere writable, and so artifacts survive the container (used later in P5 for extraction). Runtime mode switches automatically based on AGENT.md.maxTurns : single turn keeps the P3 one-shot path, multi-turn enables the tool loop and prepends a TOOLS section to the system prompt explaining the protocol. Tool output is wrapped in [forge:tool] / [/forge:tool] markers on stdout so the host TUI can route it to its action card instead of mixing it with prose. Tests : - agent-side parser (none / tool / invalid / first-block-only) - runtime FileWrite path traversal, sandbox escape, overwrite - runtime Bash stdout/stderr/exit/timeout, cwd respected Tests use a FORGE_WORKSPACE override so they don't try to touch /workspace on the host.

Completes the P4 tool catalog. Agents now have the full six : bash, write, read, edit, grep, glob. All sandboxed to /workspace, all callable via fenced forge:* blocks, all validated by Zod and capped on output size to protect the LLM context. read — line-based offset/limit, 16 KB clip, fails on missing or non-regular files edit — exact substring patch, refuses ambiguous matches unless replaceAll=true, refuses identical old/new grep — pure JS regex over a glob filter, skips binary files (NUL bytes), 200 hits cap, line clipped at 400 chars glob — hand-rolled matcher for *, **, ? (no dep), 200 results cap, walk bounded at 5000 nodes Tool dispatcher in the runtime is now a switch over six branches. System prompt lists all six with their JSON shape. Tests added for each tool plus four new parser cases (forge:read / edit / grep / glob, and a refine-rule violation on edit). resolveSandboxedPath is now exported so tools that don't write but still need the sandbox root (grep, glob) reuse it instead of duplicating the FORGE_WORKSPACE override logic.

Mistral Small (and likely most small models) regularly emits an AGENT.md where the `description` value embeds a colon — typically when listing steps ("Step 1: ..., Step 2: ...") or quoting another key (`maxTurns: 8`, `timeout: 60s`). YAML reads that as a nested mapping and rejects the whole frontmatter. Two fixes : 1. The builder system prompt now spells out the rule in both EN and FR : no colon / no embedded YAML / wrap in double quotes if needed. Comes with an example so the LLM has a template to follow. 2. The CLI normalizer now scans the frontmatter and wraps any `description` value containing an unquoted colon in double quotes, escaping any embedded double quotes in the process. Already-quoted values are left alone. Tests cover both : an unquoted "Step 1: ... Step 2: ..." is fixed up and accepted ; an already-quoted equivalent is left untouched.

Cards in Mission Control are now keyboard-navigable : - Tab cycle focus forward (lands on the most recent card the first time) - Shift+Tab cycle focus backward - Enter open the focused card in a full-screen detail view - Esc / q close the detail view The detail view uses the entire terminal, shows the action's full content (the AGENT.md body for write actions ; prompt + streamed output for run actions) with line numbers, and supports scrolling with arrow keys / PgUp / PgDn / g / G. Tab/Enter are only captured when there are actions, no permission dialog is up, the detail view is closed, and the prompt input is empty — so typing in the prompt always wins. The prompt draft is now lifted into useChat so App can read it for that guard. Visual cues : the focused card switches to a brighter "double" border and gains a leading triangle ; the Mission Control header changes its hint line depending on whether anything is focused.

When a card is focused but the detail view isn't open, pressing Esc now drops the focus without opening anything. Guarded so it only fires when the prompt is empty and no permission dialog is up — Esc keeps its meaning everywhere else. Header hint updated accordingly.

garniergeorges added 2 commits April 27, 2026 13:17

garniergeorges changed the title ~~feat(p4): native tools — bash + file-write inside the agent sandbox~~ feat(p4): native tools — six runtime tools inside the agent sandbox Apr 27, 2026

garniergeorges added 3 commits April 27, 2026 14:08

garniergeorges merged commit 1a3111a into dev Apr 27, 2026
1 check passed

garniergeorges deleted the feat/p4-native-tools branch April 27, 2026 12:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(p4): native tools — six runtime tools inside the agent sandbox#82

feat(p4): native tools — six runtime tools inside the agent sandbox#82
garniergeorges merged 5 commits into
devfrom
feat/p4-native-tools

garniergeorges commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garniergeorges commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why a text-structured protocol (not OpenAI tool_calls)

Sandbox

Files

tools-core

runtime

Tests

Test plan

Out of scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garniergeorges commented Apr 27, 2026 •

edited

Loading