Skip to content

feat(p4): native tools — six runtime tools inside the agent sandbox#82

Merged
garniergeorges merged 5 commits into
devfrom
feat/p4-native-tools
Apr 27, 2026
Merged

feat(p4): native tools — six runtime tools inside the agent sandbox#82
garniergeorges merged 5 commits into
devfrom
feat/p4-native-tools

Conversation

@garniergeorges

@garniergeorges garniergeorges commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary

Milestone P4 : agents running in their own Docker container can now call six native tools by emitting fenced `forge:*` blocks. The runtime parses one block per turn, executes the tool, feeds the structured result back as a user message, and loops up to `maxTurns` (capped at 10).

Tool Tag What it does
Bash `forge:bash` `bash -lc` in /workspace, 30s default timeout (max 120s), 16 KB output cap
FileWrite `forge:write` Create or overwrite under /workspace, parent dirs auto-created
FileRead `forge:read` Line-based offset/limit, 16 KB clip, fails on non-regular files
FileEdit `forge:edit` Exact-substring patch, refuses ambiguous matches unless `replaceAll: true`
Grep `forge:grep` Pure JS regex across a glob filter, skips binaries, 200 hits / 400 char/line cap
Glob `forge:glob` Hand-rolled `*` / `**` / `?` matcher, 200 results / 5000 node walk

Why a text-structured protocol (not OpenAI tool_calls)

  • Local LLMs (MLX, llama.cpp) don't all honor `tool_calls`.
  • A single protocol across builder and agents is easier to debug and audit — the raw stream is human-readable.

Sandbox

  • Tools live in `packages/tools-core/src/runtime/`, distinct from the host-side FileWrite.
  • All paths are resolved through `resolveSandboxedPath` : path traversal, null bytes and absolute paths outside the sandbox are refused.
  • `DockerLaunch` bind-mounts a per-run host dir at `/workspace` (`~/.agent-forge/workspaces//`) — kept after the container exits, sets up P5 artifact extraction.
  • All output sizes are bounded so a runaway tool can't blow the LLM context.

Files

tools-core

  • `runtime/bash.ts` (new)
  • `runtime/file-write.ts` (new) — also exports `resolveSandboxedPath` reused by grep/glob
  • `runtime/file-read.ts` (new)
  • `runtime/file-edit.ts` (new)
  • `runtime/grep.ts` (new)
  • `runtime/glob.ts` (new)
  • `docker-launch.ts` — adds the `/workspace` bind mount

runtime

  • `tool-protocol.ts` (new) — six-tag parser table + result renderers
  • `index.ts` — refactored from one-shot `streamText` to a tool loop with `maxTurns` ; system prompt lists all six tools

Tests

  • agent-side parser : none / tool / invalid JSON / invalid schema / first-block-wins / each tag round-trip / refine-rule violation
  • runtime FileWrite / FileRead / FileEdit / Bash / Grep / Glob : path safety, sandbox escape, edge cases
  • tests use `FORGE_WORKSPACE` to retarget the sandbox at a temp dir

Test plan

  • `bun install && bun run --cwd packages/runtime build` succeeds
  • `bun test` passes for tools-core and runtime
  • `bun run forge` boots, an agent created with `maxTurns: 5` can chain : write a file → read it → grep it → edit it → bash to verify
  • artifacts persist in `~/.agent-forge/workspaces//` after the container exits
  • check-attribution CI is green

Out of scope

  • Hardened sandbox (read-only root FS, network policy, resource caps) → P5
  • Persistent agents (`docker exec` instead of one-shot `run --rm`) → P5
  • Artifact extraction back to host → P5

Agents can now call two native tools by emitting fenced forge:* blocks
in their reply. The runtime parses the first block per turn, executes
it, feeds the structured result back as a user message, and loops up to
maxTurns (capped at 10).

Tools live in tools-core/src/runtime/ and stay distinct from the
host-side FileWrite : they are sandboxed to /workspace, overwrite by
default (in-sandbox iteration), and surface stdout/stderr/exit/timed-out
to the LLM.

DockerLaunch now bind-mounts a per-run host directory at /workspace so
the agent has somewhere writable, and so artifacts survive the
container (used later in P5 for extraction).

Runtime mode switches automatically based on AGENT.md.maxTurns : single
turn keeps the P3 one-shot path, multi-turn enables the tool loop and
prepends a TOOLS section to the system prompt explaining the protocol.

Tool output is wrapped in [forge:tool] / [/forge:tool] markers on
stdout so the host TUI can route it to its action card instead of
mixing it with prose.

Tests :
- agent-side parser (none / tool / invalid / first-block-only)
- runtime FileWrite path traversal, sandbox escape, overwrite
- runtime Bash stdout/stderr/exit/timeout, cwd respected

Tests use a FORGE_WORKSPACE override so they don't try to touch
/workspace on the host.
Completes the P4 tool catalog. Agents now have the full six : bash,
write, read, edit, grep, glob. All sandboxed to /workspace, all callable
via fenced forge:* blocks, all validated by Zod and capped on output
size to protect the LLM context.

read   — line-based offset/limit, 16 KB clip, fails on missing or
         non-regular files
edit   — exact substring patch, refuses ambiguous matches unless
         replaceAll=true, refuses identical old/new
grep   — pure JS regex over a glob filter, skips binary files (NUL
         bytes), 200 hits cap, line clipped at 400 chars
glob   — hand-rolled matcher for *, **, ?  (no dep), 200 results cap,
         walk bounded at 5000 nodes

Tool dispatcher in the runtime is now a switch over six branches.
System prompt lists all six with their JSON shape.

Tests added for each tool plus four new parser cases (forge:read /
edit / grep / glob, and a refine-rule violation on edit).

resolveSandboxedPath is now exported so tools that don't write but
still need the sandbox root (grep, glob) reuse it instead of
duplicating the FORGE_WORKSPACE override logic.
@garniergeorges garniergeorges changed the title feat(p4): native tools — bash + file-write inside the agent sandbox feat(p4): native tools — six runtime tools inside the agent sandbox Apr 27, 2026
Mistral Small (and likely most small models) regularly emits an AGENT.md
where the `description` value embeds a colon — typically when listing
steps ("Step 1: ..., Step 2: ...") or quoting another key (`maxTurns: 8`,
`timeout: 60s`). YAML reads that as a nested mapping and rejects the
whole frontmatter.

Two fixes :

1. The builder system prompt now spells out the rule in both EN and FR :
   no colon / no embedded YAML / wrap in double quotes if needed. Comes
   with an example so the LLM has a template to follow.

2. The CLI normalizer now scans the frontmatter and wraps any
   `description` value containing an unquoted colon in double quotes,
   escaping any embedded double quotes in the process. Already-quoted
   values are left alone.

Tests cover both : an unquoted "Step 1: ... Step 2: ..." is fixed up
and accepted ; an already-quoted equivalent is left untouched.
Cards in Mission Control are now keyboard-navigable :
  - Tab           cycle focus forward (lands on the most recent card
                  the first time)
  - Shift+Tab     cycle focus backward
  - Enter         open the focused card in a full-screen detail view
  - Esc / q       close the detail view

The detail view uses the entire terminal, shows the action's full
content (the AGENT.md body for write actions ; prompt + streamed
output for run actions) with line numbers, and supports scrolling
with arrow keys / PgUp / PgDn / g / G.

Tab/Enter are only captured when there are actions, no permission
dialog is up, the detail view is closed, and the prompt input is
empty — so typing in the prompt always wins. The prompt draft is
now lifted into useChat so App can read it for that guard.

Visual cues : the focused card switches to a brighter "double" border
and gains a leading triangle ; the Mission Control header changes its
hint line depending on whether anything is focused.
When a card is focused but the detail view isn't open, pressing Esc
now drops the focus without opening anything. Guarded so it only
fires when the prompt is empty and no permission dialog is up — Esc
keeps its meaning everywhere else.

Header hint updated accordingly.
@garniergeorges garniergeorges merged commit 1a3111a into dev Apr 27, 2026
1 check passed
@garniergeorges garniergeorges deleted the feat/p4-native-tools branch April 27, 2026 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant