release: dev → prod (P4 native tools + P6 skill layer) by garniergeorges · Pull Request #84 · garniergeorges/agent-forge

garniergeorges · 2026-04-27T16:02:38Z

Summary

Brings prod up to date with dev. Two milestones to release :

P4 — Six native tools (Bash, FileWrite, FileRead, FileEdit, Grep, Glob) sandboxed under `/workspace`, runtime tool-loop with `maxTurns` (cap 10), `forge:*` text-structured protocol
P6 — Skill layer : SKILL.md format, catalog (built-in + `~/.agent-forge/skills/`), server-side trigger matcher, two-call runner. First built-in : `scaffold-and-run`
Mission Control v2 — Compact mode by default, scrollable viewport, auto-focus on new arrivals, Tab/Enter/Esc keyboard nav, full-screen detail view with Markdown / YAML / JSON / agent-run highlighting
Docs — Root + sub-package READMEs aligned (EN/FR), P6 done badge, native tools and skills sections, keyboard cheatsheet, updated architecture diagram

What's NOT here

P5 (hardened sandbox + persistent agents + artifact extraction) — next milestone, on `feat/p5-hardened-sandbox`
P7-P9 — see roadmap in README.

Test plan

Pull prod after merge, `bun install`, `bun run --cwd packages/runtime build`, `bun test` green
`bun run forge` boots
`/skills` lists `scaffold-and-run · built-in`
"audite un projet TypeScript dans /workspace : crée src/index.ts avec deux TODO add et multiply, exécute le résultat" → 3 cards (skill DONE, write PROPOSED, run PROPOSED)
check-attribution CI passes

Agents can now call two native tools by emitting fenced forge:* blocks in their reply. The runtime parses the first block per turn, executes it, feeds the structured result back as a user message, and loops up to maxTurns (capped at 10). Tools live in tools-core/src/runtime/ and stay distinct from the host-side FileWrite : they are sandboxed to /workspace, overwrite by default (in-sandbox iteration), and surface stdout/stderr/exit/timed-out to the LLM. DockerLaunch now bind-mounts a per-run host directory at /workspace so the agent has somewhere writable, and so artifacts survive the container (used later in P5 for extraction). Runtime mode switches automatically based on AGENT.md.maxTurns : single turn keeps the P3 one-shot path, multi-turn enables the tool loop and prepends a TOOLS section to the system prompt explaining the protocol. Tool output is wrapped in [forge:tool] / [/forge:tool] markers on stdout so the host TUI can route it to its action card instead of mixing it with prose. Tests : - agent-side parser (none / tool / invalid / first-block-only) - runtime FileWrite path traversal, sandbox escape, overwrite - runtime Bash stdout/stderr/exit/timeout, cwd respected Tests use a FORGE_WORKSPACE override so they don't try to touch /workspace on the host.

Completes the P4 tool catalog. Agents now have the full six : bash, write, read, edit, grep, glob. All sandboxed to /workspace, all callable via fenced forge:* blocks, all validated by Zod and capped on output size to protect the LLM context. read — line-based offset/limit, 16 KB clip, fails on missing or non-regular files edit — exact substring patch, refuses ambiguous matches unless replaceAll=true, refuses identical old/new grep — pure JS regex over a glob filter, skips binary files (NUL bytes), 200 hits cap, line clipped at 400 chars glob — hand-rolled matcher for *, **, ? (no dep), 200 results cap, walk bounded at 5000 nodes Tool dispatcher in the runtime is now a switch over six branches. System prompt lists all six with their JSON shape. Tests added for each tool plus four new parser cases (forge:read / edit / grep / glob, and a refine-rule violation on edit). resolveSandboxedPath is now exported so tools that don't write but still need the sandbox root (grep, glob) reuse it instead of duplicating the FORGE_WORKSPACE override logic.

Mistral Small (and likely most small models) regularly emits an AGENT.md where the `description` value embeds a colon — typically when listing steps ("Step 1: ..., Step 2: ...") or quoting another key (`maxTurns: 8`, `timeout: 60s`). YAML reads that as a nested mapping and rejects the whole frontmatter. Two fixes : 1. The builder system prompt now spells out the rule in both EN and FR : no colon / no embedded YAML / wrap in double quotes if needed. Comes with an example so the LLM has a template to follow. 2. The CLI normalizer now scans the frontmatter and wraps any `description` value containing an unquoted colon in double quotes, escaping any embedded double quotes in the process. Already-quoted values are left alone. Tests cover both : an unquoted "Step 1: ... Step 2: ..." is fixed up and accepted ; an already-quoted equivalent is left untouched.

Cards in Mission Control are now keyboard-navigable : - Tab cycle focus forward (lands on the most recent card the first time) - Shift+Tab cycle focus backward - Enter open the focused card in a full-screen detail view - Esc / q close the detail view The detail view uses the entire terminal, shows the action's full content (the AGENT.md body for write actions ; prompt + streamed output for run actions) with line numbers, and supports scrolling with arrow keys / PgUp / PgDn / g / G. Tab/Enter are only captured when there are actions, no permission dialog is up, the detail view is closed, and the prompt input is empty — so typing in the prompt always wins. The prompt draft is now lifted into useChat so App can read it for that guard. Visual cues : the focused card switches to a brighter "double" border and gains a leading triangle ; the Mission Control header changes its hint line depending on whether anything is focused.

When a card is focused but the detail view isn't open, pressing Esc now drops the focus without opening anything. Guarded so it only fires when the prompt is empty and no permission dialog is up — Esc keeps its meaning everywhere else. Header hint updated accordingly.

feat(p4): native tools — six runtime tools inside the agent sandbox

…lder The builder now has access to a catalog of skills : self-contained behaviour modules that orchestrate multiple actions in a single turn to handle recurring intent patterns. First built-in : scaffold-and-run, which fixes the "user describes both creation and execution but the builder stops after writing AGENT.md" pattern by making the LLM emit a forge:write AND a forge:run in the same turn. Architecture : - SKILL.md format with YAML frontmatter (name, description, triggers, actions) and a markdown body containing the instructions. Mirrors AGENT.md to stay familiar. - Catalog loader discovers skills from two sources : built-ins shipped in packages/core/src/builder/skills/, plus user skills under ~/.agent-forge/skills/. User skills override built-ins on name collision. - System prompt only carries the catalog metadata (name + description + triggers) — bodies stay out of the context until the LLM emits a forge:skill block, then the resolved body is injected as a system message for the next turn. - New ParsedAction kind 'skill' with a forge:skill fenced block parser ; tolerant of either `name: <skill>` or a bare line. - SkillAction joins WriteAction/RunAction in the action store. Skills auto-execute (no permission dialog) and surface as their own card in Mission Control with a "loaded into context" hint. CardDetail renders the description plus the full body so the user can see what the skill actually injects. Other : - /skills slash command lists what's available, source-tagged. - useChat resolves the catalog once via useMemo, threads it to streamBuilder and to executeAction's resolveSkill. Tests : - SKILL.md schema (kebab-case name, missing frontmatter, unknown action tag). - Catalog loader discovers the built-in and sorts entries. - System prompt injects the SKILLS section when entries are provided, FR/EN headers, base prompt unchanged when empty. - forge:skill parser (name: prefix, bare line, kebab-case rejection) and executeAction(skill) round-trip via resolver.

Symptom : the user typed a message that matched a skill trigger ("audite un projet typescript") but the builder still skipped forge:skill and went straight to forge:write. The skill catalog was loaded, the system prompt mentioned it, but Mistral would not act on it. Two issues, both fixed here : 1. Position. The SKILLS section was appended AFTER the base prompt's "BE DECISIVE — propose the AGENT.md immediately" rule. Mistral read the strong push to write first, then a soft "you can also use a skill" at the bottom, and ignored the latter. 2. Framing. The original wording said "choose a skill when ...". Too permissive — small models read that as optional. Replaced with a STEP 0 / ÉTAPE 0 framing : an explicit, mandatory pre-flight check that runs BEFORE any other action. If any trigger phrase matches (case-insensitive substring), the LLM MUST emit a forge:skill block as the only action of that turn ; only then does the rest of the protocol apply. The catalog is now placed at the TOP of the system prompt, before the "be decisive" rule, so the order of reading mirrors the order of execution. Tests updated for the new wording. Triggers are now quoted in the catalog rendering ("audite", "teste") so the LLM sees them as literals rather than running prose.

Mistral Small reads the skill catalog and the STEP 0 instruction in the system prompt, but it does not act on it : it sees "audite a typescript project" and goes straight to a forge:write that collapses both the agent definition and the run mission into one giant AGENT.md body. Adding more rules to the prompt didn't move the needle. Plan B, in three pieces : 1. matchSkillForMessage() — case-insensitive substring match against the trigger phrases declared in each SKILL.md. Lives in core, no LLM involvement. 2. runScaffoldAndRun() — a dedicated runner that drives the skill end to end with TWO narrow LLM calls instead of one wide one : - call A : "produce ONLY the AGENT.md content" (generic role, no session-specific steps in the body) - call B : "produce ONLY the prompt to send to the agent" Each call has a tightly scoped system instruction so the model keeps the two artefacts cleanly separated. Output is parsed server-side, AGENT.md name extracted from the frontmatter. 3. useChat.send() — pre-flight before the normal stream : if the matcher finds a skill, dispatch to the runner. The skill card lands in Mission Control as DONE, then a write card and a run card appear as PROPOSED. The user approves them in order via the existing permission dialog. The system prompt no longer carries the STEP 0 / ÉTAPE 0 mandate. Skills are now an internal mechanism the LLM is informed about but never asked to operate. The catalog metadata stays in the prompt as a short tail note so the LLM understands why a skill card might appear in Mission Control. Tests : - matchSkillForMessage : substring match, no-match, multi-skill precedence (first wins), empty trigger ignored. - system prompt : informational note appears when skills are passed, FR/EN variants, base prompt comes first.

The detail screens (Esc-Tab-Enter on a Mission Control card) used to render every action body through highlightPlain — everything came back as undifferentiated grey, which made long AGENT.md / agent run outputs hard to scan. Replaced by per-shape highlighting : - skill detail : Markdown highlighter (headings, lists, inline code, bold, fenced blocks). The skill body is markdown, so this matches. - write detail : if the file has YAML frontmatter (which AGENT.md always does), split frontmatter and body. The frontmatter goes through the YAML highlighter (already existing), the body through the new Markdown one. Falls back to plain YAML for files without frontmatter. - run detail (and the compact run card in Mission Control) : a new highlightAgentRun() that walks the streamed output and recognises : · fenced ```forge:* blocks (open line orange, body via the matching language highlighter — JSON for forge:bash/write/etc.) · [forge:tool] / [/forge:tool] markers wrapping the result of the previous tool call (rendered dim grey so it visually recedes) · regular prose with inline code spans and bold New helpers in syntax.ts : highlightMarkdown(text) highlightAgentRun(text) highlightYamlLine / highlightJsonLine kept exported for the run highlighter to delegate to. The compact card in MissionControl now also uses highlightAgentRun so the streaming output during a long run reads the same as the detail view, just clipped at maxLines.

When several agents run in a session, the panel was stacking 6+ fully-expanded cards and overflowing the terminal. Two changes : 1. Compact mode by default. Each non-focused card now renders as a single line : badge + verb + target. Borders disappear, the terminal stays calm. The focused card expands to its full preview like before. Cards in 'running' status stay expanded too, so a streaming agent run remains visible without having to Tab to it. 2. Bounded viewport. Mission Control now takes a panelHeight prop (computed by App from the terminal rows minus a Welcome floor and a spacer) and slices the action list to fit. Truncated actions show as "↑ N above / ↓ N below" hints in the panel header. Welcome stays glued to the bottom with flexShrink=0, so the panel is what gives way on small terminals. useCardFocus extended : - scrollTop : action-index offset, advanced via PgUp/PgDn ; - auto-focus the last new arrival when nothing is focused (the user immediately sees what the builder just produced) ; - auto-scroll lower bound : focusing an action above scrollTop bumps scrollTop down to keep it visible. App now routes PgUp/PgDn to the Mission Control scroll when there are actions and the prompt is empty (or a card is focused). It keeps falling back to the chat transcript scroll otherwise. Tab, Shift+Tab, Enter, Esc unchanged.

Status badge moves to P6 done. Roadmap table lifts P4 and P6 to ✅, points to P5 (hardened sandbox + persistent agents + artifact extraction) as the next milestone. Root README (EN/FR) gains : - Native tools section : six-tool table (Bash, FileWrite, FileRead, FileEdit, Grep, Glob) with their tags and limits ; a short note explaining the choice of a text-structured forge:* protocol over OpenAI tool_calls. - Skills section : SKILL.md format, two sources (built-in and ~/.agent-forge/skills/), server-side matcher, two-call runner, scaffold-and-run as the first built-in. - Mission Control keyboard cheatsheet : Tab / Enter / Esc / PgUp/PgDn / Ctrl+E. - /skills slash command listed. - Architecture diagram updated : skill catalog + runner on the host side, /workspace mount + tool loop on the container side, persistence of the workspace dir after exit. - Repo structure shows packages/core/src/builder/skills/, runtime/src/tool-protocol.ts, and the runtime/ subdir under tools-core. Sub-package READMEs realigned : - packages/cli : compact / expanded card mode, scrollable viewport, focus + auto-scroll, detail view, full keyboard map, dispatch skill server-side mention. - packages/core : skill catalog / matcher / runner files listed, scaffold-and-run noted as built-in. - packages/runtime : multi-turn tool loop documented, six forge:* tags, [forge:tool] markers on stdout, FORGE_MAX_TOKENS env var. - packages/tools-core : separate "host tools" and "runtime tools" sections ; six runtime tools with their constraints ; test layout listed.

feat(p6): skill layer — orchestration patterns for the builder

garniergeorges added 13 commits April 27, 2026 13:17

Merge pull request #82 from garniergeorges/feat/p4-native-tools

1a3111a

feat(p4): native tools — six runtime tools inside the agent sandbox

Merge pull request #83 from garniergeorges/feat/p6-skills

9d22fca

feat(p6): skill layer — orchestration patterns for the builder

garniergeorges merged commit a63edd5 into prod Apr 27, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: dev → prod (P4 native tools + P6 skill layer)#84

release: dev → prod (P4 native tools + P6 skill layer)#84
garniergeorges merged 13 commits into
prodfrom
dev

garniergeorges commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garniergeorges commented Apr 27, 2026

Summary

What's NOT here

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant