feat: P1 + P2 + P3 — conversational forge with sandboxed agent runs by garniergeorges · Pull Request #80 · garniergeorges/agent-forge

garniergeorges · 2026-04-26T23:17:39Z

Summary

Brings Agent Forge from empty repo to P3 reached : a working forge REPL where you describe an agent in natural language, watch the builder draft and write its AGENT.md, approve it, then ask the builder to run that agent — it spins up its own Docker container, streams the output back, and tears the sandbox down.

This single PR closes the first three milestones (P1 + P2 + P3) of the POC roadmap.

What's in

P1 — Hello agent in Docker

poc:p1 script orchestrating a Node runtime inside agent-forge/base:latest
preflight checks (Docker daemon, image present, runtime bundle)
guaranteed cleanup on signal/error/timeout

P2 — Conversational CLI

Ink REPL with bilingual EN/FR (language picker on first run)
splash + welcome + chat with streaming
slash commands : `/help`, `/clear`, `/reset`, `/lang`, `/provider`, `/model`, `/session`, `/sessions`, `/exit`
provider-agnostic via Vercel AI SDK (Mistral cloud, OpenAI, MLX local…)
Mistral Small as default cloud provider

P3 — Builder writes and runs agents

`AGENT.md` schema (Zod) with frontmatter validation + path coercion
`FileWrite` tool, sandboxed under `~/.agent-forge`, with hard `overwrite` flag wired to user approval
`DockerLaunch` tool : `docker run --rm -i` with mounted AGENT.md and runtime bundle, async event stream, force-cleanup
runtime updated to read mounted AGENT.md as system prompt and stream tokens to stdout
builder system prompt extended with `forge:write` and `forge:run` action protocols
two-zone TUI :
- top = Mission Control with action cards (write / run), syntax-highlighted YAML preview, status badges, Mistral pixel-art logo
- bottom = pure conversation, no plumbing
permission dialog (Y / N / D) before any write or run
session persistence to `~/.agent-forge/sessions//transcript.jsonl`

Out of scope (next milestones)

P4 — six native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) usable from inside the sandbox
P5 — hardened sandbox + persistent agents via `docker exec` + artifact extraction
P6 — enriched builder skills
P7 — `TEAM.md` multi-agent coordination
P8 — pixel-art live agent dashboard
P9 — POC validation : Next.js + Laravel + QA demo end-to-end

Test plan

Add the base sandbox image (debian bookworm-slim + Node 22 LTS + git, curl, ripgrep, jq) and a build helper script. Non-root user `agent` owns /workspace so it can write there. Closes #1.

Implement the runtime entry point that reads a prompt from stdin, calls an OpenAI-compatible LLM endpoint via Vercel AI SDK, writes the assistant text to stdout. Defaults to a local MLX server (mlx_lm.server on 127.0.0.1:8080) so dev runs cost nothing. Switching to OpenAI cloud or any other OpenAI-compatible provider only requires FORGE_BASE_URL + FORGE_API_KEY + FORGE_MODEL. Also includes the bun lockfile generated on first install. Closes #2.

Adds packages/cli/src/poc-p1.ts and a root npm script `poc:p1` that demonstrates the full P1 round-trip : spawn `docker run --rm -i agent-forge/base:latest`, mount the runtime bundle as a read-only volume, pipe a hardcoded prompt to its stdin, capture stdout, and let `--rm` reclaim the container. The container is pointed at the host MLX server via `FORGE_BASE_URL=http://host.docker.internal:8080/v1` so the round-trip costs nothing. Implementation note: shells out to the docker CLI rather than using the Engine API (dockerode hangs on `attach` upgrade under Bun). The CLI is plenty for P1 — we will switch to the API later when finer-grained control is needed. Closes #3.

Before launching the container, check three preconditions and surface a clear actionable message on failure : 1. Docker daemon is reachable (`docker info`) 2. Image agent-forge/base:latest is built locally 3. Runtime bundle packages/runtime/dist/runtime.mjs exists Each failure prints `✗ <what is wrong>` followed by the exact remediation command and exits 1. The user never has to guess what to fix. LLM-side errors (endpoint unreachable, model load failure, etc.) are already surfaced verbatim by the runtime via stderr — no extra plumbing needed here. Closes #4.

Guarantees that no container survives a poc:p1 run, no matter how it ends : - explicit --name agent-forge-poc-<pid> so we can target it after a signal even if the docker client child has not yet propagated SIGTERM - SIGINT and SIGTERM handlers force-remove the container and exit 130 - 60s hard timeout (overridable via FORGE_POC_TIMEOUT_MS) kills the container and exits 1 with a clear message - try/finally wraps the spawn so any exception still triggers cleanup - FORGE_BASE_URL is now overridable from the host (default unchanged) Verified by sending SIGINT mid-run (no leftover) and by pointing FORGE_BASE_URL at a nc :9999 sink (timeout fires, no leftover). Closes #5.

Adds a Try P1 walkthrough under Try the mockup in both English and French READMEs : prerequisites, default MLX path (free, local on Apple Silicon), alternative OpenAI cloud path, and the preflight troubleshooting messages. Status badge moves from "POC" to "P1 done", and the status sentence now reflects that P1 is runnable end-to-end with P2 next. Closes #6.

Bootstraps the conversational REPL with a permanent two-zone layout : - Splash on top (logo, tagline, animated preflight checks). Stays visible as a session header. On first run, an inline language picker appears below the checks (English / Français, side-by-side, arrow keys or E/F shortcuts). - Welcome pinned to the bottom of the terminal (header bar, big question, suggestions, prompt input, footer hints). Empty space in the middle will host the transcript / build progress later. User language preference is persisted to ~/.agent-forge/config.json. All UI strings are bilingual via a small i18n table indexed by `useT()`. The bin script `forge` (root package.json) runs the TUI directly with bun, no link required for development. Closes #7.

Wires the conversational REPL to a real LLM provider : - core/builder : provider factory (Vercel AI SDK + OpenAI-compatible endpoint, MLX defaults), streamBuilder({messages,lang}) async generator, and a short bilingual EN/FR system prompt that gives the LLM the Agent Forge builder identity. - cli/useChat : message history, async streaming with live token append, error capture. Owns the scroll offset so PgUp/PgDn/Ctrl+E can navigate older turns. Auto-jumps to live on every new send. - cli/ChatViewport : line-based windowing of the full transcript (flatten turns into wrapped lines, slice). Older lines are clipped visually but kept in the LLM context. Indicator at the bottom when scrolled up. - cli layout : Welcome block stays pinned to the bottom ; question + suggestions when empty, scrolling transcript when chatting. Splash above stays put. Closes #8, #9, #10.

Adds a small slash command runtime (/help, /exit, /clear, /lang, /model, /provider) : - Detection : any input starting with `/` is parsed and executed locally instead of being sent to the LLM. - System messages : command output is appended to the transcript as a third turn role (rendered with " · " in dim grey, distinct from user " ❯ " and assistant " ▸ "). System turns are filtered out when building the history sent to the LLM. - /lang switches the UI language live (saved to ~/.agent-forge/config.json). - /model swaps FORGE_MODEL for the rest of the session. - /provider applies a preset (mlx, openai, anthropic, mistral) : base URL + default model, with a hint when an API key is required. - useChat exposes addSystemMessage and clear ; provider config in @agent-forge/core is now read live so hot swaps take effect on the next streamBuilder call. - Footer updated : Ctrl+C dropped (use /exit) ; PgUp/PgDn/Ctrl+E and /help shown during chat. Adds smoke tests with bun test + ink-testing-library : - commands.test.ts : pure-logic coverage of every command path. - store.test.ts : config round-trip on disk (with backup/restore of the user's real config). - app.test.tsx : <App /> renders without crashing and shows the Agent Forge brand. 12 tests, all green. Closes #11, #12.

… bump Lays the foundation for P3 (builder generates and runs agents) : - core/types/agent-md.ts : Zod schema for AGENT.md frontmatter (name kebab-case, description, model, sandbox{image,timeout}, maxTurns) + parseAgentMd() that splits frontmatter / body and validates. 7 tests cover valid/invalid inputs and the key error messages. - tools-core/file-write.ts : FileWriteToolSpec ready to plug into Vercel AI SDK. Hard security boundary : · path scope strictly inside ~/.agent-forge/ (after symlink resolution), · path traversal, null bytes and spaces refused, · existing files NEVER overwritten silently. 7 tests cover both the resolveSafePath helper and the executor. - Default LLM model bumped from Llama 3.2 3B 4-bit to Mistral Nemo 12B Instruct 4-bit, in the runtime, the builder, and the mlx provider preset (tool-use is more reliable on this size). - Builder caps responses at 384 tokens by default (FORGE_MAX_TOKENS overridable) so perceived latency stays low on local hardware. Closes #13.

Two changes wrapped together because they were validated end to end as one working stack : - core/builder : default provider switched from local MLX to Mistral cloud (api.mistral.ai, mistral-small-latest). Local MLX still selectable via env or /provider mlx. The text-structured action protocol (forge:write block) is hardened in the system prompt : decisive default behaviour ("immediately propose, no clarification questions"), one concrete EN/FR example with the full frontmatter shape, explicit rules that the filename MUST be AGENT.md and the YAML opener MUST be present. - cli/ConfirmAction : a system-level permission dialog that suspends the prompt input whenever the builder emits a forge:write block. Modal styling (double orange border, Y/N/D buttons), bilingual, bypasses Ink's clip on tall content by dropping the fixed height while a confirmation is pending. User-approved writes pass an `overwrite: true` flag through to FileWrite — explicit consent overrides the no-overwrite default. Supporting changes : - cli/builder-actions : coerce any agents/<name>/<wrong>.md path back to agents/<name>/AGENT.md ; normalize a missing leading `---` ; validate against the AGENT.md schema before writing. - tools-core/file-write : new optional `overwrite` flag on the input schema (defaults false). The destructive default is preserved at the tool level — only the dialog can opt in. - cli/usePreflight : send Authorization: Bearer when checking a cloud endpoint so the splash check is green, not red. - .env.example : document the three default presets (Mistral cloud, OpenAI cloud, local MLX). Real .env stays gitignored. - tests : 33 green (parser, action coercion, store round-trip, AGENT.md schema, file-write security). Closes #14, #15.

Runtime now reads /agent/AGENT.md mounted at run time, uses the body as its system prompt, and streams tokens to stdout via streamText. Build target moved to node/esm so the bundle runs in the alpine container. Adds the DockerLaunch tool : spawns one-shot containers with `docker run --rm -i`, mounts the AGENT.md and the runtime bundle, inherits provider env vars, and yields chunk/exit events through an async generator. Force-cleanup on abort or error. Builder system prompt extended with a RUN PROTOCOL describing the forge:run fenced block, kept consistent with the existing forge:write protocol.

Splits the TUI into two strict zones : - top : Mission Control (file writes, container runs, syntax-highlighted cards, status badges, Mistral pixel-art logo bottom-right) - bottom : conversation only — natural-language exchange with the builder, no code or logs Action queue replaces inlined forge:* blocks in the transcript : - parser extracts write/run blocks, strips them from the assistant text, and emits typed Actions (proposed → approved → running → done/failed/declined) - permission dialog (Y/N/D) handles both write and run, with overwrite flag passed only on user approval - run actions stream container output live into the matching card Adds session persistence to ~/.agent-forge/sessions/<id>/transcript.jsonl with /session and /sessions slash commands. Splits /clear (view only — LLM context kept in a hidden buffer) from /reset (wipes view AND context). Tests updated for the new parser and action shapes.

Root README (EN/FR) updated for the P3 milestone : new quick-start `bun run forge`, current screen layout, slash commands incl. /clear and /reset, provider presets (Mistral cloud as default), updated host/container architecture diagram. Sub-package READMEs (cli, core, runtime, tools-core) updated to reflect what is actually shipped. CONTRIBUTING note bumped from "post-P1" to "post-P9 (POC validated)". Demo GIF refreshed to match the current TUI.

garniergeorges added 14 commits April 26, 2026 04:04

feat(docker): build agent-forge/base image with node 22 and dev tooling

4aa854e

Add the base sandbox image (debian bookworm-slim + Node 22 LTS + git, curl, ripgrep, jq) and a build helper script. Non-root user `agent` owns /workspace so it can write there. Closes #1.

garniergeorges merged commit 714e6f0 into main Apr 26, 2026
1 check passed

garniergeorges deleted the feat/p1-hello-agent branch April 26, 2026 23:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: P1 + P2 + P3 — conversational forge with sandboxed agent runs#80

feat: P1 + P2 + P3 — conversational forge with sandboxed agent runs#80
garniergeorges merged 14 commits into
mainfrom
feat/p1-hello-agent

garniergeorges commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant