feat: P1 + P2 + P3 — conversational forge with sandboxed agent runs#80
Merged
Conversation
Add the base sandbox image (debian bookworm-slim + Node 22 LTS + git, curl, ripgrep, jq) and a build helper script. Non-root user `agent` owns /workspace so it can write there. Closes #1.
Implement the runtime entry point that reads a prompt from stdin, calls an OpenAI-compatible LLM endpoint via Vercel AI SDK, writes the assistant text to stdout. Defaults to a local MLX server (mlx_lm.server on 127.0.0.1:8080) so dev runs cost nothing. Switching to OpenAI cloud or any other OpenAI-compatible provider only requires FORGE_BASE_URL + FORGE_API_KEY + FORGE_MODEL. Also includes the bun lockfile generated on first install. Closes #2.
Adds packages/cli/src/poc-p1.ts and a root npm script `poc:p1` that demonstrates the full P1 round-trip : spawn `docker run --rm -i agent-forge/base:latest`, mount the runtime bundle as a read-only volume, pipe a hardcoded prompt to its stdin, capture stdout, and let `--rm` reclaim the container. The container is pointed at the host MLX server via `FORGE_BASE_URL=http://host.docker.internal:8080/v1` so the round-trip costs nothing. Implementation note: shells out to the docker CLI rather than using the Engine API (dockerode hangs on `attach` upgrade under Bun). The CLI is plenty for P1 — we will switch to the API later when finer-grained control is needed. Closes #3.
Before launching the container, check three preconditions and surface a clear actionable message on failure : 1. Docker daemon is reachable (`docker info`) 2. Image agent-forge/base:latest is built locally 3. Runtime bundle packages/runtime/dist/runtime.mjs exists Each failure prints `✗ <what is wrong>` followed by the exact remediation command and exits 1. The user never has to guess what to fix. LLM-side errors (endpoint unreachable, model load failure, etc.) are already surfaced verbatim by the runtime via stderr — no extra plumbing needed here. Closes #4.
Guarantees that no container survives a poc:p1 run, no matter how it
ends :
- explicit --name agent-forge-poc-<pid> so we can target it after a
signal even if the docker client child has not yet propagated SIGTERM
- SIGINT and SIGTERM handlers force-remove the container and exit 130
- 60s hard timeout (overridable via FORGE_POC_TIMEOUT_MS) kills the
container and exits 1 with a clear message
- try/finally wraps the spawn so any exception still triggers cleanup
- FORGE_BASE_URL is now overridable from the host (default unchanged)
Verified by sending SIGINT mid-run (no leftover) and by pointing
FORGE_BASE_URL at a nc :9999 sink (timeout fires, no leftover).
Closes #5.
Adds a Try P1 walkthrough under Try the mockup in both English and French READMEs : prerequisites, default MLX path (free, local on Apple Silicon), alternative OpenAI cloud path, and the preflight troubleshooting messages. Status badge moves from "POC" to "P1 done", and the status sentence now reflects that P1 is runnable end-to-end with P2 next. Closes #6.
Bootstraps the conversational REPL with a permanent two-zone layout :
- Splash on top (logo, tagline, animated preflight checks). Stays
visible as a session header. On first run, an inline language picker
appears below the checks (English / Français, side-by-side, arrow
keys or E/F shortcuts).
- Welcome pinned to the bottom of the terminal (header bar, big
question, suggestions, prompt input, footer hints). Empty space in
the middle will host the transcript / build progress later.
User language preference is persisted to ~/.agent-forge/config.json. All
UI strings are bilingual via a small i18n table indexed by `useT()`.
The bin script `forge` (root package.json) runs the TUI directly with
bun, no link required for development.
Closes #7.
Wires the conversational REPL to a real LLM provider :
- core/builder : provider factory (Vercel AI SDK + OpenAI-compatible
endpoint, MLX defaults), streamBuilder({messages,lang}) async
generator, and a short bilingual EN/FR system prompt that gives the
LLM the Agent Forge builder identity.
- cli/useChat : message history, async streaming with live token
append, error capture. Owns the scroll offset so PgUp/PgDn/Ctrl+E
can navigate older turns. Auto-jumps to live on every new send.
- cli/ChatViewport : line-based windowing of the full transcript
(flatten turns into wrapped lines, slice). Older lines are clipped
visually but kept in the LLM context. Indicator at the bottom when
scrolled up.
- cli layout : Welcome block stays pinned to the bottom ; question +
suggestions when empty, scrolling transcript when chatting. Splash
above stays put.
Closes #8, #9, #10.
Adds a small slash command runtime (/help, /exit, /clear, /lang,
/model, /provider) :
- Detection : any input starting with `/` is parsed and executed
locally instead of being sent to the LLM.
- System messages : command output is appended to the transcript as
a third turn role (rendered with " · " in dim grey, distinct from
user " ❯ " and assistant " ▸ "). System turns are filtered out
when building the history sent to the LLM.
- /lang switches the UI language live (saved to ~/.agent-forge/config.json).
- /model swaps FORGE_MODEL for the rest of the session.
- /provider applies a preset (mlx, openai, anthropic, mistral) :
base URL + default model, with a hint when an API key is required.
- useChat exposes addSystemMessage and clear ; provider config in
@agent-forge/core is now read live so hot swaps take effect on the
next streamBuilder call.
- Footer updated : Ctrl+C dropped (use /exit) ; PgUp/PgDn/Ctrl+E and
/help shown during chat.
Adds smoke tests with bun test + ink-testing-library :
- commands.test.ts : pure-logic coverage of every command path.
- store.test.ts : config round-trip on disk (with backup/restore
of the user's real config).
- app.test.tsx : <App /> renders without crashing and shows the
Agent Forge brand.
12 tests, all green.
Closes #11, #12.
… bump
Lays the foundation for P3 (builder generates and runs agents) :
- core/types/agent-md.ts : Zod schema for AGENT.md frontmatter
(name kebab-case, description, model, sandbox{image,timeout},
maxTurns) + parseAgentMd() that splits frontmatter / body and
validates. 7 tests cover valid/invalid inputs and the key error
messages.
- tools-core/file-write.ts : FileWriteToolSpec ready to plug into
Vercel AI SDK. Hard security boundary :
· path scope strictly inside ~/.agent-forge/ (after symlink
resolution),
· path traversal, null bytes and spaces refused,
· existing files NEVER overwritten silently.
7 tests cover both the resolveSafePath helper and the executor.
- Default LLM model bumped from Llama 3.2 3B 4-bit to Mistral Nemo
12B Instruct 4-bit, in the runtime, the builder, and the mlx
provider preset (tool-use is more reliable on this size).
- Builder caps responses at 384 tokens by default
(FORGE_MAX_TOKENS overridable) so perceived latency stays low on
local hardware.
Closes #13.
Two changes wrapped together because they were validated end to end as one
working stack :
- core/builder : default provider switched from local MLX to Mistral
cloud (api.mistral.ai, mistral-small-latest). Local MLX still
selectable via env or /provider mlx. The text-structured action
protocol (forge:write block) is hardened in the system prompt :
decisive default behaviour ("immediately propose, no clarification
questions"), one concrete EN/FR example with the full frontmatter
shape, explicit rules that the filename MUST be AGENT.md and the
YAML opener MUST be present.
- cli/ConfirmAction : a system-level permission dialog that suspends
the prompt input whenever the builder emits a forge:write block.
Modal styling (double orange border, Y/N/D buttons), bilingual,
bypasses Ink's clip on tall content by dropping the fixed height
while a confirmation is pending. User-approved writes pass an
`overwrite: true` flag through to FileWrite — explicit consent
overrides the no-overwrite default.
Supporting changes :
- cli/builder-actions : coerce any agents/<name>/<wrong>.md path back
to agents/<name>/AGENT.md ; normalize a missing leading `---` ;
validate against the AGENT.md schema before writing.
- tools-core/file-write : new optional `overwrite` flag on the input
schema (defaults false). The destructive default is preserved at
the tool level — only the dialog can opt in.
- cli/usePreflight : send Authorization: Bearer when checking a cloud
endpoint so the splash check is green, not red.
- .env.example : document the three default presets (Mistral cloud,
OpenAI cloud, local MLX). Real .env stays gitignored.
- tests : 33 green (parser, action coercion, store round-trip,
AGENT.md schema, file-write security).
Closes #14, #15.
Runtime now reads /agent/AGENT.md mounted at run time, uses the body as its system prompt, and streams tokens to stdout via streamText. Build target moved to node/esm so the bundle runs in the alpine container. Adds the DockerLaunch tool : spawns one-shot containers with `docker run --rm -i`, mounts the AGENT.md and the runtime bundle, inherits provider env vars, and yields chunk/exit events through an async generator. Force-cleanup on abort or error. Builder system prompt extended with a RUN PROTOCOL describing the forge:run fenced block, kept consistent with the existing forge:write protocol.
Splits the TUI into two strict zones :
- top : Mission Control (file writes, container runs, syntax-highlighted
cards, status badges, Mistral pixel-art logo bottom-right)
- bottom : conversation only — natural-language exchange with the
builder, no code or logs
Action queue replaces inlined forge:* blocks in the transcript :
- parser extracts write/run blocks, strips them from the assistant
text, and emits typed Actions (proposed → approved → running →
done/failed/declined)
- permission dialog (Y/N/D) handles both write and run, with overwrite
flag passed only on user approval
- run actions stream container output live into the matching card
Adds session persistence to ~/.agent-forge/sessions/<id>/transcript.jsonl
with /session and /sessions slash commands.
Splits /clear (view only — LLM context kept in a hidden buffer) from
/reset (wipes view AND context).
Tests updated for the new parser and action shapes.
Root README (EN/FR) updated for the P3 milestone : new quick-start `bun run forge`, current screen layout, slash commands incl. /clear and /reset, provider presets (Mistral cloud as default), updated host/container architecture diagram. Sub-package READMEs (cli, core, runtime, tools-core) updated to reflect what is actually shipped. CONTRIBUTING note bumped from "post-P1" to "post-P9 (POC validated)". Demo GIF refreshed to match the current TUI.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings Agent Forge from empty repo to P3 reached : a working
forgeREPL where you describe an agent in natural language, watch the builder draft and write itsAGENT.md, approve it, then ask the builder to run that agent — it spins up its own Docker container, streams the output back, and tears the sandbox down.This single PR closes the first three milestones (P1 + P2 + P3) of the POC roadmap.
What's in
P1 — Hello agent in Docker
poc:p1script orchestrating a Node runtime insideagent-forge/base:latestP2 — Conversational CLI
P3 — Builder writes and runs agents
Out of scope (next milestones)
Test plan