Skip to content

feat: P1 + P2 + P3 — conversational forge with sandboxed agent runs#80

Merged
garniergeorges merged 14 commits into
mainfrom
feat/p1-hello-agent
Apr 26, 2026
Merged

feat: P1 + P2 + P3 — conversational forge with sandboxed agent runs#80
garniergeorges merged 14 commits into
mainfrom
feat/p1-hello-agent

Conversation

@garniergeorges

Copy link
Copy Markdown
Owner

Summary

Brings Agent Forge from empty repo to P3 reached : a working forge REPL where you describe an agent in natural language, watch the builder draft and write its AGENT.md, approve it, then ask the builder to run that agent — it spins up its own Docker container, streams the output back, and tears the sandbox down.

This single PR closes the first three milestones (P1 + P2 + P3) of the POC roadmap.

What's in

P1 — Hello agent in Docker

  • poc:p1 script orchestrating a Node runtime inside agent-forge/base:latest
  • preflight checks (Docker daemon, image present, runtime bundle)
  • guaranteed cleanup on signal/error/timeout

P2 — Conversational CLI

  • Ink REPL with bilingual EN/FR (language picker on first run)
  • splash + welcome + chat with streaming
  • slash commands : `/help`, `/clear`, `/reset`, `/lang`, `/provider`, `/model`, `/session`, `/sessions`, `/exit`
  • provider-agnostic via Vercel AI SDK (Mistral cloud, OpenAI, MLX local…)
  • Mistral Small as default cloud provider

P3 — Builder writes and runs agents

  • `AGENT.md` schema (Zod) with frontmatter validation + path coercion
  • `FileWrite` tool, sandboxed under `~/.agent-forge`, with hard `overwrite` flag wired to user approval
  • `DockerLaunch` tool : `docker run --rm -i` with mounted AGENT.md and runtime bundle, async event stream, force-cleanup
  • runtime updated to read mounted AGENT.md as system prompt and stream tokens to stdout
  • builder system prompt extended with `forge:write` and `forge:run` action protocols
  • two-zone TUI :
    • top = Mission Control with action cards (write / run), syntax-highlighted YAML preview, status badges, Mistral pixel-art logo
    • bottom = pure conversation, no plumbing
  • permission dialog (Y / N / D) before any write or run
  • session persistence to `~/.agent-forge/sessions//transcript.jsonl`

Out of scope (next milestones)

  • P4 — six native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) usable from inside the sandbox
  • P5 — hardened sandbox + persistent agents via `docker exec` + artifact extraction
  • P6 — enriched builder skills
  • P7 — `TEAM.md` multi-agent coordination
  • P8 — pixel-art live agent dashboard
  • P9 — POC validation : Next.js + Laravel + QA demo end-to-end

Test plan

  • `bash scripts/docker/build-base.sh` succeeds (one-time)
  • `bun install && bun run --cwd packages/runtime build` succeeds
  • `.env` configured with a valid `FORGE_API_KEY` (Mistral free tier OK)
  • `bun run forge` boots, language picker appears, preflight checks pass
  • describe an agent → builder drafts AGENT.md → permission dialog → `Y` writes the file under `~/.agent-forge/agents//AGENT.md`
  • ask the builder to run that agent → permission dialog → `Y` launches the container, output streams in Mission Control, badge flips to `[DONE]`, container removed
  • `/clear` empties the view but the builder still references prior context
  • `/reset` empties everything ; the builder no longer remembers
  • `/sessions` lists past sessions, JSONL transcripts present on disk
  • check-attribution CI passes (no Claude markers in commits or files)

Add the base sandbox image (debian bookworm-slim + Node 22 LTS + git,
curl, ripgrep, jq) and a build helper script. Non-root user `agent`
owns /workspace so it can write there.

Closes #1.
Implement the runtime entry point that reads a prompt from stdin, calls
an OpenAI-compatible LLM endpoint via Vercel AI SDK, writes the assistant
text to stdout.

Defaults to a local MLX server (mlx_lm.server on 127.0.0.1:8080) so dev
runs cost nothing. Switching to OpenAI cloud or any other OpenAI-compatible
provider only requires FORGE_BASE_URL + FORGE_API_KEY + FORGE_MODEL.

Also includes the bun lockfile generated on first install.

Closes #2.
Adds packages/cli/src/poc-p1.ts and a root npm script `poc:p1` that
demonstrates the full P1 round-trip : spawn `docker run --rm -i
agent-forge/base:latest`, mount the runtime bundle as a read-only
volume, pipe a hardcoded prompt to its stdin, capture stdout, and let
`--rm` reclaim the container.

The container is pointed at the host MLX server via
`FORGE_BASE_URL=http://host.docker.internal:8080/v1` so the round-trip
costs nothing.

Implementation note: shells out to the docker CLI rather than using the
Engine API (dockerode hangs on `attach` upgrade under Bun). The CLI is
plenty for P1 — we will switch to the API later when finer-grained
control is needed.

Closes #3.
Before launching the container, check three preconditions and surface a
clear actionable message on failure :

  1. Docker daemon is reachable (`docker info`)
  2. Image agent-forge/base:latest is built locally
  3. Runtime bundle packages/runtime/dist/runtime.mjs exists

Each failure prints `✗ <what is wrong>` followed by the exact remediation
command and exits 1. The user never has to guess what to fix.

LLM-side errors (endpoint unreachable, model load failure, etc.) are
already surfaced verbatim by the runtime via stderr — no extra plumbing
needed here.

Closes #4.
Guarantees that no container survives a poc:p1 run, no matter how it
ends :

  - explicit --name agent-forge-poc-<pid> so we can target it after a
    signal even if the docker client child has not yet propagated SIGTERM
  - SIGINT and SIGTERM handlers force-remove the container and exit 130
  - 60s hard timeout (overridable via FORGE_POC_TIMEOUT_MS) kills the
    container and exits 1 with a clear message
  - try/finally wraps the spawn so any exception still triggers cleanup
  - FORGE_BASE_URL is now overridable from the host (default unchanged)

Verified by sending SIGINT mid-run (no leftover) and by pointing
FORGE_BASE_URL at a nc :9999 sink (timeout fires, no leftover).

Closes #5.
Adds a Try P1 walkthrough under Try the mockup in both English and French
READMEs : prerequisites, default MLX path (free, local on Apple Silicon),
alternative OpenAI cloud path, and the preflight troubleshooting messages.

Status badge moves from "POC" to "P1 done", and the status sentence now
reflects that P1 is runnable end-to-end with P2 next.

Closes #6.
Bootstraps the conversational REPL with a permanent two-zone layout :

  - Splash on top (logo, tagline, animated preflight checks). Stays
    visible as a session header. On first run, an inline language picker
    appears below the checks (English / Français, side-by-side, arrow
    keys or E/F shortcuts).
  - Welcome pinned to the bottom of the terminal (header bar, big
    question, suggestions, prompt input, footer hints). Empty space in
    the middle will host the transcript / build progress later.

User language preference is persisted to ~/.agent-forge/config.json. All
UI strings are bilingual via a small i18n table indexed by `useT()`.

The bin script `forge` (root package.json) runs the TUI directly with
bun, no link required for development.

Closes #7.
Wires the conversational REPL to a real LLM provider :

  - core/builder : provider factory (Vercel AI SDK + OpenAI-compatible
    endpoint, MLX defaults), streamBuilder({messages,lang}) async
    generator, and a short bilingual EN/FR system prompt that gives the
    LLM the Agent Forge builder identity.
  - cli/useChat : message history, async streaming with live token
    append, error capture. Owns the scroll offset so PgUp/PgDn/Ctrl+E
    can navigate older turns. Auto-jumps to live on every new send.
  - cli/ChatViewport : line-based windowing of the full transcript
    (flatten turns into wrapped lines, slice). Older lines are clipped
    visually but kept in the LLM context. Indicator at the bottom when
    scrolled up.
  - cli layout : Welcome block stays pinned to the bottom ; question +
    suggestions when empty, scrolling transcript when chatting. Splash
    above stays put.

Closes #8, #9, #10.
Adds a small slash command runtime (/help, /exit, /clear, /lang,
/model, /provider) :

  - Detection : any input starting with `/` is parsed and executed
    locally instead of being sent to the LLM.
  - System messages : command output is appended to the transcript as
    a third turn role (rendered with " · " in dim grey, distinct from
    user " ❯ " and assistant " ▸ "). System turns are filtered out
    when building the history sent to the LLM.
  - /lang switches the UI language live (saved to ~/.agent-forge/config.json).
  - /model swaps FORGE_MODEL for the rest of the session.
  - /provider applies a preset (mlx, openai, anthropic, mistral) :
    base URL + default model, with a hint when an API key is required.
  - useChat exposes addSystemMessage and clear ; provider config in
    @agent-forge/core is now read live so hot swaps take effect on the
    next streamBuilder call.
  - Footer updated : Ctrl+C dropped (use /exit) ; PgUp/PgDn/Ctrl+E and
    /help shown during chat.

Adds smoke tests with bun test + ink-testing-library :

  - commands.test.ts : pure-logic coverage of every command path.
  - store.test.ts    : config round-trip on disk (with backup/restore
    of the user's real config).
  - app.test.tsx     : <App /> renders without crashing and shows the
    Agent Forge brand.

12 tests, all green.

Closes #11, #12.
… bump

Lays the foundation for P3 (builder generates and runs agents) :

  - core/types/agent-md.ts : Zod schema for AGENT.md frontmatter
    (name kebab-case, description, model, sandbox{image,timeout},
    maxTurns) + parseAgentMd() that splits frontmatter / body and
    validates. 7 tests cover valid/invalid inputs and the key error
    messages.
  - tools-core/file-write.ts : FileWriteToolSpec ready to plug into
    Vercel AI SDK. Hard security boundary :
      · path scope strictly inside ~/.agent-forge/ (after symlink
        resolution),
      · path traversal, null bytes and spaces refused,
      · existing files NEVER overwritten silently.
    7 tests cover both the resolveSafePath helper and the executor.
  - Default LLM model bumped from Llama 3.2 3B 4-bit to Mistral Nemo
    12B Instruct 4-bit, in the runtime, the builder, and the mlx
    provider preset (tool-use is more reliable on this size).
  - Builder caps responses at 384 tokens by default
    (FORGE_MAX_TOKENS overridable) so perceived latency stays low on
    local hardware.

Closes #13.
Two changes wrapped together because they were validated end to end as one
working stack :

  - core/builder : default provider switched from local MLX to Mistral
    cloud (api.mistral.ai, mistral-small-latest). Local MLX still
    selectable via env or /provider mlx. The text-structured action
    protocol (forge:write block) is hardened in the system prompt :
    decisive default behaviour ("immediately propose, no clarification
    questions"), one concrete EN/FR example with the full frontmatter
    shape, explicit rules that the filename MUST be AGENT.md and the
    YAML opener MUST be present.
  - cli/ConfirmAction : a system-level permission dialog that suspends
    the prompt input whenever the builder emits a forge:write block.
    Modal styling (double orange border, Y/N/D buttons), bilingual,
    bypasses Ink's clip on tall content by dropping the fixed height
    while a confirmation is pending. User-approved writes pass an
    `overwrite: true` flag through to FileWrite — explicit consent
    overrides the no-overwrite default.

Supporting changes :

  - cli/builder-actions : coerce any agents/<name>/<wrong>.md path back
    to agents/<name>/AGENT.md ; normalize a missing leading `---` ;
    validate against the AGENT.md schema before writing.
  - tools-core/file-write : new optional `overwrite` flag on the input
    schema (defaults false). The destructive default is preserved at
    the tool level — only the dialog can opt in.
  - cli/usePreflight : send Authorization: Bearer when checking a cloud
    endpoint so the splash check is green, not red.
  - .env.example : document the three default presets (Mistral cloud,
    OpenAI cloud, local MLX). Real .env stays gitignored.
  - tests : 33 green (parser, action coercion, store round-trip,
    AGENT.md schema, file-write security).

Closes #14, #15.
Runtime now reads /agent/AGENT.md mounted at run time, uses the body as
its system prompt, and streams tokens to stdout via streamText. Build
target moved to node/esm so the bundle runs in the alpine container.

Adds the DockerLaunch tool : spawns one-shot containers with
`docker run --rm -i`, mounts the AGENT.md and the runtime bundle,
inherits provider env vars, and yields chunk/exit events through an
async generator. Force-cleanup on abort or error.

Builder system prompt extended with a RUN PROTOCOL describing the
forge:run fenced block, kept consistent with the existing forge:write
protocol.
Splits the TUI into two strict zones :
  - top  : Mission Control (file writes, container runs, syntax-highlighted
           cards, status badges, Mistral pixel-art logo bottom-right)
  - bottom : conversation only — natural-language exchange with the
             builder, no code or logs

Action queue replaces inlined forge:* blocks in the transcript :
  - parser extracts write/run blocks, strips them from the assistant
    text, and emits typed Actions (proposed → approved → running →
    done/failed/declined)
  - permission dialog (Y/N/D) handles both write and run, with overwrite
    flag passed only on user approval
  - run actions stream container output live into the matching card

Adds session persistence to ~/.agent-forge/sessions/<id>/transcript.jsonl
with /session and /sessions slash commands.

Splits /clear (view only — LLM context kept in a hidden buffer) from
/reset (wipes view AND context).

Tests updated for the new parser and action shapes.
Root README (EN/FR) updated for the P3 milestone : new quick-start
`bun run forge`, current screen layout, slash commands incl. /clear and
/reset, provider presets (Mistral cloud as default), updated
host/container architecture diagram.

Sub-package READMEs (cli, core, runtime, tools-core) updated to reflect
what is actually shipped.

CONTRIBUTING note bumped from "post-P1" to "post-P9 (POC validated)".

Demo GIF refreshed to match the current TUI.
@garniergeorges garniergeorges merged commit 714e6f0 into main Apr 26, 2026
1 check passed
@garniergeorges garniergeorges deleted the feat/p1-hello-agent branch April 26, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant