garniergeorges · garniergeorges · Apr 26, 2026 · Apr 26, 2026 · Apr 26, 2026 · Apr 26, 2026
diff --git a/.env.example b/.env.example
@@ -0,0 +1,21 @@
+# Copy this file to .env.local and fill in your values.
+# bun loads .env.local automatically when you run `bun run forge`.
+
+# === LLM provider for the builder ===
+# Default : Mistral Small via the Mistral cloud API.
+# Other examples below — uncomment one set.
+
+# --- Mistral cloud ---
+FORGE_BASE_URL=https://api.mistral.ai/v1
+FORGE_MODEL=mistral-small-latest
+FORGE_API_KEY=
+
+# --- OpenAI cloud ---
+# FORGE_BASE_URL=https://api.openai.com/v1
+# FORGE_MODEL=gpt-4o-mini
+# FORGE_API_KEY=sk-...
+
+# --- Local MLX server (mlx_lm.server on :8080) ---
+# FORGE_BASE_URL=http://127.0.0.1:8080/v1
+# FORGE_MODEL=mlx-community/Qwen2.5-7B-Instruct-4bit
+# FORGE_API_KEY=not-needed
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,6 +1,6 @@
 # Contributing to Agent Forge
 
-Thanks for your interest in this project. It is currently in **POC phase** — code contributions will open after the P1 milestone.
+Thanks for your interest in this project. It is currently in **POC phase** — code contributions will open after the P9 milestone (POC validated end-to-end). In the meantime, feedback and ideas via [issues](https://github.com/garniergeorges/agent-forge/issues) are very welcome.
 
 ## Project setup
 

diff --git a/README.fr.md b/README.fr.md
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
   **Forge, run, and orchestrate sandboxed LLM agents.**
 
   [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](./LICENSE)
-  ![Status: POC](https://img.shields.io/badge/status-POC-orange)
+  ![Status: P3 done](https://img.shields.io/badge/status-P3%20done-green)
   ![Stack: TypeScript + Bun](https://img.shields.io/badge/stack-TypeScript_+_Bun-3178c6)
 
   🇬🇧 English version · [🇫🇷 Version française](./README.fr.md)
@@ -16,43 +16,151 @@
 
 ---
 
-> 🚧 **Status — Design phase.** Architecture is complete, the interactive mockup is runnable. **No production code yet.** First runnable milestone (P1 — *Hello agent in Docker*) is the next deliverable. Star the repo to follow along.
+> 🚧 **Status — POC, milestone P3 reached.** You can now `bun run forge`, describe an agent in plain English or French, watch the builder draft the `AGENT.md`, approve it, then ask the builder to run that agent — it spins up its own Docker container, streams the output, and tears the sandbox down. Next milestone : P4 — native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob).
 
 ## What is Agent Forge ?
 
-A conversational CLI that lets you **describe** a software project in natural language and watch a team of specialized LLM agents **build it** — each agent isolated in a Docker container, coordinating via [`claude-presence`](https://github.com/garniergeorges/claude-presence), with a pixel-art visualization in your terminal.
+A conversational CLI where you **describe** the software you want and a **builder LLM** designs, writes and launches the agents that produce it — each agent isolated in its own Docker container, with a pixel-art TUI built on [Ink](https://github.com/vadimdemedes/ink).
+
+The builder is the only conversational surface. Sub-agents are spawned on demand in disposable sandboxes ; long-running agents and multi-agent teams come later (P5 and P7).
 
 <div align="center">
   <img src="./assets/agent-forge.gif" alt="Agent Forge demo" width="80%">
 </div>
 
-## Status
+## Status — what works today
+
+| Milestone | Scope | State |
+|---|---|---|
+| **P1** | Hello agent in Docker (host script ↔ container ↔ LLM round-trip) | ✅ done |
+| **P2** | Conversational CLI (REPL Ink, EN/FR, slash commands, provider switch) | ✅ done |
+| **P3** | Builder writes `AGENT.md`, asks for permission, launches the agent in a fresh container, streams its output | ✅ done |
+| P4 | Six native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) usable from inside the sandbox | next |
+| P5 | Hardened sandbox + artifact extraction back to host | |
+| P6 | Skills enriched (project scaffolding, audits, fixes) | |
+| P7 | `TEAM.md` — coordinated multi-agent runs | |
+| P8 | Pixel-art dashboard (live agent activity) | |
+| P9 | ★ POC validated : Next.js + Laravel + QA demo end-to-end | |
+
+## Quick start
+
+```bash
+# 1. Build the base Docker image (one-time, ~600 MB, ~1 min)
+bash scripts/docker/build-base.sh
+
+# 2. Install JS deps and build the runtime bundle
+bun install
+bun run --cwd packages/runtime build
+
+# 3. Configure your LLM provider (cloud — recommended)
+cp .env.example .env
+# edit .env and set FORGE_API_KEY=…
 
-🚧 **Phase POC.** Active design phase. **No production code yet.**
+# 4. Launch the builder REPL
+bun run forge
+```
 
-A complete interactive mockup exists (`demo-sprites/`), and the architecture is fully scaffolded. The first runnable milestone (P1 — *Hello agent in Docker*) comes next.
+On the first run the CLI asks you to pick a language (EN / FR), then drops you into the conversational prompt.
 
-## Try the mockup
+### What the screen looks like
+
+```
+ ▌▌ MISSION CONTROL ▐▐    1 action
+
+ ╭──────────────────────────────────────────────────────────────╮
+ │  [DONE]  write  agents/haiku-writer/AGENT.md                 │
+ │                                                              │
+ │      1   ---                                                 │
+ │      2   name: haiku-writer                                  │
+ │      3   description: Écrit un haïku en 5-7-5.               │
+ │      4   sandbox:                                            │
+ │      5     image: agent-forge/base:latest                    │
+ │      6     timeout: 60s                                      │
+ │      7   maxTurns: 1                                         │
+ │      8   ---                                                 │
+ │      …                                                       │
+ │     ✓ written /Users/you/.agent-forge/agents/haiku-writer/…  │
+ ╰──────────────────────────────────────────────────────────────╯
+
+                                                            ▀▀▀
+                                                            ▀▀▀▀
+                                                            ▄ ▄ ▄
+
+ ▌▌ AGENT FORGE ▐▐  v0.0.0  home · new session       session : new · model: mistral-small-latest
+ ─────────────────────────────────────────────────────────────────
+   ❯ create an agent that writes haikus
+   ▸ Done. The agent is forged. Want me to run it ?
+
+ ❯ describe what you want to build…
+ [⏎] send  [PgUp/PgDn] scroll  [Ctrl+E] live  [/help] commands
+```
+
+The TUI is split in two strict zones :
+
+- **Top zone (Mission Control)** — every concrete action the builder takes. File writes, container launches, agent output. Syntax-highlighted, status-coloured (orange = pending, green = done, red = failed).
+- **Bottom zone (Conversation)** — only the natural-language exchange between you and the builder. No code, no logs, no internals.
+
+## Provider configuration
+
+Agent Forge talks to any **OpenAI-compatible** chat endpoint via the [Vercel AI SDK](https://sdk.vercel.ai). Pick what fits.
+
+### Mistral cloud (default — recommended)
+
+Get a key at <https://console.mistral.ai>. The free tier is enough for the POC.
+
+```dotenv
+FORGE_BASE_URL=https://api.mistral.ai/v1
+FORGE_API_KEY=…
+FORGE_MODEL=mistral-small-latest
+```
+
+### OpenAI cloud
+
+```dotenv
+FORGE_BASE_URL=https://api.openai.com/v1
+FORGE_API_KEY=sk-…
+FORGE_MODEL=gpt-4o-mini
+```
+
+### Local MLX server (Apple Silicon, free, no key)
 
 ```bash
-node demo-sprites/forge-mockup-v3.mjs
+python3 -m venv ~/.agent-forge/mlx-venv
+~/.agent-forge/mlx-venv/bin/pip install mlx-lm
+~/.agent-forge/mlx-venv/bin/hf download mlx-community/Qwen2.5-7B-Instruct-4bit
+~/.agent-forge/mlx-venv/bin/mlx_lm.server \
+  --model mlx-community/Qwen2.5-7B-Instruct-4bit --port 8080
 ```
 
-Walks through the 7 screens of the product : splash, welcome, chat, mission control, focus, hangar, completion. **No real LLM calls** — scripted demo for UX validation.
+```dotenv
+FORGE_BASE_URL=http://host.docker.internal:8080/v1
+FORGE_MODEL=mlx-community/Qwen2.5-7B-Instruct-4bit
+```
 
-Press `SPACE` to advance, `B` to go back, `R` to restart.
+You can also switch on the fly inside the REPL : `/provider mistral`, `/model mistral-large-latest`, `/provider mlx`.
 
-## Concept
+## A typical session
 
-Agent Forge unifies five primitives :
+1. **Describe** — `> create an agent that writes haikus on a given topic`
+2. **Approve** — the builder drafts an `AGENT.md`, Mission Control shows it, a permission dialog asks `[Y] approve  [N] decline  [D] preview`. Press `Y`.
+3. **Run** — `> run haiku-writer on Docker`. Same dialog, same `Y`.
+4. **Watch** — Mission Control streams the container output live, the badge flips to `[DONE]`, the container is removed (`docker run --rm`).
 
-1. **Conversational CLI** — a builder LLM you dialogue with
-2. **Skills** — modular instructions invocable on demand
-3. **Tools** — native or MCP capabilities your agent can call
-4. **MCP** — extensibility via Model Context Protocol
-5. **Multi-agent teams** — coordinated agents in a shared Docker sandbox
+Every session is persisted to `~/.agent-forge/sessions/<id>/transcript.jsonl`. Use `/sessions` to list, `/session` to show the current id.
 
-Every agent runs in an isolated Docker container with strict resource limits, network policy, and read-only root filesystem. Inter-agent coordination uses [`claude-presence`](https://github.com/garniergeorges/claude-presence) MCP (broadcast + advisory locks).
+## Useful slash commands
+
+```
+/help                show all commands
+/clear               clear the view (LLM context kept)
+/reset               clear view AND LLM context
+/lang en|fr          switch UI language
+/provider <name>     mlx | openai | anthropic | mistral
+/model <name>        switch model on the active provider
+/session             show the current session id
+/sessions            list persisted sessions
+/exit                quit
+```
 
 ## Architecture
 
@@ -61,85 +169,57 @@ Every agent runs in an isolated Docker container with strict resource limits, ne
 │  HOST                                                       │
 │                                                             │
 │  forge CLI (= the builder LLM)                              │
-│    ├─ skills internes                                       │
-│    ├─ tools (Docker, Files)                                 │
-│    └─ orchestrates                                          │
+│    ├─ Ink TUI (Mission Control + conversation)              │
+│    ├─ AGENT.md parser (Zod-validated frontmatter)           │
+│    ├─ FileWrite tool (sandboxed under ~/.agent-forge)       │
+│    └─ DockerLaunch tool (spawns one-shot containers)        │
 └────────────────────┬────────────────────────────────────────┘
-                     │ docker run
+                     │ docker run --rm -i
                      ▼
 ┌─────────────────────────────────────────────────────────────┐
-│  CONTAINER (per team)                                       │
-│  agent-forge/fullstack:latest                               │
-│                                                             │
-│  ┌──────────┐  ┌──────────┐  ┌──────────┐                   │
-│  │ backend  │  │ frontend │  │ qa       │                   │
-│  │ Process  │  │ Process  │  │ Process  │                   │
-│  └────┬─────┘  └────┬─────┘  └────┬─────┘                   │
-│       └─── claude-presence MCP ───┘                         │
+│  CONTAINER (one per agent run, disposable)                  │
+│  agent-forge/base:latest                                    │
 │                                                             │
-│  /workspace/  shared filesystem                             │
+│  Node runtime ── reads /agent/AGENT.md as system prompt     │
+│              └─ pipes the user prompt through stdin         │
+│              └─ streams the LLM answer to stdout            │
 └─────────────────────────────────────────────────────────────┘
 ```
 
+Long-running agents (`docker exec`) and multi-agent teams (one container, many processes coordinating via [`claude-presence`](https://github.com/garniergeorges/claude-presence)) land in P5 and P7.
+
 ## Tech stack
 
-- **TypeScript** + **Bun** runtime
+- **TypeScript** + **Bun** runtime + **Bun workspaces**
 - **Ink** (React for terminals) for the TUI
-- `@anthropic-ai/sdk` — LLM provider
-- `@modelcontextprotocol/sdk` — MCP integration
-- `dockerode` — Docker control
-- `zod` — schema validation
+- **Vercel AI SDK** (`ai`, `@ai-sdk/openai`) — provider-agnostic LLM calls
+- `zod` — `AGENT.md` frontmatter validation
+- `docker` CLI via `child_process.spawn` (Bun + dockerode hangs on attach)
+- `biome` for lint/format
 - Apache 2.0 license
 
 ## Repository structure
 
 ```
 agent-forge/
 ├── packages/
-│   ├── core/             # builder LLM, Docker, tool interface, types
-│   ├── cli/              # the `forge` binary
-│   ├── runtime/          # runs inside the container
-│   └── tools-core/       # native tools (Bash, Read, Edit, ...)
-├── docker/               # Dockerfiles (base, fullstack)
-├── examples/             # sample teams and agents
-├── docs/                 # architecture docs
-├── scripts/              # build/CI helpers
-├── demo-sprites/         # interactive mockup (already runnable)
+│   ├── core/             # builder LLM, AGENT.md schema, provider config
+│   ├── cli/              # the `forge` binary (Ink REPL + Mission Control)
+│   ├── runtime/          # bundle that runs inside each agent container
+│   └── tools-core/       # FileWrite, DockerLaunch, …
+├── docker/               # Dockerfiles
+├── scripts/              # build helpers (docker, hooks)
+├── demo-sprites/         # interactive mockup (UX reference)
 └── assets/               # README images
 ```
 
-## Roadmap (POC)
-
-```
-P1  Hello agent in Docker
-P2  Conversational CLI (minimal)
-P3  Builder launches the agent it just designed
-P4  Native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob)
-P5  Hardened sandbox + artifact extraction
-P6  Builder skills enriched
-P7  TEAM.md (multi-agent coordination)
-P8  Pixel-art TUI dashboard
-P9  ★ POC validated : Next.js + Laravel + QA demo works end-to-end
-```
-
-After POC :
-
-```
-V1  WebSocket API server
-V2  Auth + state persistence
-V3  Python SDK on PyPI
-V4  Multi-tenant (if needed)
-V5  MCP server adapter
-V6  Release 1.0
-```
-
 ## Genesis
 
 This project's architecture was informed by a public technical analysis of an existing reference coding-agent. The analysis (~6 400 lines, 13 documents) extracted patterns worth keeping and pitfalls to avoid. **No code was copied** — only architectural patterns inspired the design.
 
 ## Contributing
 
-Project is in active design phase. Feedback and ideas welcome via [issues](https://github.com/garniergeorges/agent-forge/issues). Code contributions will open after the P1 milestone lands.
+Project is in active POC phase. Feedback and ideas welcome via [issues](https://github.com/garniergeorges/agent-forge/issues). Code contributions will open after the P9 milestone (POC validated).
 
 ## License
 

diff --git a/assets/agent-forge.gif b/assets/agent-forge.gif