diff --git a/README.fr.md b/README.fr.md index 167913f..64c8456 100644 --- a/README.fr.md +++ b/README.fr.md @@ -7,7 +7,7 @@ **Forgez, lancez et orchestrez des agents LLM en sandbox.** [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](./LICENSE) - ![Status: P3 done](https://img.shields.io/badge/status-P3%20done-green) + ![Status: P6 done](https://img.shields.io/badge/status-P6%20done-green) ![Stack: TypeScript + Bun](https://img.shields.io/badge/stack-TypeScript_+_Bun-3178c6) 🇫🇷 Version française · [🇬🇧 English version](./README.md) @@ -16,7 +16,7 @@ --- -> 🚧 **Statut — POC, jalon P3 atteint.** Vous pouvez désormais lancer `bun run forge`, décrire un agent en français ou en anglais, regarder le builder rédiger l'`AGENT.md`, l'approuver, puis demander au builder d'exécuter cet agent — il monte son propre container Docker, streame la sortie, puis détruit la sandbox. Prochain jalon : P4 — tools natifs (Bash, FileRead, FileEdit, FileWrite, Grep, Glob). +> 🚧 **Statut — POC, jalons P1 → P6 atteints.** Vous pouvez désormais lancer `bun run forge`, décrire un agent en français ou en anglais, regarder le builder rédiger l'`AGENT.md`, l'approuver, puis demander au builder d'exécuter cet agent — il monte son propre container Docker avec **six tools natifs** (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) sandboxés sous `/workspace`, streame la sortie, puis détruit la sandbox. Les patterns d'orchestration récurrents sont gérés par des **skills** : déposez un `SKILL.md` dans `~/.agent-forge/skills/` (ou utilisez la skill built-in `scaffold-and-run`) et la CLI active automatiquement quand un trigger apparaît dans votre message. Prochain jalon : P5 — sandbox durci + extraction d'artefacts. ## Qu'est-ce qu'Agent Forge ? @@ -35,9 +35,9 @@ Le builder est la seule surface conversationnelle. Les sous-agents sont créés | **P1** | Hello agent dans Docker (script host ↔ container ↔ round-trip LLM) | ✅ fait | | **P2** | CLI conversationnelle (REPL Ink, EN/FR, slash commands, switch provider) | ✅ fait | | **P3** | Le builder écrit l'`AGENT.md`, demande la permission, lance l'agent dans un container neuf, streame la sortie | ✅ fait | -| P4 | Six tools natifs (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) utilisables depuis la sandbox | suivant | -| P5 | Sandbox durci + extraction d'artefacts vers le host | | -| P6 | Skills enrichis (scaffolding projet, audits, fixes) | | +| **P4** | Six tools natifs sandboxés sous `/workspace` : Bash, FileWrite, FileRead, FileEdit, Grep, Glob ; tool-loop runtime avec `maxTurns` | ✅ fait | +| **P6** | Couche skills : format `SKILL.md`, catalogue (built-in + `~/.agent-forge/skills/`), matching des triggers côté serveur, runner à 2 appels (un pour AGENT.md, un pour le run prompt) | ✅ fait | +| P5 | Sandbox durci + agents persistants (`docker exec`) + extraction d'artefacts vers le host | suivant | | P7 | `TEAM.md` — exécutions multi-agents coordonnées | | | P8 | Dashboard pixel art (activité agents en direct) | | | P9 | ★ POC validé : démo Next.js + Laravel + QA de bout en bout | | @@ -148,6 +148,36 @@ Vous pouvez aussi switcher à la volée depuis le REPL : `/provider mistral`, `/ Chaque session est persistée dans `~/.agent-forge/sessions//transcript.jsonl`. `/sessions` liste les sessions, `/session` affiche l'id courante. +## Tools natifs (dans la sandbox de l'agent) + +Les agents lancés par le builder tournent dans un container jetable avec `/workspace` monté en écriture. Six tools natifs sont exposés et appelés via des blocs encadrés `forge:*` que l'agent émet dans sa réponse : + +| Tag | Tool | Ce que ça fait | +|---|---|---| +| `forge:bash` | Bash | `bash -lc ` dans `/workspace`, timeout 30 s par défaut (max 120 s), sortie clippée à 16 Ko | +| `forge:write` | FileWrite | Crée ou écrase un fichier sous `/workspace`, dossiers parents auto-créés | +| `forge:read` | FileRead | Offset/limit en lignes, clip à 16 Ko, refuse les non-fichiers | +| `forge:edit` | FileEdit | Patch par sous-chaîne exacte ; refuse les matchs ambigus sauf `replaceAll: true` | +| `forge:grep` | Grep | Regex JS pure sur un filtre glob optionnel, ignore les binaires, 200 hits max | +| `forge:glob` | Glob | Matcher fait main pour `*` / `**` / `?`, 200 résultats max | + +Le runtime parse un bloc par tour, exécute, réinjecte le résultat structuré comme message système, et boucle jusqu'à `maxTurns` (cap dur à 10). Tous les tools sont sandboxés : path traversal, octets nuls et chemins absolus hors `/workspace` sont refusés. + +Pourquoi un protocole texte plutôt que les `tool_calls` natifs OpenAI ? Les LLM locaux (MLX, llama.cpp) ne respectent pas tous le tool-use natif, et un protocole unique entre builder et agents simplifie le débogage — le flux brut reste lisible. + +## Skills (patterns d'orchestration récurrents) + +Un seul message utilisateur peut mélanger deux intentions que le LLM tend à confondre — « ce que l'agent EST » et « ce que l'agent doit FAIRE MAINTENANT ». Les **skills** les séparent. + +Une skill est un fichier `SKILL.md` avec un frontmatter YAML (name, description, **triggers**, actions) et un corps markdown d'instructions. La CLI charge les skills depuis deux sources : + +- built-in : livrées sous `packages/core/src/builder/skills/` +- utilisateur : posez un fichier dans `~/.agent-forge/skills/.md` (ou `/SKILL.md` pour grouper des assets) et il prend le pas sur le built-in en cas de collision de nom + +Quand vous envoyez un message, la CLI le scanne côté serveur contre les phrases triggers de chaque skill (insensible à la casse, sous-chaîne). Si un trigger matche, le **runner** prend la main : deux appels LLM ciblés, un pour l'AGENT.md (rôle générique uniquement), un pour le run prompt (la tâche concrète), puis les deux blocs apparaissent en cards PROPOSED dans Mission Control. Vous approuvez dans l'ordre. Le LLM n'a jamais à prendre la méta-décision. + +La skill `scaffold-and-run` est livrée par défaut : elle se déclenche sur des mots comme `audite`, `teste`, `lance puis`, `audit`, `test it`, `then run`, `create and run`. Tapez `/skills` dans le REPL pour lister celles qui sont disponibles. + ## Slash commands utiles ``` @@ -159,9 +189,18 @@ Chaque session est persistée dans `~/.agent-forge/sessions//transcript.json /model change de modèle sur le provider actif /session affiche l'id de la session courante /sessions liste les sessions persistées +/skills liste les skills disponibles (built-in + user) /exit quitte ``` +## Raccourcis Mission Control + +- `Tab` / `Shift+Tab` — cycle le focus entre les cards d'action +- `Enter` — ouvre la card focus en plein écran +- `Esc` — retire le focus (ou ferme la vue détail) +- `↑↓ / PgUp / PgDn / g / G` — scroll dans la vue détail +- `Ctrl+E` — retour live dans le transcript + ## Architecture ``` @@ -170,23 +209,32 @@ Chaque session est persistée dans `~/.agent-forge/sessions//transcript.json │ │ │ forge CLI (= le builder LLM) │ │ ├─ TUI Ink (Mission Control + conversation) │ -│ ├─ Parser AGENT.md (frontmatter validé par Zod) │ -│ ├─ Tool FileWrite (sandboxé sous ~/.agent-forge) │ +│ ├─ Catalogue skills : built-in + ~/.agent-forge/skills/ │ +│ ├─ Matcher de triggers + skill runner côté serveur │ +│ ├─ Parsers AGENT.md / SKILL.md (validés par Zod) │ +│ ├─ Tool FileWrite (host, sandboxé sous ~/.agent-forge) │ │ └─ Tool DockerLaunch (lance des containers one-shot) │ └────────────────────┬────────────────────────────────────────┘ │ docker run --rm -i + │ -v /AGENT.md:/agent/AGENT.md:ro + │ -v :/runtime:ro + │ -v :/workspace ▼ ┌─────────────────────────────────────────────────────────────┐ │ CONTAINER (un par run d'agent, jetable) │ │ agent-forge/base:latest │ │ │ │ Runtime Node ── lit /agent/AGENT.md comme system prompt │ -│ └─ reçoit le prompt utilisateur via stdin │ -│ └─ streame la réponse du LLM sur stdout │ +│ ├─ reçoit le prompt utilisateur via stdin │ +│ ├─ streame la réponse du LLM sur stdout │ +│ └─ tool loop : forge:bash / write / read / │ +│ edit / grep / glob, capé à maxTurns │ +│ │ +│ /workspace ── espace en écriture, conservé après l'exit │ └─────────────────────────────────────────────────────────────┘ ``` -Les agents persistants (`docker exec`) et les teams multi-agents (un container, plusieurs process coordonnés via [`claude-presence`](https://github.com/garniergeorges/claude-presence)) arrivent en P5 et P7. +Les agents persistants (`docker exec` au lieu de `docker run --rm`) et les teams multi-agents (un container, plusieurs process coordonnés via [`claude-presence`](https://github.com/garniergeorges/claude-presence)) arrivent en P5 et P7. ## Stack technique @@ -203,14 +251,20 @@ Les agents persistants (`docker exec`) et les teams multi-agents (un container, ``` agent-forge/ ├── packages/ -│ ├── core/ # builder LLM, schéma AGENT.md, config provider -│ ├── cli/ # le binaire `forge` (REPL Ink + Mission Control) -│ ├── runtime/ # bundle exécuté dans chaque container d'agent -│ └── tools-core/ # FileWrite, DockerLaunch, … -├── docker/ # Dockerfiles -├── scripts/ # helpers de build (docker, hooks) -├── demo-sprites/ # mockup interactif (référence UX) -└── assets/ # images du README +│ ├── core/ # builder LLM, schémas, couche skills +│ │ └── src/builder/skills/ # fichiers SKILL.md built-in +│ ├── cli/ # le binaire `forge` (REPL Ink + Mission Control) +│ ├── runtime/ # bundle exécuté dans chaque container d'agent +│ │ └── src/tool-protocol.ts # parser forge:* + render des résultats +│ └── tools-core/ +│ ├── file-write.ts # FileWrite host (~/.agent-forge) +│ ├── docker-launch.ts # lanceur de containers one-shot +│ └── runtime/ # tools in-container : bash, file-write, +│ # file-read, file-edit, grep, glob +├── docker/ # Dockerfiles +├── scripts/ # helpers de build (docker, hooks) +├── demo-sprites/ # mockup interactif (référence UX) +└── assets/ # images du README ``` ## Genèse diff --git a/README.md b/README.md index c0f7711..ab7c624 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ **Forge, run, and orchestrate sandboxed LLM agents.** [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](./LICENSE) - ![Status: P3 done](https://img.shields.io/badge/status-P3%20done-green) + ![Status: P6 done](https://img.shields.io/badge/status-P6%20done-green) ![Stack: TypeScript + Bun](https://img.shields.io/badge/stack-TypeScript_+_Bun-3178c6) 🇬🇧 English version · [🇫🇷 Version française](./README.fr.md) @@ -16,7 +16,7 @@ --- -> 🚧 **Status — POC, milestone P3 reached.** You can now `bun run forge`, describe an agent in plain English or French, watch the builder draft the `AGENT.md`, approve it, then ask the builder to run that agent — it spins up its own Docker container, streams the output, and tears the sandbox down. Next milestone : P4 — native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob). +> 🚧 **Status — POC, milestones P1 → P6 reached.** You can now `bun run forge`, describe an agent in plain English or French, watch the builder draft the `AGENT.md`, approve it, then ask the builder to run that agent — it spins up its own Docker container with **six native tools** (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) sandboxed under `/workspace`, streams the output, and tears the sandbox down. Recurring orchestration patterns are handled by **skills** : drop a `SKILL.md` in `~/.agent-forge/skills/` (or use the built-in `scaffold-and-run`) and the CLI auto-dispatches when a trigger phrase appears in your message. Next milestone : P5 — hardened sandbox + artifact extraction. ## What is Agent Forge ? @@ -35,9 +35,9 @@ The builder is the only conversational surface. Sub-agents are spawned on demand | **P1** | Hello agent in Docker (host script ↔ container ↔ LLM round-trip) | ✅ done | | **P2** | Conversational CLI (REPL Ink, EN/FR, slash commands, provider switch) | ✅ done | | **P3** | Builder writes `AGENT.md`, asks for permission, launches the agent in a fresh container, streams its output | ✅ done | -| P4 | Six native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) usable from inside the sandbox | next | -| P5 | Hardened sandbox + artifact extraction back to host | | -| P6 | Skills enriched (project scaffolding, audits, fixes) | | +| **P4** | Six native tools sandboxed under `/workspace` : Bash, FileWrite, FileRead, FileEdit, Grep, Glob ; runtime tool-loop with `maxTurns` | ✅ done | +| **P6** | Skill layer : `SKILL.md` format, catalog (built-in + `~/.agent-forge/skills/`), server-side trigger matching, two-call runner (one for AGENT.md, one for the run prompt) | ✅ done | +| P5 | Hardened sandbox + persistent agents (`docker exec`) + artifact extraction back to host | next | | P7 | `TEAM.md` — coordinated multi-agent runs | | | P8 | Pixel-art dashboard (live agent activity) | | | P9 | ★ POC validated : Next.js + Laravel + QA demo end-to-end | | @@ -148,6 +148,36 @@ You can also switch on the fly inside the REPL : `/provider mistral`, `/model mi Every session is persisted to `~/.agent-forge/sessions//transcript.jsonl`. Use `/sessions` to list, `/session` to show the current id. +## Native tools (inside the agent sandbox) + +Agents launched by the builder run inside a disposable container with `/workspace` mounted as their writable root. Six native tools are exposed and called via fenced `forge:*` blocks the agent emits in its reply : + +| Tag | Tool | What it does | +|---|---|---| +| `forge:bash` | Bash | `bash -lc ` inside `/workspace`, 30 s default timeout (max 120 s), output clipped at 16 KB | +| `forge:write` | FileWrite | Create or overwrite a file under `/workspace`, parent dirs auto-created | +| `forge:read` | FileRead | Line-based offset/limit, 16 KB clip, fails on non-regular files | +| `forge:edit` | FileEdit | Exact-substring patch ; refuses ambiguous matches unless `replaceAll: true` | +| `forge:grep` | Grep | Pure JS regex over an optional glob filter, skips binaries, 200 hits cap | +| `forge:glob` | Glob | Hand-rolled `*` / `**` / `?` matcher, 200 results cap | + +The runtime parses one block per turn, executes it, feeds the structured result back as a system message, and loops up to `maxTurns` (capped at 10). All tools are sandboxed : path traversal, null bytes and absolute paths outside `/workspace` are refused. + +Why a text-structured protocol instead of OpenAI `tool_calls` ? Local LLMs (MLX, llama.cpp) don't all honour native tool-use, and a single protocol across builder and agents is easier to debug — the raw stream stays human-readable. + +## Skills (recurring orchestration patterns) + +A single user message can mix two intents the LLM tends to collapse — "what the agent IS" and "what the agent should do RIGHT NOW". **Skills** keep them apart. + +A skill is a `SKILL.md` file with a YAML frontmatter (name, description, **triggers**, actions) and a markdown body of instructions. The CLI loads skills from two sources : + +- built-in : shipped under `packages/core/src/builder/skills/` +- user : drop a file into `~/.agent-forge/skills/.md` (or `/SKILL.md` for grouped assets) and it overrides the built-in on name collision + +When you send a message, the CLI scans it server-side against every skill's trigger phrases (case-insensitive substring). If one matches, the skill **runner** takes over the turn : two narrow LLM calls, one for the AGENT.md (generic role only), one for the run prompt (the concrete task), then both blocks land as PROPOSED cards in Mission Control. You approve them in order. The LLM never has to make the meta-decision. + +Built-in `scaffold-and-run` ships today : it triggers on words like `audite`, `teste`, `lance puis`, `audit`, `test it`, `then run`, `create and run`. Type `/skills` in the REPL to list what's available. + ## Useful slash commands ``` @@ -159,9 +189,18 @@ Every session is persisted to `~/.agent-forge/sessions//transcript.jsonl`. U /model switch model on the active provider /session show the current session id /sessions list persisted sessions +/skills list available skills (built-in + user) /exit quit ``` +## Mission Control keyboard + +- `Tab` / `Shift+Tab` — cycle focus through action cards +- `Enter` — open the focused card in a full-screen detail view +- `Esc` — drop the focus (or close the detail view) +- `↑↓ / PgUp / PgDn / g / G` — scroll inside the detail view +- `Ctrl+E` — return the chat transcript to live mode + ## Architecture ``` @@ -170,23 +209,32 @@ Every session is persisted to `~/.agent-forge/sessions//transcript.jsonl`. U │ │ │ forge CLI (= the builder LLM) │ │ ├─ Ink TUI (Mission Control + conversation) │ -│ ├─ AGENT.md parser (Zod-validated frontmatter) │ -│ ├─ FileWrite tool (sandboxed under ~/.agent-forge) │ +│ ├─ Skill catalog : built-in + ~/.agent-forge/skills/ │ +│ ├─ Server-side trigger matcher + skill runner │ +│ ├─ AGENT.md / SKILL.md parsers (Zod-validated) │ +│ ├─ FileWrite tool (host, sandboxed under ~/.agent-forge) │ │ └─ DockerLaunch tool (spawns one-shot containers) │ └────────────────────┬────────────────────────────────────────┘ │ docker run --rm -i + │ -v /AGENT.md:/agent/AGENT.md:ro + │ -v :/runtime:ro + │ -v :/workspace ▼ ┌─────────────────────────────────────────────────────────────┐ │ CONTAINER (one per agent run, disposable) │ │ agent-forge/base:latest │ │ │ │ Node runtime ── reads /agent/AGENT.md as system prompt │ -│ └─ pipes the user prompt through stdin │ -│ └─ streams the LLM answer to stdout │ +│ ├─ pipes the user prompt through stdin │ +│ ├─ streams the LLM answer to stdout │ +│ └─ tool loop : forge:bash / write / read / │ +│ edit / grep / glob, capped at maxTurns │ +│ │ +│ /workspace ── writable scratchpad, kept on host after exit │ └─────────────────────────────────────────────────────────────┘ ``` -Long-running agents (`docker exec`) and multi-agent teams (one container, many processes coordinating via [`claude-presence`](https://github.com/garniergeorges/claude-presence)) land in P5 and P7. +Persistent agents (`docker exec` instead of `docker run --rm`) and multi-agent teams (one container, many processes coordinating via [`claude-presence`](https://github.com/garniergeorges/claude-presence)) land in P5 and P7. ## Tech stack @@ -203,14 +251,20 @@ Long-running agents (`docker exec`) and multi-agent teams (one container, many p ``` agent-forge/ ├── packages/ -│ ├── core/ # builder LLM, AGENT.md schema, provider config -│ ├── cli/ # the `forge` binary (Ink REPL + Mission Control) -│ ├── runtime/ # bundle that runs inside each agent container -│ └── tools-core/ # FileWrite, DockerLaunch, … -├── docker/ # Dockerfiles -├── scripts/ # build helpers (docker, hooks) -├── demo-sprites/ # interactive mockup (UX reference) -└── assets/ # README images +│ ├── core/ # builder LLM, schemas, skill layer +│ │ └── src/builder/skills/ # built-in SKILL.md files +│ ├── cli/ # the `forge` binary (Ink REPL + Mission Control) +│ ├── runtime/ # bundle that runs inside each agent container +│ │ └── src/tool-protocol.ts # forge:* parser + result renderers +│ └── tools-core/ +│ ├── file-write.ts # host-side FileWrite (~/.agent-forge) +│ ├── docker-launch.ts # one-shot container launcher +│ └── runtime/ # in-container tools : bash, file-write, +│ # file-read, file-edit, grep, glob +├── docker/ # Dockerfiles +├── scripts/ # build helpers (docker, hooks) +├── demo-sprites/ # interactive mockup (UX reference) +└── assets/ # README images ``` ## Genesis diff --git a/packages/cli/README.md b/packages/cli/README.md index 50d0d5f..86bac92 100644 --- a/packages/cli/README.md +++ b/packages/cli/README.md @@ -4,20 +4,25 @@ Binaire `forge` — CLI conversationnelle. ## Ce que ça fait -Héberge le **builder LLM** dans un REPL Ink. L'utilisateur décrit ce qu'il veut, le builder génère des fichiers `AGENT.md` (P3) puis `TEAM.md` (P7) et lance les containers Docker correspondants. +Héberge le **builder LLM** dans un REPL Ink. L'utilisateur décrit ce qu'il veut, le builder génère des fichiers `AGENT.md` puis lance les containers Docker correspondants. Quand le message déclenche une **skill**, la CLI prend la main et orchestre directement (deux appels LLM ciblés au lieu d'un wide). ## État -**Phase POC, P3 livré.** Couvre : +**Phase POC, P1 → P6 livrés.** Couvre : - REPL Ink bilingue EN/FR (sélecteur de langue au premier lancement) - Splash + preflight checks (Docker dispo, image base, runtime bundle) -- Mission Control (zone haute) — affiche les actions du builder (write, run) avec coloration syntaxique YAML +- Mission Control (zone haute) — affiche les actions du builder (write, run, skill) avec : + - mode compact 1-ligne par défaut, expand sur la card focus + - viewport scrollable avec indicateurs `↑ N above / ↓ N below` + - auto-focus de la nouvelle card arrivée, running cards toujours expandées + - vue détail plein écran (Enter), highlight Markdown/YAML/JSON/agent-run - Conversation (zone basse) — uniquement le langage naturel, transcripts persistés en JSONL - Permission dialog (Y / N / D) avant toute écriture ou lancement -- Slash commands : `/help`, `/clear`, `/reset`, `/lang`, `/provider`, `/model`, `/session`, `/sessions`, `/exit` +- Slash commands : `/help`, `/clear`, `/reset`, `/lang`, `/provider`, `/model`, `/session`, `/sessions`, `/skills`, `/exit` - Provider-agnostic via Vercel AI SDK (Mistral, OpenAI, MLX local…) - Sessions persistées dans `~/.agent-forge/sessions//transcript.jsonl` +- **Couche skills** : matching des triggers côté serveur, dispatch automatique vers le runner `scaffold-and-run` quand un trigger matche ## Lancement @@ -36,37 +41,44 @@ bun run forge # depuis la racine du monorepo /model change de modèle sur le provider actif /session affiche l'id de la session courante /sessions liste les sessions persistées +/skills liste les skills disponibles (built-in + user) /exit quitte ``` ## Raccourcis clavier ``` -[⏎] envoyer -[PgUp/PgDn] scroll dans le transcript -[Ctrl+E] retour au live -[Y/N/D] approuver / refuser / aperçu (dialog de permission) +[⏎] envoyer un message +[PgUp/PgDn] scroll Mission Control (si focus actif ou input vide) + sinon scroll dans le transcript +[Ctrl+E] retour live dans le transcript +[Tab/Shift+Tab] cycle focus entre les cards Mission Control +[Enter] sur focus ouvre la card en détail plein écran +[Esc] retire le focus, ou ferme la vue détail +[Y/N/D] approuve / refuse / aperçu (dialog de permission) ``` ## Structure ``` src/ -├── index.tsx entrée Ink -├── App.tsx layout deux zones (Mission Control xor Splash, puis Welcome) +├── index.tsx entrée Ink +├── App.tsx layout + routage clavier global ├── components/ -│ ├── MissionControl.tsx zone haute, cards d'actions +│ ├── MissionControl.tsx zone haute, cards compactes / expandées + viewport +│ ├── CardDetail.tsx vue plein écran d'une card focus │ ├── ProviderLogo.tsx logo pixel art du provider actif │ ├── Welcome.tsx zone basse (header + transcript + prompt + footer) │ ├── ChatViewport.tsx transcript scrollable │ ├── ConfirmAction.tsx dialog de permission Y/N/D │ ├── Splash.tsx écran de boot -│ └── syntax.ts highlighter YAML / plain +│ └── syntax.ts highlighters YAML / Markdown / JSON / agent-run ├── hooks/ -│ ├── useChat.ts state machine (messages, actions, streaming) +│ ├── useChat.ts state machine (messages, actions, streaming, dispatch skills) +│ ├── useCardFocus.ts focus + scrollTop + auto-focus + auto-scroll │ └── useChatContext.tsx React context wrapper -├── actions/ types Action (write, run) -├── builder-actions.ts parser des blocs forge:write / forge:run +├── actions/ types Action (write, run, skill) +├── builder-actions.ts parser des blocs forge:write / forge:run / forge:skill ├── commands.ts slash commands ├── config/ .env, presets providers, langue ├── i18n/ EN/FR strings @@ -76,4 +88,4 @@ src/ ## Suite -P4 — exposer six tools natifs (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) au runtime, pour que les agents puissent agir sur leur propre `/workspace`. +P5 — sandbox durci, agents persistants via `docker exec`, extraction d'artefacts du `/workspace` vers le host. diff --git a/packages/cli/src/actions/types.ts b/packages/cli/src/actions/types.ts index 237ee06..c8b2dee 100644 --- a/packages/cli/src/actions/types.ts +++ b/packages/cli/src/actions/types.ts @@ -34,7 +34,23 @@ export type RunAction = { error?: string } -export type Action = WriteAction | RunAction +// Skill actions don't go through the permission dialog : loading a +// skill is read-only and instant. We still surface them as cards so the +// user sees, in Mission Control, that the builder is operating on a +// recognised pattern instead of free-styling. +export type SkillAction = { + id: string + kind: 'skill' + status: ActionStatus + skill: string + description: string // copied from the catalog at load time + body?: string // populated when status becomes 'done' + createdAt: string + finishedAt?: string + error?: string +} + +export type Action = WriteAction | RunAction | SkillAction let counter = 0 export function nextActionId(): string { diff --git a/packages/cli/src/builder-actions.ts b/packages/cli/src/builder-actions.ts index 889ce4c..829d432 100644 --- a/packages/cli/src/builder-actions.ts +++ b/packages/cli/src/builder-actions.ts @@ -1,7 +1,7 @@ // Parser + executor for the text-structured action protocol the builder // emits (see packages/core/src/builder/system-prompt.ts). // -// Two block types are recognized : +// Three block types are recognized : // // ```forge:write // path: @@ -15,6 +15,10 @@ // // ``` // +// ```forge:skill +// name: +// ``` +// // The closing fence is optional (small models sometimes forget the trailing // ```). When present, content stops there ; otherwise it extends to the // end of the message. @@ -22,10 +26,10 @@ import { parseAgentMd } from '@agent-forge/core/types' import { executeFileWrite } from '@agent-forge/tools-core' -const FENCE_OPEN = /```forge:(write|run)\s*\n/g +const FENCE_OPEN = /```forge:(write|run|skill)\s*\n/g // Pattern used to strip whole forge:* blocks (open + body + optional close) // from the assistant text so the chat transcript stays prose-only. -const FENCE_BLOCK = /```forge:(?:write|run)\s*\n[\s\S]*?(?:\n```|$)/g +const FENCE_BLOCK = /```forge:(?:write|run|skill)\s*\n[\s\S]*?(?:\n```|$)/g /** Remove every forge:write / forge:run block from a builder reply. * Used to keep the chat transcript free of action code — actions live in @@ -48,7 +52,13 @@ export type ParsedRunAction = { raw: string } -export type ParsedAction = ParsedWriteAction | ParsedRunAction +export type ParsedSkillAction = { + kind: 'skill' + skill: string + raw: string +} + +export type ParsedAction = ParsedWriteAction | ParsedRunAction | ParsedSkillAction export type ActionParseResult = | { ok: true; action: ParsedAction } @@ -108,19 +118,42 @@ function parseRun(inner: string, raw: string): ActionParseResult { return { ok: true, action: { kind: 'run', agent, prompt, raw } } } +function parseSkill(inner: string, raw: string): ActionParseResult { + // forge:skill expects a single key=value pair, optionally followed by a + // closing fence. Accept both `name: scaffold-and-run` (the documented + // form) and a bare line containing the name only — small models slip. + const firstLine = (inner.split('\n')[0] ?? '').trim() + const candidate = firstLine.startsWith('name:') + ? firstLine.slice('name:'.length).trim() + : firstLine + if (candidate.length === 0) { + return { ok: false, error: 'forge:skill block missing skill name', raw } + } + if (!/^[a-z][a-z0-9-]*$/.test(candidate)) { + return { + ok: false, + error: `forge:skill name must be kebab-case (got "${candidate}")`, + raw, + } + } + return { ok: true, action: { kind: 'skill', skill: candidate, raw } } +} + export function findActionBlocks(text: string): ActionParseResult[] { const out: ActionParseResult[] = [] const matches = [...text.matchAll(FENCE_OPEN)] for (let i = 0; i < matches.length; i++) { const m = matches[i] if (!m) continue - const kind = m[1] as 'write' | 'run' + const kind = m[1] as 'write' | 'run' | 'skill' const start = (m.index ?? 0) + m[0].length const closingIdx = text.indexOf('\n```', start) const end = closingIdx >= 0 ? closingIdx : text.length const inner = text.slice(start, end).replace(/\s+$/, '') const raw = text.slice(m.index ?? 0, end + (closingIdx >= 0 ? 4 : 0)) - out.push(kind === 'write' ? parseWrite(inner, raw) : parseRun(inner, raw)) + if (kind === 'write') out.push(parseWrite(inner, raw)) + else if (kind === 'run') out.push(parseRun(inner, raw)) + else out.push(parseSkill(inner, raw)) } return out } @@ -142,7 +175,18 @@ export type RunActionExecution = { result: { ok: false; error: string } | { ok: true } } -export type ActionExecution = WriteActionExecution | RunActionExecution +export type SkillActionExecution = { + kind: 'skill' + skill: string + // Skills are read-only : loading one cannot fail at exec time besides + // "skill not found in the catalog". The catalog is enforced upstream. + result: { ok: true; body: string } | { ok: false; error: string } +} + +export type ActionExecution = + | WriteActionExecution + | RunActionExecution + | SkillActionExecution function quoteUnsafeDescription(content: string): string { // Small models commonly write a `description` value containing a colon @@ -211,19 +255,47 @@ function looksLikeAgent(path: string): boolean { return path.startsWith('agents/') } +export type ExecuteActionOptions = { + overwrite?: boolean + // Resolver injected by useChat. Returns the skill body when the LLM + // asked to load a skill. We don't import the catalog here directly to + // keep this module testable without filesystem dependencies. + resolveSkill?: (name: string) => string | null +} + /** * Synchronously prepare and (for write) execute a parsed action. * For run actions, only validates pre-conditions ; the actual launch is * driven by useChat via launchAgent() so output can be streamed. + * For skill actions, looks up the skill body via the resolver. */ export function executeAction( action: ParsedAction, - options: { overwrite?: boolean } = {}, + options: ExecuteActionOptions = {}, ): ActionExecution { if (action.kind === 'run') { return { kind: 'run', agent: action.agent, result: { ok: true } } } + if (action.kind === 'skill') { + if (!options.resolveSkill) { + return { + kind: 'skill', + skill: action.skill, + result: { ok: false, error: 'no skill resolver configured' }, + } + } + const body = options.resolveSkill(action.skill) + if (body === null) { + return { + kind: 'skill', + skill: action.skill, + result: { ok: false, error: `skill not found : ${action.skill}` }, + } + } + return { kind: 'skill', skill: action.skill, result: { ok: true, body } } + } + const path = normalizeWritePath(action.path) let content = action.content diff --git a/packages/cli/src/commands.ts b/packages/cli/src/commands.ts index fb03232..ef214ad 100644 --- a/packages/cli/src/commands.ts +++ b/packages/cli/src/commands.ts @@ -5,6 +5,7 @@ import { getCurrentBaseURL, getCurrentModelName, + loadSkillCatalog, setProviderOverride, } from '@agent-forge/core/builder' import { @@ -55,6 +56,11 @@ function helpLines(lang: Lang): string[] { ` /sessions ${ lang === 'fr' ? 'liste les sessions persistées' : 'list persisted sessions' }`, + ` /skills ${ + lang === 'fr' + ? 'liste les skills disponibles' + : 'list available skills' + }`, ] } @@ -181,6 +187,23 @@ export function runCommand( return { lines } } + case '/skills': { + const catalog = loadSkillCatalog() + if (catalog.skills.length === 0) { + return { lines: [lang === 'fr' ? '(aucune skill)' : '(no skills)'] } + } + const lines = [ + lang === 'fr' + ? `${catalog.skills.length.toString()} skill(s) :` + : `${catalog.skills.length.toString()} skill(s) :`, + ] + for (const s of catalog.skills) { + const tag = s.source === 'builtin' ? '·built-in·' : '·user·' + lines.push(` ${s.name} ${tag} ${s.description}`) + } + return { lines } + } + default: return { lines: [t('cmdUnknown', lang)] } } diff --git a/packages/cli/src/components/App.tsx b/packages/cli/src/components/App.tsx index c85a8c9..01c4ef8 100644 --- a/packages/cli/src/components/App.tsx +++ b/packages/cli/src/components/App.tsx @@ -8,10 +8,14 @@ // │ Welcome │ header + transcript + (confirm dialog OR prompt) + footer // └──────────────┘ ← terminal bottom (FIXED) // -// PgUp / PgDn / Ctrl+E scroll the chat transcript inside Welcome. -// Tab / Shift+Tab cycle focus through Mission Control cards (only when -// the prompt input is empty so it doesn't fight TextInput). Enter on a -// focused card opens a full-screen CardDetail view ; Esc closes it. +// Scroll responsibilities : +// - Welcome's chat transcript : PgUp/PgDn/Ctrl+E when no card is focused +// AND no Mission Control scroll is needed. +// - Mission Control panel : PgUp/PgDn when focus is inside the panel +// (or, more simply, when there are more actions than fit and the +// prompt is empty). +// - Tab/Shift+Tab cycle the focused card. Enter opens the detail +// view. Esc unfocuses. The detail view is a full-screen modal. import { Box, useInput, useStdin } from 'ink' import React from 'react' @@ -24,6 +28,12 @@ import { ProviderLogo } from './ProviderLogo.tsx' import { Splash } from './Splash.tsx' import { Welcome } from './Welcome.tsx' +// Keep Welcome's bottom block (header + transcript + prompt + footer) +// at this minimum height ; everything above goes to Mission Control. +const WELCOME_MIN_HEIGHT = 12 +// Reserve a few rows above Welcome for the spacer + provider logo. +const SPACER_HEIGHT = 4 + export function App(): React.JSX.Element { const { lang } = useLanguage() const { isRawModeSupported } = useStdin() @@ -36,9 +46,14 @@ export function App(): React.JSX.Element { const hasActions = state.actions.length > 0 const promptIsEmpty = promptDraft.length === 0 - // Tab/Enter is only meaningful when there are actions, the prompt is - // empty (so TextInput doesn't lose its keystrokes), and no permission - // dialog is showing. + // Mission Control gets whatever is left after Welcome and the + // spacer/logo claim their slots. Floor at 6 so the panel never + // collapses below "header + 1 card line + truncation hints". + const panelHeight = Math.max( + 6, + rows - WELCOME_MIN_HEIGHT - SPACER_HEIGHT, + ) + const cardKeysActive = isRawModeSupported && lang !== null && @@ -49,15 +64,26 @@ export function App(): React.JSX.Element { useInput( (input, key) => { - if (key.pageUp) scrollUp() - else if (key.pageDown) scrollDown() - else if (key.ctrl && input === 'e') scrollToBottom() - else if (cardKeysActive && key.tab && key.shift) focus.cycleBack() + // PgUp/PgDn : when a card is focused OR there's nothing in the + // prompt and we have actions, scroll Mission Control. Otherwise + // scroll the chat transcript (legacy behaviour). + if (key.pageUp) { + if (cardKeysActive || focus.focusedId !== null) focus.scrollUp() + else scrollUp() + return + } + if (key.pageDown) { + if (cardKeysActive || focus.focusedId !== null) focus.scrollDown() + else scrollDown() + return + } + if (key.ctrl && input === 'e') { + scrollToBottom() + return + } + if (cardKeysActive && key.tab && key.shift) focus.cycleBack() else if (cardKeysActive && key.tab) focus.cycle() else if (cardKeysActive && key.return) focus.open() - // Esc clears the card focus (only when something is focused and - // the prompt is empty, so we never swallow an Esc the user meant - // for cancelling input). else if ( key.escape && promptIsEmpty && @@ -82,7 +108,12 @@ export function App(): React.JSX.Element { {hasActions ? ( - + ) : ( )} diff --git a/packages/cli/src/components/CardDetail.tsx b/packages/cli/src/components/CardDetail.tsx index 986480f..c149c0d 100644 --- a/packages/cli/src/components/CardDetail.tsx +++ b/packages/cli/src/components/CardDetail.tsx @@ -8,12 +8,13 @@ import { Box, Text, useInput } from 'ink' import React, { useState } from 'react' -import type { Action, ActionStatus, RunAction, WriteAction } from '../actions/types.ts' +import type { Action, ActionStatus } from '../actions/types.ts' import { C } from '../theme/colors.ts' import { type HighlightedLine, type Segment, - highlightPlain, + highlightAgentRun, + highlightMarkdown, highlightYamlText, } from './syntax.ts' @@ -35,23 +36,64 @@ const STATUS_COLOR: Record = { declined: C.grey, } +function sectionHeader(label: string): HighlightedLine { + return [{ text: `── ${label} ──`, color: C.grey, dim: true }] +} + function buildLines(action: Action): HighlightedLine[] { if (action.kind === 'write') { + // AGENT.md = YAML frontmatter + Markdown body. Splitting them and + // highlighting each with its own grammar gives much better + // contrast than a single YAML pass over the whole file. + const frontmatterMatch = action.content.match( + /^---\s*\n([\s\S]*?)\n---\s*\n?([\s\S]*)$/, + ) + if (frontmatterMatch) { + const fmRaw = frontmatterMatch[1] ?? '' + const bodyRaw = frontmatterMatch[2] ?? '' + const out: HighlightedLine[] = [] + out.push([{ text: '---', color: C.grey, dim: true }]) + out.push(...highlightYamlText(fmRaw)) + out.push([{ text: '---', color: C.grey, dim: true }]) + if (bodyRaw.length > 0) { + out.push([{ text: ' ' }]) + out.push(...highlightMarkdown(bodyRaw)) + } + return out + } return highlightYamlText(action.content) } - // run : prompt then output + if (action.kind === 'skill') { + const out: HighlightedLine[] = [] + out.push(sectionHeader('description')) + out.push(...highlightMarkdown(action.description)) + out.push([{ text: ' ' }]) + out.push(sectionHeader('instructions injected into context')) + if (action.body && action.body.length > 0) { + out.push(...highlightMarkdown(action.body)) + } else { + out.push([{ text: '(skill body not loaded yet)', color: C.grey, dim: true }]) + } + if (action.status === 'failed' && action.error) { + out.push([{ text: ' ' }]) + out.push([{ text: `✗ ${action.error}`, color: C.red }]) + } + return out + } + // run : prompt (markdown-ish prose) then output (mixed forge:* + + // [forge:tool] streams). const out: HighlightedLine[] = [] - out.push([{ text: '── prompt ──', color: C.grey, dim: true }]) - out.push(...highlightPlain(action.prompt)) - out.push([{ text: '' }]) - out.push([{ text: '── output ──', color: C.grey, dim: true }]) + out.push(sectionHeader('prompt')) + out.push(...highlightMarkdown(action.prompt)) + out.push([{ text: ' ' }]) + out.push(sectionHeader('output')) if (action.output.length > 0) { - out.push(...highlightPlain(action.output)) + out.push(...highlightAgentRun(action.output)) } else { out.push([{ text: '(empty)', color: C.grey, dim: true }]) } if (action.status === 'failed' && action.error) { - out.push([{ text: '' }]) + out.push([{ text: ' ' }]) out.push([{ text: `✗ ${action.error}`, color: C.red }]) } return out @@ -59,6 +101,7 @@ function buildLines(action: Action): HighlightedLine[] { function headerFor(action: Action): string { if (action.kind === 'write') return `write ${action.path}` + if (action.kind === 'skill') return `skill ${action.skill}` return `run ${action.agent}` } diff --git a/packages/cli/src/components/MissionControl.tsx b/packages/cli/src/components/MissionControl.tsx index 999b350..e991dd3 100644 --- a/packages/cli/src/components/MissionControl.tsx +++ b/packages/cli/src/components/MissionControl.tsx @@ -1,20 +1,31 @@ // MissionControl — fills the top zone whenever there is at least one -// builder action (write or run). Replaces the splash screen for the rest -// of the session. +// builder action. Two display modes per card : // -// Each action gets a card with : -// - a status badge (proposed / running / done / failed) -// - the target (file path or agent name) -// - a syntax-highlighted preview of the content (YAML for AGENT.md, -// plain for prompts) or the streaming agent output +// - compact (default for unfocused cards) : 1 terminal line, badge + +// verb + target, kept together with a thin border. +// - expanded (focused card, or any card whose status is 'running' so +// a streaming output stays visible) : the full preview panel as +// before. +// +// The panel itself is bounded : it accepts a panelHeight prop and +// renders only the slice of cards starting at scrollTop that fits +// within that height. Truncation is signalled by "↑ N above / +// ↓ N below" hints in the header. import { Box, Text } from 'ink' import React from 'react' -import type { Action, ActionStatus, RunAction, WriteAction } from '../actions/types.ts' +import type { + Action, + ActionStatus, + RunAction, + SkillAction, + WriteAction, +} from '../actions/types.ts' import { C } from '../theme/colors.ts' import { type HighlightedLine, type Segment, + highlightAgentRun, highlightPlain, highlightYamlText, } from './syntax.ts' @@ -125,6 +136,39 @@ function FocusMarker({ focused }: { focused: boolean }): React.JSX.Element { ) } +// ── Compact row : single line for unfocused cards ───────────────── + +function verbFor(action: Action): string { + if (action.kind === 'write') return 'write' + if (action.kind === 'run') return 'run' + return 'skill' +} + +function targetFor(action: Action): string { + if (action.kind === 'write') return action.path + if (action.kind === 'run') return action.agent + return action.skill +} + +function CompactRow({ + action, + focused, +}: { + action: Action + focused: boolean +}): React.JSX.Element { + return ( + + + + {` ${verbFor(action).padEnd(5, ' ')} `} + {targetFor(action)} + + ) +} + +// ── Expanded cards ──────────────────────────────────────────────── + function WriteCard({ action, focused, @@ -166,7 +210,8 @@ function RunCard({ focused: boolean }): React.JSX.Element { const promptLines = highlightPlain(action.prompt) - const outputLines = action.output.length > 0 ? highlightPlain(action.output) : [] + const outputLines = + action.output.length > 0 ? highlightAgentRun(action.output) : [] return ( @@ -196,22 +241,128 @@ function RunCard({ ) } +function SkillCard({ + action, + focused, +}: { + action: SkillAction + focused: boolean +}): React.JSX.Element { + return ( + + + + + {' skill '} + {action.skill} + + + {action.description} + + {action.status === 'done' ? ( + + {' ✓ skill loaded into context'} + + ) : null} + {action.status === 'failed' && action.error ? ( + + {` ✗ ${action.error}`} + + ) : null} + + ) +} + +// ── Layout : how many lines does a card need ? ──────────────────── + +const COMPACT_HEIGHT = 1 + +function expandedHeight(action: Action): number { + // Empirical estimate ; we don't try to be exact, we want a stable + // upper bound so the panel can budget rows. + if (action.kind === 'write') { + // CardFrame border 2, header 1, marginTop 1, body up to 14, hint 1+1 = ~20 + return 20 + } + if (action.kind === 'run') { + // border 2, header 1, prompt label 1, prompt up to 6, output label 1, output up to 14, error 1 = ~26 + return 26 + } + // skill : border 2, header 1, description ~1, loaded hint 1 = ~7 + return 7 +} + +function heightOf( + action: Action, + focused: boolean, +): number { + if (focused) return expandedHeight(action) + // Running cards stay expanded so a streaming agent run stays visible. + if (action.status === 'running') return expandedHeight(action) + return COMPACT_HEIGHT + 1 /* paddingY around row */ +} + +// ── Slicing : start at scrollTop, fit within panelHeight ────────── + +type Slice = { + visible: Action[] + hiddenAbove: number + hiddenBelow: number +} + +function sliceForViewport({ + actions, + focusedId, + scrollTop, + panelHeight, +}: { + actions: Action[] + focusedId: string | null + scrollTop: number + panelHeight: number +}): Slice { + const start = Math.min(Math.max(0, scrollTop), Math.max(0, actions.length - 1)) + const visible: Action[] = [] + let used = 0 + for (let i = start; i < actions.length; i += 1) { + const a = actions[i] as Action + const h = heightOf(a, a.id === focusedId) + if (used + h > panelHeight && visible.length > 0) break + visible.push(a) + used += h + if (used >= panelHeight) break + } + return { + visible, + hiddenAbove: start, + hiddenBelow: Math.max(0, actions.length - start - visible.length), + } +} + export function MissionControl({ actions, focusedId, + scrollTop, + panelHeight, }: { actions: Action[] focusedId: string | null + scrollTop: number + panelHeight: number }): React.JSX.Element { const cols = process.stdout.columns ?? 80 + // Reserve 2 rows for the header + truncation hints, the rest is body. + const bodyHeight = Math.max(3, panelHeight - 2) + const slice = sliceForViewport({ + actions, + focusedId, + scrollTop, + panelHeight: bodyHeight, + }) + return ( - - + + {' ▌▌ MISSION CONTROL ▐▐ '} @@ -228,14 +379,27 @@ export function MissionControl({ )} - {actions.map((a) => { + + {slice.hiddenAbove > 0 ? ( + + {` ↑ ${slice.hiddenAbove.toString()} action${slice.hiddenAbove === 1 ? '' : 's'} above`} + + ) : null} + + {slice.visible.map((a) => { const focused = a.id === focusedId - return a.kind === 'write' ? ( - - ) : ( - - ) + const expand = focused || a.status === 'running' + if (!expand) return + if (a.kind === 'write') return + if (a.kind === 'run') return + return })} + + {slice.hiddenBelow > 0 ? ( + + {` ↓ ${slice.hiddenBelow.toString()} action${slice.hiddenBelow === 1 ? '' : 's'} below`} + + ) : null} ) } diff --git a/packages/cli/src/components/syntax.ts b/packages/cli/src/components/syntax.ts index d846f3d..bd3a911 100644 --- a/packages/cli/src/components/syntax.ts +++ b/packages/cli/src/components/syntax.ts @@ -1,33 +1,38 @@ -// Tiny, line-oriented syntax helpers for the MissionControl preview. -// Returns segments {text, color, dim?} that components can render with Ink. -// We deliberately avoid a real parser : agents emit small YAML / plain text -// blocks, a handful of regexes is enough. +// Tiny, line-oriented syntax helpers for Mission Control and the +// CardDetail view. Goals : +// - keep it dependency-free (regex only) ; +// - cover the four shapes Agent Forge actually shows : YAML, plain +// text, Markdown, and JSON-ish ; +// - recognise fenced blocks inside Markdown so a forge:bash inside +// an agent run reads as bash, not as prose. +// +// Each highlighter returns a list of HighlightedLine ; a +// HighlightedLine is a list of Segment ({text, color, dim?, bold?}) +// that components render with Ink. import { C } from '../theme/colors.ts' -export type Segment = { text: string; color?: string; dim?: boolean; bold?: boolean } +export type Segment = { + text: string + color?: string + dim?: boolean + bold?: boolean +} export type HighlightedLine = Segment[] +// ── YAML ───────────────────────────────────────────────────────── + const YAML_KEY_RE = /^(\s*)([A-Za-z_][\w-]*)(\s*:)(\s*)(.*)$/ const YAML_LIST_RE = /^(\s*)(-)(\s+)(.*)$/ const YAML_SEPARATOR_RE = /^---\s*$/ const YAML_COMMENT_RE = /^(\s*)(#.*)$/ function valueSegment(value: string): Segment { - // Numbers - if (/^-?\d+(\.\d+)?$/.test(value)) { - return { text: value, color: C.greyLight } - } - // Quoted string - if (/^["'].*["']$/.test(value)) { - return { text: value, color: C.greyLight } - } - // Booleans / null - if (/^(true|false|null|yes|no)$/i.test(value)) { + if (/^-?\d+(\.\d+)?$/.test(value)) return { text: value, color: C.greyLight } + if (/^["'].*["']$/.test(value)) return { text: value, color: C.greyLight } + if (/^(true|false|null|yes|no)$/i.test(value)) return { text: value, color: C.orangeBright } - } - // Bare value return { text: value, color: C.white } } @@ -61,12 +66,9 @@ export function highlightYamlLine(line: string): HighlightedLine { { text: colon ?? '', color: C.grey }, { text: space ?? '' }, ] - if (value && value.length > 0) { - segs.push(valueSegment(value)) - } + if (value && value.length > 0) segs.push(valueSegment(value)) return segs } - // Markdown header inside body if (/^#\s/.test(line)) { return [{ text: line, color: C.orangeBright, bold: true }] } @@ -77,8 +79,253 @@ export function highlightYamlText(text: string): HighlightedLine[] { return text.split('\n').map(highlightYamlLine) } +// ── Plain ──────────────────────────────────────────────────────── + export function highlightPlain(text: string): HighlightedLine[] { return text .split('\n') .map((l) => [{ text: l.length > 0 ? l : ' ', color: C.greyLight }]) } + +// ── JSON ───────────────────────────────────────────────────────── +// +// Tokeniser-light : single line at a time. We don't try to follow +// multi-line strings — agents rarely emit them. The aim is colour, +// not validation. + +const JSON_TOKEN_RE = /"(?:[^"\\]|\\.)*"|true|false|null|-?\d+(?:\.\d+)?/g + +function highlightJsonLine(line: string): HighlightedLine { + if (line.length === 0) return [{ text: ' ' }] + const segs: HighlightedLine = [] + let last = 0 + for (const m of line.matchAll(JSON_TOKEN_RE)) { + const idx = m.index ?? 0 + if (idx > last) segs.push({ text: line.slice(last, idx), color: C.grey }) + const tok = m[0] + if (tok.startsWith('"')) { + // Heuristic : a quoted string immediately followed by ':' is a key, + // colour as orange ; otherwise a value (greyLight). + const after = line.slice(idx + tok.length).trimStart() + if (after.startsWith(':')) { + segs.push({ text: tok, color: C.orange, bold: true }) + } else { + segs.push({ text: tok, color: C.greyLight }) + } + } else if (tok === 'true' || tok === 'false' || tok === 'null') { + segs.push({ text: tok, color: C.orangeBright }) + } else { + segs.push({ text: tok, color: C.white }) + } + last = idx + tok.length + } + if (last < line.length) segs.push({ text: line.slice(last), color: C.grey }) + return segs +} + +// ── Markdown (with fenced blocks) ──────────────────────────────── +// +// Recognises : +// - ATX headings (#, ##, ...) +// - Unordered list bullets (-, *, +) +// - Ordered list bullets (1. 2. ...) +// - Inline code spans (`...`) +// - Bold (**...**) and emphasis (*...*) — colour only, no font +// - Fenced code blocks ```lang ... ``` : the content is forwarded +// to the matching highlighter (yaml/json/plain), and the fences +// themselves render dim grey +// +// Special-case our own fence prefix `forge:*` : the body is JSON-ish, +// route it to the JSON highlighter. + +const HEADING_RE = /^(#{1,6})\s+(.*)$/ +const ULIST_RE = /^(\s*)([-*+])(\s+)(.*)$/ +const OLIST_RE = /^(\s*)(\d+\.)(\s+)(.*)$/ +const FENCE_OPEN_RE = /^```(\S*)\s*$/ +const FENCE_CLOSE_RE = /^```\s*$/ +const INLINE_CODE_RE = /`[^`]+`/g +const BOLD_RE = /\*\*[^*]+\*\*/g + +function languageHighlighter(lang: string): (line: string) => HighlightedLine { + const l = lang.toLowerCase() + if (l === 'yaml' || l === 'yml') return highlightYamlLine + if (l === 'json' || l.startsWith('forge:')) return highlightJsonLine + if (l === 'bash' || l === 'sh' || l === 'shell') { + return (line) => [{ text: line.length > 0 ? line : ' ', color: C.greyLight }] + } + // Default for unknown / TypeScript / etc. : neutral grey-light. + return (line) => [{ text: line.length > 0 ? line : ' ', color: C.greyLight }] +} + +// Apply inline code spans and bold to a Markdown prose line. Returns +// a list of segments. Order doesn't matter because the matched +// regions don't overlap in practice (we don't try to nest them). +function highlightInlineMarkdown(line: string): HighlightedLine { + type Mark = { start: number; end: number; seg: Segment } + const marks: Mark[] = [] + for (const m of line.matchAll(INLINE_CODE_RE)) { + if (m.index === undefined) continue + marks.push({ + start: m.index, + end: m.index + m[0].length, + seg: { text: m[0], color: C.orangeBright }, + }) + } + for (const m of line.matchAll(BOLD_RE)) { + if (m.index === undefined) continue + // Skip if overlaps an existing inline-code mark. + const overlap = marks.some( + (e) => + !(e.end <= (m.index ?? 0) || e.start >= (m.index ?? 0) + m[0].length), + ) + if (overlap) continue + marks.push({ + start: m.index, + end: m.index + m[0].length, + seg: { text: m[0], color: C.white, bold: true }, + }) + } + if (marks.length === 0) return [{ text: line, color: C.greyLight }] + marks.sort((a, b) => a.start - b.start) + const segs: HighlightedLine = [] + let cur = 0 + for (const mark of marks) { + if (mark.start > cur) { + segs.push({ text: line.slice(cur, mark.start), color: C.greyLight }) + } + segs.push(mark.seg) + cur = mark.end + } + if (cur < line.length) segs.push({ text: line.slice(cur), color: C.greyLight }) + return segs +} + +export function highlightMarkdown(text: string): HighlightedLine[] { + const out: HighlightedLine[] = [] + const lines = text.split('\n') + let inFence = false + let fenceLang = '' + let fenceLine: ((line: string) => HighlightedLine) | null = null + for (const raw of lines) { + if (inFence) { + if (FENCE_CLOSE_RE.test(raw)) { + out.push([{ text: raw, color: C.grey, dim: true }]) + inFence = false + fenceLang = '' + fenceLine = null + continue + } + out.push((fenceLine ?? highlightYamlLine)(raw)) + continue + } + const fenceOpen = raw.match(FENCE_OPEN_RE) + if (fenceOpen) { + inFence = true + fenceLang = fenceOpen[1] ?? '' + fenceLine = languageHighlighter(fenceLang) + out.push([{ text: raw, color: C.grey, dim: true }]) + continue + } + if (raw.length === 0) { + out.push([{ text: ' ' }]) + continue + } + const heading = raw.match(HEADING_RE) + if (heading) { + out.push([ + { text: heading[1] ?? '', color: C.orange, bold: true }, + { text: ' ' }, + { text: heading[2] ?? '', color: C.orangeBright, bold: true }, + ]) + continue + } + const ulist = raw.match(ULIST_RE) + if (ulist) { + out.push([ + { text: ulist[1] ?? '' }, + { text: ulist[2] ?? '', color: C.orange, bold: true }, + { text: ulist[3] ?? '' }, + ...highlightInlineMarkdown(ulist[4] ?? ''), + ]) + continue + } + const olist = raw.match(OLIST_RE) + if (olist) { + out.push([ + { text: olist[1] ?? '' }, + { text: olist[2] ?? '', color: C.orange, bold: true }, + { text: olist[3] ?? '' }, + ...highlightInlineMarkdown(olist[4] ?? ''), + ]) + continue + } + out.push(highlightInlineMarkdown(raw)) + } + return out +} + +// ── Mixed run output ───────────────────────────────────────────── +// +// What an agent produces during a multi-turn run is a mix of : +// - prose +// - fenced ```forge:bash / forge:write / forge:read / ... blocks +// - injected [forge:tool] / [/forge:tool] markers framing the +// result of the previous tool call (raw stdout, often shell-y) +// +// We treat the markers like another fence type : everything between +// [forge:tool] and [/forge:tool] is rendered with a dim, distinct +// colour so the user can tell tool output from the agent's narration. + +const TOOL_OPEN_RE = /^\[forge:tool\]\s*$/ +const TOOL_CLOSE_RE = /^\[\/forge:tool\]\s*$/ + +export function highlightAgentRun(text: string): HighlightedLine[] { + const out: HighlightedLine[] = [] + const lines = text.split('\n') + let inFence = false + let fenceLine: ((line: string) => HighlightedLine) | null = null + let inTool = false + + for (const raw of lines) { + if (inFence) { + if (FENCE_CLOSE_RE.test(raw)) { + out.push([{ text: raw, color: C.grey, dim: true }]) + inFence = false + fenceLine = null + continue + } + out.push((fenceLine ?? highlightYamlLine)(raw)) + continue + } + if (inTool) { + if (TOOL_CLOSE_RE.test(raw)) { + out.push([{ text: raw, color: C.grey, dim: true }]) + inTool = false + continue + } + // Tool output is opaque shell-ish content. Render as plain + // greyLight so it stays readable but visually quieter than + // the agent's prose / blocks. + out.push([{ text: raw.length > 0 ? raw : ' ', color: C.grey }]) + continue + } + if (TOOL_OPEN_RE.test(raw)) { + out.push([{ text: raw, color: C.grey, dim: true }]) + inTool = true + continue + } + const fenceOpen = raw.match(FENCE_OPEN_RE) + if (fenceOpen) { + inFence = true + fenceLine = languageHighlighter(fenceOpen[1] ?? '') + out.push([{ text: raw, color: C.orange, bold: true }]) + continue + } + if (raw.length === 0) { + out.push([{ text: ' ' }]) + continue + } + out.push(highlightInlineMarkdown(raw)) + } + return out +} diff --git a/packages/cli/src/hooks/useCardFocus.ts b/packages/cli/src/hooks/useCardFocus.ts index a40bd4a..152c6e1 100644 --- a/packages/cli/src/hooks/useCardFocus.ts +++ b/packages/cli/src/hooks/useCardFocus.ts @@ -1,47 +1,93 @@ -// Mission Control card focus + detail view state. +// Mission Control card focus + scroll + detail view state. // -// Kept separate from useChat so the chat hook stays focused on -// conversation/action state. Exposes : -// - focusedId : id of the action currently highlighted (or null) -// - detailOpen : whether the full-screen detail panel is mounted -// - cycle / cycleBack / open / close : the actions wired to Tab keys +// Focus : +// - Tab from "no focus" → focus the LAST action (most recent). +// - Tab again → walk forward (wraps). +// - Shift+Tab → walk backward (wraps). +// - Esc clears focus (keep card content visible, just unhighlight). +// - When the focused action disappears, drop focus. // -// Behaviour : -// - Tab from "no focus" → focus the LAST action (most recent on top -// of Mission Control reads as bottom of the list, so we land on -// what the user just saw). -// - Tab again → walk forward; wraps around. -// - Shift+Tab → walk backward; wraps around. -// - When the focused action disappears (cleared, etc.), focus resets. - -import { useCallback, useEffect, useState } from 'react' +// Auto-focus : +// - When a new action arrives and nothing is focused, auto-focus +// the new one so the user immediately sees what the builder did. +// - We track the last seen action ids in a ref to detect "new". +// +// Scroll : +// - scrollTop is an action-INDEX offset. The Mission Control panel +// slices `actions.slice(scrollTop, …)` to fit panelHeight. +// - cycle / cycleBack adjust scrollTop when the focused index moves +// out of the visible window. The visible window size depends on +// the panel layout, which we don't know here ; we use a +// conservative heuristic : keep the focused index >= scrollTop. +// - scrollUp / scrollDown / scrollHome / scrollEnd let App expose +// PgUp / PgDn / Home / End to the user when no card is focused. + +import { useCallback, useEffect, useRef, useState } from 'react' import type { Action } from '../actions/types.ts' export type CardFocusApi = { focusedId: string | null detailOpen: boolean + scrollTop: number cycle: () => void cycleBack: () => void open: () => void close: () => void clearFocus: () => void + scrollUp: () => void + scrollDown: () => void + scrollHome: () => void + scrollEnd: () => void } export function useCardFocus(actions: Action[]): CardFocusApi { const [focusedId, setFocusedId] = useState(null) const [detailOpen, setDetailOpen] = useState(false) + const [scrollTop, setScrollTop] = useState(0) + + // Remember the previous action ids so we can detect new arrivals + // without firing on every render (initial mount included). + const prevIdsRef = useRef>(new Set()) - // If the focused action disappears (e.g. /clear), drop focus and the - // detail panel together so we never display a stale card. + // Auto-focus the most recent action when one shows up and nothing + // is focused yet. Also trims focus / scroll when actions vanish. useEffect(() => { - if (focusedId === null) return - const stillThere = actions.some((a) => a.id === focusedId) - if (!stillThere) { + const currentIds = new Set(actions.map((a) => a.id)) + // Find ids that weren't there last render — new arrivals. + const newIds: string[] = [] + for (const a of actions) { + if (!prevIdsRef.current.has(a.id)) newIds.push(a.id) + } + prevIdsRef.current = currentIds + + if (focusedId !== null && !currentIds.has(focusedId)) { setFocusedId(null) setDetailOpen(false) } + + // Auto-focus the latest new arrival, but only if nothing is + // currently focused (don't steal focus mid-cycle). + if (newIds.length > 0 && focusedId === null) { + const last = newIds[newIds.length - 1] + if (last !== undefined) setFocusedId(last) + } + + // Keep scrollTop within bounds. + setScrollTop((st) => Math.max(0, Math.min(st, Math.max(0, actions.length - 1)))) }, [actions, focusedId]) + // Scroll-to-focus : whenever focusedId changes, make sure scrollTop + // is at most the focused index (so the focused card is at or below + // the panel's first visible slot). The panel itself caps scrollTop + // upward when the focused card would fall below the bottom edge — + // we don't know panelHeight here, so we keep a lower bound only. + useEffect(() => { + if (focusedId === null) return + const idx = actions.findIndex((a) => a.id === focusedId) + if (idx === -1) return + setScrollTop((st) => (idx < st ? idx : st)) + }, [focusedId, actions]) + const cycle = useCallback(() => { if (actions.length === 0) return setFocusedId((current) => { @@ -58,9 +104,7 @@ export function useCardFocus(actions: Action[]): CardFocusApi { const cycleBack = useCallback(() => { if (actions.length === 0) return setFocusedId((current) => { - if (current === null) { - return actions[0]?.id ?? null - } + if (current === null) return actions[0]?.id ?? null const idx = actions.findIndex((a) => a.id === current) if (idx === -1) return actions[0]?.id ?? null const prev = (idx - 1 + actions.length) % actions.length @@ -81,5 +125,36 @@ export function useCardFocus(actions: Action[]): CardFocusApi { setDetailOpen(false) }, []) - return { focusedId, detailOpen, cycle, cycleBack, open, close, clearFocus } + const scrollUp = useCallback(() => { + setScrollTop((st) => Math.max(0, st - 1)) + }, []) + + const scrollDown = useCallback(() => { + setScrollTop((st) => + Math.min(Math.max(0, actions.length - 1), st + 1), + ) + }, [actions.length]) + + const scrollHome = useCallback(() => { + setScrollTop(0) + }, []) + + const scrollEnd = useCallback(() => { + setScrollTop(Math.max(0, actions.length - 1)) + }, [actions.length]) + + return { + focusedId, + detailOpen, + scrollTop, + cycle, + cycleBack, + open, + close, + clearFocus, + scrollUp, + scrollDown, + scrollHome, + scrollEnd, + } } diff --git a/packages/cli/src/hooks/useChat.ts b/packages/cli/src/hooks/useChat.ts index 13dce72..83b42c9 100644 --- a/packages/cli/src/hooks/useChat.ts +++ b/packages/cli/src/hooks/useChat.ts @@ -10,12 +10,19 @@ // Builder code blocks (```forge:*) are extracted into actions and STRIPPED // from the assistant's textual reply before that reply lands in `messages`. -import { type ChatMessage, streamBuilder } from '@agent-forge/core/builder' +import { + type ChatMessage, + loadSkillCatalog, + matchSkillForMessage, + runScaffoldAndRun, + streamBuilder, +} from '@agent-forge/core/builder' import { launchAgent } from '@agent-forge/tools-core' -import { useCallback, useRef, useState } from 'react' +import { useCallback, useMemo, useRef, useState } from 'react' import { type Action, type RunAction, + type SkillAction, type WriteAction, nextActionId, } from '../actions/types.ts' @@ -63,7 +70,10 @@ function nowIso(): string { return new Date().toISOString() } -function actionFromParsed(parsed: ParsedAction): Action { +function actionFromParsed( + parsed: ParsedAction, + skillDescriptionFor: (name: string) => string, +): Action { if (parsed.kind === 'write') { return { id: nextActionId(), @@ -74,14 +84,25 @@ function actionFromParsed(parsed: ParsedAction): Action { createdAt: nowIso(), } } + if (parsed.kind === 'run') { + return { + id: nextActionId(), + kind: 'run', + status: 'proposed', + agent: parsed.agent, + prompt: parsed.prompt, + createdAt: nowIso(), + output: '', + } + } + // skill : auto-running, the executor resolves the body synchronously. return { id: nextActionId(), - kind: 'run', - status: 'proposed', - agent: parsed.agent, - prompt: parsed.prompt, + kind: 'skill', + status: 'running', + skill: parsed.skill, + description: skillDescriptionFor(parsed.skill), createdAt: nowIso(), - output: '', } } @@ -94,10 +115,17 @@ function parsedFromAction(action: Action): ParsedAction { raw: '', } } + if (action.kind === 'run') { + return { + kind: 'run', + agent: action.agent, + prompt: action.prompt, + raw: '', + } + } return { - kind: 'run', - agent: action.agent, - prompt: action.prompt, + kind: 'skill', + skill: action.skill, raw: '', } } @@ -127,6 +155,28 @@ export function useChat(lang: Lang): { }) const [busy, setBusy] = useState(false) const [scrollOffset, setScrollOffset] = useState(0) + // Skill catalog : loaded once at hook init, kept in a memo so callbacks + // get a stable reference. Built-ins ship with the package ; users can + // drop SKILL.md into ~/.agent-forge/skills/ to extend. + const skillCatalog = useMemo(() => loadSkillCatalog(), []) + const skillEntries = useMemo( + () => + skillCatalog.skills.map((s) => ({ + name: s.name, + description: s.description, + triggers: s.triggers, + })), + [skillCatalog], + ) + const resolveSkillBody = useCallback( + (name: string): string | null => skillCatalog.byName.get(name)?.body ?? null, + [skillCatalog], + ) + const skillDescriptionFor = useCallback( + (name: string): string => + skillCatalog.byName.get(name)?.description ?? '(unknown skill)', + [skillCatalog], + ) // Lifted out of Welcome so App can know when the input is empty (and // thus capture Tab for Mission Control focus without stealing keys // from the prompt). @@ -287,6 +337,95 @@ export function useChat(lang: Lang): { })) setBusy(true) + // Server-side skill matching : if a trigger phrase appears in the + // user message, dispatch to the dedicated runner instead of the + // generic streaming flow. The runner makes two narrow LLM calls + // (one per artefact) so small models keep the AGENT.md and the + // run prompt cleanly separated. + const matched = matchSkillForMessage(prompt, skillCatalog.skills) + if (matched && matched.name === 'scaffold-and-run') { + const skillCard: SkillAction = { + id: nextActionId(), + kind: 'skill', + status: 'running', + skill: matched.name, + description: matched.description, + createdAt: nowIso(), + } + setState((prev) => ({ + ...prev, + streaming: null, + actions: [...prev.actions, skillCard], + })) + try { + const result = await runScaffoldAndRun({ + userMessage: prompt, + lang, + }) + if (!result) { + updateAction(skillCard.id, { + status: 'failed', + error: 'skill runner produced no usable output', + finishedAt: nowIso(), + }) + setBusy(false) + return + } + // Mark the skill as done and surface a write + run pair as + // proposed cards. The user approves them in order via the + // permission dialog. + updateAction(skillCard.id, { + status: 'done', + body: matched.body, + finishedAt: nowIso(), + }) + const writeCard: WriteAction = { + id: nextActionId(), + kind: 'write', + status: 'proposed', + path: `agents/${result.agentName}/AGENT.md`, + content: result.agentMdContent, + createdAt: nowIso(), + } + const runCard: RunAction = { + id: nextActionId(), + kind: 'run', + status: 'proposed', + agent: result.agentName, + prompt: result.runPrompt, + createdAt: nowIso(), + output: '', + } + // Final assistant turn : one short prose sentence so the user + // sees in the conversation that the skill fired. + const proseTurn: ChatTurn = { + id: nextId(), + role: 'assistant', + content: + lang === 'fr' + ? `Je charge la skill ${matched.name} : un AGENT.md à approuver, puis l'exécution.` + : `Loading skill ${matched.name} : one AGENT.md to approve, then the run.`, + } + persist(proseTurn) + setState((prev) => ({ + ...prev, + messages: [...prev.messages, proseTurn], + actions: [...prev.actions, writeCard, runCard], + })) + } catch (err) { + const msg = err instanceof Error ? err.message : String(err) + updateAction(skillCard.id, { + status: 'failed', + error: msg, + finishedAt: nowIso(), + }) + setState((prev) => ({ ...prev, error: msg })) + } finally { + setBusy(false) + } + return + } + try { const history: ChatMessage[] = [ ...hiddenHistoryRef.current @@ -305,7 +444,11 @@ export function useChat(lang: Lang): { ] let acc = '' - for await (const chunk of streamBuilder({ messages: history, lang })) { + for await (const chunk of streamBuilder({ + messages: history, + lang, + skills: skillEntries, + })) { acc += chunk setState((prev) => prev.streaming @@ -318,6 +461,10 @@ export function useChat(lang: Lang): { const blocks = findActionBlocks(acc) const parseErrors: ChatTurn[] = [] const newActions: Action[] = [] + // Skill bodies executed inline get appended to the assistant turn + // as a system message so the next builder turn sees the full + // instructions. + const skillSystemTurns: ChatTurn[] = [] for (const block of blocks) { if (!block.ok) { parseErrors.push({ @@ -325,8 +472,42 @@ export function useChat(lang: Lang): { role: 'system', content: `✗ action skipped : ${block.error}`, }) + continue + } + const action = actionFromParsed(block.action, skillDescriptionFor) + if (action.kind === 'skill') { + // Resolve synchronously and finalise the card state in the + // same render — skills are local, free, never partial. + const exec = executeAction(block.action, { + resolveSkill: resolveSkillBody, + }) + if (exec.kind === 'skill' && exec.result.ok) { + const finalised: SkillAction = { + ...action, + status: 'done', + body: exec.result.body, + finishedAt: nowIso(), + } + newActions.push(finalised) + skillSystemTurns.push({ + id: nextId(), + role: 'system', + content: `[skill:${action.skill}] ${exec.result.body}`, + }) + } else { + const err = + exec.kind === 'skill' && !exec.result.ok + ? exec.result.error + : 'unknown error' + newActions.push({ + ...action, + status: 'failed', + error: err, + finishedAt: nowIso(), + }) + } } else { - newActions.push(actionFromParsed(block.action)) + newActions.push(action) } } const proseOnly = stripActionBlocks(acc) @@ -336,12 +517,14 @@ export function useChat(lang: Lang): { } persist(finalAssistant) for (const e of parseErrors) persist(e) + for (const s of skillSystemTurns) persist(s) setState((prev) => ({ ...prev, messages: [ ...prev.messages, ...(proseOnly.length > 0 ? [finalAssistant] : []), ...parseErrors, + ...skillSystemTurns, ], streaming: null, error: null, @@ -358,7 +541,7 @@ export function useChat(lang: Lang): { setBusy(false) } }, - [state.messages, lang], + [state.messages, lang, skillCatalog, skillEntries, updateAction], ) return { diff --git a/packages/cli/tests/builder-actions.test.ts b/packages/cli/tests/builder-actions.test.ts index 18259cf..384ab59 100644 --- a/packages/cli/tests/builder-actions.test.ts +++ b/packages/cli/tests/builder-actions.test.ts @@ -136,6 +136,64 @@ prompt }) }) +describe('findActionBlocks (skill)', () => { + test('parses a forge:skill block with name: prefix', () => { + const md = `OK je charge une skill : + +\`\`\`forge:skill +name: scaffold-and-run +\`\`\`` + const blocks = findActionBlocks(md) + expect(blocks.length).toBe(1) + expect(blocks[0]?.ok).toBe(true) + if (blocks[0]?.ok && blocks[0].action.kind === 'skill') { + expect(blocks[0].action.skill).toBe('scaffold-and-run') + } + }) + + test('parses a forge:skill block with bare name', () => { + const md = `\`\`\`forge:skill +scaffold-and-run +\`\`\`` + const blocks = findActionBlocks(md) + expect(blocks[0]?.ok).toBe(true) + if (blocks[0]?.ok && blocks[0].action.kind === 'skill') { + expect(blocks[0].action.skill).toBe('scaffold-and-run') + } + }) + + test('rejects skill with non-kebab-case name', () => { + const md = `\`\`\`forge:skill +name: ScaffoldAndRun +\`\`\`` + const blocks = findActionBlocks(md) + expect(blocks[0]?.ok).toBe(false) + }) + + test('executeAction(skill) resolves the body via the resolver', () => { + const exec = executeAction( + { kind: 'skill', skill: 'scaffold-and-run', raw: '' }, + { resolveSkill: (name) => (name === 'scaffold-and-run' ? 'BODY' : null) }, + ) + expect(exec.kind).toBe('skill') + if (exec.kind === 'skill') { + expect(exec.result.ok).toBe(true) + if (exec.result.ok) expect(exec.result.body).toBe('BODY') + } + }) + + test('executeAction(skill) errors when resolver returns null', () => { + const exec = executeAction( + { kind: 'skill', skill: 'unknown', raw: '' }, + { resolveSkill: () => null }, + ) + expect(exec.kind).toBe('skill') + if (exec.kind === 'skill') { + expect(exec.result.ok).toBe(false) + } + }) +}) + describe('executeAction (path coercion + agent validation)', () => { const validFrontmatter = `--- name: ${TEST_AGENT} diff --git a/packages/core/README.md b/packages/core/README.md index 3668fc8..4562ada 100644 --- a/packages/core/README.md +++ b/packages/core/README.md @@ -2,22 +2,27 @@ Primitives de base d'Agent Forge. -## Contenu (état P3) +## Contenu (état P6) - **`builder/`** — l'agent LLM conversationnel qui conçoit les autres agents - `provider.ts` — résout `FORGE_BASE_URL` / `FORGE_API_KEY` / `FORGE_MODEL`, supporte les overrides à chaud (`/provider`, `/model`) - - `system-prompt.ts` — prompt système bilingue EN/FR avec ACTION PROTOCOL et RUN PROTOCOL (fenced blocks `forge:write` et `forge:run`) - - `stream.ts` — `streamBuilder({ messages, lang })` via Vercel AI SDK + - `system-prompt.ts` — prompt système bilingue EN/FR avec ACTION PROTOCOL et RUN PROTOCOL (fenced blocks `forge:write` et `forge:run`), plus la liste informationnelle des skills disponibles + - `stream.ts` — `streamBuilder({ messages, lang, skills })` via Vercel AI SDK + - **`skill-catalog.ts`** — discovery des `SKILL.md` (built-in dans `skills/`, utilisateur dans `~/.agent-forge/skills/`) + - **`skill-matcher.ts`** — match côté serveur des triggers (sous-chaîne insensible à la casse) + - **`skill-runner.ts`** — orchestration de `scaffold-and-run` (deux appels `generateText` ciblés, un pour AGENT.md, un pour le run prompt) + - **`skills/scaffold-and-run.md`** — première skill built-in - **`types/agent-md.ts`** — `parseAgentMd(text)` : sépare frontmatter / body, valide via Zod (name kebab-case, description non vide, sandbox.image, sandbox.timeout, maxTurns) +- **`types/skill-md.ts`** — `parseSkillMd(text)` : même pattern pour les skills (name, description, triggers, actions) ## À venir - **`docker/`** — abstraction sandbox (P5 : agents persistants via `docker exec`, pas seulement `run --rm`) -- **`tools/`** — interface `Tool` partagée (P4) +- **`tools/`** — interface `Tool` partagée ## Dependencies - `ai`, `@ai-sdk/openai` — Vercel AI SDK pour les appels LLM provider-agnostic -- `zod` — validation du frontmatter `AGENT.md` -- `@modelcontextprotocol/sdk` — intégration MCP (P6+) +- `zod` — validation du frontmatter `AGENT.md` et `SKILL.md` +- `@modelcontextprotocol/sdk` — intégration MCP (P7+) - `yaml` — parsing du frontmatter diff --git a/packages/core/src/builder/index.ts b/packages/core/src/builder/index.ts index 696cea8..915d875 100644 --- a/packages/core/src/builder/index.ts +++ b/packages/core/src/builder/index.ts @@ -8,4 +8,18 @@ export { type ProviderConfig, } from './provider.ts' export { type ChatMessage, type ChatRole, streamBuilder } from './stream.ts' -export { type BuilderLang, getBuilderSystemPrompt } from './system-prompt.ts' +export { + type BuilderLang, + type SkillCatalogEntry, + getBuilderSystemPrompt, +} from './system-prompt.ts' +export { + loadSkillCatalog, + type SkillCatalog, + type SkillEntry, +} from './skill-catalog.ts' +export { matchSkillForMessage } from './skill-matcher.ts' +export { + runScaffoldAndRun, + type ScaffoldAndRunResult, +} from './skill-runner.ts' diff --git a/packages/core/src/builder/skill-catalog.ts b/packages/core/src/builder/skill-catalog.ts new file mode 100644 index 0000000..db226d9 --- /dev/null +++ b/packages/core/src/builder/skill-catalog.ts @@ -0,0 +1,114 @@ +// Skill catalog — discovers SKILL.md files from two sources : +// +// 1. Built-in : packages/core/src/builder/skills/*.md, shipped with +// the package. Resolved relative to import.meta.url so it works +// both in dev (TS through Bun) and in a built bundle (the .md +// files are copied next to the runtime source). +// +// 2. User : ~/.agent-forge/skills/.md or /SKILL.md. +// Read at startup ; future revisions can add a /skills reload +// slash command. +// +// Loading is lazy in the body sense : the catalog only carries the +// metadata (name + description + triggers). The body is kept on the +// SkillEntry too, but the LLM does NOT see it until it explicitly +// emits a forge:skill block — the CLI then injects the body into +// the conversation. This avoids paying tokens for skills the user +// never triggers. + +import { existsSync, readFileSync, readdirSync, statSync } from 'node:fs' +import { homedir } from 'node:os' +import { dirname, join, resolve } from 'node:path' +import { fileURLToPath } from 'node:url' +import { + type ParsedSkillMd, + SkillMdError, + parseSkillMd, +} from '../types/skill-md.ts' + +export type SkillEntry = { + name: string + description: string + triggers: string[] + actions: ParsedSkillMd['meta']['actions'] + body: string + source: 'builtin' | 'user' + filePath: string +} + +const BUILTIN_DIR = resolve(dirname(fileURLToPath(import.meta.url)), 'skills') +const USER_DIR = join(homedir(), '.agent-forge', 'skills') + +function readSkillFile(filePath: string, source: SkillEntry['source']): SkillEntry | null { + let raw: string + try { + raw = readFileSync(filePath, 'utf8') + } catch { + return null + } + let parsed: ParsedSkillMd + try { + parsed = parseSkillMd(raw) + } catch (err) { + if (err instanceof SkillMdError) { + console.error(`✗ skill ${filePath} : ${err.message}`) + return null + } + throw err + } + return { + name: parsed.meta.name, + description: parsed.meta.description, + triggers: parsed.meta.triggers, + actions: parsed.meta.actions, + body: parsed.body, + source, + filePath, + } +} + +function collectFromDir(dir: string, source: SkillEntry['source']): SkillEntry[] { + if (!existsSync(dir)) return [] + const out: SkillEntry[] = [] + for (const entry of readdirSync(dir)) { + const full = join(dir, entry) + let st: ReturnType + try { + st = statSync(full) + } catch { + continue + } + if (st.isFile() && entry.endsWith('.md')) { + const skill = readSkillFile(full, source) + if (skill) out.push(skill) + } else if (st.isDirectory()) { + // Convention : /SKILL.md so users can group assets next to + // their skill (templates, examples, etc.). + const inner = join(full, 'SKILL.md') + if (existsSync(inner)) { + const skill = readSkillFile(inner, source) + if (skill) out.push(skill) + } + } + } + return out +} + +export type SkillCatalog = { + skills: SkillEntry[] + byName: Map +} + +export function loadSkillCatalog(): SkillCatalog { + const builtins = collectFromDir(BUILTIN_DIR, 'builtin') + const users = collectFromDir(USER_DIR, 'user') + + // User skills take precedence on name collision so users can + // override a built-in by writing their own. + const merged = new Map() + for (const s of builtins) merged.set(s.name, s) + for (const s of users) merged.set(s.name, s) + + const skills = Array.from(merged.values()).sort((a, b) => a.name.localeCompare(b.name)) + return { skills, byName: merged } +} diff --git a/packages/core/src/builder/skill-matcher.ts b/packages/core/src/builder/skill-matcher.ts new file mode 100644 index 0000000..a536240 --- /dev/null +++ b/packages/core/src/builder/skill-matcher.ts @@ -0,0 +1,41 @@ +// Server-side skill trigger matching. +// +// Small models (Mistral Small, MLX local) don't reliably emit +// forge:skill even when the system prompt says they MUST. Plan B : +// the CLI matches triggers itself before calling the LLM. If a +// trigger phrase appears as a substring of the user message +// (case-insensitive), the matched skill is auto-loaded : its body is +// injected into the conversation as a system message, and a +// SkillAction (status=done) is added to Mission Control. The LLM +// then sees the skill instructions as if it had asked for them +// itself, and the next turn follows the orchestration described in +// the skill body. + +import type { SkillEntry } from './skill-catalog.ts' + +/** + * Returns the FIRST skill whose triggers match the user message, or + * null if none match. Match is case-insensitive substring : we trim + * the trigger and lower-case both sides before comparing. We don't + * need a fuzzy matcher — skills define their own trigger phrases, so + * authors can list as many synonyms as they like. + * + * The first match wins because skills are sorted alphabetically in + * the catalog ; if two skills compete on a message, the first one + * lexicographically takes precedence. That's deterministic and easy + * to reason about ; we'll revisit if real conflicts appear. + */ +export function matchSkillForMessage( + message: string, + skills: SkillEntry[], +): SkillEntry | null { + const haystack = message.toLowerCase() + for (const skill of skills) { + for (const trigger of skill.triggers) { + const needle = trigger.trim().toLowerCase() + if (needle.length === 0) continue + if (haystack.includes(needle)) return skill + } + } + return null +} diff --git a/packages/core/src/builder/skill-runner.ts b/packages/core/src/builder/skill-runner.ts new file mode 100644 index 0000000..ccf3bb9 --- /dev/null +++ b/packages/core/src/builder/skill-runner.ts @@ -0,0 +1,162 @@ +// Skill runner — deterministic orchestration for skills that small +// models can't reliably handle through prompt instructions alone. +// +// Today this only knows how to drive `scaffold-and-run`. The shape is +// generic enough that other skills can plug in : each runner takes +// the user prompt, calls the LLM with a tightly scoped instruction +// (one block to produce, nothing else), and returns either the +// generated content or null on failure. The CLI assembles the +// resulting actions in Mission Control. +// +// The win over a single LLM call : Mistral Small collapses +// "what the agent is" (AGENT.md) and "what the agent should do this +// time" (forge:run prompt) into one big system prompt. Splitting the +// work into two narrow calls forces the model to keep them apart. + +import { generateText } from 'ai' +import { getBuilderModel } from './provider.ts' +import type { BuilderLang } from './system-prompt.ts' + +export type ScaffoldAndRunResult = { + agentName: string + agentMdContent: string // full AGENT.md (frontmatter + body), no fences + runPrompt: string // prompt to feed forge:run +} + +const AGENT_MD_INSTRUCTION_FR = `Tu es un assistant qui produit UNIQUEMENT le contenu d'un fichier AGENT.md, rien d'autre. + +Format obligatoire (commence par \`---\`, finis par \`---\` puis le corps) : + +--- +name: +description: "Une phrase courte décrivant le rôle GÉNÉRIQUE de l'agent (pas la mission spécifique de cette session)." +sandbox: + image: agent-forge/base:latest + timeout: 120s +maxTurns: 8 +--- + +# + +Tu es un . Décris en 2 à 4 lignes le rôle GÉNÉRIQUE de l'agent. Mentionne brièvement les outils dont il dispose (forge:bash, forge:write, forge:read, forge:edit, forge:grep, forge:glob, sandboxés sous /workspace). NE liste PAS d'étapes spécifiques à la session courante — ces étapes seront passées séparément en prompt run. + +RÈGLES STRICTES : +- Ne produis QUE le contenu du fichier AGENT.md, sans \`\`\` ni texte avant/après. +- La valeur de \`description\` ne doit JAMAIS contenir de deux-points non quoté. +- N'invente pas de section "Étapes" ou "Mission" dans le corps : elles iront dans le prompt run. +- Réponds en français.` + +const AGENT_MD_INSTRUCTION_EN = `You output ONLY the content of an AGENT.md file, nothing else. + +Required format (start with \`---\`, end with \`---\` then the body) : + +--- +name: +description: "One short sentence describing the GENERIC role of the agent (not the specific mission of this session)." +sandbox: + image: agent-forge/base:latest + timeout: 120s +maxTurns: 8 +--- + +# + +You are a . Describe the GENERIC role in 2-4 lines. Briefly mention the tools available (forge:bash, forge:write, forge:read, forge:edit, forge:grep, forge:glob, sandboxed under /workspace). Do NOT list session-specific steps — those will be passed separately as the run prompt. + +STRICT RULES : +- Output ONLY the AGENT.md content, no \`\`\` and no prose before/after. +- The \`description\` value must NEVER contain an unquoted colon. +- Do not invent a "Steps" or "Mission" section in the body : that goes in the run prompt. +- Answer in English.` + +const RUN_PROMPT_INSTRUCTION_FR = `Tu es un assistant qui produit UNIQUEMENT le prompt à envoyer à un agent, rien d'autre. + +Tu vas extraire de la demande utilisateur la MISSION CONCRÈTE à exécuter, et la reformuler comme une INSTRUCTION DIRECTE adressée à l'agent (à la 2ème personne du singulier en français : « tu vas… »). Cette instruction sera passée à l'agent via un bloc forge:run. + +RÈGLES STRICTES : +- Produis UNIQUEMENT le texte du prompt, sans \`\`\`, sans préambule, sans explication. +- Décris des étapes concrètes et exécutables (pas de méta-discours). +- Ne ré-explique PAS le rôle de l'agent, il est déjà défini dans son AGENT.md. +- Si la demande mentionne du code à scaffolder, sois explicite sur le contenu attendu. +- Termine par : « Réponds en français. »` + +const RUN_PROMPT_INSTRUCTION_EN = `You output ONLY the prompt to send to an agent, nothing else. + +You extract from the user's message the CONCRETE MISSION to execute, and rephrase it as a DIRECT INSTRUCTION to the agent (second person : "you will…"). This instruction will be passed to the agent through a forge:run block. + +STRICT RULES : +- Output ONLY the prompt text, no \`\`\`, no preamble, no explanation. +- Describe concrete executable steps (no meta-talk). +- Do NOT re-explain the role of the agent, it's already defined in its AGENT.md. +- If the user mentioned code to scaffold, be explicit about the expected content. +- End with : "Answer in English."` + +function buildAgentMdInstruction(lang: BuilderLang): string { + return lang === 'fr' ? AGENT_MD_INSTRUCTION_FR : AGENT_MD_INSTRUCTION_EN +} + +function buildRunPromptInstruction(lang: BuilderLang): string { + return lang === 'fr' ? RUN_PROMPT_INSTRUCTION_FR : RUN_PROMPT_INSTRUCTION_EN +} + +const NAME_RE = /name\s*:\s*([a-z][a-z0-9-]*)/i + +function extractAgentName(agentMd: string): string | null { + const m = NAME_RE.exec(agentMd) + return m && m[1] ? m[1] : null +} + +function stripFences(text: string): string { + // The instruction tells the model NOT to wrap output in fences, but + // small models slip — strip a leading and trailing ``` if present. + let out = text.trim() + if (out.startsWith('```')) { + const firstNl = out.indexOf('\n') + if (firstNl !== -1) out = out.slice(firstNl + 1) + } + if (out.endsWith('```')) { + out = out.slice(0, -3).trimEnd() + } + return out.trim() +} + +/** + * Drive the scaffold-and-run skill end to end. Two narrow LLM calls, + * each producing exactly one artefact. The CLI then surfaces them as + * a write action + a run action in Mission Control. + * + * Returns null if either call fails to produce a recognisable + * artefact (e.g. AGENT.md without a `name:` line). The caller falls + * back to the normal flow. + */ +export async function runScaffoldAndRun(args: { + userMessage: string + lang: BuilderLang +}): Promise { + const model = getBuilderModel() + const agentMdInstruction = buildAgentMdInstruction(args.lang) + const runPromptInstruction = buildRunPromptInstruction(args.lang) + + // Call 1 : produce the AGENT.md. + const agentMdResp = await generateText({ + model, + system: agentMdInstruction, + prompt: args.userMessage, + maxTokens: 600, + }) + const agentMdContent = stripFences(agentMdResp.text) + const agentName = extractAgentName(agentMdContent) + if (!agentName) return null + + // Call 2 : produce the run prompt. + const runResp = await generateText({ + model, + system: runPromptInstruction, + prompt: args.userMessage, + maxTokens: 400, + }) + const runPrompt = stripFences(runResp.text) + if (runPrompt.length === 0) return null + + return { agentName, agentMdContent, runPrompt } +} diff --git a/packages/core/src/builder/skills/scaffold-and-run.md b/packages/core/src/builder/skills/scaffold-and-run.md new file mode 100644 index 0000000..39ecd91 --- /dev/null +++ b/packages/core/src/builder/skills/scaffold-and-run.md @@ -0,0 +1,59 @@ +--- +name: scaffold-and-run +description: When the user describes both an agent AND a concrete task to perform in the same message, propose creation AND execution in one builder turn instead of stopping after the write. +triggers: + - audite + - teste + - lance puis + - crée puis lance + - scaffolde et exécute + - audit + - test it + - then run + - create and run +actions: + - write + - run +--- + +# scaffold-and-run + +Activate this skill when the user's message describes **both** what an agent should be (its role, its tools, its workspace assumptions) **and** what it should do right now (a concrete task, mission, audit, or scenario to run once). + +When activated, you MUST : + +1. Emit a fenced ```forge:write``` block creating the AGENT.md, exactly as you would normally do. +2. In the **same turn**, immediately after, emit a fenced ```forge:run``` block targeting that same agent. The prompt inside the run block is the concrete task you extracted from the user's message — phrased as an instruction to the agent, NOT as a description of what the agent is. +3. Do NOT wait for the user to ask for the run separately. The user already gave you the full intent. +4. Do NOT mix the two blocks into one. They are two independent actions, with two independent permission dialogs. The user will approve them in order. + +Both blocks must respect their usual rules : +- `forge:write` : path `agents//AGENT.md`, full YAML frontmatter, body as system prompt for the agent. +- `forge:run` : `agent: ` matching the one you just wrote, then `---`, then the prompt. + +Example shape (do not copy literally — adapt to the user's actual request) : + +```forge:write +path: agents/code-auditor/AGENT.md +--- +--- +name: code-auditor +description: "Audits a TypeScript mini-project in /workspace." +sandbox: + image: agent-forge/base:latest + timeout: 60s +maxTurns: 8 +--- + +# code-auditor + +You are a TypeScript code auditor. Use your tools to scaffold, list, read, edit and verify. +``` + +```forge:run +agent: code-auditor +--- +Scaffold src/index.ts with two TODO functions, list workspace files, read the code, replace each `return 0` by the correct implementation, then run `node -e "require('./src/index.ts')"` to verify. Answer in French. +``` + +Keep prose minimal between the two blocks — one short sentence is enough. The cards in Mission Control are what the user will read. diff --git a/packages/core/src/builder/stream.ts b/packages/core/src/builder/stream.ts index 1722f7c..dafec65 100644 --- a/packages/core/src/builder/stream.ts +++ b/packages/core/src/builder/stream.ts @@ -7,11 +7,15 @@ // the CLI parses fenced action blocks the builder emits in plain text. See // packages/cli/src/builder-actions.ts. -import { streamText } from 'ai' +import { streamText, type CoreMessage } from 'ai' import { getBuilderModel } from './provider.ts' -import { type BuilderLang, getBuilderSystemPrompt } from './system-prompt.ts' +import { + type BuilderLang, + type SkillCatalogEntry, + getBuilderSystemPrompt, +} from './system-prompt.ts' -export type ChatRole = 'user' | 'assistant' +export type ChatRole = 'user' | 'assistant' | 'system' export type ChatMessage = { role: ChatRole @@ -21,16 +25,21 @@ export type ChatMessage = { export type StreamBuilderArgs = { messages: ChatMessage[] lang: BuilderLang + // Catalog metadata advertised to the LLM in the system prompt. + // Bodies are NOT included here — they land in the conversation only + // after the LLM emits a forge:skill block. + skills?: SkillCatalogEntry[] } export async function* streamBuilder({ messages, lang, + skills, }: StreamBuilderArgs): AsyncGenerator { const result = streamText({ model: getBuilderModel(), - system: getBuilderSystemPrompt(lang), - messages, + system: getBuilderSystemPrompt(lang, { skills }), + messages: messages as CoreMessage[], // 512 leaves room for a full forge:write block (~300 tokens) plus a // short intro sentence. Override via FORGE_MAX_TOKENS if needed. maxTokens: Number(process.env.FORGE_MAX_TOKENS ?? '512'), diff --git a/packages/core/src/builder/system-prompt.ts b/packages/core/src/builder/system-prompt.ts index 8d3d869..882adc5 100644 --- a/packages/core/src/builder/system-prompt.ts +++ b/packages/core/src/builder/system-prompt.ts @@ -142,6 +142,56 @@ ${ACTION_BLOCK_FR} Réponds toujours en français.` -export function getBuilderSystemPrompt(lang: BuilderLang): string { - return lang === 'fr' ? FR : EN +// Skill catalog metadata as injected into the system prompt. The body +// of each skill is NOT included here — it would cost too many tokens +// for skills the user never triggers. The LLM only sees the entry, +// recognises a trigger, and emits a `forge:skill` block ; the CLI +// then injects the body into the conversation as a system message, +// so the next turn carries the full skill instructions. +export type SkillCatalogEntry = { + name: string + description: string + triggers: string[] +} + +// Note : skill activation is now handled SERVER-SIDE by the CLI, not +// by the LLM. Trigger matching, runner dispatch, and write+run +// orchestration all happen in TypeScript before the LLM is even +// called for the matched user message. This keeps the small models +// out of the meta-decision business and makes the orchestration +// deterministic. +// +// We still surface the skill catalog in the system prompt as a short +// informational note, so the LLM doesn't get confused when a skill +// card appears in Mission Control — it knows skills exist and that +// they were dispatched on its behalf. + +const SKILLS_PREAMBLE_EN = `Skills available (auto-dispatched by the CLI when the user message matches a trigger ; you do NOT need to invoke them yourself) : +` + +const SKILLS_PREAMBLE_FR = `Skills disponibles (déclenchées automatiquement par la CLI quand le message utilisateur correspond à un trigger ; tu n'as PAS à les invoquer toi-même) : +` + +function renderCatalog(entries: SkillCatalogEntry[]): string { + if (entries.length === 0) return '' + return entries + .map((s) => { + const triggers = + s.triggers.length > 0 + ? ` — triggers : ${s.triggers.map((t) => `"${t}"`).join(', ')}` + : '' + return `- ${s.name} : ${s.description}${triggers}` + }) + .join('\n') +} + +export function getBuilderSystemPrompt( + lang: BuilderLang, + options: { skills?: SkillCatalogEntry[] } = {}, +): string { + const base = lang === 'fr' ? FR : EN + const entries = options.skills ?? [] + if (entries.length === 0) return base + const preamble = lang === 'fr' ? SKILLS_PREAMBLE_FR : SKILLS_PREAMBLE_EN + return `${base}\n\n${preamble}${renderCatalog(entries)}` } diff --git a/packages/core/src/types/index.ts b/packages/core/src/types/index.ts index b74f28b..505de0e 100644 --- a/packages/core/src/types/index.ts +++ b/packages/core/src/types/index.ts @@ -6,3 +6,13 @@ export { type AgentMd, type ParsedAgentMd, } from './agent-md.ts' + +export { + SkillActionTagSchema, + SkillMdError, + SkillMdSchema, + parseSkillMd, + type ParsedSkillMd, + type SkillActionTag, + type SkillMd, +} from './skill-md.ts' diff --git a/packages/core/src/types/skill-md.ts b/packages/core/src/types/skill-md.ts new file mode 100644 index 0000000..76e5083 --- /dev/null +++ b/packages/core/src/types/skill-md.ts @@ -0,0 +1,88 @@ +// SKILL.md — describes a high-level builder behaviour the LLM can load +// on demand to handle a recurring intent pattern. +// +// Format : Markdown with YAML frontmatter at the top, body below. +// Example : +// +// --- +// name: scaffold-and-run +// description: When the user describes both an agent AND a concrete task in the same message, propose creation AND execution in one turn. +// triggers: +// - "audite" +// - "teste" +// - "fais que cet agent" +// - "create and run" +// actions: +// - write +// - run +// --- +// +// # scaffold-and-run +// +// When activated, you must : +// 1. Emit a forge:write block creating the AGENT.md +// 2. In the SAME turn, emit a forge:run block targeting the agent +// with a prompt that captures the user's intent +// +// The user will see two PROPOSED cards and approve them in order. +// +// Skills are loaded into the conversation lazily : the system prompt +// only carries the catalog metadata (name + description + triggers). +// The body lands in the context only after the LLM emits a +// forge:skill block, which the CLI executes by injecting the body as +// a system message. + +import { parse as parseYaml } from 'yaml' +import { z } from 'zod' + +const FRONTMATTER_RE = /^---\s*\n([\s\S]*?)\n---\s*\n?([\s\S]*)$/ + +export const SkillActionTagSchema = z.enum(['write', 'run', 'skill']) +export type SkillActionTag = z.infer + +export const SkillMdSchema = z.object({ + name: z + .string() + .min(1) + .regex(/^[a-z][a-z0-9-]*$/, 'name must be kebab-case (lowercase, digits, hyphens)'), + description: z.string().min(1), + triggers: z.array(z.string().min(1)).default([]), + actions: z.array(SkillActionTagSchema).default([]), +}) + +export type SkillMd = z.infer + +export type ParsedSkillMd = { + meta: SkillMd + body: string +} + +export class SkillMdError extends Error { + constructor(message: string, public readonly cause?: unknown) { + super(message) + this.name = 'SkillMdError' + } +} + +export function parseSkillMd(text: string): ParsedSkillMd { + const match = text.match(FRONTMATTER_RE) + if (!match) { + throw new SkillMdError( + 'SKILL.md must start with a YAML frontmatter block delimited by ---', + ) + } + const [, yamlText, body] = match + let parsedYaml: unknown + try { + parsedYaml = parseYaml(yamlText ?? '') + } catch (err) { + throw new SkillMdError('Invalid YAML in SKILL.md frontmatter', err) + } + const result = SkillMdSchema.safeParse(parsedYaml) + if (!result.success) { + const first = result.error.issues[0] + const path = first?.path.join('.') ?? '' + throw new SkillMdError(`Invalid SKILL.md : ${path} — ${first?.message ?? 'unknown error'}`) + } + return { meta: result.data, body: (body ?? '').trim() } +} diff --git a/packages/core/tests/skill-catalog.test.ts b/packages/core/tests/skill-catalog.test.ts new file mode 100644 index 0000000..fa86fd2 --- /dev/null +++ b/packages/core/tests/skill-catalog.test.ts @@ -0,0 +1,27 @@ +// Catalog loader tests : the built-in scaffold-and-run skill must be +// discoverable, parseable, and the resulting entry must carry name + +// description + triggers + body. + +import { describe, expect, test } from 'bun:test' +import { loadSkillCatalog } from '../src/builder/skill-catalog.ts' + +describe('loadSkillCatalog', () => { + test('discovers the built-in scaffold-and-run skill', () => { + const cat = loadSkillCatalog() + const s = cat.byName.get('scaffold-and-run') + expect(s).toBeDefined() + if (!s) return + expect(s.source).toBe('builtin') + expect(s.description.length).toBeGreaterThan(0) + expect(s.body.length).toBeGreaterThan(0) + expect(s.triggers.length).toBeGreaterThan(0) + expect(s.actions).toEqual(expect.arrayContaining(['write', 'run'])) + }) + + test('catalog skills are sorted by name', () => { + const cat = loadSkillCatalog() + const names = cat.skills.map((s) => s.name) + const sorted = [...names].sort((a, b) => a.localeCompare(b)) + expect(names).toEqual(sorted) + }) +}) diff --git a/packages/core/tests/skill-matcher.test.ts b/packages/core/tests/skill-matcher.test.ts new file mode 100644 index 0000000..1950385 --- /dev/null +++ b/packages/core/tests/skill-matcher.test.ts @@ -0,0 +1,43 @@ +import { describe, expect, test } from 'bun:test' +import { matchSkillForMessage } from '../src/builder/skill-matcher.ts' +import type { SkillEntry } from '../src/builder/skill-catalog.ts' + +const fakeSkill = ( + name: string, + triggers: string[], +): SkillEntry => ({ + name, + description: 'desc', + triggers, + actions: [], + body: 'body', + source: 'builtin', + filePath: '', +}) + +describe('matchSkillForMessage', () => { + test('matches a trigger as case-insensitive substring', () => { + const skill = fakeSkill('scaffold-and-run', ['audite', 'teste']) + const r = matchSkillForMessage('Audite ce projet TypeScript stp', [skill]) + expect(r?.name).toBe('scaffold-and-run') + }) + + test('returns null when no trigger matches', () => { + const skill = fakeSkill('scaffold-and-run', ['audite']) + const r = matchSkillForMessage('crée un agent qui écrit des haïkus', [skill]) + expect(r).toBeNull() + }) + + test('first skill in the list wins on multi-match', () => { + const a = fakeSkill('a-skill', ['shared']) + const b = fakeSkill('b-skill', ['shared']) + const r = matchSkillForMessage('shared keyword present', [a, b]) + expect(r?.name).toBe('a-skill') + }) + + test('empty trigger is ignored', () => { + const skill = fakeSkill('x', ['', ' ']) + const r = matchSkillForMessage('anything goes here', [skill]) + expect(r).toBeNull() + }) +}) diff --git a/packages/core/tests/skill-md.test.ts b/packages/core/tests/skill-md.test.ts new file mode 100644 index 0000000..c383601 --- /dev/null +++ b/packages/core/tests/skill-md.test.ts @@ -0,0 +1,65 @@ +// Schema and parser tests for SKILL.md. + +import { describe, expect, test } from 'bun:test' +import { SkillMdError, parseSkillMd } from '../src/types/skill-md.ts' + +describe('parseSkillMd', () => { + test('parses a minimal valid skill', () => { + const md = `--- +name: scaffold-and-run +description: Create then run in one turn. +--- + +Body goes here.` + const r = parseSkillMd(md) + expect(r.meta.name).toBe('scaffold-and-run') + expect(r.meta.description).toBe('Create then run in one turn.') + expect(r.meta.triggers).toEqual([]) + expect(r.meta.actions).toEqual([]) + expect(r.body).toBe('Body goes here.') + }) + + test('parses triggers and actions arrays', () => { + const md = `--- +name: x +description: y +triggers: + - audite + - test +actions: + - write + - run +--- + +body` + const r = parseSkillMd(md) + expect(r.meta.triggers).toEqual(['audite', 'test']) + expect(r.meta.actions).toEqual(['write', 'run']) + }) + + test('rejects a non kebab-case name', () => { + const md = `--- +name: ScaffoldAndRun +description: invalid +--- + +body` + expect(() => parseSkillMd(md)).toThrow(SkillMdError) + }) + + test('rejects missing frontmatter', () => { + expect(() => parseSkillMd('# no frontmatter')).toThrow(SkillMdError) + }) + + test('rejects an unknown action tag', () => { + const md = `--- +name: x +description: y +actions: + - bogus +--- + +body` + expect(() => parseSkillMd(md)).toThrow(SkillMdError) + }) +}) diff --git a/packages/core/tests/system-prompt.test.ts b/packages/core/tests/system-prompt.test.ts new file mode 100644 index 0000000..6268597 --- /dev/null +++ b/packages/core/tests/system-prompt.test.ts @@ -0,0 +1,43 @@ +// System prompt — verify that the skill catalog metadata is appended +// when entries are provided (skills are auto-dispatched by the CLI ; +// the LLM only sees them as an informational note). + +import { describe, expect, test } from 'bun:test' +import { getBuilderSystemPrompt } from '../src/builder/system-prompt.ts' + +describe('getBuilderSystemPrompt', () => { + test('returns the base prompt when no skills are provided', () => { + const en = getBuilderSystemPrompt('en') + expect(en).toContain('Agent Forge builder') + expect(en).not.toContain('Skills available') + }) + + test('appends an informational skill list when entries are passed', () => { + const en = getBuilderSystemPrompt('en', { + skills: [ + { + name: 'scaffold-and-run', + description: 'Create then run.', + triggers: ['audite', 'test'], + }, + ], + }) + expect(en).toContain('Skills available') + expect(en).toContain('auto-dispatched') + expect(en).toContain('scaffold-and-run') + expect(en).toContain('Create then run.') + expect(en).toContain('"audite", "test"') + // The base prompt comes first ; the skill note is a tail. + expect(en.indexOf('Agent Forge builder')).toBeLessThan( + en.indexOf('Skills available'), + ) + }) + + test('FR variant uses French wording', () => { + const fr = getBuilderSystemPrompt('fr', { + skills: [{ name: 'x', description: 'y', triggers: [] }], + }) + expect(fr).toContain('Skills disponibles') + expect(fr).toContain('automatiquement par la CLI') + }) +}) diff --git a/packages/runtime/README.md b/packages/runtime/README.md index c852792..702f4df 100644 --- a/packages/runtime/README.md +++ b/packages/runtime/README.md @@ -2,16 +2,25 @@ Le process qui tourne **à l'intérieur** des containers Docker lancés par Agent Forge. -## Ce que ça fait (état P3) +## Ce que ça fait (état P4) 1. Lit le fichier `/agent/AGENT.md` monté en lecture seule dans le container 2. Sépare le frontmatter (validé Zod côté host) du corps Markdown -3. Utilise le corps comme **system prompt** de l'agent +3. Utilise le corps comme **system prompt** de l'agent, plus une section TOOLS qui décrit les six tools disponibles 4. Récupère le prompt utilisateur via stdin -5. Streame la réponse du LLM (`streamText` du Vercel AI SDK) sur stdout, chunk par chunk +5. **Tool loop multi-turns** : + - streame la réponse du LLM (`streamText` du Vercel AI SDK) sur stdout, chunk par chunk + - parse le premier bloc `forge:*` que l'agent émet + - exécute le tool correspondant (Bash / FileWrite / FileRead / FileEdit / Grep / Glob) + - réinjecte le résultat structuré comme message utilisateur dans la conversation + - boucle jusqu'à ce que l'agent réponde sans bloc OU que `maxTurns` soit atteint (cap dur à 10) 6. Sort avec le code 0 quand le LLM a fini -Le container est lancé avec `docker run --rm -i`, donc il est détruit dès la sortie. +Le container est lancé avec `docker run --rm -i`, donc il est détruit dès la sortie. Le `/workspace` (bind-mount RW) est conservé sur le host pour inspection / extraction d'artefacts (P5). + +## Protocole tool agent-side + +Voir `src/tool-protocol.ts` pour le parser et les renderers de résultats. Les six tags reconnus sont `forge:bash`, `forge:write`, `forge:read`, `forge:edit`, `forge:grep`, `forge:glob`. Les résultats sont écrits sur stdout entre marqueurs `[forge:tool]` / `[/forge:tool]` pour que le host TUI puisse les router dans la card Mission Control. ## Variables d'environnement @@ -21,6 +30,7 @@ Héritées du host par le `DockerLaunch` tool : FORGE_BASE_URL endpoint OpenAI-compatible FORGE_API_KEY clé (peut être vide pour MLX local) FORGE_MODEL nom du modèle +FORGE_MAX_TOKENS optionnel, default 1024 par tour ``` ## Build @@ -33,7 +43,6 @@ Produit `dist/runtime.mjs`. **Cible Node, pas Bun** — les containers tournent ## À venir -- **P4** — exposer six tools natifs (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) à l'agent depuis l'intérieur du container -- **P5** — agents persistants via `docker exec` (au lieu de `docker run --rm` jetable) +- **P5** — sandbox durci (read-only root FS, network policy, resource caps), agents persistants via `docker exec` au lieu de `docker run --rm` jetable - **P5** — extraction d'artefacts du `/workspace` du container vers le host -- **P6** — `claude-presence` MCP pour la coordination entre agents d'une même team +- **P7** — `claude-presence` MCP pour la coordination entre agents d'une même team diff --git a/packages/tools-core/README.md b/packages/tools-core/README.md index e55ec9b..acff97f 100644 --- a/packages/tools-core/README.md +++ b/packages/tools-core/README.md @@ -2,32 +2,39 @@ Tools natifs partagés entre le builder (côté host) et le runtime (dans le container). -## État P3 +## État P4 -Deux tools livrés et utilisés dans le parcours `forge` : +### Tools host -- **`FileWrite`** — écrit sous `~/.agent-forge/agents//` avec sandbox de chemin (refuse tout `..`, refuse les écrasements sauf `overwrite: true` quand l'utilisateur a confirmé dans le dialog de permission). Schéma Zod sur l'input. -- **`DockerLaunch`** — `launchAgent({ agent, prompt })` : retourne un handle `{ containerName, events: AsyncGenerator, abort }`. Spawn `docker run --rm -i`, monte `AGENT.md` + le bundle runtime, hérite des env vars provider, force le cleanup en `try/finally`. +Utilisés par le builder pour préparer / lancer les agents : -## Tools prévus pour P4 +- **`FileWrite`** (`src/file-write.ts`) — écrit sous `~/.agent-forge/agents//` avec sandbox de chemin (refuse tout `..`, refuse les écrasements sauf `overwrite: true` quand l'utilisateur a confirmé dans le dialog de permission). Schéma Zod sur l'input. +- **`DockerLaunch`** (`src/docker-launch.ts`) — `launchAgent({ agent, prompt })` retourne un handle `{ containerName, events: AsyncGenerator, abort }`. Spawn `docker run --rm -i`, monte `AGENT.md` + le bundle runtime + un `/workspace` RW propre par run, hérite des env vars provider, force le cleanup en `try/finally`. -Depuis l'intérieur du container, accessibles à l'agent : +### Tools runtime (in-container) -- **`Bash`** — exécution shell, restreinte au `/workspace` -- **`FileRead`** — lecture avec offset/limit -- **`FileEdit`** — patch par `old_string` / `new_string` -- **`FileWrite`** — version "in-container" (différente de la version builder host) -- **`Grep`** — recherche ripgrep -- **`Glob`** — pattern matching +Utilisés par les agents eux-mêmes via le tool-loop du runtime, tous sandboxés sous `/workspace` : + +- **`Bash`** (`src/runtime/bash.ts`) — exécution shell (`bash -lc`), timeout 30 s par défaut (max 120 s), output clippé à 16 Ko +- **`FileWrite`** (`src/runtime/file-write.ts`) — version in-container, écrase par défaut (différente de la version host qui est stricte) +- **`FileRead`** (`src/runtime/file-read.ts`) — offset/limit en lignes, clip 16 Ko, refuse les non-fichiers +- **`FileEdit`** (`src/runtime/file-edit.ts`) — patch par sous-chaîne exacte, refuse les matchs ambigus sauf `replaceAll: true` +- **`Grep`** (`src/runtime/grep.ts`) — regex JS pure sur un filtre glob optionnel, ignore les binaires (octets nuls dans les 4 Ko de tête), 200 hits max, lignes clippées à 400 chars +- **`Glob`** (`src/runtime/glob.ts`) — matcher fait main pour `*` / `**` / `?`, 200 résultats max, walk borné à 5000 nodes + +Tous les tools runtime utilisent `resolveSandboxedPath` pour valider les chemins. La racine sandbox est `/workspace` en production ; pour les tests, `FORGE_WORKSPACE` peut la rediriger vers un dossier temp. ## Interface tool -```ts -type Tool = { - name: string - schema: ZodSchema - run(input: Input, ctx: ToolContext): AsyncGenerator -} -``` +Pattern Vercel AI SDK : Zod schema + fonction pure `execute*` qui retourne un résultat structuré (`{ ok: true, … }` ou `{ ok: false, error: string }`). Pas d'instances ni d'effets cachés — chaque appel est self-contained, ce qui simplifie les tests. + +## Tests -Pattern emprunté à l'analyse OpenClaude (`../../analyse/06-tools-system.md`). +`tests/` couvre : +- `file-write.test.ts` — host FileWrite (path safety, sandbox escape, refus d'écrasement) +- `runtime-bash.test.ts` — stdout / stderr / exit / timeout / cwd +- `runtime-file-write.test.ts` — sandbox escape, traversal, écrasement, parent-dir +- `runtime-file-read.test.ts` — offset/limit, fichier manquant, sandbox escape +- `runtime-file-edit.test.ts` — match unique, ambiguïté, replaceAll, missing oldString +- `runtime-grep.test.ts` — case sensitivity, glob filter, regex invalide +- `runtime-glob.test.ts` — `**/*`, `*` mono-segment, `?`, no-match