Skip to content

feat(p6): skill layer — orchestration patterns for the builder#83

Merged
garniergeorges merged 6 commits into
devfrom
feat/p6-skills
Apr 27, 2026
Merged

feat(p6): skill layer — orchestration patterns for the builder#83
garniergeorges merged 6 commits into
devfrom
feat/p6-skills

Conversation

@garniergeorges

Copy link
Copy Markdown
Owner

Summary

P6 (early). Adds a skill layer that lets the builder load high-level orchestration patterns on demand instead of stopping mid-intent. Ships one built-in skill — `scaffold-and-run` — that fixes the most visible failure mode today : when the user describes BOTH the agent AND a concrete task in the same message, the builder writes the AGENT.md and stops. With this skill loaded, it emits a `forge:write` and a `forge:run` in the same turn.

How it works

Step Where
Discover SKILL.md files `packages/core/src/builder/skill-catalog.ts` reads built-ins from the package + user skills from `~/.agent-forge/skills/`
Advertise to the LLM The system prompt gains an `AVAILABLE SKILLS` / `SKILLS DISPONIBLES` section with name + description + triggers (bodies excluded to keep tokens low)
LLM activates a skill Emits a fenced `forge:skill` block with the skill name
CLI loads the body The catalog resolves the body, the action card lands in Mission Control as DONE, and the body is appended as a system message in the same turn
Next turn The builder sees the full skill instructions and follows them

Files

core

  • `src/types/skill-md.ts` — Zod schema + parser for SKILL.md
  • `src/builder/skill-catalog.ts` — loader (built-in + user, user wins on name collision, sorted)
  • `src/builder/skills/scaffold-and-run.md` — first built-in
  • `src/builder/system-prompt.ts` — `SkillCatalogEntry` parameter, FR/EN section
  • `src/builder/stream.ts` — passes `skills` through

cli

  • `src/builder-actions.ts` — `forge:skill` block parser, `SkillActionExecution` type, `executeAction(skill, { resolveSkill })`
  • `src/actions/types.ts` — `SkillAction` (auto-running, no permission)
  • `src/components/MissionControl.tsx` — `SkillCard` (description + green tick when loaded)
  • `src/components/CardDetail.tsx` — full body view for skill actions
  • `src/hooks/useChat.ts` — loads the catalog once, threads it to `streamBuilder` and `executeAction`, injects the resolved body as a system message in the same turn
  • `src/commands.ts` — `/skills` lists what's available

Tests

  • SKILL.md schema validation (kebab-case, missing frontmatter, unknown action tag)
  • Catalog loader (built-in present, sorted)
  • System prompt (no SKILLS section when empty, FR/EN headers when present)
  • `forge:skill` parser (`name:` prefix, bare line, kebab-case rejection)
  • `executeAction(skill)` round-trip + missing skill case

Test plan

  • `bun install && bun test` passes
  • `bun run forge` boots, `/skills` lists `scaffold-and-run · built-in`
  • User says : "crée et lance un agent qui audite un projet TypeScript" → builder emits `forge:skill` then a `forge:write` AND a `forge:run` in the SAME turn
  • Three cards appear in Mission Control : skill (DONE), write (PROPOSED), run (PROPOSED) ; the user approves write and run in order
  • Tab-cycling reaches the skill card ; Enter shows its full body in detail
  • check-attribution CI is green

Out of scope

  • Skill-to-skill chaining (decided out for P6, possibly later)
  • Hot-reload skills on filesystem change (`/skills reload` command)
  • Documenting skill authoring in user-facing docs (P7+)

…lder

The builder now has access to a catalog of skills : self-contained
behaviour modules that orchestrate multiple actions in a single turn
to handle recurring intent patterns. First built-in :
scaffold-and-run, which fixes the "user describes both creation and
execution but the builder stops after writing AGENT.md" pattern by
making the LLM emit a forge:write AND a forge:run in the same turn.

Architecture :
  - SKILL.md format with YAML frontmatter (name, description,
    triggers, actions) and a markdown body containing the
    instructions. Mirrors AGENT.md to stay familiar.
  - Catalog loader discovers skills from two sources : built-ins
    shipped in packages/core/src/builder/skills/, plus user skills
    under ~/.agent-forge/skills/. User skills override built-ins on
    name collision.
  - System prompt only carries the catalog metadata (name +
    description + triggers) — bodies stay out of the context until
    the LLM emits a forge:skill block, then the resolved body is
    injected as a system message for the next turn.
  - New ParsedAction kind 'skill' with a forge:skill fenced block
    parser ; tolerant of either `name: <skill>` or a bare line.
  - SkillAction joins WriteAction/RunAction in the action store.
    Skills auto-execute (no permission dialog) and surface as their
    own card in Mission Control with a "loaded into context" hint.
    CardDetail renders the description plus the full body so the
    user can see what the skill actually injects.

Other :
  - /skills slash command lists what's available, source-tagged.
  - useChat resolves the catalog once via useMemo, threads it to
    streamBuilder and to executeAction's resolveSkill.

Tests :
  - SKILL.md schema (kebab-case name, missing frontmatter,
    unknown action tag).
  - Catalog loader discovers the built-in and sorts entries.
  - System prompt injects the SKILLS section when entries are
    provided, FR/EN headers, base prompt unchanged when empty.
  - forge:skill parser (name: prefix, bare line, kebab-case
    rejection) and executeAction(skill) round-trip via resolver.
Symptom : the user typed a message that matched a skill trigger
("audite un projet typescript") but the builder still skipped
forge:skill and went straight to forge:write. The skill catalog was
loaded, the system prompt mentioned it, but Mistral would not act on
it.

Two issues, both fixed here :

1. Position. The SKILLS section was appended AFTER the base prompt's
   "BE DECISIVE — propose the AGENT.md immediately" rule. Mistral read
   the strong push to write first, then a soft "you can also use a
   skill" at the bottom, and ignored the latter.

2. Framing. The original wording said "choose a skill when ...". Too
   permissive — small models read that as optional. Replaced with a
   STEP 0 / ÉTAPE 0 framing : an explicit, mandatory pre-flight check
   that runs BEFORE any other action. If any trigger phrase matches
   (case-insensitive substring), the LLM MUST emit a forge:skill
   block as the only action of that turn ; only then does the rest
   of the protocol apply.

The catalog is now placed at the TOP of the system prompt, before
the "be decisive" rule, so the order of reading mirrors the order
of execution.

Tests updated for the new wording. Triggers are now quoted in the
catalog rendering ("audite", "teste") so the LLM sees them as
literals rather than running prose.
Mistral Small reads the skill catalog and the STEP 0 instruction in
the system prompt, but it does not act on it : it sees "audite a
typescript project" and goes straight to a forge:write that
collapses both the agent definition and the run mission into one
giant AGENT.md body. Adding more rules to the prompt didn't move
the needle.

Plan B, in three pieces :

1. matchSkillForMessage() — case-insensitive substring match against
   the trigger phrases declared in each SKILL.md. Lives in core, no
   LLM involvement.

2. runScaffoldAndRun() — a dedicated runner that drives the skill
   end to end with TWO narrow LLM calls instead of one wide one :
     - call A : "produce ONLY the AGENT.md content" (generic role,
       no session-specific steps in the body)
     - call B : "produce ONLY the prompt to send to the agent"
   Each call has a tightly scoped system instruction so the model
   keeps the two artefacts cleanly separated. Output is parsed
   server-side, AGENT.md name extracted from the frontmatter.

3. useChat.send() — pre-flight before the normal stream : if the
   matcher finds a skill, dispatch to the runner. The skill card
   lands in Mission Control as DONE, then a write card and a run
   card appear as PROPOSED. The user approves them in order via
   the existing permission dialog.

The system prompt no longer carries the STEP 0 / ÉTAPE 0 mandate.
Skills are now an internal mechanism the LLM is informed about but
never asked to operate. The catalog metadata stays in the prompt as
a short tail note so the LLM understands why a skill card might
appear in Mission Control.

Tests :
- matchSkillForMessage : substring match, no-match, multi-skill
  precedence (first wins), empty trigger ignored.
- system prompt : informational note appears when skills are
  passed, FR/EN variants, base prompt comes first.
The detail screens (Esc-Tab-Enter on a Mission Control card) used to
render every action body through highlightPlain — everything came
back as undifferentiated grey, which made long AGENT.md / agent run
outputs hard to scan.

Replaced by per-shape highlighting :

- skill detail : Markdown highlighter (headings, lists, inline code,
  bold, fenced blocks). The skill body is markdown, so this matches.

- write detail : if the file has YAML frontmatter (which AGENT.md
  always does), split frontmatter and body. The frontmatter goes
  through the YAML highlighter (already existing), the body through
  the new Markdown one. Falls back to plain YAML for files without
  frontmatter.

- run detail (and the compact run card in Mission Control) : a new
  highlightAgentRun() that walks the streamed output and recognises :
  · fenced ```forge:* blocks (open line orange, body via the
    matching language highlighter — JSON for forge:bash/write/etc.)
  · [forge:tool] / [/forge:tool] markers wrapping the result of the
    previous tool call (rendered dim grey so it visually recedes)
  · regular prose with inline code spans and bold

New helpers in syntax.ts :
  highlightMarkdown(text)
  highlightAgentRun(text)
  highlightYamlLine / highlightJsonLine kept exported for the run
  highlighter to delegate to.

The compact card in MissionControl now also uses highlightAgentRun
so the streaming output during a long run reads the same as the
detail view, just clipped at maxLines.
When several agents run in a session, the panel was stacking 6+
fully-expanded cards and overflowing the terminal. Two changes :

1. Compact mode by default. Each non-focused card now renders as a
   single line : badge + verb + target. Borders disappear, the
   terminal stays calm. The focused card expands to its full preview
   like before. Cards in 'running' status stay expanded too, so a
   streaming agent run remains visible without having to Tab to it.

2. Bounded viewport. Mission Control now takes a panelHeight prop
   (computed by App from the terminal rows minus a Welcome floor
   and a spacer) and slices the action list to fit. Truncated
   actions show as "↑ N above / ↓ N below" hints in the panel
   header. Welcome stays glued to the bottom with flexShrink=0, so
   the panel is what gives way on small terminals.

useCardFocus extended :
  - scrollTop : action-index offset, advanced via PgUp/PgDn ;
  - auto-focus the last new arrival when nothing is focused (the
    user immediately sees what the builder just produced) ;
  - auto-scroll lower bound : focusing an action above scrollTop
    bumps scrollTop down to keep it visible.

App now routes PgUp/PgDn to the Mission Control scroll when there
are actions and the prompt is empty (or a card is focused). It
keeps falling back to the chat transcript scroll otherwise. Tab,
Shift+Tab, Enter, Esc unchanged.
Status badge moves to P6 done. Roadmap table lifts P4 and P6 to
✅, points to P5 (hardened sandbox + persistent agents +
artifact extraction) as the next milestone.

Root README (EN/FR) gains :
  - Native tools section : six-tool table (Bash, FileWrite,
    FileRead, FileEdit, Grep, Glob) with their tags and limits ;
    a short note explaining the choice of a text-structured
    forge:* protocol over OpenAI tool_calls.
  - Skills section : SKILL.md format, two sources (built-in and
    ~/.agent-forge/skills/), server-side matcher, two-call
    runner, scaffold-and-run as the first built-in.
  - Mission Control keyboard cheatsheet : Tab / Enter / Esc /
    PgUp/PgDn / Ctrl+E.
  - /skills slash command listed.
  - Architecture diagram updated : skill catalog + runner on the
    host side, /workspace mount + tool loop on the container
    side, persistence of the workspace dir after exit.
  - Repo structure shows packages/core/src/builder/skills/,
    runtime/src/tool-protocol.ts, and the runtime/ subdir under
    tools-core.

Sub-package READMEs realigned :
  - packages/cli : compact / expanded card mode, scrollable
    viewport, focus + auto-scroll, detail view, full keyboard
    map, dispatch skill server-side mention.
  - packages/core : skill catalog / matcher / runner files
    listed, scaffold-and-run noted as built-in.
  - packages/runtime : multi-turn tool loop documented, six
    forge:* tags, [forge:tool] markers on stdout, FORGE_MAX_TOKENS
    env var.
  - packages/tools-core : separate "host tools" and "runtime
    tools" sections ; six runtime tools with their constraints ;
    test layout listed.
@garniergeorges garniergeorges merged commit 9d22fca into dev Apr 27, 2026
1 check passed
@garniergeorges garniergeorges deleted the feat/p6-skills branch April 27, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant