Agentic Engineering — Documentation-First Development

Eight skills that give coding agents persistent memory, zero-ambiguity contracts, a git contract for how code lands, and an independent definition of "done" — the harness layer for long-running, loop-driven development.

The Core Problem

AI Agents are stateless. Every new session starts from zero — your naming conventions, error handling patterns, architectural constraints are unknown. Without structured documentation, the Agent guesses, causing context collapse: code that runs but is architecturally incoherent.

The most effective agentic coding workflow starts before the first prompt — with a written spec the agent builds to, not assumptions it makes on its own. — Blink

What this suite solves:

Agent produces code that contradicts existing architecture → SPEC provides the contract
Agent makes decisions you already resolved → ADR preserves decision history
New session has no idea what happened last time → STATUS provides memory
Agent builds the wrong thing → PRD defines what and why
Agent does tasks in wrong order or scope → IMPL PLAN defines the sequence
Agent claims "done" when it isn't → VERIFICATION makes done a verdict, not a claim
Code lands as untraceable commits on main → GIT WORKFLOW ships each task as an evidence-carrying PR
Crash mid-session loses all progress → STATUS checkpoints at task granularity
Tools each demand their own config → AGENTS.md is the single entry point

How the Skills Flow

ADRs can be triggered at any point along the chain — whenever a decision fork appears in PRD, SPEC, or IMPL PLAN work. VERIFY loops every task: build → independent verdict → only PASS marks ✅. STATUS loops every session forever. Code travels per GIT WORKFLOW: a branch per task, a draft PR carrying the done condition, ready only on PASS, merged only by a human.

The Skills

Each skill is a directory containing a SKILL.md (the shared format used by Claude Code and Codex). Click any skill for its core mechanisms.

1. agents-md-template — The Door · bootstraps AGENTS.md as the single source of truth every tool points to

Bootstraps a repo with AGENTS.md as the single source of truth every agent reads before doing anything. Produces AGENTS.md + pointer files.

Mechanism	What it does
One door	`CLAUDE.md`, `copilot-instructions.md`, `.cursor/rules/` are one-line pointers to `AGENTS.md` — every tool gets identical instructions, zero config drift
Six-section anatomy	Project identity → documentation chain → coding conventions → architecture constraints → boundaries → verification
Session protocol	Read memory → code by contract → checkpoint per task → independent verification → handoff. The agent's standing orders for every session
Explicit boundaries	A "must NOT" list: no self-grading, no deleting tests to go green, no redefining "done" mid-task, no relitigating locked decisions
Three-layer verification	Producer self-check (necessary, never sufficient) → independent gate (fresh-context PASS required) → human gate (read the diff before merge)

2. project-kickoff-prd — The Why · phased dialogue that turns a rough idea into a prioritized PRD

Turns a rough idea into a structured PRD through phased dialogue — the agent is a co-creator, not a note-taker. Produces docs/prd.md.

Mechanism	What it does
Co-creator stance	The agent challenges weak assumptions, surfaces forgotten edge cases, and makes recommendations — it never just transcribes
Five-phase dialogue	Problem & vision → features & boundaries → technical direction → extensibility → structured PRD. One phase at a time, confirmation gate between each
Priority triage	Every feature lands in P0 (product fails without it) / P1 (ship a week later) / P2 (defer), with explicit out-of-scope and the reasons why
Convergence pressure	2–4 exchanges per phase, never a 20-question interview; vague scope ("it should handle everything") is rejected with "give me three examples"
ADR candidate flagging	Consequential choices made mid-dialogue are flagged for capture before they silently become architecture

3. technical-specification — The Contract · zero-ambiguity specs with machine-verifiable acceptance criteria

Hardens the PRD into implementation contracts any agent can build from with zero prior context and zero follow-up questions. Produces docs/spec/[domain].md, one per bounded context.

Mechanism	What it does
Zero-ambiguity rule	Every number concrete, every boundary has a violation response, "etc." is banned, "should/may" become "must/must not"
Bounded-context split	One spec per domain; a spec past 500 lines is covering too much and gets split
Six-phase coverage	Data model (with invariants) → API contracts (rate limits, idempotency, retries) → state machines (invalid transitions defined) → error taxonomy → non-functional requirements → acceptance criteria
Machine-verifiable criteria	Every acceptance criterion binds Verify via (the exact command) + Evidence (what output proves it); human-judgment checks are explicitly `MANUAL` — and treated as a smell
Immutable once approved	Changes require a new version, a changelog, and the ADR that motivated them — the contract can't drift under the implementer

4. architecture-decision-record — The Law · append-only decision history with trade-offs and review triggers

Captures every decision fork with context, options, and trade-offs — append-only, written for the reader six months from now. Produces docs/adr/ADR-NNN-[slug].md.

Mechanism	What it does
Fork detection	The agent recognizes "X or Y?" moments — including implicit ones ("let's just use X") — and proposes capturing them before they evaporate
Append-only history	Accepted ADRs are never edited; they're superseded by new ones. History stays honest, numbers are never reused
Fair options analysis	Rejected options keep their real advantages on record — no strawmanning; future readers may revisit them when context changes
Reversibility + review triggers	Every decision records how hard it is to undo (Easy → Irreversible) and the concrete conditions for reconsidering it
Anti-relitigation	Locked decisions are quick-referenced in `AGENTS.md` and STATUS — agents don't reopen them without the user

5. implementation-plan — The Sequence · dependency-ordered atomic tasks with locked done conditions

Decomposes approved specs into dependency-ordered atomic tasks any agent can execute in a single session. Produces docs/plans/implementation-plan.md.

Mechanism	What it does
Atomic tasks	Self-contained, ≤ 90 minutes, ≤ 5 files, with named file paths and spec references — zero reading between the lines
Dependency order, not importance	The flashy feature waits for its boring infrastructure; every milestone ends in demonstrable state you can run, show, or prove
Locked done conditions	Each task's acceptance criteria bind executable commands and are frozen at task start — the executing agent cannot weaken them mid-run
Done is a verdict	A task completes when the independent verifier returns PASS — never when the producer claims it
Worktree isolation	Parallel tasks get parallel git worktrees, merging through the verification gate; review bandwidth, not git, is the real parallelism ceiling
Risk register + rollback points	Risks acknowledged before they become surprises; milestones double as safe pause-or-pivot points

6. independent-verification — The Gate · fresh-context verifier turns "done" into an evidence-backed verdict

The maker-checker gate: a verifier with fresh context executes the done condition as written and returns a verdict with evidence. Produces docs/verification-log.md.

Mechanism	What it does
Role separation	The producer builds and self-checks; a separate verifier judges. The model that wrote the code never grades it
Fresh-context requirement	The verifier runs as a sub-agent or fresh session and never sees the producer's conversation — inherited reasoning means inherited blind spots
Evidence-based verdicts	Binary PASS/FAIL per criterion with commands, exit codes, and key output — "looks good" is not a verdict
Test ratchet	The suite count only goes up; a deleted or skipped test is an automatic FAIL regardless of everything else
Circuit breaker	Three consecutive FAILs on one task stops the loop and escalates to the user — no infinite fix-verify ping-pong
Human gate	A machine PASS is necessary, not sufficient: a human reads the diff and can explain it before merge
Cost discipline	Dual-opinion where stakes justify it — always for main, data, auth, money; a cheaper model is fine for mechanical checks

7. status-tracker — The Memory · live progress, in-run checkpoints, crash recovery, session handoffs

The bridge between sessions: live progress, in-run checkpoints, and an append-only handoff log. Produces docs/status.md.

Mechanism	What it does
Session handoff log	Append-only, newest first: what was built, how it connects, what was verified (exact counts), caveats, and the literal next action
In-flight checkpoint	Updated at task granularity during the session — a crash costs at most one task, never the session
Crash detection	A checkpoint that isn't `none` at session start means the last session died mid-work; the checkpoint is the recovery point
Four-state lifecycle	⬜ not started → 🟡 in progress → 🔍 built, awaiting verification → ✅ verified done. ✅ requires a verifier verdict, not a producer claim
Numbers discipline	"232/232 pytest", named files, named components — vague entries ("made progress") are banned

8. git-workflow — The Road · branch per task, evidence-carrying draft PRs, merge only through both gates

Defines how code actually lands: one branch and one PR per task, verification evidence in the PR, merge gated by verdict and human. Produces task branches + draft PRs.

Mechanism	What it does
Protected main	main is read-only for agents — no exceptions, including "trivial" fixes
One task, one branch, one PR	`task/<task-id>-<slug>` named from plan task IDs; traceability runs diff → PR → task → spec
Evidence-carrying PRs	The PR description holds the done condition and the verdict reference — review starts from the contract, not the raw diff
Draft until PASS	PRs are born draft; ready-for-review is earned by the verifier's PASS, never by the producer's confidence
Double-gated merge	Verifier PASS + a human who read the diff; the agent never merges its own PR — in unattended loops, merge is always the human's move
Evidence integrity	No history rewriting once verification starts; branches deleted only after merge
Worktree isolation	Parallel agents get one worktree each — mechanical conflicts solved, human review bandwidth still the ceiling

Quick Start

Claude Code — plugin marketplace (recommended):

/plugin marketplace add affectionatec/agentic-engineering
/plugin install agentic-engineering@agentic-engineering

Then, in any project:

/using-agentic-engineering — the entry point: assesses which chain documents exist, reports where the project stands, and routes you to the right skill
/run-loop M2 — the chain-aware loop driver: branch → build → draft PR → dispatch the verifier → checkpoint → next task. Stops on the circuit breaker (3 FAILs) or the task budget, and never merges
A ready-made verifier sub-agent ships with the plugin (@agent-agentic-engineering:verifier) — fresh context by construction, writes nothing but the verification log
The eight skills auto-trigger from their frontmatter descriptions ("write the spec", "where are we", "verify this task") — or invoke one explicitly: /agentic-engineering:independent-verification, or just say "use the project-kickoff-prd skill: I want to build …"

Manual install — personal symlinks · single project · Codex · Cursor / Copilot / any agent

Claude Code — personal, all projects (no plugin system needed):

git clone https://github.com/affectionatec/agentic-engineering.git ~/src/agentic-engineering

mkdir -p ~/.claude/skills
for skill in agents-md-template architecture-decision-record implementation-plan \
             independent-verification project-kickoff-prd status-tracker \
             technical-specification; do
  ln -s "$HOME/src/agentic-engineering/skills/$skill" "$HOME/.claude/skills/$skill"
done

Claude Code — single project:

git clone https://github.com/affectionatec/agentic-engineering.git
mkdir -p your-project/.claude/skills
cp -r agentic-engineering/skills/*/ your-project/.claude/skills/

Codex: same SKILL.md directory format — copy the directories under skills/ into your skills location and invoke with $ or /skills.

Any other agent (Cursor, Copilot, Windsurf, …): the skills are plain markdown playbooks, and the documents they produce (the AGENTS.md chain) are tool-agnostic by design. Even without native skill support, use a SKILL.md as a rules file or system prompt — and every tool reads the same AGENTS.md through its pointer file.

Your First Project

Step	You say	Skill that fires	You get
0	"Set up AGENTS.md for this repo"	agents-md-template	`AGENTS.md` + one-line pointer files for every tool
1	"Let's kick off: I want to build X"	project-kickoff-prd	`docs/prd.md` after a phased dialogue
2	"Write the specs"	technical-specification	`docs/spec/*.md`, one per domain
3	"Should we use X or Y?" (any decision fork, any time)	architecture-decision-record	`docs/adr/ADR-NNN-*.md`
4	"Break this into tasks"	implementation-plan	`docs/plans/implementation-plan.md`
5	"Pick up the next task" (every session)	status-tracker	Briefing from `docs/status.md`, work resumes where it left off
6	"Verify M1-T1" (dispatches the bundled verifier)	independent-verification	PASS/FAIL verdict with evidence in `docs/verification-log.md`
7	"/run-loop M1" (unattended)	git-workflow + verifier agent	A queue of verified draft→ready PRs — merging stays yours

Where This Sits — The Harness Layer

Loop engineering sits one floor above the harness: automations find the work, worktrees isolate it, sub-agents check it, and the loop feeds itself. This suite is deliberately the floor below — the memory, contract, and verification substrate that any loop consumes. It is product-agnostic: the same documentation chain serves Claude Code (/loop, scheduled tasks), Codex (Automations, /goal), a Ralph-loop bash script, or a human driving sessions by hand. It also ships the smallest possible loop of its own — /run-loop drives the chain task-by-task while leaving every merge to a human.

How the chain maps to the convergent primitives of long-running agents:

Long-running agent primitive	Where this suite provides it
External completion criteria — "done" defined before work starts	SPEC acceptance criteria (with verification commands) + IMPL PLAN locked done conditions
Persistent state outside the context window	STATUS handoff log + In-Flight Checkpoint
Independent evaluator — maker-checker separation	independent-verification + `docs/verification-log.md`
Checkpoint cadence — every N work units, not only at the end	STATUS checkpoint protocol (task granularity)
Project knowledge that survives sessions	AGENTS.md single source of truth
Decision history that can't be silently rewritten	ADR (append-only, supersede-only)
Worktrees — parallel agent isolation	git-workflow: one branch + PR per task, one worktree per agent
Sub-agents — producer vs. checker	the bundled `verifier` agent: fresh context by construction
Automations — run until done	`/run-loop`: chain-aware driver with circuit breaker; merging stays human

Loop Safety — Non-Negotiables When a Loop Drives This Chain

Circuit breaker — three consecutive verification FAILs on one task stops the loop and escalates to a human. No infinite fix-verify ping-pong.
Test ratchet — the suite count only goes up. Deleting or skipping tests to go green is an automatic FAIL.
Locked done conditions — the executing agent can never weaken acceptance criteria mid-run; changes require the user (and an ADR if architectural).
Human gate — a machine PASS is necessary, not sufficient: a human reads the diff and can explain it before merge. A loop that outruns your comprehension is accumulating comprehension debt, not velocity.

Inspired By

This suite stands on published patterns and prior art:

Source	What this suite takes from it
Long-running Agents — Addy Osmani	External completion criteria, checkpoint cadence, maker-checker separation, "The agent forgets. The repo doesn't."
Loop Engineering — Addy Osmani	The harness/loop layering, producer-vs-checker sub-agents, comprehension-debt and cognitive-surrender guardrails
agent-skills — Addy Osmani	`SKILL.md` anatomy and the principle that verification is non-negotiable — every skill ends with evidence requirements
Agent Harness Engineering — Addy Osmani	The harness concept this suite implements
The Ralph Loop — Geoffrey Huntley	Progress lives in files and git history, never in the context window
Anthropic Engineering	Context rot, the test ratchet, brain/hands/session separation
Blink — Agentic Coding Best Practices	Start with a written spec the agent builds to, not assumptions it makes
substratia.io	`AGENTS.md` as the single source of truth all tool configs point to
Michael Nygard — Documenting Architecture Decisions	The original ADR format and its append-only discipline
Matt Pocock Skills · Superpowers	Skill-suite structure and developer-workflow references
Karpathy-Inspired Guidelines · Claude Code Best Practice	Behavioral guardrails for LLM-driven coding

One-line summary: AGENTS.md is the door, STATUS.md is memory, SPEC is the contract, ADR is the law, VERIFY is the gate, and GIT WORKFLOW is the road through it. Agent enters → reads memory → works by contract → doesn't break the law → ships every task down the road → and never grades its own homework.

Build the harness. Externalize the state. Separate verification. Then — go read what your agent wrote. — Addy Osmani

License

MIT — use it, fork it, adapt it to your own harness.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.claude-plugin		.claude-plugin
agents		agents
assets		assets
commands		commands
skills		skills
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Engineering — Documentation-First Development

The Core Problem

How the Skills Flow

The Skills

Quick Start

Your First Project

Where This Sits — The Harness Layer

Loop Safety — Non-Negotiables When a Loop Drives This Chain

Inspired By

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Agentic Engineering — Documentation-First Development

The Core Problem

How the Skills Flow

The Skills

Quick Start

Your First Project

Where This Sits — The Harness Layer

Loop Safety — Non-Negotiables When a Loop Drives This Chain

Inspired By

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages