Eight skills that give coding agents persistent memory, zero-ambiguity contracts, a git contract for how code lands, and an independent definition of "done" — the harness layer for long-running, loop-driven development.
AI Agents are stateless. Every new session starts from zero — your naming conventions, error handling patterns, architectural constraints are unknown. Without structured documentation, the Agent guesses, causing context collapse: code that runs but is architecturally incoherent.
The most effective agentic coding workflow starts before the first prompt — with a written spec the agent builds to, not assumptions it makes on its own. — Blink
What this suite solves:
- Agent produces code that contradicts existing architecture → SPEC provides the contract
- Agent makes decisions you already resolved → ADR preserves decision history
- New session has no idea what happened last time → STATUS provides memory
- Agent builds the wrong thing → PRD defines what and why
- Agent does tasks in wrong order or scope → IMPL PLAN defines the sequence
- Agent claims "done" when it isn't → VERIFICATION makes done a verdict, not a claim
- Code lands as untraceable commits on main → GIT WORKFLOW ships each task as an evidence-carrying PR
- Crash mid-session loses all progress → STATUS checkpoints at task granularity
- Tools each demand their own config → AGENTS.md is the single entry point
ADRs can be triggered at any point along the chain — whenever a decision fork appears in PRD, SPEC, or IMPL PLAN work. VERIFY loops every task: build → independent verdict → only PASS marks ✅. STATUS loops every session forever. Code travels per GIT WORKFLOW: a branch per task, a draft PR carrying the done condition, ready only on PASS, merged only by a human.
Each skill is a directory containing a SKILL.md (the shared format used by Claude Code and Codex). Click any skill for its core mechanisms.
1. agents-md-template — The Door · bootstraps AGENTS.md as the single source of truth every tool points to
Bootstraps a repo with AGENTS.md as the single source of truth every agent reads before doing anything. Produces AGENTS.md + pointer files.
| Mechanism | What it does |
|---|---|
| One door | CLAUDE.md, copilot-instructions.md, .cursor/rules/ are one-line pointers to AGENTS.md — every tool gets identical instructions, zero config drift |
| Six-section anatomy | Project identity → documentation chain → coding conventions → architecture constraints → boundaries → verification |
| Session protocol | Read memory → code by contract → checkpoint per task → independent verification → handoff. The agent's standing orders for every session |
| Explicit boundaries | A "must NOT" list: no self-grading, no deleting tests to go green, no redefining "done" mid-task, no relitigating locked decisions |
| Three-layer verification | Producer self-check (necessary, never sufficient) → independent gate (fresh-context PASS required) → human gate (read the diff before merge) |
2. project-kickoff-prd — The Why · phased dialogue that turns a rough idea into a prioritized PRD
Turns a rough idea into a structured PRD through phased dialogue — the agent is a co-creator, not a note-taker. Produces docs/prd.md.
| Mechanism | What it does |
|---|---|
| Co-creator stance | The agent challenges weak assumptions, surfaces forgotten edge cases, and makes recommendations — it never just transcribes |
| Five-phase dialogue | Problem & vision → features & boundaries → technical direction → extensibility → structured PRD. One phase at a time, confirmation gate between each |
| Priority triage | Every feature lands in P0 (product fails without it) / P1 (ship a week later) / P2 (defer), with explicit out-of-scope and the reasons why |
| Convergence pressure | 2–4 exchanges per phase, never a 20-question interview; vague scope ("it should handle everything") is rejected with "give me three examples" |
| ADR candidate flagging | Consequential choices made mid-dialogue are flagged for capture before they silently become architecture |
3. technical-specification — The Contract · zero-ambiguity specs with machine-verifiable acceptance criteria
Hardens the PRD into implementation contracts any agent can build from with zero prior context and zero follow-up questions. Produces docs/spec/[domain].md, one per bounded context.
| Mechanism | What it does |
|---|---|
| Zero-ambiguity rule | Every number concrete, every boundary has a violation response, "etc." is banned, "should/may" become "must/must not" |
| Bounded-context split | One spec per domain; a spec past 500 lines is covering too much and gets split |
| Six-phase coverage | Data model (with invariants) → API contracts (rate limits, idempotency, retries) → state machines (invalid transitions defined) → error taxonomy → non-functional requirements → acceptance criteria |
| Machine-verifiable criteria | Every acceptance criterion binds Verify via (the exact command) + Evidence (what output proves it); human-judgment checks are explicitly MANUAL — and treated as a smell |
| Immutable once approved | Changes require a new version, a changelog, and the ADR that motivated them — the contract can't drift under the implementer |
4. architecture-decision-record — The Law · append-only decision history with trade-offs and review triggers
Captures every decision fork with context, options, and trade-offs — append-only, written for the reader six months from now. Produces docs/adr/ADR-NNN-[slug].md.
| Mechanism | What it does |
|---|---|
| Fork detection | The agent recognizes "X or Y?" moments — including implicit ones ("let's just use X") — and proposes capturing them before they evaporate |
| Append-only history | Accepted ADRs are never edited; they're superseded by new ones. History stays honest, numbers are never reused |
| Fair options analysis | Rejected options keep their real advantages on record — no strawmanning; future readers may revisit them when context changes |
| Reversibility + review triggers | Every decision records how hard it is to undo (Easy → Irreversible) and the concrete conditions for reconsidering it |
| Anti-relitigation | Locked decisions are quick-referenced in AGENTS.md and STATUS — agents don't reopen them without the user |
5. implementation-plan — The Sequence · dependency-ordered atomic tasks with locked done conditions
Decomposes approved specs into dependency-ordered atomic tasks any agent can execute in a single session. Produces docs/plans/implementation-plan.md.
| Mechanism | What it does |
|---|---|
| Atomic tasks | Self-contained, ≤ 90 minutes, ≤ 5 files, with named file paths and spec references — zero reading between the lines |
| Dependency order, not importance | The flashy feature waits for its boring infrastructure; every milestone ends in demonstrable state you can run, show, or prove |
| Locked done conditions | Each task's acceptance criteria bind executable commands and are frozen at task start — the executing agent cannot weaken them mid-run |
| Done is a verdict | A task completes when the independent verifier returns PASS — never when the producer claims it |
| Worktree isolation | Parallel tasks get parallel git worktrees, merging through the verification gate; review bandwidth, not git, is the real parallelism ceiling |
| Risk register + rollback points | Risks acknowledged before they become surprises; milestones double as safe pause-or-pivot points |
6. independent-verification — The Gate · fresh-context verifier turns "done" into an evidence-backed verdict
The maker-checker gate: a verifier with fresh context executes the done condition as written and returns a verdict with evidence. Produces docs/verification-log.md.
| Mechanism | What it does |
|---|---|
| Role separation | The producer builds and self-checks; a separate verifier judges. The model that wrote the code never grades it |
| Fresh-context requirement | The verifier runs as a sub-agent or fresh session and never sees the producer's conversation — inherited reasoning means inherited blind spots |
| Evidence-based verdicts | Binary PASS/FAIL per criterion with commands, exit codes, and key output — "looks good" is not a verdict |
| Test ratchet | The suite count only goes up; a deleted or skipped test is an automatic FAIL regardless of everything else |
| Circuit breaker | Three consecutive FAILs on one task stops the loop and escalates to the user — no infinite fix-verify ping-pong |
| Human gate | A machine PASS is necessary, not sufficient: a human reads the diff and can explain it before merge |
| Cost discipline | Dual-opinion where stakes justify it — always for main, data, auth, money; a cheaper model is fine for mechanical checks |
7. status-tracker — The Memory · live progress, in-run checkpoints, crash recovery, session handoffs
The bridge between sessions: live progress, in-run checkpoints, and an append-only handoff log. Produces docs/status.md.
| Mechanism | What it does |
|---|---|
| Session handoff log | Append-only, newest first: what was built, how it connects, what was verified (exact counts), caveats, and the literal next action |
| In-flight checkpoint | Updated at task granularity during the session — a crash costs at most one task, never the session |
| Crash detection | A checkpoint that isn't none at session start means the last session died mid-work; the checkpoint is the recovery point |
| Four-state lifecycle | ⬜ not started → 🟡 in progress → 🔍 built, awaiting verification → ✅ verified done. ✅ requires a verifier verdict, not a producer claim |
| Numbers discipline | "232/232 pytest", named files, named components — vague entries ("made progress") are banned |
8. git-workflow — The Road · branch per task, evidence-carrying draft PRs, merge only through both gates
Defines how code actually lands: one branch and one PR per task, verification evidence in the PR, merge gated by verdict and human. Produces task branches + draft PRs.
| Mechanism | What it does |
|---|---|
| Protected main | main is read-only for agents — no exceptions, including "trivial" fixes |
| One task, one branch, one PR | task/<task-id>-<slug> named from plan task IDs; traceability runs diff → PR → task → spec |
| Evidence-carrying PRs | The PR description holds the done condition and the verdict reference — review starts from the contract, not the raw diff |
| Draft until PASS | PRs are born draft; ready-for-review is earned by the verifier's PASS, never by the producer's confidence |
| Double-gated merge | Verifier PASS + a human who read the diff; the agent never merges its own PR — in unattended loops, merge is always the human's move |
| Evidence integrity | No history rewriting once verification starts; branches deleted only after merge |
| Worktree isolation | Parallel agents get one worktree each — mechanical conflicts solved, human review bandwidth still the ceiling |
Claude Code — plugin marketplace (recommended):
/plugin marketplace add affectionatec/agentic-engineering
/plugin install agentic-engineering@agentic-engineering
Then, in any project:
/using-agentic-engineering— the entry point: assesses which chain documents exist, reports where the project stands, and routes you to the right skill/run-loop M2— the chain-aware loop driver: branch → build → draft PR → dispatch the verifier → checkpoint → next task. Stops on the circuit breaker (3 FAILs) or the task budget, and never merges- A ready-made
verifiersub-agent ships with the plugin (@agent-agentic-engineering:verifier) — fresh context by construction, writes nothing but the verification log - The eight skills auto-trigger from their frontmatter descriptions ("write the spec", "where are we", "verify this task") — or invoke one explicitly:
/agentic-engineering:independent-verification, or just say "use the project-kickoff-prd skill: I want to build …"
Manual install — personal symlinks · single project · Codex · Cursor / Copilot / any agent
Claude Code — personal, all projects (no plugin system needed):
git clone https://github.com/affectionatec/agentic-engineering.git ~/src/agentic-engineering
mkdir -p ~/.claude/skills
for skill in agents-md-template architecture-decision-record implementation-plan \
independent-verification project-kickoff-prd status-tracker \
technical-specification; do
ln -s "$HOME/src/agentic-engineering/skills/$skill" "$HOME/.claude/skills/$skill"
doneClaude Code — single project:
git clone https://github.com/affectionatec/agentic-engineering.git
mkdir -p your-project/.claude/skills
cp -r agentic-engineering/skills/*/ your-project/.claude/skills/Codex: same SKILL.md directory format — copy the directories under skills/ into your skills location and invoke with $ or /skills.
Any other agent (Cursor, Copilot, Windsurf, …): the skills are plain markdown playbooks, and the documents they produce (the AGENTS.md chain) are tool-agnostic by design. Even without native skill support, use a SKILL.md as a rules file or system prompt — and every tool reads the same AGENTS.md through its pointer file.
| Step | You say | Skill that fires | You get |
|---|---|---|---|
| 0 | "Set up AGENTS.md for this repo" | agents-md-template | AGENTS.md + one-line pointer files for every tool |
| 1 | "Let's kick off: I want to build X" | project-kickoff-prd | docs/prd.md after a phased dialogue |
| 2 | "Write the specs" | technical-specification | docs/spec/*.md, one per domain |
| 3 | "Should we use X or Y?" (any decision fork, any time) | architecture-decision-record | docs/adr/ADR-NNN-*.md |
| 4 | "Break this into tasks" | implementation-plan | docs/plans/implementation-plan.md |
| 5 | "Pick up the next task" (every session) | status-tracker | Briefing from docs/status.md, work resumes where it left off |
| 6 | "Verify M1-T1" (dispatches the bundled verifier) | independent-verification | PASS/FAIL verdict with evidence in docs/verification-log.md |
| 7 | "/run-loop M1" (unattended) | git-workflow + verifier agent | A queue of verified draft→ready PRs — merging stays yours |
Loop engineering sits one floor above the harness: automations find the work, worktrees isolate it, sub-agents check it, and the loop feeds itself. This suite is deliberately the floor below — the memory, contract, and verification substrate that any loop consumes. It is product-agnostic: the same documentation chain serves Claude Code (/loop, scheduled tasks), Codex (Automations, /goal), a Ralph-loop bash script, or a human driving sessions by hand. It also ships the smallest possible loop of its own — /run-loop drives the chain task-by-task while leaving every merge to a human.
How the chain maps to the convergent primitives of long-running agents:
| Long-running agent primitive | Where this suite provides it |
|---|---|
| External completion criteria — "done" defined before work starts | SPEC acceptance criteria (with verification commands) + IMPL PLAN locked done conditions |
| Persistent state outside the context window | STATUS handoff log + In-Flight Checkpoint |
| Independent evaluator — maker-checker separation | independent-verification + docs/verification-log.md |
| Checkpoint cadence — every N work units, not only at the end | STATUS checkpoint protocol (task granularity) |
| Project knowledge that survives sessions | AGENTS.md single source of truth |
| Decision history that can't be silently rewritten | ADR (append-only, supersede-only) |
| Worktrees — parallel agent isolation | git-workflow: one branch + PR per task, one worktree per agent |
| Sub-agents — producer vs. checker | the bundled verifier agent: fresh context by construction |
| Automations — run until done | /run-loop: chain-aware driver with circuit breaker; merging stays human |
- Circuit breaker — three consecutive verification FAILs on one task stops the loop and escalates to a human. No infinite fix-verify ping-pong.
- Test ratchet — the suite count only goes up. Deleting or skipping tests to go green is an automatic FAIL.
- Locked done conditions — the executing agent can never weaken acceptance criteria mid-run; changes require the user (and an ADR if architectural).
- Human gate — a machine PASS is necessary, not sufficient: a human reads the diff and can explain it before merge. A loop that outruns your comprehension is accumulating comprehension debt, not velocity.
This suite stands on published patterns and prior art:
| Source | What this suite takes from it |
|---|---|
| Long-running Agents — Addy Osmani | External completion criteria, checkpoint cadence, maker-checker separation, "The agent forgets. The repo doesn't." |
| Loop Engineering — Addy Osmani | The harness/loop layering, producer-vs-checker sub-agents, comprehension-debt and cognitive-surrender guardrails |
| agent-skills — Addy Osmani | SKILL.md anatomy and the principle that verification is non-negotiable — every skill ends with evidence requirements |
| Agent Harness Engineering — Addy Osmani | The harness concept this suite implements |
| The Ralph Loop — Geoffrey Huntley | Progress lives in files and git history, never in the context window |
| Anthropic Engineering | Context rot, the test ratchet, brain/hands/session separation |
| Blink — Agentic Coding Best Practices | Start with a written spec the agent builds to, not assumptions it makes |
| substratia.io | AGENTS.md as the single source of truth all tool configs point to |
| Michael Nygard — Documenting Architecture Decisions | The original ADR format and its append-only discipline |
| Matt Pocock Skills · Superpowers | Skill-suite structure and developer-workflow references |
| Karpathy-Inspired Guidelines · Claude Code Best Practice | Behavioral guardrails for LLM-driven coding |
One-line summary: AGENTS.md is the door, STATUS.md is memory, SPEC is the contract, ADR is the law, VERIFY is the gate, and GIT WORKFLOW is the road through it. Agent enters → reads memory → works by contract → doesn't break the law → ships every task down the road → and never grades its own homework.
Build the harness. Externalize the state. Separate verification. Then — go read what your agent wrote. — Addy Osmani
MIT — use it, fork it, adapt it to your own harness.