Code consistency is the casualty of LLM agent velocity.
LLMs write code faster than any human can read it — and faster than the agent can stay coherent with itself. SDD keeps the spec small, dense, and durable. It's the part the agent re-reads every turn, so task ten is built against the same constraints as task one.
The mechanics:
- Every row has an address.
§V.<n>/§T.<n>/§B.<n>are stable cites — code comments link to the invariant they uphold, tests reference the bug they guard, commits cite the task they close.SPEC.mdsurvives/clearand team handoff. - Telegraph encoding cuts tokens ~40% vs Claude prose for the same content (measured per-row mean 41%, median 39%, n=30 — see
benchmarks/telegraph/). The savings come from terse grammar — dropped articles/filler, fragments, unpadded pipe tables — plus compact§-refs and a curated low-token symbol set (→ ≥ ≤ ! ? § |); Distinct fromsteno(the bundled human-facing shorthand). - Every test failure feeds back into the spec. A
§Brow, usually a new§Vinvariant. The drift report stays trustworthy because every prior failure tightened the spec. - Main Claude does all the writes. Code edits,
SPEC.mdmutations, status flips, commits. Read-only audits (e.g./sdd:check) may fan out to sub-agents. No orchestrator. Same spec + same task → same plan. - Re-onboarding is one command. Come back to the repo after a week, run
/sdd:check. You get a read-only drift report: which§Vinvariants the code violates, which§Ttasks remain. No digging through old transcripts.
The spec is the only artifact that earns its tokens. Everything else must save more tokens later, save the agent's context, or get cut.
SPEC.md is an LLM-facing artifact. You operate it through Claude — /sdd:spec writes, /sdd:build and /sdd:check read, /sdd:explain decodes a citation back to prose when you want to read along. The loop is human → /sdd:* → Claude → SPEC.md, not hand-editing in your editor.
That framing is load-bearing. Telegraphic fragments over full sentences, pipe tables over bulleted lists, dropped line citations — all optimized for the model that re-parses the spec every command, not the human skimming it. If you want to skim it as a human, /sdd:explain is the front door.
/plugin marketplace add kborovik/spec-driven-dev/plugin install sdd@spec-driven-devThen in any repo:
/sdd:spec # creates SPEC.md if missing ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│/sdd:design │──►│ /sdd:spec │──►│ /sdd:build │──►│ /sdd:check │
│ propose │ │ mutator │ │ plan→exec │ │ read-only │
└────────────┘ └─────▲──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
│ ▼ on failure │ on drift
│ ┌────────────┐ │
│ │ backprop │ │
│ │ §B (+ §V) │ │
│ └─────┬──────┘ │
│ │ │
└────────────────┴────────────────┘
amend SPEC.md
- One spec file.
SPEC.md. Nodocs/tree, no JSON sidecars. - One writer.
/sdd:spec. (/sdd:buildmay flip a.→x; nothing else writes.) - Read-only commands write nothing.
/sdd:check(drift report) and/sdd:explain(decompression).
Six fixed sections, fixed order. Each row is addressable as §<S>.<n>.
# SPEC
## §G GOAL
one line. what code must do.
## §C CONSTRAINTS
- non-negotiable boundary
- tech / language / library locked in
## §I INTERFACES
external surface — what the world sees.
- cmd: `foo bar` → stdout JSON
- api: POST /x → 200 {id}
- file: `config.yaml` schema …
- env: `FOO_KEY` required
## §V INVARIANTS
numbered. testable. each ! MUST hold.
V<n>: every req → auth check before handler
V<n>: token expiry ≤ current_time → reject
V<n>: DB write ! in transaction
## §T TASKS
id|status|task|cites
T<n>|.|scaffold repo|-
T<n>|.|impl §I.api POST /x|V<n>
T<n>|x|add §V.<n> middleware|V<n>,I.api
## §B BUGS
id|date|cause|fix
B<n>|2026-04-20|token `<` not `≤`|V<n>
B<n>|2026-04-21|race on write|V<n>Status markers: . todo · x done.
Cell rules: literal | → \|. Empty cell = -. Backticks OK.
Use when there's a structural choice to weigh — tradeoffs, named alternatives, subsystem shape. The model proposes a shape, you critique, the loop converges only when ## Open Questions is empty. Persists to designs/<slug>.md. /sdd:spec later folds the converged design into §V / §T rows; the draft file stays in the working tree for you to remove or keep.
/sdd:design how should the release pipeline split monorepo plugins?Distinct from /sdd:spec's socratic gate: socratic converges on enough (sharpen vague intent); design converges on exhausted (every structural question has a decision).
The sole mutator. The argument is free-form intent — the socratic gate (the bundled socratic skill) reads what you wrote and picks a mode. You don't pick the mode yourself.
| project state | possible modes | gate behavior |
|---|---|---|
no SPEC.md |
NEW or DISTILL | concrete intent passes ≤ 1 turn; vague intent triggers single-question dialogue to convergence |
SPEC.md exists |
BACKPROP or AMEND or NEW (rare, requires explicit re-init) | mode emerges from convergence triple — symptom + surface + recurrence-class for BACKPROP, §-target + delta for AMEND |
Examples (all free-form — the gate classifies):
/sdd:spec a CLI that ingests JSON over stdin and emits Parquet
/sdd:spec build the spec from this codebase
/sdd:spec V<n>'s `≤` should be `<` for unsigned tokens
/sdd:spec rate-limiter dropped requests under 100rpsPlan → execute → verify loop. EXECUTE serializes on main thread; PLAN reads may delegate to sub-agents.
| arg form | scope |
|---|---|
§T.n |
implement that one task |
--next |
lowest-numbered row with status . |
--all |
every . row in §T order |
| (empty) | same as --next |
Loop per task:
- PLAN — cite every §V / §I the task touches, then proceed to EDIT. The plan is emitted inline for transparency, not a wait-state. Gaps are annotated so you can route them back to
/sdd:specpost-hoc; the build never invents rules. - EDIT — make the change, run tests / build.
- VERIFY — on failure, classify: (a) code bug → fix and re-run; (b) spec wrong or unspecified edge case → invoke
backprop(via/sdd:spec <cause>), let it append§Band usually a new§V, resume against the updated spec. - CLOSE — flip
.→xonly when verification is green.
Ambiguity is a spec defect, not a coding judgement. /sdd:build never silently retries and never edits SPEC.md beyond flipping a status cell — every rule-shaped question routes back to /sdd:spec.
Read-only diagnostic. Diffs SPEC.md against the working tree. Always audits §V + §I + §T together.
| arg | what it does |
|---|---|
| (empty) | memo-driven sweep — re-audits §V rows touched since last clean run; rest HOLD-SINCE-CLEAN |
--full |
force full re-classify — deletes .claude/check-state.json upfront, rebuilds memo |
Output groups violations by severity (VIOLATE / RISK / STALE) and suggests a remedy — usually /sdd:spec <intent> or /sdd:build. It never runs them itself.
The inverse of telegraph. Given any citation, returns plain English with cited context.
/sdd:explain §V.<n> # expand a specific invariant
/sdd:explain §T.<n> # expand a task + every §V/§I it cites
/sdd:explain §B.<n> # expand a bug + the invariant that catches recurrence
/sdd:explain --next # expand the next unfinished taskUseful for code review, onboarding, or when you'd otherwise have to translate every req → auth check before handler in your head.
Operator-triggered compactor for an oversized SPEC.md (advisory fires in /sdd:check when the estimate exceeds ~25k tokens). Six prongs — fold sibling invariants, mark superseded tasks, archive old §T/§B rows to SPEC.archive.md, prune inlined history, rewrite prose to telegraph, extract heavy audit recipes. Single atomic commit, rollback via git revert.
Operator-triggered clarity pass (at most once per major epoch): clusters §V invariants by topic, renumbers them, and sweeps every citation in the same commit. Renumber history persists to .claude/spec-renumber-map.json so old citations still resolve via /sdd:explain.
Each skill dir surfaces directly as a slash command (e.g. skills/spec/ → /sdd:spec). SKILL.md frontmatter (description, allowed-tools, model) is honored on dispatch.
| skill | role |
|---|---|
design |
propose-then-critique → designs/<slug>.md |
spec |
sole mutator |
build |
plan → execute loop |
check |
drift report |
explain |
telegraph → prose decoder |
compact |
token-budget compaction sweep |
reorganize |
§V cluster + renumber + cite sweep |
telegraph |
telegraph encoder (~40% reduction vs prose); auto-fires on writes |
backprop |
bug → spec protocol; fires on non-code-bug verification failures |
socratic |
single-question intent gate; invoked by /sdd:spec |
steno |
human-facing terse-prose register for reviewer-read text |
You don't usually invoke telegraph, backprop, socratic, or steno directly — Claude picks them up from the command flow. backprop, for example, fires automatically when a /sdd:build verification failure smells like under-specification (clear code bugs are just fixed).
/sdd:design how should we shape the parser / renderer split? # optional — only if structural Qs
/sdd:spec build a static-site generator that converts a Markdown directory into a single-page HTML bundle
# review §G/§C/§I/§V in SPEC.md, amend if needed
/sdd:build --next # plan, implement, verify T<n> (scaffold)
/sdd:build --next # T<n> (renderer)
/sdd:check # before opening a PR/sdd:spec build the spec from this codebase # gate routes to DISTILL
/sdd:check # see what already drifts from the distilled spec
/sdd:spec V<n>'s bound is too loose for the rate-limiter # gate routes to AMEND
/sdd:build §T.<n> # tackle a specific task/sdd:spec webhook handler retried POSTs after 5xx, double-charged 11 customers
# gate routes to BACKPROP: appends §B, adds §V "POST handler ! idempotent on retry",
# adds a §T fix task, commits SPEC.md
/sdd:build --next # failing test first, then the fix; commit cites the new §B/§V
/sdd:check # confirm new §V is now upheld/sdd:check
/sdd:explain §V.<n> # if a violation is unclear, decompress ittelegraph writes telegraphic grammar — dropped articles, aux verbs, and filler, fragments, compact pipe tables — with a curated low-token symbol set (→ ≥ ≤ ! ? §). Anything heavier is written as the ASCII word: a multi-token math operator costs 2–4 tokens vs a 1-token word, so a symbol earns its place only where it reads clearer than the word.
Result: every spec write lands ~40% leaner in tokens than the equivalent prose while staying machine- and human-readable. The measured per-row mean is 41% (median 39%, n=30 across §V/§T/§B rows of this repo's own SPEC.md), reproducible via benchmarks/telegraph/telegraph-bench.py — full methodology, per-row results, and caveats in the benchmark write-up. steno (bundled) handles reviewer-facing text and keeps grammar intact so reviewers don't slow down.
Rules:
- Drop articles (a/an/the). Drop filler. Drop aux verbs where a fragment works.
- Short synonyms (
fixoverimplement). - Preserve verbatim: code, paths, identifiers, URLs, numbers, error strings, SQL, regex.
Full symbol table: skills/telegraph/SKILL.md SYMBOLS section.
Example. Prose: "The authentication middleware must verify the token expiry on every request before allowing the handler to execute."
Telegraph: V<n>: every req → auth check before handler
If telegraph encoding slows you down on review, /sdd:explain §V.<n> decompresses on demand.
Backprop is the one non-obvious thing SDD does that vanilla plan-then-execute doesn't. Six steps:
- Capture — record the failing case verbatim (test name, error, stack, repro).
- Trace — find the cause: code bug, spec wrong, or unspecified edge case.
- Append §B — add a row:
id|date|cause|fix. Telegraph-encoded. - Decide on §V — would an invariant have caught the class of this bug? If yes, add or tighten one. Cite it from the new §B row.
- Write the failing test first — in the
/sdd:buildresume: watch it fail, then ship the fix. The test stays as a permanent guard. - Two commits, cross-cited — the spec commit lands first (§B + §V, via
/sdd:spec); the code commit (test + fix) follows via/sdd:buildand cites them. The record survives even when the fix is deferred.
Triggers:
- Test/build fails inside
/sdd:buildverification. /sdd:checkreports aVIOLATEwhose root cause is identified.- User:
/sdd:spec <description>— post-mortems, prod incidents, user reports; gate routes to BACKPROP on bug-class intent.
Why Markdown, not YAML / JSON? Markdown + pipe tables grep cleanly, diff cleanly, render in every PR tool, and don't trip on quoting. JSON specs invite tooling that defeats the point — the spec is for humans and one LLM, not a build system.
Why one file? Sub-1000-line specs fit in context cheaply. Multi-file specs invite cross-file inconsistency and force grep ceremony. When the spec outgrows its budget (~25k tokens — /sdd:check raises an advisory), /sdd:compact folds, trims, and archives old §T/§B rows to SPEC.archive.md instead of splitting.
Does /sdd:build always backprop on failure? Only on failures that aren't clear code bugs. Typos and wrong loop bounds get fixed without a spec change. Anything that smells like under-specification routes through backprop.
Can I skip telegraph encoding and write prose specs? Yes, but every future load of the spec into context pays ~1.7x the tokens for the same content (the measured ~40% cut, inverted). Optional in syntax, expensive in practice.
.claude-plugin/plugin.json plugin manifest (name: sdd)
.claude-plugin/marketplace.json marketplace manifest for direct install
skills/design/ /sdd:design — propose-then-critique design loop → designs/<slug>.md
skills/spec/ /sdd:spec — sole SPEC.md mutator
skills/build/ /sdd:build — plan-execute loop
skills/check/ /sdd:check — read-only drift report
skills/explain/ /sdd:explain — telegraph → prose decoder
skills/compact/ /sdd:compact — token-budget compaction sweep
skills/reorganize/ /sdd:reorganize — §V cluster + renumber + cite-DAG sweep
skills/telegraph/ auto-fire telegraph encoder for SPEC-adjacent writes
skills/backprop/ auto-fire bug → spec protocol on /sdd:build verify-fail
skills/socratic/ intent-sharpening gate invoked by /sdd:spec
skills/steno/ human-facing terse-prose register
scripts/check-mechanical.py deterministic audit core used by /sdd:check
benchmarks/telegraph/ telegraph token-reduction benchmark — bench script + results JSON + README write-up
SPEC-FORMAT.md structural format contract for every SPEC.md
SDD is adapted from JuliusBrussee/cavekit (v4.0.0, MIT-licensed). For the original project, history, and v3.1.0 (full Hunt lifecycle with sub-agents, parallel workers, and design-system enforcement), see the upstream repo.
MIT. See LICENSE.