chore(workflow): refine agents, skills, commands, and ADR structure for v5

nullhack · nullhack · commit 5a5429e16bf9 · 2026-04-18T20:19:08.000-04:00
- Consolidate ADR files into single adr.md (drop adr-NNN-&lt;title&gt;.md pattern)
- Add test-coverage and test-build tasks; fix test-fast/test flags
- Clarify design priority chain with full complexity ladder
- Remove self-selection language from agent instructions
- Add refactor and design-patterns skills to reviewer agent
- Add BASELINED guard to feature-selection skill
- Fix scope skill: drop silent pre-mortem requirement from Session 1
- Align function/class line-count wording to code-lines-only
diff --git a/.opencode/agents/product-owner.md b/.opencode/agents/product-owner.md
@@ -33,7 +33,6 @@ Load `skill session-workflow` first — it reads TODO.md, orients you to the cur
 - You are the **sole owner** of `.feature` files and `docs/features/discovery.md`
 - No other agent may edit these files
 - Software-engineer escalates spec gaps to you; you decide whether to extend criteria
-- **You pick** the next feature from backlog — the software-engineer never self-selects
 - **NEVER move a feature to `in-progress/` unless its discovery section has `Status: BASELINED`** — if not baselined, complete Step 1 (Phase 2 + 3 + 4) first
 
 ## Step 5 — Accept
@@ -60,4 +59,4 @@ When a gap is reported (by software-engineer or reviewer):
 
 - `session-workflow` — session start/end protocol
 - `feature-selection` — when TODO.md is idle: score and select next backlog feature using WSJF
-- `scope` — Step 1: 3-session discovery (Phase 1 + 2), stories (Phase 3), and criteria (Phase 4)
+- `scope` — Step 1: 3-session discovery (Phase 1 + 2), stories (Phase 3), and criteria (Phase 4)
diff --git a/.opencode/agents/reviewer.md b/.opencode/agents/reviewer.md
@@ -58,4 +58,6 @@ You never edit `.feature` files or add Examples yourself.
 ## Available Skills
 
 - `session-workflow` — session start/end protocol
+- `refactor` — Code refactoring heuristics
+- `design-patterns` — Reference for code smell and design patterns
 - `verify` — Step 4: full verification protocol with all tables, gates, and report template
diff --git a/.opencode/agents/software-engineer.md b/.opencode/agents/software-engineer.md
@@ -45,7 +45,6 @@ Load `skill session-workflow` first — it reads TODO.md, orients you to the cur
 
 - You own all technical decisions: module structure, patterns, internal APIs, test tooling, linting config
 - **PO approves**: new runtime dependencies, changed entry points, scope changes
-- You are **never** the one to pick the next feature — only the PO picks from backlog
 
 ## Spec Gaps
 
@@ -61,4 +60,4 @@ If during implementation you discover behavior not covered by existing acceptanc
 - `design-patterns` — on-demand when smell detected during architecture or refactor
 - `pr-management` — Step 5: PRs with conventional commits
 - `git-release` — Step 5: calver versioning and themed release naming
-- `create-skill` — meta: create new skills when needed
+- `create-skill` — meta: create new skills when needed
diff --git a/.opencode/skills/feature-selection/SKILL.md b/.opencode/skills/feature-selection/SKILL.md
@@ -38,6 +38,10 @@ Read each `.feature` file in `docs/features/backlog/`. Check its discovery secti
 - Non-BASELINED features are not eligible — they need Step 1 (scope) first
 - If no BASELINED features exist: inform the stakeholder; run `@product-owner` with `skill scope` to baseline the most promising backlog item first
 
+**IMPORTANT**
+
+**NEVER move a feature to `in-progress/` unless its discovery section has `Status: BASELINED`**
+
 ### 3. Score Each Candidate
 
 For each BASELINED feature, fill this table:
diff --git a/.opencode/skills/implementation/SKILL.md b/.opencode/skills/implementation/SKILL.md
@@ -15,12 +15,12 @@ Steps 2 (Architecture) and 3 (TDD Loop) combined into a single skill. The softwa
 
 During implementation, correctness priorities are (in order):
 
-1. **Design correctness** — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns
+1. **Design correctness** — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns > complex code > complicated code > failing code > no code
 2. **One @id green** — the specific test under work passes, plus `test-fast` still passes
 3. **Commit** — when a meaningful increment is green
 4. **Quality tooling** — `lint`, `static-check`, full `test` with coverage run at end-of-feature handoff
 
-Design correctness is far more important than lint/pyright/coverage compliance. Never run lint, static-check, or coverage during the TDD loop — those are handoff-only checks.
+Design correctness is far more important than lint/pyright/coverage compliance. Never run lint (ruff check, ruff format), static-check (pyright), or coverage during the TDD loop — those are handoff-only checks.
 
 ---
 
@@ -37,7 +37,7 @@ Design correctness is far more important than lint/pyright/coverage compliance.
 
 1. Read `pyproject.toml` → locate `[tool.setuptools]` → record `packages = ["<name>"]`
 2. Confirm directory exists: `ls <name>/`
-3. All new source files go under `<name>/` — never under a template placeholder.
+3. All new source files go under `<name>/`
 
 ### Move Feature File
 
@@ -118,7 +118,7 @@ Place stubs where responsibility dictates — do not pre-create `ports/` or `ada
 
 ### Write ADR Files (significant decisions only)
 
-For each significant architectural decision, create `docs/architecture/adr-NNN-<title>.md`:
+For each significant architectural decision, create or append to `docs/architecture/adr.md`:
 
 ```markdown
 # ADR-NNN: <title>
@@ -153,25 +153,21 @@ Commit: `feat(<feature-name>): add architecture stubs`
 
 ### Prerequisites
 
+- [ ] Exactly one .feature `in_progress`. If not present, Load `skill feature-selection` 
 - [ ] Architecture stubs present in `<package>/` (committed by Step 2)
-- [ ] Read all `docs/architecture/adr-NNN-*.md` files — understand the architectural decisions before writing any test
-- [ ] Test stub files exist in `tests/features/<feature-name>/` — one file per `Rule:` block, all `@id` functions present with `@pytest.mark.skip`; if missing, write them now before entering RED
+- [ ] Read all `docs/architecture/adr.md` files — understand the architectural decisions before writing any test
+- [ ] Test stub files exist in `tests/features/<feature-name>/<rule_slug>_test.py` — one file per `Rule:` block, all `@id` stub functions present with `@pytest.mark.skip`; if missing, write them now before entering RED
 
 ### Write Test Stubs (if not present)
 
-For each `Rule:` block in the in-progress `.feature` file, create `tests/features/<feature-name>/<rule-slug>_test.py` if it does not already exist. Write one function per `@id` Example, all skipped:
+For each `Rule:` block in the in-progress `.feature` file, create `tests/features/<feature-name>/<rule_slug>_test.py` if it does not already exist. Write one function per `@id` Example, all skipped:
 
 ```python
 @pytest.mark.skip(reason="not yet implemented")
-def test_<rule_slug>_<8char_hex>() -> None:
+def test_<feature_slug>_<@id>() -> None:
     """
-    Given: ...
-    When: ...
-    Then: ...
+    <@id steps raw text including new lines>
     """
-    # Given
-    # When
-    # Then
 ```
 
 Run `uv run task gen-todo` after writing stubs to sync `@id` rows into `TODO.md`.
@@ -192,17 +188,17 @@ For each pending `@id`:
 ```
 INNER LOOP
 ├── RED
-│   ├── Confirm stub for this @id exists in tests/features/<feature-name>/ with @pytest.mark.skip
+│   ├── Confirm stub for this @id exists in tests/features/<feature-name>/<rule_slug>.feature with @pytest.mark.skip
 │   ├── Read existing stubs in `<package>/` — base the test on the current data model and signatures
 │   ├── Write test body (Given/When/Then → Arrange/Act/Assert); remove @pytest.mark.skip
-│   ├── Update stub signatures as needed — edit the `.py` file directly
+│   ├── Update <package> stub signatures as needed — edit the `.py` file directly
 │   ├── uv run task test-fast
 │   └── EXIT: this @id FAILS
 │       (if it passes: test is wrong — fix it first)
 │
 ├── GREEN
 │   ├── Write minimum code — YAGNI + KISS only
-│   │   (no DRY, SOLID, OC here — those belong in REFACTOR)
+│   │   (no DRY, SOLID, OC, Docstring, type hint here — those belong in REFACTOR)
 │   ├── uv run task test-fast
 │   └── EXIT: this @id passes AND all prior tests pass
 │       (fix implementation only; do not advance to next @id)
@@ -221,7 +217,7 @@ Commit when a meaningful increment is green
 ```bash
 uv run task lint
 uv run task static-check
-uv run task test          # coverage must be 100%
+uv run task test-coverage          # coverage must be 100%
 timeout 10s uv run task run
 ```
 
@@ -231,7 +227,7 @@ All must pass before Self-Declaration.
 
 ### Self-Declaration (once, after all quality gates pass)
 
-Write into `TODO.md` under a `## Self-Declaration` block:
+Answer honestly the `## Self-Declaration` report:
 
 ```markdown
 ## Self-Declaration
@@ -256,6 +252,7 @@ As a software-engineer I declare:
 * OC-7: ≤20 lines per function, ≤50 per class — AGREE/DISAGREE | longest: file:line
 * OC-8: ≤2 instance variables per class (behavioural classes only; dataclasses, Pydantic models, value objects, and TypedDicts are exempt) — AGREE/DISAGREE | file:line
 * OC-9: no getters/setters — AGREE/DISAGREE | file:line
+* Patterns: I have no good reason to refactor parts of the code using OOP or Design Patterns — AGREE/DISAGREE | file:line
 * Patterns: no creational smell — AGREE/DISAGREE | file:line
 * Patterns: no structural smell — AGREE/DISAGREE | file:line
 * Patterns: no behavioral smell — AGREE/DISAGREE | file:line
@@ -268,7 +265,7 @@ A `DISAGREE` answer is not automatic rejection — state the reason inline and f
 
 Signal completion to the reviewer. Provide:
 - Feature file path
-- Self-Declaration from TODO.md
+- Self-Declaration report
 - Summary of what was implemented
 
 ---
@@ -278,40 +275,35 @@ Signal completion to the reviewer. Provide:
 ### Test File Layout
 
 ```
-tests/features/<feature-name>/<rule-slug>_test.py
+tests/features/<feature-name>/<rule_slug>_test.py
 ```
 
 - `<feature-name>` = the `.feature` file stem
-- `<rule-slug>` = the `Rule:` title slugified
+- `<rule_slug>` = the `Rule:` title slugified
 
 ### Function Naming
 
 ```python
-def test_<rule_slug>_<8char_hex>() -> None:
+def test_<rule_slug>_<@id>() -> None:
 ```
 
 - `rule_slug` = the `Rule:` title with spaces/hyphens replaced by underscores, lowercase
-- `8char_hex` = the `@id` from the `Example:` block
+- `@id` = the `@id` from the `Example:` block
 
 ### Docstring Format (mandatory)
 
 New tests start as skipped stubs. Remove `@pytest.mark.skip` when implementing in the RED phase.
 
 ```python
 @pytest.mark.skip(reason="not yet implemented")
-def test_wall_bounce_a3f2b1c4() -> None:
+def test_<feature_slug>_<@id>() -> None:
     """
-    Given: A ball moving upward reaches y=0
-    When: The physics engine processes the next frame
-    Then: The ball velocity y-component becomes positive
+    <@id steps raw text including new lines>
     """
-    # Given
-    # When
-    # Then
 ```
 
 **Rules**:
-- Docstring contains `Given:/When:/Then:` on separate indented lines
+- Docstring contains `Gherkin steps` as raw text on separate indented lines
 - No extra metadata in docstring — traceability comes from function name `@id` suffix
 
 ### Markers
@@ -320,6 +312,7 @@ def test_wall_bounce_a3f2b1c4() -> None:
 - `@pytest.mark.deprecated` — auto-skipped by conftest; used for superseded Examples
 
 ```python
+@pytest.mark.deprecated
 def test_wall_bounce_a3f2b1c4() -> None:
     ...
 
@@ -350,11 +343,11 @@ def test_wall_bounce_c4d5e6f7(x: float) -> None:
 **Rules**:
 - `@pytest.mark.slow` is mandatory on every `@given`-decorated test
 - `@example(...)` is optional but encouraged
-- Never use Hypothesis for: I/O, side effects, network calls, database writes
+- Do not use Hypothesis for: I/O, side effects, network calls, database writes
 
 ### Semantic Alignment Rule
 
-The test's Given/When/Then must operate at the **same abstraction level** as the AC's Given/When/Then.
+The test's Given/When/Then must operate at the **same abstraction level** as the AC's Steps.
 
 | AC says | Test must do |
 |---|---|
@@ -369,7 +362,7 @@ If testing through the real entry point is infeasible, escalate to PO to adjust
 - No `isinstance()`, `type()`, or internal attribute (`_x`) checks in assertions
 - One assertion concept per test (multiple `assert` ok if they verify the same thing)
 - No `pytest.mark.xfail` without written justification
-- `pytest.mark.skip` is only valid on stubs (`reason="not yet implemented"`) — remove it when implementing
+- `pytest.mark.skip(reason="not yet implemented")` is only valid on stubs — remove it when implementing
 - Test data embedded directly in the test, not loaded from external files
 
 ### Test Tool Decision
@@ -396,7 +389,7 @@ Extra tests in `tests/unit/` are allowed freely (coverage, edge cases, etc.) —
 
 ## Signature Design
 
-Signatures are written during Step 2 (Architecture) and refined during Step 3 (RED). They live directly in the package `.py` files — never in the `.feature` file.
+<package> signatures are written during Step 2 (Architecture) and refined during Step 3 (RED). They live directly in the package `.py` files — never in the `.feature` file.
 
 Key rules:
 - Bodies are always `...` in the architecture stub
@@ -420,4 +413,4 @@ class EmailAddress:
 class UserRepository(Protocol):
     def save(self, user: "User") -> None: ...
     def find_by_email(self, email: EmailAddress) -> "User | None": ...
-```
+```
diff --git a/.opencode/skills/scope/SKILL.md b/.opencode/skills/scope/SKILL.md
@@ -149,8 +149,7 @@ Commit: `feat(discovery): baseline project discovery`
 
 1. Write the **Session 1 Synthesis** in the `.feature` file: summarize the key entities, their relationships, and the constraints that emerged.
 2. Present the synthesis to the stakeholder. Stakeholder confirms or corrects. PO refines until approved.
-3. Run a **silent pre-mortem** on the confirmed synthesis.
-4. Mark `Template §1: CONFIRMED`. This unlocks Session 2.
+3. Mark `Template §1: CONFIRMED`. This unlocks Session 2.
 
 ### Session 2 — Behavior Groups / Big Picture for This Feature
 
diff --git a/AGENTS.md b/AGENTS.md
@@ -90,8 +90,7 @@ docs/features/
   completed/<feature-name>.feature    ← file moves here at Step 5
 
 docs/architecture/
-  STEP2-ARCH.md                       ← Step 2 reference diagram (canonical)
-  adr-NNN-<title>.md                  ← one per significant architectural decision
+  adr.md                  ← one per significant architectural decision
 
 tests/
   features/<feature-name>/
@@ -112,25 +111,14 @@ Tests in `tests/unit/` are software-engineer-authored extras not covered by any
 tests/features/<feature-name>/<rule_slug>_test.py
 ```
 
-### Function Naming
-
-```python
-def test_<rule_slug>_<8char_hex>() -> None:
-```
-
-### Docstring Format (mandatory)
+### Stub Format (mandatory)
 
 ```python
 @pytest.mark.skip(reason="not yet implemented")
-def test_wall_bounce_a3f2b1c4() -> None:
+def test_<feature_slug>_<@id>() -> None:
     """
-    Given: A ball moving upward reaches y=0
-    When: The physics engine processes the next frame
-    Then: The ball velocity y-component becomes positive
+    <@id steps raw text including new lines>
     """
-    # Given
-    # When
-    # Then
 ```
 
 ### Markers
@@ -155,41 +143,39 @@ uv run task test-fast
 # Run full test suite with coverage
 uv run task test
 
-# Run slow tests only
-uv run task test-slow
+# Run tests with coverage report generation
+uv run task test-build
 
 # Lint and format
 uv run task lint
 
 # Type checking
 uv run task static-check
 
-# Serve documentation
-uv run task doc-serve
+# Build documentation
+uv run task doc-build
 ```
 
 ## Code Quality Standards
 
-- **Principles (in priority order)**: YAGNI > KISS > DRY > SOLID > Object Calisthenics
-- **Linting**: ruff, Google docstring convention, `noqa` forbidden
+- **Principles (in priority order)**: YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns > complex code > complicate code > failing code > no code
+- **Linting**: ruff format, ruff check, Google docstring convention, `noqa` forbidden
 - **Type checking**: pyright, 0 errors required
 - **Coverage**: 100% (measured against your actual package)
-- **Function length**: ≤ 20 lines
-- **Class length**: ≤ 50 lines
+- **Function length**: ≤ 20 lines (code lines only, excluding docstrings)
+- **Class length**: ≤ 50 lines (code lines only, excluding docstrings)
 - **Max nesting**: 2 levels
 - **Instance variables**: ≤ 2 per class *(exception: dataclasses, Pydantic models, value objects, and TypedDicts are exempt — they may carry as many fields as the domain requires)*
 - **Semantic alignment**: tests must operate at the same abstraction level as the acceptance criteria they cover
-- **Integration tests**: multi-component features require at least one test in `tests/features/` that exercises the public entry point end-to-end
 
 ### Software-Engineer Quality Gate Priority Order
 
 During Step 3 (TDD Loop), correctness priorities are:
 
-1. **Design correctness** — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns
+1. **Design correctness** — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriated design patterns > complex code > complicated code > failing code > no code
 2. **One test green** — the specific test under work passes, plus `test-fast` still passes
-3. **Reviewer code-design check** — reviewer verifies design + semantic alignment (no lint/pyright/coverage)
-4. **Commit** — only after reviewer APPROVED
-5. **Quality tooling** — `lint`, `static-check`, full `test` with coverage run only at software-engineer handoff (before Step 5)
+3. **Reviewer code-design check** — reviewer verifies design + semantic alignment (no lint/pyright/coverage yet)
+5. **Quality tooling** — `lint`, `static-check`, full `test` with coverage run only at software-engineer handoff (before Step 4)
 
 Design correctness is far more important than lint/pyright/coverage compliance. A well-designed codebase with minor lint issues is better than a lint-clean codebase with poor design.
 
@@ -200,10 +186,6 @@ Design correctness is far more important than lint/pyright/coverage compliance.
 - Both are required. All-green automated checks are necessary but not sufficient for APPROVED.
 - Reviewer defaults to REJECTED unless correctness is proven.
 
-## Deprecation Process
-
-This template does not support deprecation. Criteria changes are handled by adding new Examples with new `@id` tags.
-
 ## Release Management
 
 Version format: `v{major}.{minor}.{YYYYMMDD}`
@@ -212,13 +194,13 @@ Version format: `v{major}.{minor}.{YYYYMMDD}`
 - Same-day second release: increment minor, keep same date
 - Each release gets a unique adjective-animal name
 
-Use `@software-engineer /skill git-release` for the full release process.
+Use `@software-engineer /skill git-release` for the full release process. When requested by the stakeholder
 
 ## Session Management
 
 Every session: load `skill session-workflow`. Read `TODO.md` first, update it at the end.
 
-`TODO.md` is a session bookmark — not a project journal. See `docs/workflow.md` for the full structure including the Cycle State and Self-Declaration blocks used during Step 4.
+`TODO.md` is a session bookmark — not a project journal. See `docs/workflow.md` for the full structure including the Cycle State and Self-Declaration blocks used during Step 3.
 
 ## Setup
 
diff --git a/docs/workflow.md b/docs/workflow.md
diff --git a/pyproject.toml b/pyproject.toml