Python Project Template

A Python template to quickstart any project with a production-ready workflow, quality tooling, and AI-assisted development.

Workflow Overview

Features flow through 5 steps with a WIP limit of 1 feature at a time. The filesystem enforces WIP:

docs/features/backlog/<feature-name>.feature — features waiting to be worked on
docs/features/in-progress/<feature-name>.feature — exactly one feature being built right now
docs/features/completed/<feature-name>.feature — accepted and shipped features

STEP 1: SCOPE          (product-owner)  → discovery + Gherkin stories + criteria
STEP 2: ARCH           (software-engineer)      → read all features + existing package files, write domain stubs (signatures only, no bodies); ADRs to docs/architecture/
STEP 3: TDD LOOP       (software-engineer)      → RED → GREEN → REFACTOR, one @id at a time
STEP 4: VERIFY         (reviewer)       → run all commands, review code
STEP 5: ACCEPT         (product-owner)  → demo, validate, move folder to completed/

PO picks the next feature from backlog. Software-engineer never self-selects.

Verification is adversarial. The reviewer's job is to try to break the feature, not to confirm it works. The default hypothesis is "it might be broken despite green checks; prove otherwise."

Roles

Product Owner (PO) — AI agent. Interviews the stakeholder, writes discovery docs, Gherkin features, and acceptance criteria. Accepts or rejects deliveries.
Stakeholder — Human. Answers PO's questions, provides domain knowledge, approves PO syntheses to confirm discovery is complete.
Software Engineer — AI agent. Architecture, test bodies, implementation, git. Never edits .feature files. Escalates spec gaps to PO.
Reviewer — AI agent. Adversarial verification. Reports spec gaps to PO.

Agents

product-owner — defines scope (4 phases), picks features, accepts deliveries
software-engineer — architecture, tests, code, git, releases (Steps 2-3 + release)
reviewer — runs commands and reviews code at Step 4, produces APPROVED/REJECTED report
setup-project — one-time setup to initialize a new project from this template

Skills

Skill	Used By	Step
`session-workflow`	all agents	every session
`feature-selection`	product-owner	between features (idle state)
`scope`	product-owner	1
`implementation`	software-engineer	2, 3
`design-patterns`	software-engineer	2, 3 (on-demand, when GoF pattern needed)
`refactor`	software-engineer	3 (REFACTOR phase + preparatory refactoring)
`verify`	reviewer	4
`code-quality`	software-engineer	pre-handoff (redirects to `verify`)
`pr-management`	software-engineer	5
`git-release`	software-engineer	5
`create-skill`	software-engineer	meta
`create-agent`	human-user	meta

Session protocol: Every agent loads skill session-workflow at session start. Load additional skills as needed for the current step.

Step 1 — SCOPE (4 Phases)

Phase 1 — Project Discovery (once per project)

PO creates docs/features/discovery.md using the 3-session template. Skip Phase 1 entirely if discovery.md Status is BASELINED. To add features to an existing project: append new questions to Session 1 and re-fill from there.

Session 1 — Individual scope elicitation: 5Ws + Success + Failure + Out-of-scope. Gap-finding per answer using CIT, Laddering, and CI Perspective Change. PO writes synthesis; stakeholder confirms or corrects. PO runs silent pre-mortem on confirmed synthesis. Template §1 must be confirmed before Session 2.
Session 2 — Behavior groups / big picture: questions target behavior groups and cross-cutting concerns. Gap-finding per group. Level 2 synthesis when transitioning between groups. Template §2 must be complete before Session 3.
Session 3 — Synthesis approval + feature derivation: PO produces full synthesis of all behavior groups; stakeholder approves or corrects (PO refines until approved). Domain analysis: nouns/verbs → subject areas → FDD "Action object" feature names. Create backlog/<name>.feature stubs. Write Status: BASELINED to discovery.md.

Phase 2 — Feature Discovery (per feature)

Each .feature file has its own 3-session discovery template in its description. Sessions are enforced by the template: each section must be filled before proceeding to the next.

Session 1 — Individual entity elicitation: populate Entities table from project discovery; generate questions from entity gaps using CIT, Laddering, CI Perspective Change. PO writes synthesis; stakeholder confirms. Silent pre-mortem on confirmed synthesis.
Session 2 — Behavior groups / big picture: questions target behavior groups within this feature. Gap-finding per group. Level 2 group transition summaries.
Session 3 — Feature synthesis approval + story derivation: PO produces synthesis of feature scope and behavior groups; stakeholder approves or corrects (PO refines until approved). Story candidates become candidate user stories (Rules). Write Status: BASELINED to .feature discovery section.

Decomposition check: after Session 3, does this feature span >2 distinct concerns OR have >8 candidate Examples? YES → split into separate .feature files, re-run Phase 2. NO → proceed.

Phase 3 — Stories (PO alone)

Story candidates from Phase 2 Session 2 → one Rule: block per user story. Each Rule: has the user story header (As a / I want / So that) as its description — no Example: blocks yet. INVEST gate: all 6 letters must pass. Commit: feat(stories): write user stories for <name>

Phase 4 — Criteria (PO alone)

Pre-mortem per Rule (all Rules must be checked before writing Examples). Write Example: blocks — declarative Given/When/Then, MoSCoW triage (Must/Should/Could) per Example. Review checklist (4.3). Commit: feat(criteria): write acceptance criteria for <name>

Criteria are frozen: no Example: changes after commit. Adding new Example with new @id replaces old.

Filesystem Structure

docs/features/
  discovery.md                        ← project-level (Status + Questions only)
  backlog/<feature-name>.feature      ← one per feature; discovery + Rules + Examples
  in-progress/<feature-name>.feature  ← file moves here at Step 2
  completed/<feature-name>.feature    ← file moves here at Step 5

docs/architecture/
  STEP2-ARCH.md                       ← Step 2 reference diagram (canonical)
  adr-NNN-<title>.md                  ← one per significant architectural decision

tests/
  features/<feature-name>/
    <rule_slug>_test.py               ← one per Rule: block, software-engineer-written
  unit/
    <anything>_test.py                ← software-engineer-authored extras (no @id traceability)

Tests in tests/unit/ are software-engineer-authored extras not covered by any @id criterion. Any test style is valid — plain assert or Hypothesis @given. Use Hypothesis when the test covers a property that holds across many inputs (mathematical invariants, parsing contracts, value object constraints). Use plain pytest for specific behaviors or single edge cases discovered during refactoring.

@pytest.mark.slow is mandatory on every @given-decorated test (Hypothesis is genuinely slow)
@example(...) is optional but encouraged when using @given to document known corner cases
No @id tags — tests with @id belong in tests/features/, written by software-engineer

Test File Layout

tests/features/<feature-name>/<rule_slug>_test.py

Function Naming

def test_<rule_slug>_<8char_hex>() -> None:

Docstring Format (mandatory)

@pytest.mark.skip(reason="not yet implemented")
def test_wall_bounce_a3f2b1c4() -> None:
    """
    Given: A ball moving upward reaches y=0
    When: The physics engine processes the next frame
    Then: The ball velocity y-component becomes positive
    """
    # Given
    # When
    # Then

Markers

@pytest.mark.slow — takes > 50ms; applied to Hypothesis tests and any test with I/O, network, or DB
@pytest.mark.deprecated — auto-skipped by conftest; used for superseded Examples

Development Commands

# Install dependencies
uv sync --all-extras

# Run the application (for humans)
uv run task run

# Run the application with timeout (for agents — prevents hanging)
timeout 10s uv run task run

# Run tests (fast, no coverage)
uv run task test-fast

# Run full test suite with coverage
uv run task test

# Run slow tests only
uv run task test-slow

# Lint and format
uv run task lint

# Type checking
uv run task static-check

# Serve documentation
uv run task doc-serve

Code Quality Standards

Principles (in priority order): YAGNI > KISS > DRY > SOLID > Object Calisthenics
Linting: ruff, Google docstring convention, noqa forbidden
Type checking: pyright, 0 errors required
Coverage: 100% (measured against your actual package)
Function length: ≤ 20 lines
Class length: ≤ 50 lines
Max nesting: 2 levels
Instance variables: ≤ 2 per class (exception: dataclasses, Pydantic models, value objects, and TypedDicts are exempt — they may carry as many fields as the domain requires)
Semantic alignment: tests must operate at the same abstraction level as the acceptance criteria they cover
Integration tests: multi-component features require at least one test in tests/features/ that exercises the public entry point end-to-end

Software-Engineer Quality Gate Priority Order

During Step 3 (TDD Loop), correctness priorities are:

Design correctness — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns
One test green — the specific test under work passes, plus test-fast still passes
Reviewer code-design check — reviewer verifies design + semantic alignment (no lint/pyright/coverage)
Commit — only after reviewer APPROVED
Quality tooling — lint, static-check, full test with coverage run only at software-engineer handoff (before Step 5)

Design correctness is far more important than lint/pyright/coverage compliance. A well-designed codebase with minor lint issues is better than a lint-clean codebase with poor design.

Verification Philosophy

Automated checks (lint, typecheck, coverage) verify syntax-level correctness — the code is well-formed.
Human review (semantic alignment, code review, manual testing) verifies semantic-level correctness — the code does what the user needs.
Both are required. All-green automated checks are necessary but not sufficient for APPROVED.
Reviewer defaults to REJECTED unless correctness is proven.

Deprecation Process

This template does not support deprecation. Criteria changes are handled by adding new Examples with new @id tags.

Release Management

Version format: v{major}.{minor}.{YYYYMMDD}

Minor bump for new features; major bump for breaking changes
Same-day second release: increment minor, keep same date
Each release gets a unique adjective-animal name

Use @software-engineer /skill git-release for the full release process.

Session Management

Every session: load skill session-workflow. Read TODO.md first, update it at the end.

TODO.md is a session bookmark — not a project journal. See docs/workflow.md for the full structure including the Cycle State and Self-Declaration blocks used during Step 4.

Setup

To initialize a new project from this template:

@setup-project

The setup agent will ask for your project name, GitHub username, author info, and configure all template placeholders.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Project Template

Workflow Overview

Roles

Agents

Skills

Step 1 — SCOPE (4 Phases)

Phase 1 — Project Discovery (once per project)

Phase 2 — Feature Discovery (per feature)

Phase 3 — Stories (PO alone)

Phase 4 — Criteria (PO alone)

Filesystem Structure

Test File Layout

Function Naming

Docstring Format (mandatory)

Markers

Development Commands

Code Quality Standards

Software-Engineer Quality Gate Priority Order

Verification Philosophy

Deprecation Process

Release Management

Session Management

Setup

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Python Project Template

Workflow Overview

Roles

Agents

Skills

Step 1 — SCOPE (4 Phases)

Phase 1 — Project Discovery (once per project)

Phase 2 — Feature Discovery (per feature)

Phase 3 — Stories (PO alone)

Phase 4 — Criteria (PO alone)

Filesystem Structure

Test File Layout

Function Naming

Docstring Format (mandatory)

Markers

Development Commands

Code Quality Standards

Software-Engineer Quality Gate Priority Order

Verification Philosophy

Deprecation Process

Release Management

Session Management

Setup