codebeltnet · gimlichael · Mar 23, 2026 · Mar 22, 2026 · Mar 22, 2026 · Mar 22, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -17,14 +17,16 @@ When running evals or testing skills, create all workspaces in a temp location:
 
 **Why:** Eval artifacts — branches, commits, local git config — leak into the real repo history and are painful to clean up. The skill source lives in a git repo; eval output does not belong here.
 
-## Per-Skill Evals
-
-Every repo-managed skill must include its own `evals/evals.json` file at `skills/<name>/evals/evals.json`.
-
-- Treat this as a required artifact for every first-party skill in this repo
-- Run evals **per skill**, not as one shared repo-level eval file
-- Run evals from a temp workspace such as `$env:TEMP/<skill-name>-workspace/`, never from inside this repository
-- When creating or modifying a repo-managed skill, run both `with_skill` and `without_skill` comparison executions from that temp workspace before the work is considered complete
+## Per-Skill Evals
+
+Every repo-managed skill must include its own `evals/evals.json` file at `skills/<name>/evals/evals.json`.
+
+- Treat this as a required artifact for every first-party skill in this repo
+- Eval entries may include an optional `files` array of skill-relative fixture paths such as `evals/files/example.md`
+- When `files` is present, keep the paths relative to `skills/<name>/` and stage those fixtures into the temp eval workspace for both `with_skill` and `without_skill` runs
+- Run evals **per skill**, not as one shared repo-level eval file
+- Run evals from a temp workspace such as `$env:TEMP/<skill-name>-workspace/`, never from inside this repository
+- When creating or modifying a repo-managed skill, run both `with_skill` and `without_skill` comparison executions from that temp workspace before the work is considered complete
 - For a brand-new skill, the baseline is `without_skill`; for an existing skill, use either `without_skill` or the previous/original skill version as the baseline, matching the `skill-creator` benchmark flow
 - Generate the human-review artifacts too: aggregate the comparison into `benchmark.json` and launch `eval-viewer/generate_review.py` from the installed Anthropic `skill-creator` copy (typically under `~/.agents/skills/skill-creator/` or `~/.claude/skills/skill-creator/`) so the user can inspect `Outputs` and `Benchmark` before sign-off
 - Deterministic scaffold/template skills must keep local deterministic validators as well; evals supplement validators, they do not replace them
@@ -80,20 +82,22 @@ After changing any repo-managed skill, sync the touched files across the repo co
 Every skill follows this layout:
 
 ```
-skills/<name>/
-├── SKILL.md              # Required — the skill definition (loaded by Claude)
-├── FORMS.md              # Optional — structured form fields for parameter collection
-├── assets/               # Optional — file templates, fonts, icons used in output
-│   └── <variant>/        # Group by variant when a skill supports multiple (e.g. library/, app/)
-├── scripts/              # Optional — executable code (Python, Bash, etc.)
-├── references/           # Optional — detailed reference docs the agent consults during generation
-└── evals/                # Required for repo-managed skills — per-skill eval prompts and expectations
-```
+skills/<name>/
+├── SKILL.md              # Required — the skill definition (loaded by Claude)
+├── FORMS.md              # Optional — structured form fields for parameter collection
+├── assets/               # Optional — file templates, fonts, icons used in output
+│   └── <variant>/        # Group by variant when a skill supports multiple (e.g. library/, app/)
+├── scripts/              # Optional — executable code (Python, Bash, etc.)
+├── references/           # Optional — detailed reference docs the agent consults during generation
+└── evals/                # Required for repo-managed skills — per-skill eval prompts and expectations
+    └── files/            # Optional — input fixtures referenced by evals/evals.json files[]
+```
 
 - `SKILL.md` is the entry point — it contains the workflow, conventions, and step-by-step instructions
-- `assets/` holds file templates, fonts, icons, and other static content used in output (the agent reads and substitutes placeholders)
-- `references/` holds detailed specs that `SKILL.md` references but are too long to inline
-- `evals/` holds the per-skill `evals.json` definitions used to verify that the skill still works after changes
+- `assets/` holds file templates, fonts, icons, and other static content used in output (the agent reads and substitutes placeholders)
+- `references/` holds detailed specs that `SKILL.md` references but are too long to inline
+- `evals/` holds the per-skill `evals.json` definitions used to verify that the skill still works after changes
+- `evals/files/` holds optional skill-local fixture inputs referenced by `evals/evals.json` when a benchmark needs attached source material
 
 ## Template Files Are Literal
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,26 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 
 ## [Unreleased]
 
+## [0.3.2] - 2026-03-23
+
+This is a minor release introducing the markdown-illustrator skill for visualization-first document analysis, with expanded repository branding, comprehensive skill documentation, and foundational eval fixture file infrastructure across the skill suite.
+
+### Added
+
+- `markdown-illustrator` skill that reads markdown files and generates a document-wide Visual Brief plus one compiled diffusion-ready prompt, with zero follow-up questions and inferred visual strategy defaults (hero-focused cinematic editorial by default, steerable toward whiteboard, blackboard, isometric, or blueprint styles),
+- Hero image assets for repository branding at `/assets/hero.jpg` and for individual skills (`trunk-first-repo/assets/hero.jpg`),
+- Optional `files` array support in eval infrastructure (`evals/evals.json`) to stage skill-relative fixture paths into temporary eval workspaces for both `with_skill` and `without_skill` runs,
+- Eval fixtures for `markdown-illustrator` with real-world examples (microservices architecture, product launch, transformers explanation),
+- Benchmark contract reference documentation in `skill-creator-agnostic` with fixture guidance patterns.
+
+### Changed
+
+- Enhanced README with markdown-illustrator installation snippet and comprehensive "Why markdown-illustrator?" section explaining visual-brief anchoring, inferred defaults, good trigger examples, and reference visual directions for users,
+- Extended AGENTS.md with detailed eval fixture file documentation, explaining the optional `files` property and fixture staging workflow for skill evaluation,
+- Updated CONTRIBUTING.md with eval fixture guidance and temp-workspace isolation setup instructions,
+- Improved validation script (`validate-skill-templates.ps1`) to enforce fixture file path checks and consistency across skills,
+- Applied fixture guidance pattern to `skill-creator-agnostic` with benchmark contract examples and reference documentation.
+
 ## [0.3.1] - 2026-03-19
 
 This is a patch release introducing three new NuGet-focused skills and runner-agnostic benchmark tooling, with enhanced release automation, comprehensive documentation standardization, and skill refinements.
@@ -82,7 +102,8 @@ This is a minor release that introduces two complementary git workflow skills, e
 
 - Improved scaffold fidelity with hidden `.bot` asset preservation, explicit UTF-8 and BOM handling, and checks aimed at preventing mojibake or incomplete generated output.
 
-[Unreleased]: https://github.com/codebeltnet/agentic/compare/v0.3.1...HEAD
+[Unreleased]: https://github.com/codebeltnet/agentic/compare/v0.3.2...HEAD
+[0.3.2]: https://github.com/codebeltnet/agentic/compare/v0.3.1...v0.3.2
 [0.3.1]: https://github.com/codebeltnet/agentic/compare/v0.3.0...v0.3.1
 [0.3.0]: https://github.com/codebeltnet/agentic/compare/v0.2.0...v0.3.0
 [0.2.0]: https://github.com/codebeltnet/agentic/compare/v0.1.0...v0.2.0

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -68,18 +68,21 @@ The `description` is the most important field — it's how the AI decides to loa
 
 Evals let you verify the skill works and measure improvement over a baseline. Every repo-managed skill in this repository must include `evals/evals.json`:
 
-```json
-{
-  "skill_name": "your-skill-name",
-  "evals": [
-    {
-      "id": 0,
-      "prompt": "The user message to test against",
-      "expected_output": "What a correct response looks like — used for manual or automated grading"
-    }
-  ]
-}
-```
+```json
+{
+  "skill_name": "your-skill-name",
+  "evals": [
+    {
+      "id": 0,
+      "prompt": "The user message to test against",
+      "expected_output": "What a correct response looks like — used for manual or automated grading",
+      "files": ["evals/files/example.md"]
+    }
+  ]
+}
+```
+
+`files` is optional. When present, list one or more fixture files relative to `skills/<name>/`. A common pattern is to store those fixtures under `evals/files/` so benchmark runners can copy or attach the same source inputs for both `with_skill` and `without_skill` runs.
 
 Aim for 3–5 evals that cover distinct scenarios: happy path, edge cases, and cases where the skill should *not* do something.
 
@@ -130,9 +133,10 @@ powershell -NoProfile -ExecutionPolicy Bypass -File .\scripts\validate-skill-tem
 - [ ] `SKILL.md` has valid front matter with `name` and `description`
 - [ ] Skill is stack-agnostic (or clearly scoped to a specific tech in the name/description)
 - [ ] Examples are generic — no personal emails, usernames, or project-specific identifiers
-- [ ] At least one eval in `evals/evals.json`
-- [ ] The skill's `evals/evals.json` exists and its `skill_name` matches the folder/frontmatter name
-- [ ] Skill changes were benchmarked from a temp workspace with both `with_skill` and `without_skill` runs
+- [ ] At least one eval in `evals/evals.json`
+- [ ] The skill's `evals/evals.json` exists and its `skill_name` matches the folder/frontmatter name
+- [ ] Any optional `files` entries in `evals/evals.json` point to real fixture files under the same skill folder
+- [ ] Skill changes were benchmarked from a temp workspace with both `with_skill` and `without_skill` runs
 - [ ] `benchmark.json` and `eval-viewer/generate_review.py` from the installed Anthropic `skill-creator` copy were used so a human could compare `Outputs` and `Benchmark`
 - [ ] `scripts/validate-skill-templates.ps1` passes for the current working tree when changing scaffold or template behavior
 - [ ] If CI is enabled for the branch, the GitHub Actions validation job passes too