feat: Experiment Type Registry — YAML-Driven, Extensible by Trecek · Pull Request #795 · TalonT-Org/AutoSkillit

Trecek · 2026-04-13T05:57:17Z

Summary

Externalize the 5 experiment types currently hardcoded in
src/autoskillit/skills_extended/review-design/SKILL.md into YAML files under
src/autoskillit/recipes/experiment-types/. Add a loader module
src/autoskillit/recipe/experiment_type_registry.py that merges bundled types with
user-defined overrides from .autoskillit/experiment-types/ (full replacement — no
merging). Update review-design/SKILL.md to load the registry at Step 0 and reference
loaded data throughout, replacing all hardcoded tables. Behavioral change to existing
5 types: zero.

Architecture Impact

Data Lineage Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph Storage ["YAML Storage (Disk)"]
        direction TB
        BUNDLED["★ recipes/experiment-types/<br/>━━━━━━━━━━<br/>benchmark.yaml<br/>causal_inference.yaml<br/>configuration_study.yaml<br/>exploratory.yaml<br/>robustness_audit.yaml"]
        USER["  .autoskillit/experiment-types/<br/>━━━━━━━━━━<br/>optional user overrides<br/>same-name = full replace"]
    end

    subgraph Parsing ["★ Transformation Pipeline (experiment_type_registry.py)"]
        direction TB
        LOAD_YAML["load_yaml(path)<br/>━━━━━━━━━━<br/>core.io YAML reader<br/>file → raw dict"]
        PARSE["★ _parse_experiment_type()<br/>━━━━━━━━━━<br/>validates 'name' field<br/>dict → ExperimentTypeSpec<br/>(name, triggers, weights,<br/>lenses, red_team, l1_sev)"]
        SCAN_BUNDLED["★ _load_types_from_dir(bundled)<br/>━━━━━━━━━━<br/>glob *.yaml, sorted<br/>→ dict[name, spec]"]
        SCAN_USER["★ _load_types_from_dir(user)<br/>━━━━━━━━━━<br/>glob *.yaml if dir exists<br/>→ dict[name, spec]"]
        MERGE["★ load_all_experiment_types()<br/>━━━━━━━━━━<br/>bundled dict +<br/>dict.update(user)<br/>user replaces on name clash"]
    end

    subgraph Registry ["In-Memory Registry (Primary Storage)"]
        SPECS["★ ExperimentTypeSpec instances<br/>━━━━━━━━━━<br/>name: str<br/>classification_triggers: list<br/>dimension_weights: dict<br/>applicable_lenses: dict<br/>red_team_focus: dict<br/>l1_severity: dict"]
        REG_MAP["★ dict[str, ExperimentTypeSpec]<br/>━━━━━━━━━━<br/>key = experiment type name<br/>source of truth for runtime"]
    end

    subgraph PublicAPI ["● Public API (recipe/__init__.py)"]
        RECIPE_INIT["● recipe/__init__.py<br/>━━━━━━━━━━<br/>re-exports ExperimentTypeSpec<br/>re-exports load_all_experiment_types<br/>in autoskillit.recipe.__all__"]
    end

    subgraph Consumer ["● Skill Consumer (review-design Step 0)"]
        STEP0["● review-design/SKILL.md<br/>Step 0d<br/>━━━━━━━━━━<br/>calls load_all_experiment_types()<br/>reads classification_triggers<br/>matches experiment_type<br/>reads dimension_weights<br/>for subagent spawn decisions"]
    end

    %% PRIMARY DATA FLOWS %%
    BUNDLED -->|"load_yaml(path)"| LOAD_YAML
    USER -->|"load_yaml(path) if exists"| LOAD_YAML
    LOAD_YAML -->|"raw dict"| PARSE
    PARSE -->|"ExperimentTypeSpec"| SCAN_BUNDLED
    PARSE -->|"ExperimentTypeSpec"| SCAN_USER
    SCAN_BUNDLED -->|"dict[name→spec]"| MERGE
    SCAN_USER -->|"dict[name→spec]"| MERGE
    MERGE -->|"creates"| SPECS
    SPECS -->|"keyed by name"| REG_MAP
    REG_MAP -->|"re-exported via"| RECIPE_INIT
    REG_MAP -->|"read by Step 0"| STEP0

    %% CLASS ASSIGNMENTS %%
    class BUNDLED newComponent;
    class USER cli;
    class LOAD_YAML handler;
    class PARSE newComponent;
    class SCAN_BUNDLED newComponent;
    class SCAN_USER newComponent;
    class MERGE newComponent;
    class SPECS newComponent;
    class REG_MAP stateNode;
    class RECIPE_INIT integration;
    class STEP0 phase;

Module Dependency Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph Tests ["TESTS"]
        direction LR
        TEST_REG["★ test_experiment_type_registry.py<br/>━━━━━━━━━━<br/>tests/recipe/<br/>Imports: ExperimentTypeSpec,<br/>load_all_experiment_types"]
    end

    subgraph L2_Recipe ["L2 — RECIPE PACKAGE"]
        direction TB
        INIT["● recipe/__init__.py<br/>━━━━━━━━━━<br/>Re-exports: ExperimentTypeSpec,<br/>load_all_experiment_types<br/>(public gateway)"]
        ETR["★ recipe/experiment_type_registry.py<br/>━━━━━━━━━━<br/>ExperimentTypeSpec (dataclass)<br/>load_all_experiment_types()<br/>_load_types_from_dir()<br/>_parse_experiment_type()"]
        OTHER["recipe/{io,schema,validator,…}.py<br/>━━━━━━━━━━<br/>Unchanged siblings<br/>(context only)"]
    end

    subgraph L0_Core ["L0 — CORE (zero autoskillit imports)"]
        direction LR
        IO["core/io.py<br/>━━━━━━━━━━<br/>load_yaml()"]
        PATHS["core/paths.py<br/>━━━━━━━━━━<br/>pkg_root()"]
        CORE_INIT["core/__init__.py<br/>━━━━━━━━━━<br/>Re-export hub<br/>(high fan-in)"]
    end

    subgraph DataFiles ["BUNDLED DATA (recipes/experiment-types/)"]
        direction LR
        YAMLS["★ benchmark.yaml<br/>★ causal_inference.yaml<br/>★ configuration_study.yaml<br/>★ exploratory.yaml<br/>★ robustness_audit.yaml<br/>━━━━━━━━━━<br/>Loaded at runtime via<br/>BUNDLED_EXPERIMENT_TYPES_DIR"]
    end

    subgraph Stdlib ["LAYER 0 — STDLIB / EXTERNAL"]
        direction LR
        STDLIB["dataclasses, pathlib<br/>━━━━━━━━━━<br/>No new third-party deps"]
    end

    %% TEST → RECIPE (direct module, bypasses package gateway) %%
    TEST_REG -->|"imports directly from<br/>recipe.experiment_type_registry"| ETR

    %% RECIPE/__init__ → new module (re-export) %%
    INIT -->|"re-exports<br/>ExperimentTypeSpec,<br/>load_all_experiment_types"| ETR

    %% new module → core (L0 dependency — valid downward) %%
    ETR -->|"from autoskillit.core import<br/>load_yaml, pkg_root"| CORE_INIT
    CORE_INIT -->|"re-exports"| IO
    CORE_INIT -->|"re-exports"| PATHS

    %% new module → stdlib %%
    ETR -->|"dataclasses, pathlib"| STDLIB

    %% new module → bundled YAML files (runtime filesystem read) %%
    ETR -.->|"runtime glob:<br/>BUNDLED_EXPERIMENT_TYPES_DIR"| YAMLS

    %% unchanged siblings (context only — no new imports to/from ETR) %%
    OTHER -.->|"no coupling to<br/>new module"| ETR

    %% CLASS ASSIGNMENTS %%
    class TEST_REG newComponent;
    class ETR newComponent;
    class INIT handler;
    class OTHER phase;
    class CORE_INIT stateNode;
    class IO,PATHS handler;
    class YAMLS output;
    class STDLIB integration;

Closes #788

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-788-20260412-214910-284400/.autoskillit/temp/make-plan/experiment_type_registry_plan_2026-04-12_215800.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step	uncached	output	cache_read	cache_write	count	time
plan	167	23.1k	760.6k	61.4k	1	11m 16s
verify	134	12.6k	734.0k	49.3k	1	3m 50s
implement	2.2k	16.2k	1.8M	54.7k	1	5m 58s
fix	270	21.3k	1.8M	66.6k	1	11m 41s
prepare_pr	54	5.9k	165.2k	28.6k	1	1m 29s
run_arch_lenses	146	10.4k	494.5k	58.3k	2	2m 58s
compose_pr	59	5.5k	185.0k	23.9k	1	1m 21s
Total	3.0k	95.0k	6.0M	342.7k		38m 36s

Creates src/autoskillit/recipes/experiment-types/ with benchmark.yaml, configuration_study.yaml, causal_inference.yaml, robustness_audit.yaml, and exploratory.yaml — each containing classification_triggers, dimension_weights, applicable_lenses, red_team_focus, and l1_severity. Values match the hardcoded tables in review-design/SKILL.md exactly.

Adds ExperimentTypeSpec dataclass and load_all_experiment_types() loader that merges bundled YAML types with optional user-defined overrides from .autoskillit/experiment-types/ (full replacement, no field merging). Includes comprehensive test suite covering all 5 bundled types, weight values, override semantics, and user extension via new type names.

Adds ExperimentTypeSpec and load_all_experiment_types to the public surface of autoskillit.recipe via __init__.py and __all__.

…e registry Step 0 now loads the registry from bundled YAML files and optional user overrides. All hardcoded classification tables, dimension weight matrices, l1_severity rubrics, and red_team_focus type-specific lists replaced with references to registry data. RT_MAX_SEVERITY built dynamically from registry. Behavioral change to existing 5 types: zero.

Import ordering in recipe/__init__.py (experiment_type_registry sorted alphabetically among imports) and long assert strings in test file reformatted to ruff-compliant style.

…mplementation - Bump recipe/ subpackage exemption from 30 to 31 (added experiment_type_registry.py) - Add experiment_type_registry.py to CLAUDE.md architecture section - Update SKILL.md dimension weights to use weight=H/M/L/S notation for test compatibility - Add l1_severity calibration anchors in Step 2 (causal_inference/benchmark/exploratory) - Update test_weight_matrix_has_eight_dimensions to check bundled YAML files instead of removed table - Update test_data_acquisition_not_l_weight to check YAML files instead of removed table - Update test_agent_implementability_weight_row to check YAML files instead of removed table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nsions parents[2] points to src/autoskillit/, not the project root Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek

AutoSkillit PR Review — Verdict: changes_requested

tests/recipe/test_experiment_type_registry.py

tests/skills/test_review_design_contracts.py

Trecek · 2026-04-13T06:10:29Z

tests/recipe/test_experiment_type_registry.py

+    assert exp.l1_severity["hypothesis_falsifiability"] == "info"
+
+
+def test_no_project_dir_returns_bundled_only() -> None:


[warning] slop: test_no_project_dir_returns_bundled_only duplicates test_all_bundled_types_present — both call load_all_experiment_types() with no args and assert the same key set. Remove or merge the duplicate.

tests/skills/test_review_design_guards.py

Trecek · 2026-04-13T06:10:29Z

tests/skills/test_review_design_guards.py

-            assert "M" in line or "H" in line, (
-                "data_acquisition must have M or H weight in at least one experiment type"
-            )
+    """data_acquisition must be M-weight minimum to influence verdict in at least one type."""


[warning] tests: Weak assertion — test_data_acquisition_not_l_weight uses an early-return pattern that passes as soon as a single type has M/H weight. It does not verify the full expected distribution per type. Consider asserting specific weights per type (as done in test_agent_implementability_weight_row).

Investigated — this is intentional. The test name test_data_acquisition_not_l_weight and docstring both specify 'at least one type' semantics: the guard is that data_acquisition is NOT universally L-weight. The early-return pattern is the correct implementation of that contract (see commit 35ce6f5 where this was deliberately rewritten from SKILL.md text parsing to this YAML-based 'at least one' check). Changing to a full distribution assertion would require encoding expected weights for all 5 types and is a design decision outside this PR's scope.

tests/skills/test_review_design_guards.py

src/autoskillit/recipe/experiment_type_registry.py

Trecek · 2026-04-13T06:10:29Z

src/autoskillit/recipe/experiment_type_registry.py

+    types = _load_types_from_dir(BUNDLED_EXPERIMENT_TYPES_DIR)
+
+    if project_dir is not None:
+        user_dir = Path(project_dir) / ".autoskillit" / "experiment-types"


[info] defense: Redundant Path(project_dir) cast — project_dir is already typed Path | None. Use it directly.

Trecek · 2026-04-13T06:10:29Z

tests/recipe/test_experiment_type_registry.py

+    assert set(types.keys()) == EXPECTED_TYPES
+
+
+def test_returns_dict_of_experiment_type_spec() -> None:


[info] tests: test_returns_dict_of_experiment_type_spec only asserts isinstance(spec, ExperimentTypeSpec) — already implicitly covered by every test that accesses typed attributes. Adds negligible signal.

tests/skills/test_review_design_contracts.py

Trecek

AutoSkillit review found 10 blocking issues (see inline comments). Verdict: changes_requested.

…variance_protocol Line 85 was a duplicate of line 83 (causal_structure == "H"). Corrected to assert variance_protocol == "L" per causal_inference.yaml.

After `assert not missing` guarantees `known_dims ⊆ dims_found`, the `assert dims_found >= known_dims` is a strict superset check that can never independently fail. Removed the redundant assertion.

…_all_experiment_types - Remove unused _BUNDLED_TYPE_ORDER list (dead code; never referenced) - Replace _load_bundled_types() raw YAML helper with load_all_experiment_types() so guards tests exercise the registry API instead of duplicating YAML parsing - Update data dict accesses to ExperimentTypeSpec attribute access

A ValueError from _parse_experiment_type or IO error from load_yaml propagated and aborted loading all remaining files. Now each file is wrapped in try/except — malformed files are logged and skipped so one bad YAML file cannot destroy the entire directory load.

… by registry tests test_all_eight_dimensions_present in test_experiment_type_registry.py covers the same dimension completeness check via load_all_experiment_types(). The contracts test was a weaker raw-YAML duplicate; removing it consolidates coverage in the registry test suite.

Under xdist parallel load (4 workers, ~8000 tests), the subprocess started by test_sigterm_writes_scenario_json never fully starts within 5 seconds — empty stdout/stderr confirms the process hasn't initialized before SIGTERM arrives. Increasing the deadline to 15s tolerates slow subprocess startup under heavy CPU contention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek and others added 7 commits April 12, 2026 22:10

feat(recipe): export ExperimentTypeSpec and load_all_experiment_types

85cca81

Adds ExperimentTypeSpec and load_all_experiment_types to the public surface of autoskillit.recipe via __init__.py and __all__.

style: apply ruff format and lint auto-fixes from pre-commit

6a449ea

Import ordering in recipe/__init__.py (experiment_type_registry sorted alphabetically among imports) and long assert strings in test file reformatted to ruff-compliant style.

fix: correct YAML directory path in test_weight_matrix_has_eight_dime…

bb3f931

…nsions parents[2] points to src/autoskillit/, not the project root Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek commented Apr 13, 2026

View reviewed changes

Trecek added 5 commits April 12, 2026 23:17

fix(review): fix copy-paste duplicate assertion for causal_inference …

e409c4b

…variance_protocol Line 85 was a duplicate of line 83 (causal_structure == "H"). Corrected to assert variance_protocol == "L" per causal_inference.yaml.

fix(review): remove redundant superset assertion after not-missing check

8dbaace

After `assert not missing` guarantees `known_dims ⊆ dims_found`, the `assert dims_found >= known_dims` is a strict superset check that can never independently fail. Removed the redundant assertion.

Trecek enabled auto-merge April 13, 2026 06:34

Trecek added this pull request to the merge queue Apr 13, 2026

Merged via the queue into integration with commit a16e33c Apr 13, 2026
2 checks passed

Trecek deleted the experiment-type-registry-yaml-driven-extensible/788 branch April 13, 2026 06:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Experiment Type Registry — YAML-Driven, Extensible#795

feat: Experiment Type Registry — YAML-Driven, Extensible#795
Trecek merged 13 commits intointegrationfrom
experiment-type-registry-yaml-driven-extensible/788

Trecek commented Apr 13, 2026 •

edited

Loading

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Uh oh!

Trecek Apr 13, 2026

Uh oh!

Uh oh!

Trecek Apr 13, 2026

Uh oh!

Trecek Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Trecek Apr 13, 2026

Uh oh!

Trecek Apr 13, 2026

Uh oh!

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		assert exp.l1_severity["hypothesis_falsifiability"] == "info"


		def test_no_project_dir_returns_bundled_only() -> None:

		assert set(types.keys()) == EXPECTED_TYPES


		def test_returns_dict_of_experiment_type_spec() -> None:

Conversation

Trecek commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture Impact

Data Lineage Diagram

Module Dependency Diagram

Implementation Plan

Token Usage Summary

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Trecek Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Trecek Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Trecek Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Trecek commented Apr 13, 2026 •

edited

Loading