feat: Experiment Type Registry — YAML-Driven, Extensible#795
feat: Experiment Type Registry — YAML-Driven, Extensible#795Trecek merged 13 commits intointegrationfrom
Conversation
Creates src/autoskillit/recipes/experiment-types/ with benchmark.yaml, configuration_study.yaml, causal_inference.yaml, robustness_audit.yaml, and exploratory.yaml — each containing classification_triggers, dimension_weights, applicable_lenses, red_team_focus, and l1_severity. Values match the hardcoded tables in review-design/SKILL.md exactly.
Adds ExperimentTypeSpec dataclass and load_all_experiment_types() loader that merges bundled YAML types with optional user-defined overrides from .autoskillit/experiment-types/ (full replacement, no field merging). Includes comprehensive test suite covering all 5 bundled types, weight values, override semantics, and user extension via new type names.
Adds ExperimentTypeSpec and load_all_experiment_types to the public surface of autoskillit.recipe via __init__.py and __all__.
…e registry Step 0 now loads the registry from bundled YAML files and optional user overrides. All hardcoded classification tables, dimension weight matrices, l1_severity rubrics, and red_team_focus type-specific lists replaced with references to registry data. RT_MAX_SEVERITY built dynamically from registry. Behavioral change to existing 5 types: zero.
Import ordering in recipe/__init__.py (experiment_type_registry sorted alphabetically among imports) and long assert strings in test file reformatted to ruff-compliant style.
…mplementation - Bump recipe/ subpackage exemption from 30 to 31 (added experiment_type_registry.py) - Add experiment_type_registry.py to CLAUDE.md architecture section - Update SKILL.md dimension weights to use weight=H/M/L/S notation for test compatibility - Add l1_severity calibration anchors in Step 2 (causal_inference/benchmark/exploratory) - Update test_weight_matrix_has_eight_dimensions to check bundled YAML files instead of removed table - Update test_data_acquisition_not_l_weight to check YAML files instead of removed table - Update test_agent_implementability_weight_row to check YAML files instead of removed table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nsions parents[2] points to src/autoskillit/, not the project root Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: changes_requested
| assert exp.l1_severity["hypothesis_falsifiability"] == "info" | ||
|
|
||
|
|
||
| def test_no_project_dir_returns_bundled_only() -> None: |
There was a problem hiding this comment.
[warning] slop: test_no_project_dir_returns_bundled_only duplicates test_all_bundled_types_present — both call load_all_experiment_types() with no args and assert the same key set. Remove or merge the duplicate.
| assert "M" in line or "H" in line, ( | ||
| "data_acquisition must have M or H weight in at least one experiment type" | ||
| ) | ||
| """data_acquisition must be M-weight minimum to influence verdict in at least one type.""" |
There was a problem hiding this comment.
[warning] tests: Weak assertion — test_data_acquisition_not_l_weight uses an early-return pattern that passes as soon as a single type has M/H weight. It does not verify the full expected distribution per type. Consider asserting specific weights per type (as done in test_agent_implementability_weight_row).
There was a problem hiding this comment.
Investigated — this is intentional. The test name test_data_acquisition_not_l_weight and docstring both specify 'at least one type' semantics: the guard is that data_acquisition is NOT universally L-weight. The early-return pattern is the correct implementation of that contract (see commit 35ce6f5 where this was deliberately rewritten from SKILL.md text parsing to this YAML-based 'at least one' check). Changing to a full distribution assertion would require encoding expected weights for all 5 types and is a design decision outside this PR's scope.
| types = _load_types_from_dir(BUNDLED_EXPERIMENT_TYPES_DIR) | ||
|
|
||
| if project_dir is not None: | ||
| user_dir = Path(project_dir) / ".autoskillit" / "experiment-types" |
There was a problem hiding this comment.
[info] defense: Redundant Path(project_dir) cast — project_dir is already typed Path | None. Use it directly.
| assert set(types.keys()) == EXPECTED_TYPES | ||
|
|
||
|
|
||
| def test_returns_dict_of_experiment_type_spec() -> None: |
There was a problem hiding this comment.
[info] tests: test_returns_dict_of_experiment_type_spec only asserts isinstance(spec, ExperimentTypeSpec) — already implicitly covered by every test that accesses typed attributes. Adds negligible signal.
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review found 10 blocking issues (see inline comments). Verdict: changes_requested.
…variance_protocol Line 85 was a duplicate of line 83 (causal_structure == "H"). Corrected to assert variance_protocol == "L" per causal_inference.yaml.
After `assert not missing` guarantees `known_dims ⊆ dims_found`, the `assert dims_found >= known_dims` is a strict superset check that can never independently fail. Removed the redundant assertion.
…_all_experiment_types - Remove unused _BUNDLED_TYPE_ORDER list (dead code; never referenced) - Replace _load_bundled_types() raw YAML helper with load_all_experiment_types() so guards tests exercise the registry API instead of duplicating YAML parsing - Update data dict accesses to ExperimentTypeSpec attribute access
A ValueError from _parse_experiment_type or IO error from load_yaml propagated and aborted loading all remaining files. Now each file is wrapped in try/except — malformed files are logged and skipped so one bad YAML file cannot destroy the entire directory load.
… by registry tests test_all_eight_dimensions_present in test_experiment_type_registry.py covers the same dimension completeness check via load_all_experiment_types(). The contracts test was a weaker raw-YAML duplicate; removing it consolidates coverage in the registry test suite.
Under xdist parallel load (4 workers, ~8000 tests), the subprocess started by test_sigterm_writes_scenario_json never fully starts within 5 seconds — empty stdout/stderr confirms the process hasn't initialized before SIGTERM arrives. Increasing the deadline to 15s tolerates slow subprocess startup under heavy CPU contention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Externalize the 5 experiment types currently hardcoded in
src/autoskillit/skills_extended/review-design/SKILL.mdinto YAML files undersrc/autoskillit/recipes/experiment-types/. Add a loader modulesrc/autoskillit/recipe/experiment_type_registry.pythat merges bundled types withuser-defined overrides from
.autoskillit/experiment-types/(full replacement — nomerging). Update
review-design/SKILL.mdto load the registry at Step 0 and referenceloaded data throughout, replacing all hardcoded tables. Behavioral change to existing
5 types: zero.
Architecture Impact
Data Lineage Diagram
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% flowchart LR %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph Storage ["YAML Storage (Disk)"] direction TB BUNDLED["★ recipes/experiment-types/<br/>━━━━━━━━━━<br/>benchmark.yaml<br/>causal_inference.yaml<br/>configuration_study.yaml<br/>exploratory.yaml<br/>robustness_audit.yaml"] USER[" .autoskillit/experiment-types/<br/>━━━━━━━━━━<br/>optional user overrides<br/>same-name = full replace"] end subgraph Parsing ["★ Transformation Pipeline (experiment_type_registry.py)"] direction TB LOAD_YAML["load_yaml(path)<br/>━━━━━━━━━━<br/>core.io YAML reader<br/>file → raw dict"] PARSE["★ _parse_experiment_type()<br/>━━━━━━━━━━<br/>validates 'name' field<br/>dict → ExperimentTypeSpec<br/>(name, triggers, weights,<br/>lenses, red_team, l1_sev)"] SCAN_BUNDLED["★ _load_types_from_dir(bundled)<br/>━━━━━━━━━━<br/>glob *.yaml, sorted<br/>→ dict[name, spec]"] SCAN_USER["★ _load_types_from_dir(user)<br/>━━━━━━━━━━<br/>glob *.yaml if dir exists<br/>→ dict[name, spec]"] MERGE["★ load_all_experiment_types()<br/>━━━━━━━━━━<br/>bundled dict +<br/>dict.update(user)<br/>user replaces on name clash"] end subgraph Registry ["In-Memory Registry (Primary Storage)"] SPECS["★ ExperimentTypeSpec instances<br/>━━━━━━━━━━<br/>name: str<br/>classification_triggers: list<br/>dimension_weights: dict<br/>applicable_lenses: dict<br/>red_team_focus: dict<br/>l1_severity: dict"] REG_MAP["★ dict[str, ExperimentTypeSpec]<br/>━━━━━━━━━━<br/>key = experiment type name<br/>source of truth for runtime"] end subgraph PublicAPI ["● Public API (recipe/__init__.py)"] RECIPE_INIT["● recipe/__init__.py<br/>━━━━━━━━━━<br/>re-exports ExperimentTypeSpec<br/>re-exports load_all_experiment_types<br/>in autoskillit.recipe.__all__"] end subgraph Consumer ["● Skill Consumer (review-design Step 0)"] STEP0["● review-design/SKILL.md<br/>Step 0d<br/>━━━━━━━━━━<br/>calls load_all_experiment_types()<br/>reads classification_triggers<br/>matches experiment_type<br/>reads dimension_weights<br/>for subagent spawn decisions"] end %% PRIMARY DATA FLOWS %% BUNDLED -->|"load_yaml(path)"| LOAD_YAML USER -->|"load_yaml(path) if exists"| LOAD_YAML LOAD_YAML -->|"raw dict"| PARSE PARSE -->|"ExperimentTypeSpec"| SCAN_BUNDLED PARSE -->|"ExperimentTypeSpec"| SCAN_USER SCAN_BUNDLED -->|"dict[name→spec]"| MERGE SCAN_USER -->|"dict[name→spec]"| MERGE MERGE -->|"creates"| SPECS SPECS -->|"keyed by name"| REG_MAP REG_MAP -->|"re-exported via"| RECIPE_INIT REG_MAP -->|"read by Step 0"| STEP0 %% CLASS ASSIGNMENTS %% class BUNDLED newComponent; class USER cli; class LOAD_YAML handler; class PARSE newComponent; class SCAN_BUNDLED newComponent; class SCAN_USER newComponent; class MERGE newComponent; class SPECS newComponent; class REG_MAP stateNode; class RECIPE_INIT integration; class STEP0 phase;Module Dependency Diagram
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph Tests ["TESTS"] direction LR TEST_REG["★ test_experiment_type_registry.py<br/>━━━━━━━━━━<br/>tests/recipe/<br/>Imports: ExperimentTypeSpec,<br/>load_all_experiment_types"] end subgraph L2_Recipe ["L2 — RECIPE PACKAGE"] direction TB INIT["● recipe/__init__.py<br/>━━━━━━━━━━<br/>Re-exports: ExperimentTypeSpec,<br/>load_all_experiment_types<br/>(public gateway)"] ETR["★ recipe/experiment_type_registry.py<br/>━━━━━━━━━━<br/>ExperimentTypeSpec (dataclass)<br/>load_all_experiment_types()<br/>_load_types_from_dir()<br/>_parse_experiment_type()"] OTHER["recipe/{io,schema,validator,…}.py<br/>━━━━━━━━━━<br/>Unchanged siblings<br/>(context only)"] end subgraph L0_Core ["L0 — CORE (zero autoskillit imports)"] direction LR IO["core/io.py<br/>━━━━━━━━━━<br/>load_yaml()"] PATHS["core/paths.py<br/>━━━━━━━━━━<br/>pkg_root()"] CORE_INIT["core/__init__.py<br/>━━━━━━━━━━<br/>Re-export hub<br/>(high fan-in)"] end subgraph DataFiles ["BUNDLED DATA (recipes/experiment-types/)"] direction LR YAMLS["★ benchmark.yaml<br/>★ causal_inference.yaml<br/>★ configuration_study.yaml<br/>★ exploratory.yaml<br/>★ robustness_audit.yaml<br/>━━━━━━━━━━<br/>Loaded at runtime via<br/>BUNDLED_EXPERIMENT_TYPES_DIR"] end subgraph Stdlib ["LAYER 0 — STDLIB / EXTERNAL"] direction LR STDLIB["dataclasses, pathlib<br/>━━━━━━━━━━<br/>No new third-party deps"] end %% TEST → RECIPE (direct module, bypasses package gateway) %% TEST_REG -->|"imports directly from<br/>recipe.experiment_type_registry"| ETR %% RECIPE/__init__ → new module (re-export) %% INIT -->|"re-exports<br/>ExperimentTypeSpec,<br/>load_all_experiment_types"| ETR %% new module → core (L0 dependency — valid downward) %% ETR -->|"from autoskillit.core import<br/>load_yaml, pkg_root"| CORE_INIT CORE_INIT -->|"re-exports"| IO CORE_INIT -->|"re-exports"| PATHS %% new module → stdlib %% ETR -->|"dataclasses, pathlib"| STDLIB %% new module → bundled YAML files (runtime filesystem read) %% ETR -.->|"runtime glob:<br/>BUNDLED_EXPERIMENT_TYPES_DIR"| YAMLS %% unchanged siblings (context only — no new imports to/from ETR) %% OTHER -.->|"no coupling to<br/>new module"| ETR %% CLASS ASSIGNMENTS %% class TEST_REG newComponent; class ETR newComponent; class INIT handler; class OTHER phase; class CORE_INIT stateNode; class IO,PATHS handler; class YAMLS output; class STDLIB integration;Closes #788
Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/impl-788-20260412-214910-284400/.autoskillit/temp/make-plan/experiment_type_registry_plan_2026-04-12_215800.md🤖 Generated with Claude Code via AutoSkillit
Token Usage Summary