Skip to content

feat: Experiment Type Registry — YAML-Driven, Extensible#795

Merged
Trecek merged 13 commits intointegrationfrom
experiment-type-registry-yaml-driven-extensible/788
Apr 13, 2026
Merged

feat: Experiment Type Registry — YAML-Driven, Extensible#795
Trecek merged 13 commits intointegrationfrom
experiment-type-registry-yaml-driven-extensible/788

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 13, 2026

Summary

Externalize the 5 experiment types currently hardcoded in
src/autoskillit/skills_extended/review-design/SKILL.md into YAML files under
src/autoskillit/recipes/experiment-types/. Add a loader module
src/autoskillit/recipe/experiment_type_registry.py that merges bundled types with
user-defined overrides from .autoskillit/experiment-types/ (full replacement — no
merging). Update review-design/SKILL.md to load the registry at Step 0 and reference
loaded data throughout, replacing all hardcoded tables. Behavioral change to existing
5 types: zero.

Architecture Impact

Data Lineage Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph Storage ["YAML Storage (Disk)"]
        direction TB
        BUNDLED["★ recipes/experiment-types/<br/>━━━━━━━━━━<br/>benchmark.yaml<br/>causal_inference.yaml<br/>configuration_study.yaml<br/>exploratory.yaml<br/>robustness_audit.yaml"]
        USER["  .autoskillit/experiment-types/<br/>━━━━━━━━━━<br/>optional user overrides<br/>same-name = full replace"]
    end

    subgraph Parsing ["★ Transformation Pipeline (experiment_type_registry.py)"]
        direction TB
        LOAD_YAML["load_yaml(path)<br/>━━━━━━━━━━<br/>core.io YAML reader<br/>file → raw dict"]
        PARSE["★ _parse_experiment_type()<br/>━━━━━━━━━━<br/>validates 'name' field<br/>dict → ExperimentTypeSpec<br/>(name, triggers, weights,<br/>lenses, red_team, l1_sev)"]
        SCAN_BUNDLED["★ _load_types_from_dir(bundled)<br/>━━━━━━━━━━<br/>glob *.yaml, sorted<br/>→ dict[name, spec]"]
        SCAN_USER["★ _load_types_from_dir(user)<br/>━━━━━━━━━━<br/>glob *.yaml if dir exists<br/>→ dict[name, spec]"]
        MERGE["★ load_all_experiment_types()<br/>━━━━━━━━━━<br/>bundled dict +<br/>dict.update(user)<br/>user replaces on name clash"]
    end

    subgraph Registry ["In-Memory Registry (Primary Storage)"]
        SPECS["★ ExperimentTypeSpec instances<br/>━━━━━━━━━━<br/>name: str<br/>classification_triggers: list<br/>dimension_weights: dict<br/>applicable_lenses: dict<br/>red_team_focus: dict<br/>l1_severity: dict"]
        REG_MAP["★ dict[str, ExperimentTypeSpec]<br/>━━━━━━━━━━<br/>key = experiment type name<br/>source of truth for runtime"]
    end

    subgraph PublicAPI ["● Public API (recipe/__init__.py)"]
        RECIPE_INIT["● recipe/__init__.py<br/>━━━━━━━━━━<br/>re-exports ExperimentTypeSpec<br/>re-exports load_all_experiment_types<br/>in autoskillit.recipe.__all__"]
    end

    subgraph Consumer ["● Skill Consumer (review-design Step 0)"]
        STEP0["● review-design/SKILL.md<br/>Step 0d<br/>━━━━━━━━━━<br/>calls load_all_experiment_types()<br/>reads classification_triggers<br/>matches experiment_type<br/>reads dimension_weights<br/>for subagent spawn decisions"]
    end

    %% PRIMARY DATA FLOWS %%
    BUNDLED -->|"load_yaml(path)"| LOAD_YAML
    USER -->|"load_yaml(path) if exists"| LOAD_YAML
    LOAD_YAML -->|"raw dict"| PARSE
    PARSE -->|"ExperimentTypeSpec"| SCAN_BUNDLED
    PARSE -->|"ExperimentTypeSpec"| SCAN_USER
    SCAN_BUNDLED -->|"dict[name→spec]"| MERGE
    SCAN_USER -->|"dict[name→spec]"| MERGE
    MERGE -->|"creates"| SPECS
    SPECS -->|"keyed by name"| REG_MAP
    REG_MAP -->|"re-exported via"| RECIPE_INIT
    REG_MAP -->|"read by Step 0"| STEP0

    %% CLASS ASSIGNMENTS %%
    class BUNDLED newComponent;
    class USER cli;
    class LOAD_YAML handler;
    class PARSE newComponent;
    class SCAN_BUNDLED newComponent;
    class SCAN_USER newComponent;
    class MERGE newComponent;
    class SPECS newComponent;
    class REG_MAP stateNode;
    class RECIPE_INIT integration;
    class STEP0 phase;
Loading

Module Dependency Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph Tests ["TESTS"]
        direction LR
        TEST_REG["★ test_experiment_type_registry.py<br/>━━━━━━━━━━<br/>tests/recipe/<br/>Imports: ExperimentTypeSpec,<br/>load_all_experiment_types"]
    end

    subgraph L2_Recipe ["L2 — RECIPE PACKAGE"]
        direction TB
        INIT["● recipe/__init__.py<br/>━━━━━━━━━━<br/>Re-exports: ExperimentTypeSpec,<br/>load_all_experiment_types<br/>(public gateway)"]
        ETR["★ recipe/experiment_type_registry.py<br/>━━━━━━━━━━<br/>ExperimentTypeSpec (dataclass)<br/>load_all_experiment_types()<br/>_load_types_from_dir()<br/>_parse_experiment_type()"]
        OTHER["recipe/{io,schema,validator,…}.py<br/>━━━━━━━━━━<br/>Unchanged siblings<br/>(context only)"]
    end

    subgraph L0_Core ["L0 — CORE (zero autoskillit imports)"]
        direction LR
        IO["core/io.py<br/>━━━━━━━━━━<br/>load_yaml()"]
        PATHS["core/paths.py<br/>━━━━━━━━━━<br/>pkg_root()"]
        CORE_INIT["core/__init__.py<br/>━━━━━━━━━━<br/>Re-export hub<br/>(high fan-in)"]
    end

    subgraph DataFiles ["BUNDLED DATA (recipes/experiment-types/)"]
        direction LR
        YAMLS["★ benchmark.yaml<br/>★ causal_inference.yaml<br/>★ configuration_study.yaml<br/>★ exploratory.yaml<br/>★ robustness_audit.yaml<br/>━━━━━━━━━━<br/>Loaded at runtime via<br/>BUNDLED_EXPERIMENT_TYPES_DIR"]
    end

    subgraph Stdlib ["LAYER 0 — STDLIB / EXTERNAL"]
        direction LR
        STDLIB["dataclasses, pathlib<br/>━━━━━━━━━━<br/>No new third-party deps"]
    end

    %% TEST → RECIPE (direct module, bypasses package gateway) %%
    TEST_REG -->|"imports directly from<br/>recipe.experiment_type_registry"| ETR

    %% RECIPE/__init__ → new module (re-export) %%
    INIT -->|"re-exports<br/>ExperimentTypeSpec,<br/>load_all_experiment_types"| ETR

    %% new module → core (L0 dependency — valid downward) %%
    ETR -->|"from autoskillit.core import<br/>load_yaml, pkg_root"| CORE_INIT
    CORE_INIT -->|"re-exports"| IO
    CORE_INIT -->|"re-exports"| PATHS

    %% new module → stdlib %%
    ETR -->|"dataclasses, pathlib"| STDLIB

    %% new module → bundled YAML files (runtime filesystem read) %%
    ETR -.->|"runtime glob:<br/>BUNDLED_EXPERIMENT_TYPES_DIR"| YAMLS

    %% unchanged siblings (context only — no new imports to/from ETR) %%
    OTHER -.->|"no coupling to<br/>new module"| ETR

    %% CLASS ASSIGNMENTS %%
    class TEST_REG newComponent;
    class ETR newComponent;
    class INIT handler;
    class OTHER phase;
    class CORE_INIT stateNode;
    class IO,PATHS handler;
    class YAMLS output;
    class STDLIB integration;
Loading

Closes #788

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-788-20260412-214910-284400/.autoskillit/temp/make-plan/experiment_type_registry_plan_2026-04-12_215800.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step uncached output cache_read cache_write count time
plan 167 23.1k 760.6k 61.4k 1 11m 16s
verify 134 12.6k 734.0k 49.3k 1 3m 50s
implement 2.2k 16.2k 1.8M 54.7k 1 5m 58s
fix 270 21.3k 1.8M 66.6k 1 11m 41s
prepare_pr 54 5.9k 165.2k 28.6k 1 1m 29s
run_arch_lenses 146 10.4k 494.5k 58.3k 2 2m 58s
compose_pr 59 5.5k 185.0k 23.9k 1 1m 21s
Total 3.0k 95.0k 6.0M 342.7k 38m 36s

Trecek and others added 7 commits April 12, 2026 22:10
Creates src/autoskillit/recipes/experiment-types/ with benchmark.yaml,
configuration_study.yaml, causal_inference.yaml, robustness_audit.yaml,
and exploratory.yaml — each containing classification_triggers,
dimension_weights, applicable_lenses, red_team_focus, and l1_severity.

Values match the hardcoded tables in review-design/SKILL.md exactly.
Adds ExperimentTypeSpec dataclass and load_all_experiment_types() loader
that merges bundled YAML types with optional user-defined overrides from
.autoskillit/experiment-types/ (full replacement, no field merging).

Includes comprehensive test suite covering all 5 bundled types, weight
values, override semantics, and user extension via new type names.
Adds ExperimentTypeSpec and load_all_experiment_types to the public
surface of autoskillit.recipe via __init__.py and __all__.
…e registry

Step 0 now loads the registry from bundled YAML files and optional user
overrides. All hardcoded classification tables, dimension weight matrices,
l1_severity rubrics, and red_team_focus type-specific lists replaced with
references to registry data. RT_MAX_SEVERITY built dynamically from
registry. Behavioral change to existing 5 types: zero.
Import ordering in recipe/__init__.py (experiment_type_registry sorted
alphabetically among imports) and long assert strings in test file
reformatted to ruff-compliant style.
…mplementation

- Bump recipe/ subpackage exemption from 30 to 31 (added experiment_type_registry.py)
- Add experiment_type_registry.py to CLAUDE.md architecture section
- Update SKILL.md dimension weights to use weight=H/M/L/S notation for test compatibility
- Add l1_severity calibration anchors in Step 2 (causal_inference/benchmark/exploratory)
- Update test_weight_matrix_has_eight_dimensions to check bundled YAML files instead of removed table
- Update test_data_acquisition_not_l_weight to check YAML files instead of removed table
- Update test_agent_implementability_weight_row to check YAML files instead of removed table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nsions

parents[2] points to src/autoskillit/, not the project root

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

assert exp.l1_severity["hypothesis_falsifiability"] == "info"


def test_no_project_dir_returns_bundled_only() -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] slop: test_no_project_dir_returns_bundled_only duplicates test_all_bundled_types_present — both call load_all_experiment_types() with no args and assert the same key set. Remove or merge the duplicate.

assert "M" in line or "H" in line, (
"data_acquisition must have M or H weight in at least one experiment type"
)
"""data_acquisition must be M-weight minimum to influence verdict in at least one type."""
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: Weak assertion — test_data_acquisition_not_l_weight uses an early-return pattern that passes as soon as a single type has M/H weight. It does not verify the full expected distribution per type. Consider asserting specific weights per type (as done in test_agent_implementability_weight_row).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The test name test_data_acquisition_not_l_weight and docstring both specify 'at least one type' semantics: the guard is that data_acquisition is NOT universally L-weight. The early-return pattern is the correct implementation of that contract (see commit 35ce6f5 where this was deliberately rewritten from SKILL.md text parsing to this YAML-based 'at least one' check). Changing to a full distribution assertion would require encoding expected weights for all 5 types and is a design decision outside this PR's scope.

types = _load_types_from_dir(BUNDLED_EXPERIMENT_TYPES_DIR)

if project_dir is not None:
user_dir = Path(project_dir) / ".autoskillit" / "experiment-types"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] defense: Redundant Path(project_dir) cast — project_dir is already typed Path | None. Use it directly.

assert set(types.keys()) == EXPECTED_TYPES


def test_returns_dict_of_experiment_type_spec() -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: test_returns_dict_of_experiment_type_spec only asserts isinstance(spec, ExperimentTypeSpec) — already implicitly covered by every test that accesses typed attributes. Adds negligible signal.

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 10 blocking issues (see inline comments). Verdict: changes_requested.

Trecek added 5 commits April 12, 2026 23:17
…variance_protocol

Line 85 was a duplicate of line 83 (causal_structure == "H"). Corrected to
assert variance_protocol == "L" per causal_inference.yaml.
After `assert not missing` guarantees `known_dims ⊆ dims_found`, the
`assert dims_found >= known_dims` is a strict superset check that can
never independently fail. Removed the redundant assertion.
…_all_experiment_types

- Remove unused _BUNDLED_TYPE_ORDER list (dead code; never referenced)
- Replace _load_bundled_types() raw YAML helper with load_all_experiment_types()
  so guards tests exercise the registry API instead of duplicating YAML parsing
- Update data dict accesses to ExperimentTypeSpec attribute access
A ValueError from _parse_experiment_type or IO error from load_yaml
propagated and aborted loading all remaining files. Now each file is
wrapped in try/except — malformed files are logged and skipped so one
bad YAML file cannot destroy the entire directory load.
… by registry tests

test_all_eight_dimensions_present in test_experiment_type_registry.py covers
the same dimension completeness check via load_all_experiment_types(). The
contracts test was a weaker raw-YAML duplicate; removing it consolidates
coverage in the registry test suite.
@Trecek Trecek enabled auto-merge April 13, 2026 06:34
Under xdist parallel load (4 workers, ~8000 tests), the subprocess
started by test_sigterm_writes_scenario_json never fully starts
within 5 seconds — empty stdout/stderr confirms the process hasn't
initialized before SIGTERM arrives. Increasing the deadline to 15s
tolerates slow subprocess startup under heavy CPU contention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek added this pull request to the merge queue Apr 13, 2026
Merged via the queue into integration with commit a16e33c Apr 13, 2026
2 checks passed
@Trecek Trecek deleted the experiment-type-registry-yaml-driven-extensible/788 branch April 13, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant