Skip to content

feat(recipe): extensible criteria values via catalog-driven runtime registry (#998)#999

Merged
mchmarny merged 5 commits into
mainfrom
feat/criteria-registry-998
May 21, 2026
Merged

feat(recipe): extensible criteria values via catalog-driven runtime registry (#998)#999
mchmarny merged 5 commits into
mainfrom
feat/criteria-registry-998

Conversation

@mchmarny
Copy link
Copy Markdown
Member

Summary

Make service / accelerator / intent / os / platform criteria values extensible at runtime via the --data overlay catalog. A --data overlay declaring spec.criteria.service: ncp-internal (or any other proprietary value) becomes a valid CLI / API input — no fork, no rebuild.

Motivation / Context

The criteria validation switches in pkg/recipe/criteria.go were hardcoded enum-style: an unknown value returned ErrCodeInvalidRequest before the overlay catalog was even consulted. That made it impossible to add undisclosed NCPs, proprietary platforms (runai, nvmesh), or new GPU SKUs via --data without forking AICR or contributing the value upstream.

Fixes: #998
Related: N/A

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Documentation update

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • Recipe engine / data (pkg/recipe)
  • Docs/examples (docs/, examples/)
  • Other: pkg/config (new criteriaStrict field + accessor)

Implementation Notes

Three coupled pieces; they have no observable behavior unless landed together, hence the single-PR scope agreed on the issue.

  1. Registry (pkg/recipe/criteria_registry.go) — process-global singleton, sync.RWMutex-guarded, per-(field, value) Origin tag (Embedded / External). Embedded never downgrades on subsequent external Register. SetStrict + AICR_CRITERIA_STRICT env var baseline.

  2. Parser fallback (pkg/recipe/criteria.go) — each ParseCriteria*Type's default: branch consults DefaultRegistry().Has(field, value) before returning the existing error. GetCriteria*Types() stays unchanged (static OSS list); new AllCriteria*Types() returns the static + registry union for CLI --help / autocomplete.

  3. CLI deferred validation + LoadCatalog (pkg/cli/recipe.go) — --criteria-strict flag added; recipe.LoadCatalog(ctx) called right after initDataProvider so the registry is populated before any ParseCriteria*Type lookup (the metadata store is otherwise lazy). cfg.Recipe().IsCriteriaStrict() honors spec.recipe.criteriaStrict in --config. Strict mode composes via OR across all three sources (flag, config, env).

Operational notes:

  • The aicrd API server currently has no --data flag wired, so it uses the embedded catalog only — registry stays OSS-clean by construction. If --data is ever added to the server, the caller MUST invoke recipe.LoadCatalog(ctx) immediately after recipe.SetDataProvider(...) for the same eager-load reason documented for the CLI; the registry design doc calls this out.
  • make qualify's unit-test step now exports AICR_CRITERIA_STRICT=1 so the upstream catalog cannot accidentally start depending on internal-only values during dev.

Testing

make qualify

Result: passes — 22 chainsaw tests (incl. the new cli-criteria-registry), unit tests pass with AICR_CRITERIA_STRICT=1 in the env, total coverage 76.5% (≥ 75% threshold), no vulnerabilities, license headers OK.

Test coverage added:

  • pkg/recipe/criteria_registry_test.go — registry Register / Has / HasEmbedded / Values / Reset / strict-mode / env-var parsing / concurrent access / seedCriteriaRegistry
  • pkg/recipe/criteria_registry_parse_test.go — fallback per dimension, strict rejection of external, AllCriteria*Types union, Criteria.Validate registry path
  • pkg/config/accessors_test.goIsCriteriaStrict tri-state (nil / explicit false / explicit true)
  • tests/chainsaw/cli/criteria-registry/ — end-to-end: rejection without --data, admission with --data (service + platform dimensions), strict rejection via flag, env var, and config-file

Per-package coverage delta on changed packages:

  • pkg/recipe: 90.0% (no regression)
  • pkg/cli, pkg/config: unchanged

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert

Rollout notes: Backward compatible.

  • All existing criteria values continue to validate (the static switch arms remain as the fast path).
  • GetCriteria*Types() returns the same OSS list; only the new AllCriteria*Types() family is registry-aware, so consumers that explicitly want the OSS-only list don't change behavior.
  • Default behavior (no --data, no --criteria-strict) is unchanged.
  • If make qualify's AICR_CRITERIA_STRICT=1 ever exposes accidental drift in the embedded catalog (e.g., an overlay declaring an external-only value), the test would fail — that's the intent, and it's the change's safety net for the OSS catalog.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

…egistry

Fixes #998.

Adds a process-global criteria registry that the overlay loader seeds
from every overlay's spec.criteria with an Origin tag (embedded vs
external). ParseCriteria{Service,Accelerator,Intent,OS,Platform}Type
keep their canonical/aliased switch as the fast path; the default branch
now consults the registry, so a `--data` overlay declaring
`service: ncp-internal` (or any other proprietary value) becomes a valid
CLI / API input at runtime — no fork, no rebuild.

Composes with three strict-mode toggles (OR): the new `--criteria-strict`
flag, `spec.recipe.criteriaStrict: true` in `--config`, and
`AICR_CRITERIA_STRICT=1` env var. Strict mode hides external-origin
registry entries so the upstream OSS catalog cannot accidentally start
depending on internal-only values; `make qualify` exports the env var
on the unit-test step so OSS CI gates this automatically.

Three coupled pieces, single PR per the issue's scope:

  1. Registry — pkg/recipe/criteria_registry.go
     - Singleton DefaultRegistry() with sync.RWMutex
     - Origin (Embedded / External) tracked per (field, value)
     - Embedded never downgrades on subsequent external Register
     - SetStrict + AICR_CRITERIA_STRICT env baseline
     - Reset() for tests

  2. Parser fallback — pkg/recipe/criteria.go
     - Default branch in each Parse*Type consults DefaultRegistry().Has
     - GetCriteria*Types unchanged (static OSS list, stable contract)
     - New AllCriteria*Types returns static + registry union
       (used by CLI --help / autocomplete)

  3. CLI deferred validation — pkg/cli/recipe.go
     - --criteria-strict flag added
     - recipe.LoadCatalog(ctx) called right after initDataProvider so
       the registry is populated *before* any ParseCriteria*Type lookup
       (the metadata store is otherwise lazy)
     - cfg.Recipe().IsCriteriaStrict() honors spec.recipe.criteriaStrict

Tests
-----

Unit:
  - pkg/recipe/criteria_registry_test.go covers Register / Has /
    HasEmbedded / Values / Reset / strict mode / env var parsing /
    concurrent access / seedCriteriaRegistry
  - pkg/recipe/criteria_registry_parse_test.go covers Parse*Type
    fallback per dimension, strict rejection of external,
    AllCriteria*Types union, Criteria.Validate registry path
  - pkg/config/accessors_test.go covers IsCriteriaStrict tri-state
    (nil / explicit false / explicit true)

Integration:
  - tests/chainsaw/cli/criteria-registry/ exercises the full CLI flow:
    rejection without --data, admission with --data, strict-mode
    rejection via --criteria-strict flag, AICR_CRITERIA_STRICT env,
    and --config spec.recipe.criteriaStrict

Docs
----

  - New: docs/integrator/data-extension.md — generic walkthrough of
    --data extension (layout, precedence, criteria registry, strict
    mode, debugging, distribution patterns)
  - Updated: docs/contributor/data.md — full registry design
    (rationale, origin tracking, strict mode, seeding, LoadCatalog,
    API surface table)
  - Updated: docs/integrator/recipe-development.md — points to new
    data-extension.md, notes criteria registry under External Data
  - Updated: docs/user/cli-reference.md — --criteria-strict flag and
    AICR_CRITERIA_STRICT env var documented; recipe flag tables note
    --data extends listed values at runtime
  - Updated: docs/integrator/index.md + docs/index.yml — new doc page

Coverage
--------
pkg/recipe 90.0%, pkg/config and pkg/cli unchanged. make qualify passes
including the new chainsaw test; no regressions in the existing 21
chainsaw tests.
@github-actions
Copy link
Copy Markdown
Contributor

@mchmarny mchmarny self-assigned this May 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

Coverage Report ✅

Metric Value
Coverage 76.5%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-76.5%25-green)

Merging this branch changes the coverage (1 decrease, 2 increase)

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/aicr/pkg/cli 66.69% (-0.05%) 👎
github.com/NVIDIA/aicr/pkg/config 93.45% (+0.06%) 👍
github.com/NVIDIA/aicr/pkg/recipe 90.69% (+1.27%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/aicr/pkg/cli/recipe.go 84.13% (-1.12%) 126 (+4) 106 (+2) 20 (+2) 👎
github.com/NVIDIA/aicr/pkg/config/accessors.go 88.89% (+1.39%) 27 (+3) 24 (+3) 3 👍
github.com/NVIDIA/aicr/pkg/config/config.go 0.00% (ø) 0 0 0
github.com/NVIDIA/aicr/pkg/recipe/criteria.go 91.62% (+0.98%) 334 (+35) 306 (+35) 28 👍
github.com/NVIDIA/aicr/pkg/recipe/criteria_registry.go 100.00% (+100.00%) 80 (+80) 80 (+80) 0 🌟
github.com/NVIDIA/aicr/pkg/recipe/metadata_store.go 85.68% (+2.71%) 370 (+6) 317 (+15) 53 (-9) 👍

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

@coderabbitai

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

… slug

Self-review of #999 found two issues:

1. pkg/cli/recipe.go masked LoadCatalog's inner error code by wrapping
   with ErrCodeInternal. The underlying loadMetadataStore returns
   ErrCodeInvalidRequest for malformed YAML, dependency validation
   failures, etc. — those codes should propagate to callers (CLI exit
   code, API status). Switch to errors.PropagateOrWrap, which preserves
   the inner StructuredError's code when present and only falls back to
   ErrCodeInternal for unstructured errors. Matches CLAUDE.md rule
   against re-wrapping errors that already have proper codes.

2. docs/integrator/data-extension.md heading
   "## Adding a criteria value (the registry path)" produces a
   parens-derived GitHub slug exactly of the kind CLAUDE.md's
   "Slug gotchas when promoting" section warns against. Simplify to
   "## Adding a criteria value" (slug "adding-a-criteria-value") and
   update the cross-link from recipe-development.md to match.
coderabbitai[bot]

This comment was marked as resolved.

mchmarny and others added 2 commits May 21, 2026 07:31
Five issues raised by CodeRabbit, all resolved:

1. (Major) pkg/recipe/metadata_store.go — registry mutation was
   non-transactional: seedCriteriaRegistry ran during the WalkDir
   callback, so a malformed file later in the walk would leave
   partially-seeded values in the global registry. Stage entries in
   a per-build slice during the walk and commit them only after the
   walk completes, the base recipe is found, and dependency
   validation passes. Added TestLoadCatalog_DoesNotLeakRegistryOnFailure
   to lock the guarantee in: a fixture with one well-formed overlay
   followed by a malformed one is rejected by LoadCatalog, and the
   well-formed overlay's criteria value is verified absent from the
   registry afterwards.

2. (Major) pkg/recipe/criteria_registry.go — Register panicked on a
   zero-value CriteriaRegistry (var r CriteriaRegistry, or
   &CriteriaRegistry{}) because the inner map was nil. The exported
   type made this reachable for external callers even though
   DefaultRegistry / newCriteriaRegistry initialize correctly. Added
   defensive nil-map init under the existing lock + a regression test
   (TestCriteriaRegistry_RegisterOnZeroValue).

3. (Major) tests/chainsaw/cli/criteria-registry — the four rejection
   cases asserted only that exit code \!= 0, which would have passed
   on any incidental runtime failure unrelated to criteria validation.
   Each rejection step now captures stderr to a file, runs the binary
   inside `if`, and greps for both "invalid service type" and the
   specific rejected value ("ncp-internal") before declaring success.
   chainsaw `check:` is now `($error == null): true` since the script
   itself does the substantive verification.

4. (Minor) docs/contributor/data.md — corrected the description of
   the seed helper's source→origin mapping. The text incorrectly said
   non-embedded sources map to OriginEmbedded; updated to describe the
   strict-mode-safe default (only "embedded" → OriginEmbedded, all
   other sources → OriginExternal). Also took the opportunity to
   tighten the code in seedCriteriaRegistry so the default behavior
   matches: any future DataProvider source category that isn't the
   literal "embedded" sentinel maps to OriginExternal, so strict mode
   fails closed if a new source class is introduced later. Test table
   updated accordingly.

5. (Minor) docs/integrator/data-extension.md — added `text` language
   identifiers to two fenced code blocks containing directory listings
   to satisfy markdownlint MD040.

Verified: `make qualify` passes (22 chainsaw tests, no vuln, headers
OK); registry / parse / config / chainsaw test suites all green;
golangci-lint zero issues.
coderabbitai[bot]

This comment was marked as resolved.

@mchmarny mchmarny enabled auto-merge (squash) May 21, 2026 15:08
@mchmarny mchmarny merged commit c148e96 into main May 21, 2026
33 checks passed
@mchmarny mchmarny deleted the feat/criteria-registry-998 branch May 21, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: extensible criteria values via catalog-driven runtime registry (unblocks --data overlays)

2 participants