feat(recipe): extensible criteria values via catalog-driven runtime registry (#998)#999
Merged
Conversation
…egistry Fixes #998. Adds a process-global criteria registry that the overlay loader seeds from every overlay's spec.criteria with an Origin tag (embedded vs external). ParseCriteria{Service,Accelerator,Intent,OS,Platform}Type keep their canonical/aliased switch as the fast path; the default branch now consults the registry, so a `--data` overlay declaring `service: ncp-internal` (or any other proprietary value) becomes a valid CLI / API input at runtime — no fork, no rebuild. Composes with three strict-mode toggles (OR): the new `--criteria-strict` flag, `spec.recipe.criteriaStrict: true` in `--config`, and `AICR_CRITERIA_STRICT=1` env var. Strict mode hides external-origin registry entries so the upstream OSS catalog cannot accidentally start depending on internal-only values; `make qualify` exports the env var on the unit-test step so OSS CI gates this automatically. Three coupled pieces, single PR per the issue's scope: 1. Registry — pkg/recipe/criteria_registry.go - Singleton DefaultRegistry() with sync.RWMutex - Origin (Embedded / External) tracked per (field, value) - Embedded never downgrades on subsequent external Register - SetStrict + AICR_CRITERIA_STRICT env baseline - Reset() for tests 2. Parser fallback — pkg/recipe/criteria.go - Default branch in each Parse*Type consults DefaultRegistry().Has - GetCriteria*Types unchanged (static OSS list, stable contract) - New AllCriteria*Types returns static + registry union (used by CLI --help / autocomplete) 3. CLI deferred validation — pkg/cli/recipe.go - --criteria-strict flag added - recipe.LoadCatalog(ctx) called right after initDataProvider so the registry is populated *before* any ParseCriteria*Type lookup (the metadata store is otherwise lazy) - cfg.Recipe().IsCriteriaStrict() honors spec.recipe.criteriaStrict Tests ----- Unit: - pkg/recipe/criteria_registry_test.go covers Register / Has / HasEmbedded / Values / Reset / strict mode / env var parsing / concurrent access / seedCriteriaRegistry - pkg/recipe/criteria_registry_parse_test.go covers Parse*Type fallback per dimension, strict rejection of external, AllCriteria*Types union, Criteria.Validate registry path - pkg/config/accessors_test.go covers IsCriteriaStrict tri-state (nil / explicit false / explicit true) Integration: - tests/chainsaw/cli/criteria-registry/ exercises the full CLI flow: rejection without --data, admission with --data, strict-mode rejection via --criteria-strict flag, AICR_CRITERIA_STRICT env, and --config spec.recipe.criteriaStrict Docs ---- - New: docs/integrator/data-extension.md — generic walkthrough of --data extension (layout, precedence, criteria registry, strict mode, debugging, distribution patterns) - Updated: docs/contributor/data.md — full registry design (rationale, origin tracking, strict mode, seeding, LoadCatalog, API surface table) - Updated: docs/integrator/recipe-development.md — points to new data-extension.md, notes criteria registry under External Data - Updated: docs/user/cli-reference.md — --criteria-strict flag and AICR_CRITERIA_STRICT env var documented; recipe flag tables note --data extends listed values at runtime - Updated: docs/integrator/index.md + docs/index.yml — new doc page Coverage -------- pkg/recipe 90.0%, pkg/config and pkg/cli unchanged. make qualify passes including the new chainsaw test; no regressions in the existing 21 chainsaw tests.
Contributor
|
🌿 Preview your docs: https://nvidia-preview-feat-criteria-registry-998.docs.buildwithfern.com/aicr |
Contributor
Coverage Report ✅
Coverage BadgeMerging this branch changes the coverage (1 decrease, 2 increase)
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
This comment was marked as resolved.
This comment was marked as resolved.
… slug Self-review of #999 found two issues: 1. pkg/cli/recipe.go masked LoadCatalog's inner error code by wrapping with ErrCodeInternal. The underlying loadMetadataStore returns ErrCodeInvalidRequest for malformed YAML, dependency validation failures, etc. — those codes should propagate to callers (CLI exit code, API status). Switch to errors.PropagateOrWrap, which preserves the inner StructuredError's code when present and only falls back to ErrCodeInternal for unstructured errors. Matches CLAUDE.md rule against re-wrapping errors that already have proper codes. 2. docs/integrator/data-extension.md heading "## Adding a criteria value (the registry path)" produces a parens-derived GitHub slug exactly of the kind CLAUDE.md's "Slug gotchas when promoting" section warns against. Simplify to "## Adding a criteria value" (slug "adding-a-criteria-value") and update the cross-link from recipe-development.md to match.
Five issues raised by CodeRabbit, all resolved:
1. (Major) pkg/recipe/metadata_store.go — registry mutation was
non-transactional: seedCriteriaRegistry ran during the WalkDir
callback, so a malformed file later in the walk would leave
partially-seeded values in the global registry. Stage entries in
a per-build slice during the walk and commit them only after the
walk completes, the base recipe is found, and dependency
validation passes. Added TestLoadCatalog_DoesNotLeakRegistryOnFailure
to lock the guarantee in: a fixture with one well-formed overlay
followed by a malformed one is rejected by LoadCatalog, and the
well-formed overlay's criteria value is verified absent from the
registry afterwards.
2. (Major) pkg/recipe/criteria_registry.go — Register panicked on a
zero-value CriteriaRegistry (var r CriteriaRegistry, or
&CriteriaRegistry{}) because the inner map was nil. The exported
type made this reachable for external callers even though
DefaultRegistry / newCriteriaRegistry initialize correctly. Added
defensive nil-map init under the existing lock + a regression test
(TestCriteriaRegistry_RegisterOnZeroValue).
3. (Major) tests/chainsaw/cli/criteria-registry — the four rejection
cases asserted only that exit code \!= 0, which would have passed
on any incidental runtime failure unrelated to criteria validation.
Each rejection step now captures stderr to a file, runs the binary
inside `if`, and greps for both "invalid service type" and the
specific rejected value ("ncp-internal") before declaring success.
chainsaw `check:` is now `($error == null): true` since the script
itself does the substantive verification.
4. (Minor) docs/contributor/data.md — corrected the description of
the seed helper's source→origin mapping. The text incorrectly said
non-embedded sources map to OriginEmbedded; updated to describe the
strict-mode-safe default (only "embedded" → OriginEmbedded, all
other sources → OriginExternal). Also took the opportunity to
tighten the code in seedCriteriaRegistry so the default behavior
matches: any future DataProvider source category that isn't the
literal "embedded" sentinel maps to OriginExternal, so strict mode
fails closed if a new source class is introduced later. Test table
updated accordingly.
5. (Minor) docs/integrator/data-extension.md — added `text` language
identifiers to two fenced code blocks containing directory listings
to satisfy markdownlint MD040.
Verified: `make qualify` passes (22 chainsaw tests, no vuln, headers
OK); registry / parse / config / chainsaw test suites all green;
golangci-lint zero issues.
lalitadithya
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make
service/accelerator/intent/os/platformcriteria values extensible at runtime via the--dataoverlay catalog. A--dataoverlay declaringspec.criteria.service: ncp-internal(or any other proprietary value) becomes a valid CLI / API input — no fork, no rebuild.Motivation / Context
The criteria validation switches in
pkg/recipe/criteria.gowere hardcoded enum-style: an unknown value returnedErrCodeInvalidRequestbefore the overlay catalog was even consulted. That made it impossible to add undisclosed NCPs, proprietary platforms (runai,nvmesh), or new GPU SKUs via--datawithout forking AICR or contributing the value upstream.Fixes: #998
Related: N/A
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)pkg/recipe)docs/,examples/)pkg/config(newcriteriaStrictfield + accessor)Implementation Notes
Three coupled pieces; they have no observable behavior unless landed together, hence the single-PR scope agreed on the issue.
Registry (
pkg/recipe/criteria_registry.go) — process-global singleton,sync.RWMutex-guarded, per-(field, value)Origintag (Embedded/External). Embedded never downgrades on subsequent externalRegister.SetStrict+AICR_CRITERIA_STRICTenv var baseline.Parser fallback (
pkg/recipe/criteria.go) — eachParseCriteria*Type'sdefault:branch consultsDefaultRegistry().Has(field, value)before returning the existing error.GetCriteria*Types()stays unchanged (static OSS list); newAllCriteria*Types()returns the static + registry union for CLI--help/ autocomplete.CLI deferred validation + LoadCatalog (
pkg/cli/recipe.go) —--criteria-strictflag added;recipe.LoadCatalog(ctx)called right afterinitDataProviderso the registry is populated before anyParseCriteria*Typelookup (the metadata store is otherwise lazy).cfg.Recipe().IsCriteriaStrict()honorsspec.recipe.criteriaStrictin--config. Strict mode composes via OR across all three sources (flag, config, env).Operational notes:
--dataflag wired, so it uses the embedded catalog only — registry stays OSS-clean by construction. If--datais ever added to the server, the caller MUST invokerecipe.LoadCatalog(ctx)immediately afterrecipe.SetDataProvider(...)for the same eager-load reason documented for the CLI; the registry design doc calls this out.make qualify's unit-test step now exportsAICR_CRITERIA_STRICT=1so the upstream catalog cannot accidentally start depending on internal-only values during dev.Testing
Result: passes — 22 chainsaw tests (incl. the new
cli-criteria-registry), unit tests pass withAICR_CRITERIA_STRICT=1in the env, total coverage 76.5% (≥ 75% threshold), no vulnerabilities, license headers OK.Test coverage added:
pkg/recipe/criteria_registry_test.go— registry Register / Has / HasEmbedded / Values / Reset / strict-mode / env-var parsing / concurrent access /seedCriteriaRegistrypkg/recipe/criteria_registry_parse_test.go— fallback per dimension, strict rejection of external,AllCriteria*Typesunion,Criteria.Validateregistry pathpkg/config/accessors_test.go—IsCriteriaStricttri-state (nil / explicit false / explicit true)tests/chainsaw/cli/criteria-registry/— end-to-end: rejection without--data, admission with--data(service + platform dimensions), strict rejection via flag, env var, and config-filePer-package coverage delta on changed packages:
pkg/recipe: 90.0% (no regression)pkg/cli,pkg/config: unchangedRisk Assessment
Rollout notes: Backward compatible.
GetCriteria*Types()returns the same OSS list; only the newAllCriteria*Types()family is registry-aware, so consumers that explicitly want the OSS-only list don't change behavior.--data, no--criteria-strict) is unchanged.make qualify'sAICR_CRITERIA_STRICT=1ever exposes accidental drift in the embedded catalog (e.g., an overlay declaring an external-only value), the test would fail — that's the intent, and it's the change's safety net for the OSS catalog.Checklist
make testwith-race)make lint)git commit -S)