Skip to content

feat: add NVIDIA Run:ai as platform-runai#955

Merged
mchmarny merged 3 commits into
NVIDIA:mainfrom
resker:feat/platform-runai
May 19, 2026
Merged

feat: add NVIDIA Run:ai as platform-runai#955
mchmarny merged 3 commits into
NVIDIA:mainfrom
resker:feat/platform-runai

Conversation

@resker
Copy link
Copy Markdown
Contributor

@resker resker commented May 18, 2026

Summary

Add runai as a value for the recipe platform criteria field — peer to the existing dynamo, kubeflow, nim, and slurm values. Enum-plumbing only — no operator components or leaf overlays in this slice.

Motivation / Context

Mirrors the precedent set by #866 (Slinky slurm-operator as platform-slurm). Reserving the canonical runai enum value now lets a follow-up PR ship Run:ai-specific recipes / mixins without re-litigating the enum name. Detailed motivation, alternatives, and success criteria are in the linked issue.

Fixes: #953
Related: #866

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: api/aicr/v1/server.yaml (OpenAPI enum sites)

Implementation Notes

Sites updated (mirrors the slurm precedent surface)

File Change
pkg/recipe/criteria.go CriteriaPlatformRunai const, "runai" parser case, sorted slice entry in GetCriteriaPlatformTypes
pkg/recipe/criteria_test.go Table cases for runai and Runai, updated TestGetCriteriaPlatformTypes expected list
pkg/recipe/doc.go Platform field docstring and the CriteriaPlatform* bullet list
api/aicr/v1/server.yaml Three platform enum sites (GET query param ×2, components/schemas)
docs/user/api-reference.md Platform value enumeration
docs/user/cli-reference.md Platform value enumeration
docs/contributor/api-server.md Platform enum row
docs/contributor/validations.md Platform value enumeration
docs/README.md Glossary Criteria row

Key decisions

Decision Rationale
Reserve --platform runai enum value Public CLI surface — locks in the canonical name now so a recipe matrix PR doesn't have to re-litigate it
Canonical form runai (no colon, no dash) Matches existing single-token enum convention (dynamo, kubeflow, nim, slurm). Branding is "NVIDIA Run:ai"; the colon doesn't survive CLI/URL contexts
No aliases (e.g. run-ai, run:ai) No other platform value has aliases today; aliases are easy to add later if demand emerges
No operator/recipe content in this PR Matches the slurm precedent (#866) — platform plumbing first, leaf overlays / operator plumbing as separate PRs once a Run:ai distribution model is agreed
Alphabetic placement between nim and slurm Matches the existing sort in GetCriteriaPlatformTypes, OpenAPI enums, and every doc enumeration

Testing

# Targeted unit tests
go test -race -v -run "TestParseCriteriaPlatformType|TestGetCriteriaPlatformTypes" ./pkg/recipe/...
# PASS — all subtests including runai (lower + uppercase) green

# Full pkg/recipe package
go test -race ./pkg/recipe/...
# ok  github.com/NVIDIA/aicr/pkg/recipe        2.805s
# ok  github.com/NVIDIA/aicr/pkg/recipe/oskind 1.572s

# Project lint gate
make lint
# Running go vet... (clean)
# Running golangci-lint... 0 issues.
# Ensuring license headers... OK
# OK: AGENTS.md is in sync with .claude/CLAUDE.md
# OK: all doc filenames follow kebab-case convention
# OK: all doc files are MDX-safe
# Verifying chart-version pins (ADR-006)... OK
# Completed Go and YAML lints and ensured license headers

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert
  • Medium — Touches multiple components or has broader impact
  • High — Breaking change, affects critical paths, or complex rollout

Rationale: Identical pattern to the recently-merged #866. Pure additive surface — existing recipes, criteria queries, and CLI/API invocations are unaffected. The new enum value is opt-in via --platform runai and has no effect on any existing matcher path.

Rollout notes: N/A — additive change, no migration steps. Reversible by git revert.

Follow-ups (separate PRs, not blocking this one)

  • Run:ai recipe matrixrunai-* leaf overlays per CSP × accelerator × OS once a distribution model is agreed.
  • Run:ai operator / CRD plumbing (if applicable) — separate slice once it's settled whether AICR ships any Run:ai-side components or only references the upstream Run:ai install model.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

@resker resker requested a review from a team as a code owner May 18, 2026 20:49
@resker resker added the enhancement New feature or request label May 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Welcome to AICR, @resker! Thanks for your first pull request.

Before review, please ensure:

  • All commits are signed off per the DCO
  • CI checks pass (tests, lint, security scan)
  • The PR description explains the why behind your changes

A maintainer will review this soon.

@github-actions
Copy link
Copy Markdown
Contributor

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 64ba23fc-47e2-4355-8484-84c7ebb0b899

📥 Commits

Reviewing files that changed from the base of the PR and between 81e74f3 and cef6cdc.

📒 Files selected for processing (2)
  • docs/user/api-reference.md
  • docs/user/cli-reference.md

📝 Walkthrough

Walkthrough

Adds runai as a supported value for the recipe platform criteria. The change adds a new CriteriaPlatformRunai constant, updates parsing and getter functions and unit tests, extends the OpenAPI platform enums (GET /v1/recipe, GET /v1/query, and components.schemas.Criteria.platform), and updates related user and contributor documentation and package docs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/aicr#866: Added slurm as a platform criteria value using the same enum-plumbing pattern across pkg/recipe/criteria.go, the parser, and documentation.

Suggested labels

area/recipes

Suggested reviewers

  • mchmarny
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the main change: adding NVIDIA Run:ai as a supported platform enum value (runai).
Description check ✅ Passed The PR description comprehensively covers the changeset, providing motivation, implementation details, testing results, and risk assessment.
Linked Issues check ✅ Passed All coding requirements from issue #953 are met: runai constant added, parser updated, tests added, OpenAPI spec updated, all doc sites updated, and tests pass.
Out of Scope Changes check ✅ Passed All changes are directly within scope of issue #953: enum-plumbing only, no operator components or recipe overlays, consistent with slurm precedent.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

@yuanchen8911 yuanchen8911 self-requested a review May 18, 2026 22:30
@yuanchen8911
Copy link
Copy Markdown
Contributor

a minor issue: pkg/api/doc.go:72 still reads (dynamo, kubeflow, nim, any) — missing both slurm (pre-existing drift from #866) and the new runai.
Suggested:

// - platform: Platform/framework (dynamo, kubeflow, nim, runai, slurm, any)

resker added 2 commits May 19, 2026 07:47
Add `runai` as a platform value in AICR recipes, peer to
`dynamo`, `kubeflow`, `nim`, and `slurm`.

Mirrors the precedent set by NVIDIA#866 (slurm). Sites updated:

- pkg/recipe/criteria.go: CriteriaPlatformRunai const, parser case,
  sorted slice in GetCriteriaPlatformTypes
- pkg/recipe/criteria_test.go: table-driven cases for runai (lower +
  uppercase) and updated TestGetCriteriaPlatformTypes expectations
- pkg/recipe/doc.go: Platform field docstring + bullet under "Platform
  types for workload frameworks"
- api/aicr/v1/server.yaml: three platform enum sites
- docs/user/{api,cli}-reference.md: platform value enumerations
- docs/contributor/{api-server,validations}.md: platform value
  enumerations
- docs/README.md: glossary Criteria row

Run:ai recipe matrix and any operator/CRD plumbing are deferred to
follow-up PRs.
Per review feedback on NVIDIA#955: pkg/api/doc.go:72 still listed the
platform enum as (dynamo, kubeflow, nim, any) — missing both `slurm`
(pre-existing drift from NVIDIA#866) and the new `runai` added in this PR.

Updated to match the canonical list now used across server.yaml,
pkg/recipe/criteria.go, and the user/contributor docs:

    platform: Platform/framework (dynamo, kubeflow, nim, runai, slurm, any)

No behavioral change; godoc-only.

Reported-by: @yuanchen8911
Signed-off-by: Rob Esker <resker@nvidia.com>
@resker resker force-pushed the feat/platform-runai branch from 166c2e3 to 81e74f3 Compare May 19, 2026 14:57
@resker
Copy link
Copy Markdown
Contributor Author

resker commented May 19, 2026

Thanks @yuanchen8911 — fixed in 81e74f3.

This should also close out the slurm omission from #866 in the same spot.

@mchmarny mchmarny enabled auto-merge (squash) May 19, 2026 20:37
@mchmarny mchmarny merged commit 2dfa5e3 into NVIDIA:main May 19, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add NVIDIA Run:ai as a platform criteria value (platform=runai)

3 participants