Skip to content

feat: Add component manager capability aware task dispatch#554

Merged
jw-nvidia merged 1 commit into
NVIDIA:mainfrom
jw-nvidia:feat/cm-capability-aware-dispatch
May 21, 2026
Merged

feat: Add component manager capability aware task dispatch#554
jw-nvidia merged 1 commit into
NVIDIA:mainfrom
jw-nvidia:feat/cm-capability-aware-dispatch

Conversation

@jw-nvidia
Copy link
Copy Markdown
Contributor

Description

  • Add capability metadata for component managers and derive task capability requirements from operation rules before Executor dispatch.
  • Move capability primitives into the componentmanager package, add registry checks for active manager capabilities, and keep activity-level checks as the operation-boundary guardrail.

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • Flow - Flow service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@jw-nvidia jw-nvidia requested a review from a team as a code owner May 20, 2026 23:10
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Review Change Stack

Summary by CodeRabbit

  • New Features

    • Pre-dispatch capability validation blocks dispatch when required capabilities are missing and surfaces clear unsupported-capability errors.
    • Action configs can declare component types and actions expose ordered execution.
  • Refactor

    • Centralized capability handling with a parsed/validated capability set API and capability-aware registry lookups that return safe descriptor copies.
    • Activity/executor flows now perform capability-gated manager selection.
  • Tests

    • Broadened unit coverage for capability parsing, sets, requirement derivation, registry capability checks, activity/manager preflight validation; removed an obsolete serve test.

Walkthrough

This PR centralizes capability handling: introduces a capability package, migrates descriptors to capability types, adds registry APIs for capability-gated manager selection, derives per-execution capability requirements from rules, and enforces capability checks in activities and pre-dispatch workflow validation.

Changes

Capability Infrastructure and Registry Integration

Layer / File(s) Summary
Capability package: parsing, set operations, and error types
flow/internal/task/componentmanager/capability/capability.go, flow/internal/task/componentmanager/capability/capability_test.go
New capability package with Parse, exported sentinel errors and wrapper error types, and CapabilitySet helpers (NewSet, Contains, Add, Sorted, Normalize) with tests.
Catalog descriptor and error surface migration
flow/internal/task/componentmanager/catalog/catalog.go, flow/internal/task/componentmanager/catalog/errors.go
Switch Descriptor.Capabilities to capability.CapabilitySet; remove local capability error types from catalog errors and add ErrUnknownComponentManagerImplementation.
Component manager error types and capability error sourcing
flow/internal/task/componentmanager/errors.go
Repoint capability error sentinels and aliases to the capability package; add ErrUnsupportedCapability and UnsupportedCapabilityError carrying component/implementation/capability context and available capabilities.
Manager descriptor capability declarations
flow/internal/task/componentmanager/compute/nico/nico.go, flow/internal/task/componentmanager/nvswitch/nico/nico.go, flow/internal/task/componentmanager/nvswitch/nvswitchmanager/nvswitchmanager.go, flow/internal/task/componentmanager/powershelf/nico/nico.go, flow/internal/task/componentmanager/powershelf/psm/psm.go, flow/internal/task/componentmanager/mock/mock.go
All component managers now declare supported capabilities using capability.CapabilitySet and capability.Capability* constants.
Registry capability-aware manager selection and cloning semantics
flow/internal/task/componentmanager/registry.go, flow/internal/task/componentmanager/registry_test.go
Add GetCapableManager and CheckCapability for capability-scoped manager retrieval; return cloned descriptors on GetDescriptor/Descriptors; document normalization occurs at activation boundary.
Catalog and builtin test updates
flow/internal/task/componentmanager/catalog/catalog_test.go, flow/internal/task/componentmanager/builtin/builtin_test.go
Update tests to use capability.CapabilitySet and capability constants/errors; adjust assertions and fixtures accordingly.
Component manager test helper with capability support
flow/internal/task/componentmanager/test_helpers_test.go
Add newRegistryWithCapabilities test helper to construct registries whose descriptors advertise specific capability sets.

Task Dispatch Capability Requirements and Validation

Layer / File(s) Summary
Capability requirements types and validation
flow/internal/task/capabilityrequirements/requirements.go, flow/internal/task/capabilityrequirements/requirements_test.go
New capabilityrequirements package: Requirement (component type + capabilities), CapabilityChecker interface, Requirement.Validate, and Required(rule, componentTypes) builder that derives deterministic per-component capability requirements from rules and action parameters.
Operation rules component type expansion and action ordering
flow/internal/task/operationrules/actions.go, flow/internal/task/operationrules/actions_test.go, flow/internal/task/operationrules/rules.go
Add ActionConfig.ComponentTypes() to parse/validate component_types parameters (required and optional), rework componentTypesFromParameter to return parsed types, and add SequenceStep.OrderedActions() to flatten pre/main/post actions in execution order.
Pre-dispatch capability validation in workflow manager
flow/internal/task/executor/temporalworkflow/manager/manager.go, flow/internal/task/executor/temporalworkflow/manager/pre_dispatch_validation_test.go
Manager.Execute now derives required capabilities for the execution, validates them against the active registry (via GetCapableManager/CheckCapability), and fails fast with UnsupportedCapabilityError when unmet. Helper executionComponentTypes ensures deterministic ordering.

Activity-Level Capability-Aware Manager Selection

Layer / File(s) Summary
Activity methods with capability-aware manager selection
flow/internal/task/executor/temporalworkflow/activity/activity.go, flow/internal/task/executor/temporalworkflow/activity/activity_test.go, flow/internal/task/executor/temporalworkflow/activity/registry_test.go
All capability-dependent activities obtain managers via validAndGetComponentManager(target, requiredCapability) which uses Registry.GetCapableManager; tests assert unsupported-capability behavior and successful manager invocation when supported.

Cleanup

Layer / File(s) Summary
Removed obsolete logging test
flow/cmd/serve_test.go
Removed TestLogComponentManagerRegistry (serve logging test removed).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately and specifically describes the main change: adding capability-aware task dispatch for component managers.
Description check ✅ Passed The pull request description clearly relates to the changeset, detailing capability metadata addition, capability-aware registry checks, and activity-level guardrails.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
flow/internal/task/operationrules/actions_test.go (1)

21-26: ⚡ Quick win

Use testify/require assertions instead of reflect.DeepEqual in this test.

Line 348 should use require.Equal(t, tt.want, got) and reflect can then be removed from imports.

Proposed change
 import (
-	"reflect"
 	"testing"
 	"time"
 
+	"github.com/stretchr/testify/require"
 	"github.com/NVIDIA/infra-controller-rest/flow/pkg/common/devicetypes"
 )
@@
-			if !reflect.DeepEqual(got, tt.want) {
-				t.Fatalf("ComponentTypes() = %v, want %v", got, tt.want)
-			}
+			require.Equal(t, tt.want, got)

As per coding guidelines "**/*_test.go: Use testify (assert/require) for test assertions in Go tests."

Also applies to: 348-350

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/operationrules/actions_test.go` around lines 21 - 26,
Replace the reflect.DeepEqual assertion in the test with a testify require
assertion: change the DeepEqual check (currently comparing got and tt.want
around lines 348-350) to require.Equal(t, tt.want, got), add the
"github.com/stretchr/testify/require" import, and remove the "reflect" import;
ensure the test file (flow/internal/task/operationrules/actions_test.go) uses
require for this assertion so it follows the project testing guideline.
flow/internal/task/operationrules/actions.go (1)

140-141: ⚡ Quick win

Split assignment from condition in ComponentTypes().

Line 140 uses inline assign-and-condition. Please split it into two statements to align with repo Go style.

Proposed change
-		if spec, ok := actionRegistry[ac.Name]; ok &&
-			spec.requiresParam(ParamComponentTypes) {
+		spec, ok := actionRegistry[ac.Name]
+		if ok && spec.requiresParam(ParamComponentTypes) {
 			return nil, fmt.Errorf(
 				"action %s missing required parameter: %s",
 				ac.Name,
 				ParamComponentTypes,
 			)
 		}

As per coding guidelines "**/*.go: Split assign-and-condition into two statements; prefer separate derr := action() and if derr != nil over if derr := action(); derr != nil."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/operationrules/actions.go` around lines 140 - 141, In
ComponentTypes(), avoid the inline assign-and-condition: first perform the
lookup with spec, ok := actionRegistry[ac.Name], then use a separate if
statement like if ok && spec.requiresParam(ParamComponentTypes) { ... }; update
any subsequent logic that relied on the inline form to use the separately
declared spec and ok variables (references: ComponentTypes, actionRegistry,
ac.Name, spec.requiresParam, ParamComponentTypes).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flow/internal/task/executor/temporalworkflow/manager/manager.go`:
- Around line 326-334: In the Execute method, split the inline assignment from
the if condition: call validateRequiredCapabilities with req.Info and
m.conf.ComponentManagerRegistry and assign its result to err on its own line,
then follow with a separate if err != nil { return nil, fmt.Errorf("component
manager capability pre-dispatch validation failed: %w", err) }. This affects the
call to validateRequiredCapabilities and the error handling inside Execute.

---

Nitpick comments:
In `@flow/internal/task/operationrules/actions_test.go`:
- Around line 21-26: Replace the reflect.DeepEqual assertion in the test with a
testify require assertion: change the DeepEqual check (currently comparing got
and tt.want around lines 348-350) to require.Equal(t, tt.want, got), add the
"github.com/stretchr/testify/require" import, and remove the "reflect" import;
ensure the test file (flow/internal/task/operationrules/actions_test.go) uses
require for this assertion so it follows the project testing guideline.

In `@flow/internal/task/operationrules/actions.go`:
- Around line 140-141: In ComponentTypes(), avoid the inline
assign-and-condition: first perform the lookup with spec, ok :=
actionRegistry[ac.Name], then use a separate if statement like if ok &&
spec.requiresParam(ParamComponentTypes) { ... }; update any subsequent logic
that relied on the inline form to use the separately declared spec and ok
variables (references: ComponentTypes, actionRegistry, ac.Name,
spec.requiresParam, ParamComponentTypes).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a1a6dc9c-cb8d-4e5f-a863-1a83d4247f79

📥 Commits

Reviewing files that changed from the base of the PR and between 598b4fa and 14d818c.

📒 Files selected for processing (27)
  • flow/cmd/serve_test.go
  • flow/internal/task/capabilityrequirements/requirements.go
  • flow/internal/task/capabilityrequirements/requirements_test.go
  • flow/internal/task/componentmanager/builtin/builtin_test.go
  • flow/internal/task/componentmanager/capability/capability.go
  • flow/internal/task/componentmanager/capability/capability_test.go
  • flow/internal/task/componentmanager/catalog/catalog.go
  • flow/internal/task/componentmanager/catalog/catalog_test.go
  • flow/internal/task/componentmanager/catalog/errors.go
  • flow/internal/task/componentmanager/compute/nico/nico.go
  • flow/internal/task/componentmanager/errors.go
  • flow/internal/task/componentmanager/mock/mock.go
  • flow/internal/task/componentmanager/nvswitch/nico/nico.go
  • flow/internal/task/componentmanager/nvswitch/nvswitchmanager/nvswitchmanager.go
  • flow/internal/task/componentmanager/powershelf/nico/nico.go
  • flow/internal/task/componentmanager/powershelf/psm/psm.go
  • flow/internal/task/componentmanager/registry.go
  • flow/internal/task/componentmanager/registry_test.go
  • flow/internal/task/componentmanager/test_helpers_test.go
  • flow/internal/task/executor/temporalworkflow/activity/activity.go
  • flow/internal/task/executor/temporalworkflow/activity/activity_test.go
  • flow/internal/task/executor/temporalworkflow/activity/registry_test.go
  • flow/internal/task/executor/temporalworkflow/manager/manager.go
  • flow/internal/task/executor/temporalworkflow/manager/pre_dispatch_validation_test.go
  • flow/internal/task/operationrules/actions.go
  • flow/internal/task/operationrules/actions_test.go
  • flow/internal/task/operationrules/rules.go
💤 Files with no reviewable changes (2)
  • flow/cmd/serve_test.go
  • flow/internal/task/componentmanager/catalog/errors.go

Comment thread flow/internal/task/executor/temporalworkflow/manager/manager.go Outdated
@jw-nvidia jw-nvidia force-pushed the feat/cm-capability-aware-dispatch branch from 14d818c to 31faee6 Compare May 21, 2026 00:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
flow/internal/task/componentmanager/test_helpers_test.go (1)

151-153: ⚡ Quick win

Use require.NoError instead of t.Fatalf in test assertions.

Please keep assertions consistent with the test suite convention in _test.go files.

Proposed diff
 import (
 	"context"
+	"testing"
+
 	"github.com/NVIDIA/infra-controller-rest/flow/internal/task/componentmanager/capability"
-	"testing"
+	"github.com/stretchr/testify/require"
@@
-	if err != nil {
-		t.Fatalf("NewRegistry() error = %v", err)
-	}
+	require.NoError(t, err)

As per coding guidelines **/*_test.go: Use testify (assert/require) for test assertions in Go tests.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/componentmanager/test_helpers_test.go` around lines 151 -
153, Replace the t.Fatalf assertion used after calling NewRegistry() with
testify's require.NoError to follow test conventions: change the check in the
test that reads `if err != nil { t.Fatalf("NewRegistry() error = %v", err) }` to
`require.NoError(t, err)` (or `require.NoError(t, err, "NewRegistry()")`) and
add the testify/require import (`github.com/stretchr/testify/require`) to the
test file if it isn't already present.
flow/internal/task/capabilityrequirements/requirements.go (1)

51-54: ⚡ Quick win

Apply repo style for error assignment checks.

Lines 51-54 and 144-148 use inline assign-and-check; split them into two statements per project Go conventions.

Suggested patch
 	for _, capability := range r.Capabilities {
-		if err := checker.CheckCapability(
-			r.ComponentType,
-			capability,
-		); err != nil {
+		derr := checker.CheckCapability(
+			r.ComponentType,
+			capability,
+		)
+		if derr != nil {
 			return fmt.Errorf(
 				"component type %s requires capability %q: %w",
 				devicetypes.ComponentTypeToString(r.ComponentType),
 				capability,
-				err,
+				derr,
 			)
 		}
 	}
@@
 		for _, action := range step.OrderedActions() {
-			if err := builder.collectAction(
-				step.ComponentType,
-				action,
-			); err != nil {
-				return nil, err
+			derr := builder.collectAction(
+				step.ComponentType,
+				action,
+			)
+			if derr != nil {
+				return nil, derr
 			}
 		}
 	}

As per coding guidelines: Split assign-and-condition into two statements; prefer separate derr := action() and if derr != nil over if derr := action(); derr != nil.

Also applies to: 144-148

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/capabilityrequirements/requirements.go` around lines 51 -
54, The inline assign-and-check pattern used with checker.CheckCapability
(currently written as if err := checker.CheckCapability(...); err != nil) must
be split into two statements per repo Go style: first assign the result to a
distinct error variable (e.g., derr := checker.CheckCapability(...)) and then
check it with if derr != nil { ... }; apply this change to the occurrence around
the checker.CheckCapability call and the second occurrence referenced (lines
~144-148) so both places use separate assignment followed by an if that tests
that variable.
flow/internal/task/componentmanager/capability/capability.go (1)

120-127: ⚡ Quick win

Clarify or enforce Add's normalization precondition.

CapabilitySet is documented as normalized, but Add appends raw values unchanged. That allows whitespace-padded or unknown capabilities into the set and can defeat Contains/dedupe semantics until some later Normalize() call. Please either normalize on insert or explicitly document that Add only accepts already-normalized capabilities.

Minimal doc fix
-// Add returns a capability set with capability included once.
+// Add returns a capability set with capability included once.
+// Callers must pass an already-normalized capability; use Parse for raw names.
 func (s CapabilitySet) Add(capability Capability) CapabilitySet {

As per coding guidelines, document when you have intentionally omitted code that the reader might otherwise expect to be present.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/componentmanager/capability/capability.go` around lines
120 - 127, The Add method on CapabilitySet currently appends the provided
Capability raw, risking unnormalized values; update CapabilitySet.Add to
normalize the incoming capability (e.g., call the existing Normalize() or
equivalent normalizer) before calling Contains and before appending so
dedupe/Contains semantics always hold, and ensure you reference
CapabilitySet.Contains and CapabilitySet.Normalize (or the normalizing function)
in the implementation; alternatively, if you prefer not to change behavior, add
a short doc comment on Add stating it requires an already-normalized Capability
and reference CapabilitySet.Normalize so callers know to call Normalize first.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@flow/internal/task/capabilityrequirements/requirements.go`:
- Around line 51-54: The inline assign-and-check pattern used with
checker.CheckCapability (currently written as if err :=
checker.CheckCapability(...); err != nil) must be split into two statements per
repo Go style: first assign the result to a distinct error variable (e.g., derr
:= checker.CheckCapability(...)) and then check it with if derr != nil { ... };
apply this change to the occurrence around the checker.CheckCapability call and
the second occurrence referenced (lines ~144-148) so both places use separate
assignment followed by an if that tests that variable.

In `@flow/internal/task/componentmanager/capability/capability.go`:
- Around line 120-127: The Add method on CapabilitySet currently appends the
provided Capability raw, risking unnormalized values; update CapabilitySet.Add
to normalize the incoming capability (e.g., call the existing Normalize() or
equivalent normalizer) before calling Contains and before appending so
dedupe/Contains semantics always hold, and ensure you reference
CapabilitySet.Contains and CapabilitySet.Normalize (or the normalizing function)
in the implementation; alternatively, if you prefer not to change behavior, add
a short doc comment on Add stating it requires an already-normalized Capability
and reference CapabilitySet.Normalize so callers know to call Normalize first.

In `@flow/internal/task/componentmanager/test_helpers_test.go`:
- Around line 151-153: Replace the t.Fatalf assertion used after calling
NewRegistry() with testify's require.NoError to follow test conventions: change
the check in the test that reads `if err != nil { t.Fatalf("NewRegistry() error
= %v", err) }` to `require.NoError(t, err)` (or `require.NoError(t, err,
"NewRegistry()")`) and add the testify/require import
(`github.com/stretchr/testify/require`) to the test file if it isn't already
present.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8906d691-8948-42ac-85a1-a150bed498fd

📥 Commits

Reviewing files that changed from the base of the PR and between 14d818c and 31faee6.

📒 Files selected for processing (28)
  • flow/cmd/serve_test.go
  • flow/internal/task/capabilityrequirements/requirements.go
  • flow/internal/task/capabilityrequirements/requirements_test.go
  • flow/internal/task/componentmanager/builtin/builtin_test.go
  • flow/internal/task/componentmanager/capability/capability.go
  • flow/internal/task/componentmanager/capability/capability_test.go
  • flow/internal/task/componentmanager/catalog/catalog.go
  • flow/internal/task/componentmanager/catalog/catalog_test.go
  • flow/internal/task/componentmanager/catalog/errors.go
  • flow/internal/task/componentmanager/compute/nico/nico.go
  • flow/internal/task/componentmanager/errors.go
  • flow/internal/task/componentmanager/mock/mock.go
  • flow/internal/task/componentmanager/nvswitch/nico/nico.go
  • flow/internal/task/componentmanager/nvswitch/nvswitchmanager/nvswitchmanager.go
  • flow/internal/task/componentmanager/powershelf/nico/nico.go
  • flow/internal/task/componentmanager/powershelf/psm/psm.go
  • flow/internal/task/componentmanager/registry.go
  • flow/internal/task/componentmanager/registry_test.go
  • flow/internal/task/componentmanager/test_helpers_test.go
  • flow/internal/task/executor/temporalworkflow/activity/activity.go
  • flow/internal/task/executor/temporalworkflow/activity/activity_test.go
  • flow/internal/task/executor/temporalworkflow/activity/registry_test.go
  • flow/internal/task/executor/temporalworkflow/manager/manager.go
  • flow/internal/task/executor/temporalworkflow/manager/pre_dispatch_validation_test.go
  • flow/internal/task/operationrules/actions.go
  • flow/internal/task/operationrules/actions_test.go
  • flow/internal/task/operationrules/loader_actions_test.go
  • flow/internal/task/operationrules/rules.go
💤 Files with no reviewable changes (2)
  • flow/cmd/serve_test.go
  • flow/internal/task/componentmanager/catalog/errors.go

@jw-nvidia jw-nvidia force-pushed the feat/cm-capability-aware-dispatch branch from 31faee6 to b1e3957 Compare May 21, 2026 01:15
@jw-nvidia jw-nvidia requested a review from kunzhao-nv May 21, 2026 01:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
flow/internal/task/executor/temporalworkflow/manager/manager.go (1)

269-271: ⚡ Quick win

Split inline assign-and-condition in capability validation loop.

Use a separate assignment and conditional check to align with repository Go style.

Suggested change
 	for _, r := range requirements {
-		if err := r.Validate(registry); err != nil {
-			return err
+		derr := r.Validate(registry)
+		if derr != nil {
+			return derr
 		}
 	}
As per coding guidelines: "Split assign-and-condition into two statements; prefer separate `derr := action()` and `if derr != nil` over `if derr := action(); derr != nil`."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/executor/temporalworkflow/manager/manager.go` around lines
269 - 271, The capability validation loop currently uses an inline
assign-and-condition ("if err := r.Validate(registry); err != nil"); change this
to split the assignment and conditional: first assign the result to a named
variable (e.g., err := r.Validate(registry)) and then check it with a separate
if (if err != nil { return err }). Update the use inside the manager.go
validation loop where r.Validate(registry) is called so it follows the
repository Go style.
flow/internal/task/capabilityrequirements/requirements.go (1)

218-221: ⚡ Quick win

Split inline assign-and-condition in the inner capability loop.

Use a separate assignment before the if to match repository Go style.

Suggested change
 		for _, capability := range capabilities {
-			if err := b.add(componentType, capability); err != nil {
-				return err
+			derr := b.add(componentType, capability)
+			if derr != nil {
+				return derr
 			}
 		}
As per coding guidelines: "Split assign-and-condition into two statements; prefer separate `derr := action()` and `if derr != nil` over `if derr := action(); derr != nil`."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/capabilityrequirements/requirements.go` around lines 218 -
221, Inside the inner loop over capabilities, avoid the inline
assign-and-condition; replace the `if err := b.add(componentType, capability);
err != nil { return err }` with a two-step assignment then check (e.g., `derr :=
b.add(componentType, capability)` followed by `if derr != nil { return derr }`)
so use the `b.add`, `capability`, `componentType`, and `capabilities`
identifiers to locate the loop and perform the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flow/internal/task/componentmanager/capability/capability.go`:
- Around line 125-136: CapabilitySet.Add currently only normalizes the newly
appended capability via append(s, capability).Sorted(), which leaves the
existing receiver s un-normalized and can miss semantic duplicates; update Add
(method CapabilitySet.Add) to first normalize the existing set (e.g.,
rebuild/parse each element of s or call an existing normalize helper if
present), then parse/normalize the incoming capability with Parse, check
Contains against the normalized set, and finally append and return the combined
set sorted/deduplicated (use Sorted or the normalize helper) so the returned
CapabilitySet is fully normalized.

In `@flow/internal/task/executor/temporalworkflow/manager/manager.go`:
- Around line 256-260: In validateRequiredCapabilities, replace the inline
if-assignment pattern (if err := r.Validate(registry); err != nil { ... }) with
a two-step assignment and conditional using a distinct error variable (e.g.,
derr := r.Validate(registry) followed by if derr != nil { ... }) to match the
repo's Go style; keep the call site as r.Validate(registry) and do not add any
extra nil-registry checks (componentmanager.Registry.GetCapableManager already
returns ErrRegistryNotConfigured when the receiver is nil), so simply handle
derr as the validation error.

---

Nitpick comments:
In `@flow/internal/task/capabilityrequirements/requirements.go`:
- Around line 218-221: Inside the inner loop over capabilities, avoid the inline
assign-and-condition; replace the `if err := b.add(componentType, capability);
err != nil { return err }` with a two-step assignment then check (e.g., `derr :=
b.add(componentType, capability)` followed by `if derr != nil { return derr }`)
so use the `b.add`, `capability`, `componentType`, and `capabilities`
identifiers to locate the loop and perform the change.

In `@flow/internal/task/executor/temporalworkflow/manager/manager.go`:
- Around line 269-271: The capability validation loop currently uses an inline
assign-and-condition ("if err := r.Validate(registry); err != nil"); change this
to split the assignment and conditional: first assign the result to a named
variable (e.g., err := r.Validate(registry)) and then check it with a separate
if (if err != nil { return err }). Update the use inside the manager.go
validation loop where r.Validate(registry) is called so it follows the
repository Go style.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 37e9df28-b9e5-441d-bc51-44aec4a4120e

📥 Commits

Reviewing files that changed from the base of the PR and between 31faee6 and b1e3957.

📒 Files selected for processing (28)
  • flow/cmd/serve_test.go
  • flow/internal/task/capabilityrequirements/requirements.go
  • flow/internal/task/capabilityrequirements/requirements_test.go
  • flow/internal/task/componentmanager/builtin/builtin_test.go
  • flow/internal/task/componentmanager/capability/capability.go
  • flow/internal/task/componentmanager/capability/capability_test.go
  • flow/internal/task/componentmanager/catalog/catalog.go
  • flow/internal/task/componentmanager/catalog/catalog_test.go
  • flow/internal/task/componentmanager/catalog/errors.go
  • flow/internal/task/componentmanager/compute/nico/nico.go
  • flow/internal/task/componentmanager/errors.go
  • flow/internal/task/componentmanager/mock/mock.go
  • flow/internal/task/componentmanager/nvswitch/nico/nico.go
  • flow/internal/task/componentmanager/nvswitch/nvswitchmanager/nvswitchmanager.go
  • flow/internal/task/componentmanager/powershelf/nico/nico.go
  • flow/internal/task/componentmanager/powershelf/psm/psm.go
  • flow/internal/task/componentmanager/registry.go
  • flow/internal/task/componentmanager/registry_test.go
  • flow/internal/task/componentmanager/test_helpers_test.go
  • flow/internal/task/executor/temporalworkflow/activity/activity.go
  • flow/internal/task/executor/temporalworkflow/activity/activity_test.go
  • flow/internal/task/executor/temporalworkflow/activity/registry_test.go
  • flow/internal/task/executor/temporalworkflow/manager/manager.go
  • flow/internal/task/executor/temporalworkflow/manager/pre_dispatch_validation_test.go
  • flow/internal/task/operationrules/actions.go
  • flow/internal/task/operationrules/actions_test.go
  • flow/internal/task/operationrules/loader_actions_test.go
  • flow/internal/task/operationrules/rules.go
💤 Files with no reviewable changes (2)
  • flow/cmd/serve_test.go
  • flow/internal/task/componentmanager/catalog/errors.go

Comment thread flow/internal/task/componentmanager/capability/capability.go Outdated
Comment on lines +256 to +260
func validateRequiredCapabilities(
info task.ExecutionInfo,
registry *componentmanager.Registry,
) error {
requirements, err := capabilityrequirements.Required(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect Registry capability-check implementation and nil-guard presence.
rg -nP --type=go -C3 'func\s*\(.*\*Registry\)\s*CheckCapability\s*\('

# 2) Inspect current validation helper and call sites in manager.
rg -nP --type=go -C3 'validateRequiredCapabilities\s*\(' flow/internal/task/executor/temporalworkflow/manager/manager.go

# 3) Find direct Manager struct literals that may bypass Config.Build().
rg -nP --type=go -C2 '&Manager\s*{'

Repository: NVIDIA/infra-controller-rest

Length of output: 9953


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect validateRequiredCapabilities implementation and surrounding context.
sed -n '230,360p' flow/internal/task/executor/temporalworkflow/manager/manager.go

# Inspect Registry.CheckCapability implementation.
sed -n '160,240p' flow/internal/task/componentmanager/registry.go

# Locate and inspect componentmanager Registry error constants.
rg -n 'ErrRegistryNotConfigured' flow/internal/task/componentmanager -S
rg -n 'Err.*Registry' flow/internal/task/componentmanager -S

# Find the Requirement.Validate implementation and how it uses the registry/checker.
rg -n --type=go 'type\s+.*Requirement|func\s+\(.*\)\s+Validate\s*\(' flow/internal/task -S

# Inspect capabilityrequirements.Required output type usage around validateRequiredCapabilities call.
rg -n --type=go 'capabilityrequirements\.Required\(' flow/internal/task/executor/temporalworkflow/manager/manager.go

Repository: NVIDIA/infra-controller-rest

Length of output: 11445


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,120p' flow/internal/task/capabilityrequirements/requirements.go
sed -n '120,240p' flow/internal/task/capabilityrequirements/requirements.go

sed -n '80,180p' flow/internal/task/componentmanager/registry.go
sed -n '180,280p' flow/internal/task/componentmanager/registry.go

# Also confirm the error constant exists as referenced in the original suggestion
sed -n '1,120p' flow/internal/task/componentmanager/errors.go

Repository: NVIDIA/infra-controller-rest

Length of output: 17826


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show manager.go with line numbers around validateRequiredCapabilities + loop.
nl -ba flow/internal/task/executor/temporalworkflow/manager/manager.go | sed -n '230,320p'

# Also search for inline assign-and-condition pattern in this file.
rg -n --type=go 'if\s+\w+\s*:=.*;\s*\w+\s*!=\s*nil\s*{' flow/internal/task/executor/temporalworkflow/manager/manager.go

# Search for the specific "derr :=" style mentioned in original scratchpad.
rg -n --type=go '\bderr\s*:=' flow/internal/task/executor/temporalworkflow/manager/manager.go

Repository: NVIDIA/infra-controller-rest

Length of output: 115


🏁 Script executed:

#!/bin/bash
set -euo pipefail

cat -n flow/internal/task/executor/temporalworkflow/manager/manager.go | sed -n '230,320p'

# Look for other inline assign-and-condition forms in this file.
rg -n --type=go 'if\s+\w+\s*:=[^;]+;\s*\w+\s*!=\s*nil\s*{' flow/internal/task/executor/temporalworkflow/manager/manager.go

# Locate exact occurrence around line ~269 if present.
sed -n '250,290p' flow/internal/task/executor/temporalworkflow/manager/manager.go

Repository: NVIDIA/infra-controller-rest

Length of output: 3405


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '240,360p' flow/internal/task/capabilityrequirements/requirements.go

Repository: NVIDIA/infra-controller-rest

Length of output: 226


Refactor the inline assign-and-condition in validateRequiredCapabilities to match the repo’s Go style guidelines.

flow/internal/task/executor/temporalworkflow/manager/manager.go uses if err := r.Validate(registry); err != nil { ... }—split this into two statements (derr := ... then if derr != nil).

The nil-registry panic concern is unnecessary: componentmanager.Registry.GetCapableManager returns ErrRegistryNotConfigured when the registry receiver is nil, so the validation path fails cleanly instead of panicking.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/executor/temporalworkflow/manager/manager.go` around lines
256 - 260, In validateRequiredCapabilities, replace the inline if-assignment
pattern (if err := r.Validate(registry); err != nil { ... }) with a two-step
assignment and conditional using a distinct error variable (e.g., derr :=
r.Validate(registry) followed by if derr != nil { ... }) to match the repo's Go
style; keep the call site as r.Validate(registry) and do not add any extra
nil-registry checks (componentmanager.Registry.GetCapableManager already returns
ErrRegistryNotConfigured when the receiver is nil), so simply handle derr as the
validation error.

Comment thread flow/internal/task/componentmanager/capability/capability.go Outdated
Comment thread flow/internal/task/componentmanager/catalog/catalog.go
Comment thread flow/cmd/serve_test.go
- Add capability metadata for component managers and derive task capability
  requirements from operation rules before Executor dispatch.
- Move capability primitives into the componentmanager package, add registry
  checks for active manager capabilities, and keep activity-level checks as
  the operation-boundary guardrail.

Signed-off-by: Jin Wang <jinwan@nvidia.com>
@jw-nvidia jw-nvidia force-pushed the feat/cm-capability-aware-dispatch branch from b1e3957 to dafcec1 Compare May 21, 2026 18:34
@jw-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test dafcec1

@jw-nvidia jw-nvidia enabled auto-merge (squash) May 21, 2026 18:36
@github-actions
Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-05-21 18:38:42 UTC | Commit: dafcec1

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
flow/internal/task/executor/temporalworkflow/manager/manager.go (1)

268-272: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inline assign-and-condition pattern violates repository Go style guidelines.

The loop body uses if err := r.Validate(registry); err != nil { ... }, which should be split into separate assignment and conditional statements per repository conventions.

🔧 Suggested refactor
 	for _, r := range requirements {
-		if err := r.Validate(registry); err != nil {
+		verr := r.Validate(registry)
+		if verr != nil {
-			return err
+			return verr
 		}
 	}

As per coding guidelines: "Split assign-and-condition into two statements; prefer separate derr := action() and if derr != nil over if derr := action(); derr != nil"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/executor/temporalworkflow/manager/manager.go` around lines
268 - 272, The loop over requirements uses the inline assign-and-condition form
`if err := r.Validate(registry); err != nil { return err }`; change it to a
two-statement pattern to match repo style: call r.Validate(registry) into a
named variable (e.g., derr or err), then check that variable in a separate if
statement and return it on error. Update the loop in manager.go where
requirements is iterated and r.Validate(registry) is invoked (the same
identifier r and registry should be used) so the behavior is identical but the
assignment and conditional are split.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flow/internal/task/capabilityrequirements/requirements.go`:
- Around line 218-221: In the capability aggregation loop inside
requirements.go, replace the inline assignment+condition pattern "if err :=
b.add(componentType, capability); err != nil { return err }" with two
statements: first call b.add and assign its result to a named variable (e.g.,
err := b.add(componentType, capability)), then check that variable in a separate
if (if err != nil { return err }); locate the loop that iterates over
capabilities and the b.add call to make this change.

---

Duplicate comments:
In `@flow/internal/task/executor/temporalworkflow/manager/manager.go`:
- Around line 268-272: The loop over requirements uses the inline
assign-and-condition form `if err := r.Validate(registry); err != nil { return
err }`; change it to a two-statement pattern to match repo style: call
r.Validate(registry) into a named variable (e.g., derr or err), then check that
variable in a separate if statement and return it on error. Update the loop in
manager.go where requirements is iterated and r.Validate(registry) is invoked
(the same identifier r and registry should be used) so the behavior is identical
but the assignment and conditional are split.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2458329d-8893-46d6-a635-31b23844b47f

📥 Commits

Reviewing files that changed from the base of the PR and between b1e3957 and dafcec1.

📒 Files selected for processing (28)
  • flow/cmd/serve_test.go
  • flow/internal/task/capabilityrequirements/requirements.go
  • flow/internal/task/capabilityrequirements/requirements_test.go
  • flow/internal/task/componentmanager/builtin/builtin_test.go
  • flow/internal/task/componentmanager/capability/capability.go
  • flow/internal/task/componentmanager/capability/capability_test.go
  • flow/internal/task/componentmanager/catalog/catalog.go
  • flow/internal/task/componentmanager/catalog/catalog_test.go
  • flow/internal/task/componentmanager/catalog/errors.go
  • flow/internal/task/componentmanager/compute/nico/nico.go
  • flow/internal/task/componentmanager/errors.go
  • flow/internal/task/componentmanager/mock/mock.go
  • flow/internal/task/componentmanager/nvswitch/nico/nico.go
  • flow/internal/task/componentmanager/nvswitch/nvswitchmanager/nvswitchmanager.go
  • flow/internal/task/componentmanager/powershelf/nico/nico.go
  • flow/internal/task/componentmanager/powershelf/psm/psm.go
  • flow/internal/task/componentmanager/registry.go
  • flow/internal/task/componentmanager/registry_test.go
  • flow/internal/task/componentmanager/test_helpers_test.go
  • flow/internal/task/executor/temporalworkflow/activity/activity.go
  • flow/internal/task/executor/temporalworkflow/activity/activity_test.go
  • flow/internal/task/executor/temporalworkflow/activity/registry_test.go
  • flow/internal/task/executor/temporalworkflow/manager/manager.go
  • flow/internal/task/executor/temporalworkflow/manager/pre_dispatch_validation_test.go
  • flow/internal/task/operationrules/actions.go
  • flow/internal/task/operationrules/actions_test.go
  • flow/internal/task/operationrules/loader_actions_test.go
  • flow/internal/task/operationrules/rules.go
💤 Files with no reviewable changes (2)
  • flow/cmd/serve_test.go
  • flow/internal/task/componentmanager/catalog/errors.go

Comment on lines +218 to +221
for _, capability := range capabilities {
if err := b.add(componentType, capability); err != nil {
return err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Split assign-and-condition in capability aggregation loop (Line 219).

Please split the inline assignment from the conditional to match repository Go style requirements.

Proposed patch
 		for _, capability := range capabilities {
-			if err := b.add(componentType, capability); err != nil {
-				return err
+			derr := b.add(componentType, capability)
+			if derr != nil {
+				return derr
 			}
 		}
 	}
As per coding guidelines: "Split assign-and-condition into two statements; prefer separate `derr := action()` and `if derr != nil` over `if derr := action(); derr != nil`".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _, capability := range capabilities {
if err := b.add(componentType, capability); err != nil {
return err
}
for _, capability := range capabilities {
derr := b.add(componentType, capability)
if derr != nil {
return derr
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/capabilityrequirements/requirements.go` around lines 218 -
221, In the capability aggregation loop inside requirements.go, replace the
inline assignment+condition pattern "if err := b.add(componentType, capability);
err != nil { return err }" with two statements: first call b.add and assign its
result to a named variable (e.g., err := b.add(componentType, capability)), then
check that variable in a separate if (if err != nil { return err }); locate the
loop that iterates over capabilities and the b.add call to make this change.

@jw-nvidia jw-nvidia merged commit f813f8d into NVIDIA:main May 21, 2026
52 checks passed
@github-actions
Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-flow 66 4 34 18 2 8
nico-nsm 82 2 28 43 9 0
nico-psm 67 4 35 18 2 8
nico-rest-api 100 6 53 30 3 8
nico-rest-cert-manager 65 4 34 18 1 8
nico-rest-db 66 4 34 18 2 8
nico-rest-site-agent 65 4 34 18 1 8
nico-rest-site-manager 65 4 34 18 1 8
nico-rest-workflow 67 4 35 18 2 8
TOTAL 643 36 321 199 23 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@jw-nvidia jw-nvidia deleted the feat/cm-capability-aware-dispatch branch May 21, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants