Skip to content

feat(configurenmxc): optimize nmxc fabric state#1749

Open
narasimhan321 wants to merge 8 commits into
NVIDIA:mainfrom
narasimhan321:nv/improve-nmxcstate
Open

feat(configurenmxc): optimize nmxc fabric state#1749
narasimhan321 wants to merge 8 commits into
NVIDIA:mainfrom
narasimhan321:nv/improve-nmxcstate

Conversation

@narasimhan321
Copy link
Copy Markdown
Contributor

Description

  • add new substate to nmx-configure to disable cluster on all switches.
  • selectively configure nmxc on primary switch

Type of Change

  • Add - New feature or capability

Related Issues (Optional)

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 17, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@narasimhan321 narasimhan321 force-pushed the nv/improve-nmxcstate branch from 6483f17 to f663d89 Compare May 18, 2026 17:21
@narasimhan321 narasimhan321 marked this pull request as ready for review May 18, 2026 18:25
@narasimhan321 narasimhan321 requested a review from a team as a code owner May 18, 2026 18:25
Copy link
Copy Markdown

@vinodchitrali vinodchitrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add migration for state controller

self.get_firmware_job_status_calls.lock().await.push(cmd);
pop_or_err(&mut self.get_firmware_job_status_responses.lock().await)
}
async fn add_firmware_object(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these required ?

) -> Result<rms::GetRackFirmwareInventoryResponse, RackManagerError> {
Ok(rms::GetRackFirmwareInventoryResponse::default())
}
async fn add_firmware_object(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like its bcz of rms proto related issue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is due new rpcs part of pinning latest nvm-rms-client

return transition_to_rack_error(id, state, "RMS client not configured", ctx)
.await;
};
let switch_inventory = load_rack_switch_firmware_inventory(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time to rename this to load_rack_switch_firmware_inventory - load_rack_switch_inventory

"Disabling ScaleUpFabric state before selecting ConfigureNmxCluster primary switch"
);
let response = match rms_client
.set_scale_up_fabric_state(rms::SetScaleUpFabricStateRequest {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in ConfigureNmxClusterState::DisableScaleUpFabricState is this correct ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is enabled: Some(false), holds the logic ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentional. SetScaleUpFabricStateRequest.enabled is the desired target state, so Some(false) disables ScaleUpFabric state. This state runs before primary selection and sends the request for all switches in the rack.

Ok(StateHandlerOutcome::transition(RackState::Maintenance {
maintenance_state: RackMaintenanceState::ConfigureNmxCluster {
configure_nmx_cluster:
ConfigureNmxClusterState::ConfigureScaleUpFabricManager,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u need to add migration .. What to do if the rack is in WaitForFabricStatus ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants