Skip to content

feat(api): add Flat VPC virtualization type for zero-DPU hosts#1775

Open
chet wants to merge 1 commit into
NVIDIA:mainfrom
chet:vpc-flat
Open

feat(api): add Flat VPC virtualization type for zero-DPU hosts#1775
chet wants to merge 1 commit into
NVIDIA:mainfrom
chet:vpc-flat

Conversation

@chet
Copy link
Copy Markdown
Contributor

@chet chet commented May 18, 2026

Description

This closes #1522.

Adds Flat to VpcVirtualizationType for VPCs hosted on zero-DPU machines. ETV and FNN both presume a Carbide-managed DPU data plane, so using them for zero-DPU hosts meant allocating overlay machinery that nothing consumed. Flat just records the VPC and lets the network operator's switch fabric own reachability.

Also, as I had mentioned to @Matthias247 and @bcavnvidia on the side:

I ended up doing a bit more refactoring when I was in there. While I was working it, I was like, "you know, it'd be a lot nicer if this wasn't just a bunch of additional matching + conditional branching" -- so I tried to break it out by defining new approach VPC capabilities (kind of like machine capabilities and rack capabilities), and using that modeling to simplify some of the decision making.

Per-variant policy lives in a new VpcCapabilities profile in model::vpc::capability: which host fabric interface the type attaches to (Dpu or Nic), which segment types it accepts, whether it supports IPv6 / routing profiles / stretched-L2 SVI, and which other types it peers with. Each variant maps to one profile constant; handlers consult capability methods that just read from the profile. Adding a future VPC type is a six-field profile plus one match arm, no handler edits.

Flat VPCs and HostInband segments are mutually bound -- a Flat VPC can only hold HostInband segments, and HostInband segments can only live in Flat VPCs. Tenants pick FLAT through the same VPC create flow as any other type.

Docs in a separate PR. Tests added!

Signed-off-by: Chet Nichols III chetn@nvidia.com

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@chet chet requested a review from a team as a code owner May 18, 2026 18:34
@github-actions
Copy link
Copy Markdown

Comment thread crates/api/src/handlers/vpc_peering.rs
.ensure_supports_segment(&new_network_segment)
.map_err(CarbideError::from)?;
virtualization_type.allocates_svi_for(&new_network_segment)
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fail if there's no VPC and new_network_segment.segment_type is HostInband? Your new code in api/src/handlers/instance/mod.rs will fail allocations onto HostInband segments if there's no vpc_id for the segment, so maybe it's better to validate that here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah good call out -- so, I think a HostInband segment should be able to exist without being bound to a VPC.

For example, maybe we want a HostInband segment for playing around with zero DPU provisioning, but maybe we'd never bind it to a VPC, and that should be ok; if we required a VPC, we'd have this weird inter-dependency thing where we'd need to create a VPC just for getting zero DPU hosts provisioned. This is kind of similar to Admin maybe?

All that to say, you're right in calling this out. I think the actual adjustment is to update the comments in api/src/handlers/instance/mod.rs to explain that better, and improve the error handling a bit to return an error specific to the segment not being bound to a VPC at allocation time.

I guess TLDR is it should be fine to have a HostInband segment not within a VPC, BUT, once it comes time to allocate an instance from the host into a VPC, the segment the host is in needs to be bound to a VPC?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah to close this out (going back through this), I updated the comment, and now return a FailedPrecondition if the segment is VPC-less at the point of instance allocation.

@chet chet requested a review from Coco-Ben as a code owner May 19, 2026 22:13
@chet chet removed the request for review from Coco-Ben May 20, 2026 16:33
Comment thread crates/agent/src/nvue.rs Outdated
VpcVirtualizationType::EthernetVirtualizer
| VpcVirtualizationType::EthernetVirtualizerWithNvue => Ok(TMPL_ETV_WITH_NVUE),
VpcVirtualizationType::Fnn => Ok(TMPL_FNN),
// Flat VPCs are comprised of machines whose primary interface is not
Copy link
Copy Markdown
Contributor

@bcavnvidia bcavnvidia May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this statement true? A machine with a primary interface that is not a DPU is not necessarily a zero-dpu machine, right?

Only a machine that has no DPUs or has all DPUs in NIC mode are zero-DPU, no?

I could have a machine with the primary interface being non-DPU-backed, marked as the primary interface in ExpectedMachines, but the rest of the interfaces could be DPU-backed. (?) (EDIT: I think my head drifted into admin-network here, but the overall questions still stand)

Depending the answers to those, I think we'll have more questions to answer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this statement true? A machine with a primary interface that is not a DPU is not necessarily a zero-dpu machine, right?

Right -- I think we've just been overloading the term a bit, so I put the aka "zero DPU" (like legit heavy quotes). I can drop that part of the comment (if that's what you're recommending), OR, are you saying that we may want to be able to use NVUE to configure non-primary DPUs on a machine in a Flat VPC? 🤢

...but yes, like in some BCM-managed environments, we've got some trays with a basic NIC for N/S, AND they also have 2x DPUs that aren't being used for N/S. They aren't zero DPU, but they're also not in the N/S serving path...

...at least not in our present configuration. I guess umm.. we could potentially virtualize and get all insane and start carving out VFs and one VM might use the basic NIC, and then other VMs would use VFs off of the DPUs.

Is that kind of what you're getting at? Probably good to keep in mind, but nothing we're doing here yet right?

Copy link
Copy Markdown
Contributor Author

@chet chet May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All said for now, I'll reword the comment, e.g. "primary fabric interface is a plain NIC" or something, and we can figure out mixed-mode stuff separately. I'm guessing it's something that's probably going to somehow become some high priority thing eventually.

segment: &NewNetworkSegment,
) -> Result<(), VpcCapabilityError> {
self.ensure_supports_segment_type(segment.segment_type)?;
if segment.prefixes.iter().any(|p| p.prefix.is_ipv6()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there's an assumption here that things will always be v4 or dual-stack but never pure-v6. 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...yeah. I've been trying to get away from that with a lot of the other v6 work. Like a lot of the flow takes an IpAddress to support both, and in other cases I've got an optional v6-specific flag (and subsequently made the v4 flag generic and support v4 or v6) to allow things to be pure v6, but there's probably some to be desired in there. There's even the older cases where certain variables were a Vec with the idea it would support up to 2 "items", to allow for v4 or v6 or dual stack, lol. Let me see what I can do to clean this up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I was trying to decide between implicit and explicit, and I think it [probably??] makes sense to add a supports_ipv4_prefix bool w/ matching check per family. Really it makes sense not to even need the check to begin with, but until all virtualization types actually support v6, I guess we need the check.

Comment thread crates/api-model/src/vpc/capability.rs Outdated
fn supports_routing_profiles(self) -> bool;

/// Whether this type allocates a stretched-L2 SVI on its segments.
fn allocates_stretched_l2_svi(self) -> bool;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only thing that worries me is that things could be combined in ways that make no sense because they're mutually exclusive.

If we never expose a way to let users configure things in config, then this is moot.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.. this was one where I was trying to express it in a way that was more generic as a capability, but I wasn't entirely happy with it. Let me think about it..

Copy link
Copy Markdown
Contributor Author

@chet chet May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to get around the mutually exclusive combos, I could maybe turn this into an enum ::DpuOverlayL2, ::DpuOverlayL3, ::OperatorManaged (or something), or just toss a validate() function in there to yell, but like you said that'd only be if we expose it to users ever. Thoughts?

..like:

  pub enum DataPlaneKind {
      DpuOverlayL2,    // e.g. ETV
      DpuOverlayL3,    // e.g. FNN
      OperatorManaged, // e.g. Flat
  }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does look nice

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me see what I can do with it.

Comment thread crates/api-model/src/vpc/capability.rs Outdated
],
supports_ipv6_prefix: true,
supports_routing_profiles: true,
allocates_stretched_l2_svi: true,
Copy link
Copy Markdown
Contributor

@bcavnvidia bcavnvidia May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be false ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehhh bad variable naming on my end. Kind of what I was saying above. If it's confusing to you, it's confusing, lol. This is basically a parameter for fronting:

  pub fn get_svi_ip(...) -> eyre::Result<Option<IpNetwork>> {
      if virtualization_type == VpcVirtualizationType::Fnn && is_l2_segment {
          // ... return Some(svi_ip)
      }
      Ok(None)
  }

Can probably just call it allocates_svi_ip.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the thing that gets an admin network VPC an svi ip?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The network segments used for tenant /31s don't get an SVI IP. That's what makes it confusing here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess even as allocates_svi_ip it reads weird?

Like an SVI is allocated only for stretched- L2 segments via allocates_svi_for(&segment), which combines my poorly named bool with the segment-specific can_stretch, which I think goes to your comment that tenant /31 segments have can_stretch = false, so they don't get one, but allocates_svi_ip as a parameter makes it sound like they do...

Comment thread crates/api-model/src/vpc/capability.rs Outdated
],
supports_ipv6_prefix: false,
supports_routing_profiles: true,
allocates_stretched_l2_svi: false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be true ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above re: allocates_svi_ip. I'll change the name of this.

}

/// Resolution of routing-related state for a VPC at create time. The
/// `internal` flag isn't strictly part of the routing profile, but it
Copy link
Copy Markdown
Contributor

@bcavnvidia bcavnvidia May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not entirely a true statement. internal is part of the routing-profile, it just needs to default to internal for the non-FNN case because there's no distinction in non-FNN. With FNN, the internal flag in the routing profile should be the only thing deciding int/ext. (EDIT: which it seems is still being maintained, so maybe just a misleading comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you know, I had considered making it so callers would get an Option<ResolvedVpcRouting> and not try to come up with a default. Would that be more clear vs. what I'm doing here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...I'll just fix the comment for now I think.

Comment thread crates/api/src/tests/vpc_peering.rs Outdated
/// - Positive: the Flat VPC's HostInband segment prefix appears in the
/// FNN instance's `vpc_peer_prefixes` (peer reachability is exposed
/// to the DPU).
/// - Negative: the Flat VPC's VNI does NOT appear in `vpc_peer_vnis`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are saying Flat <-> FNN peering should be allowed, then the Flat VPC's VNI should appear in vpc_peer_vnis. An operator could make things work by using routing profiles that align with how they've configured things for the Flat-VPC, but if that's the path we expect, the peering starts to make less sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah 100% -- good call. I totally brain farted on that -- we DO want to make sure the Flat VPC VNIs appear, because Flat VPCs DO have VNIs, because some day we figure pluggable SDN things can leverage them to configure switch ports/VTEPs or whatever else network operators want to do w/ the VNIs configured in Flat VPCs. Fixing!

Comment thread crates/api-model/src/vpc/capability.rs Outdated
NetworkSegmentType::Underlay,
],
supports_ipv6_prefix: false,
supports_routing_profiles: true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to split this out into two -- I sort of conflated "supports accepting a routing profile type" and "supports applying a routing profile" into a single parameter. In this case yeah, the latter is definitely false (like you're saying), but the former would be true (which is what I'm true-ing) for here. Let me fix!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of conflated "supports accepting a routing profile type" and "supports applying a routing profile" into a single parameter.

How could one of those be true and the other false, (or maybe, how would it make sense)? 🤔

Copy link
Copy Markdown
Contributor Author

@chet chet May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcavnvidia Soooooo... "supports accepting routing profile type" means the API accepts the profile type string, which ETV and FNN both do.

...but the difference is the extent to which "supports applying a routing profile" differs between ETV and FNN.

For ETV, it means that it uses resolve_vpc_routing to look up the FnnRoutingProfileConfig, and then looks at access_tier, internal, and the profile type gets stored on the VPC... but that's.. it?

Like the agent (in the ETV case) doesn't do anything to apply route_target_imports / route_targets_on_exports / leak_* params.

..so I was trying, but failing, at trying to express that, lol.

@chet chet force-pushed the vpc-flat branch 2 times, most recently from 7789213 to 5540a70 Compare May 21, 2026 23:24
This closes NVIDIA#1522.

Adds `Flat` to `VpcVirtualizationType` for VPCs hosted on zero-DPU
machines. ETV and FNN both presume a Carbide-managed DPU data plane,
so using them for zero-DPU hosts meant allocating overlay machinery
that nothing consumed. Flat just records the VPC and lets the network
operator's switch fabric own reachability.

Per-variant policy lives in a new `VpcCapabilities` profile in
`model::vpc::capability`: which host fabric interface the type
attaches to (`Dpu` or `Nic`), which segment types it accepts, whether
it supports IPv6 / routing profiles / stretched-L2 SVI, and which
other types it peers with. Each variant maps to one profile constant;
handlers consult capability methods that just read from the profile.
Adding a future VPC type is a six-field profile plus one match arm,
no handler edits.

Flat VPCs and `HostInband` segments are mutually bound -- a Flat VPC
can only hold HostInband segments, and HostInband segments can only
live in Flat VPCs. Tenants pick `FLAT` through the same VPC create
flow as any other type.

Docs in a separate PR. Tests added!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new VpcVirtualizationType::Flat for machines with direct underlay connectivity

3 participants