ipv6 e2e integration by rcgoodfellow · Pull Request #9570 · oxidecomputer/omicron

rcgoodfellow · 2025-12-27T07:15:58Z

This PR pulls in various IPv6 work. The biggest code changes revolve around integrating new Maghemite APIs around IPv6 peers and unnumbered peers. This has meant changing data structures in the rack initialize API.

Before this PR, the rack initialize API was in the client-side versioned bootstrap API. This makes it impossible to change. However, it was observed that the only client of the rack initialize API is wicketd, essentially making the three API endpoints under rack-initailize lockstep. With that in mind, the rack initialize endpoints have been factored out as a lockstep API, and a new version of the bootstrap client side API has been created that deprecates use of rack-initialize.

Another tricky aspect of changing these data structures is that they are in the boot store. In particular we are changing the BGP peer member from an Ipv4Addr to an IpAddr. This should be a bootstore backwards compatible change, but testing is required to ensure it is.

Remaining work items:

External API updates for IPv6 BGP peers
External API updates for unnumbered BPG peers
Wicket updates for IPv6 BGP peers
Wicket updates for unnumbered BGP peers

Functional milestones:

Depends on

internet-diglett · 2026-01-16T01:10:07Z

Things look good to me so far. I'm not able to replicate the CI build-and-test failures on my local workstation so those might be transient failures.

Looks like sled-agent is failing here on the deploy task:

sled-agent: Failed to delete all XDE devices

Caused by:
    0: Failure interacting with the OPTE ioctl(2) interface: command ListPorts failed: BadApiVersion { user: 38, kernel: 37 }
    1: command ListPorts failed: BadApiVersion { user: 38, kernel: 37 }
[ Jan  5 21:09:50 Stopping because all processes in service exited. ]
[ Jan  5 21:09:50 Executing stop method (:kill). ]
[ Jan  5 21:09:50 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/sled-agent/sled-agent run /opt/oxide/sled-agent/pkg/config.toml &"). ]
[ Jan  5 21:09:50 Method "start" exited with status 0. ]

internet-diglett · 2026-01-16T01:32:29Z

Local deployment is working so it looks like the deploy task will work once we pull the new xde / update the illumos image in ci

schema/crdb/dbinit.sql

common/src/api/internal/shared/rack_init/v2.rs

Several small, related changes to `MaxPathConfig` and `RouterLifetimeConfig`: * remove `new_unchecked()` (required changing some `into()`s into `try_into()`s, but I think this is quite a bit safer) * add custom `Deserialize` impls that validate bounds * add custom `JsonSchema` impls that describe the bounds (for `MaxPathConfig`, the min value of 1 also causes progenitor to generate a `NonZeroU8` in clients, which I didn't know it could do) * remove a duplicate `MaxPathConfig` definition

nexus/external-api/src/lib.rs

ahl · 2026-02-11T00:25:41Z

nexus/external-api/src/lib.rs

    async fn networking_bgp_exported(
        rqctx: RequestContext<Self::Context>,
-    ) -> Result<HttpResponseOk<BgpExported>, HttpError>;
+    ) -> Result<HttpResponseOk<Vec<BgpExported>>, HttpError>;


Even in the case of relatively bounded Vec returns, we typically have a paginated response interface. In the past where actual pagination has been impractical, we've faked it up e.g. by always returning a ResultsPage with next_page: None.

I see several instances of that that I think we should address.

Here's an example of us doing this. Several benefits including future-proofing and client enumeration:

omicron/nexus/src/external_api/http_entrypoints.rs

Lines 4969 to 4974 in 0ee7d73

let instance_lookup =

nexus.instance_lookup(&opctx, instance_selector)?;

let ips = nexus

.instance_list_external_ips(&opctx, &instance_lookup)

.await?;

Ok(HttpResponseOk(ResultsPage { items: ips, next_page: None }))

Follow up issue here

Consistent pagination for networking APIs #9850

ahl · 2026-02-11T00:25:58Z

nexus/external-api/src/lib.rs

+    async fn networking_bgp_imported(
+        rqctx: RequestContext<Self::Context>,
+        query_params: Query<params::BgpRouteSelector>,
+    ) -> Result<HttpResponseOk<Vec<BgpImported>>, HttpError>;


Follow up issue here

Consistent pagination for networking APIs #9850

nexus/external-api/src/lib.rs

nexus/src/app/background/tasks/sync_switch_configuration.rs

iliana · 2026-02-13T21:02:55Z

clients/bootstrap-agent-client/src/lib.rs


 progenitor::generate_api!(
-    spec = "../../openapi/bootstrap-agent/bootstrap-agent-1.0.0-127591.json",
+    spec = "../../openapi/bootstrap-agent/bootstrap-agent-2.0.0-632b71.json",


Is this right? I was under the impression we needed to stay on bootstrap-agent 1.0.0 this release so that we could upgrade through this change from R17.

Please double-check, but we think this is okay:

The only API changes here are to remove calls that now live in bootstrap-agent-lockstep, because those calls were only ever made by lockstep clients (during RSS).

The type changes made to types that are kept in the bootstore (slightly different than the bootstrap API, although there's overlap), under common/src/api/internal/shared/*/v2.rs, only made wire-compatible changes, allowing us to still deserialize the old bootstore. The kinds of changes made are:

adding new fields tagged with #[serde(default)] (e.g., BgpPeerConfig::router_lifetime) - still deserializes thanks to the default tag

making required fields optional (e.g., UplinkAddressConfig::address) - still deserializes and will show up as Some(_)

changing IP types that were ipv4-only to be generic IP (e.g., RackNetworkConfig::infra_ip_{first,last}) - still deserializes because we can parse an IPv4 string as a generic IpAddr

This is obviously all very manual and error prone, hence filing #9801, which we basically must do before any more changes need to happen to these types.

What I thought the issue was is that older bootstrap agent server will 404 any responses from clients that are generated from the 2.0.0 spec, because the server isn't aware of that version yet.

But if that's okay, then that all seems fine.

... I completely forgot about this. I think we only tested bootstore compatibility via mupdate, which wouldn't run into problems here. 🤦

That said, I think we're okay, but please double check this too! The only endpoints left in the bootstrap API are baseboard_get() and components_get(). I don't think there are any callers of components_get(). There's one caller of baseboard_get(): other sled-agent instances to service a "sled add". This would fail mid-online-update, but it's probably okay to note that adding a sled during an update from R17 to R18 needs to wait until after all the OS updates are done?

I think that makes sense, but I will admit to not having double-checked it. I suppose if there is a show-stopper we will catch it in upgrade testing (we should definitely make sure we perform online upgrade of a racklette from 17.1 to 18).

Verified online-update from 17.2 to 18 on a racklet and it worked just fine for me.

Correction: Upgrade to this commit turned out to cause bgp config to be lost (which manifests itself during a cold boot or in the next update). The fix is in #9863.

rcgoodfellow added this to the 18 milestone Dec 29, 2025

rcgoodfellow force-pushed the ry/ipv6-all-the-things branch 5 times, most recently from c75fdb6 to b813569 Compare January 5, 2026 06:07

ipv6 all the things

1bd54d4

rcgoodfellow force-pushed the ry/ipv6-all-the-things branch from b813569 to 1bd54d4 Compare January 5, 2026 18:43

internet-diglett self-requested a review January 16, 2026 00:55

pull in bgp work

409e7b4

rcgoodfellow force-pushed the ry/ipv6-all-the-things branch from 88ec5ed to 4a5a0d4 Compare January 18, 2026 17:48

bgp unnumbered plumbing

a4fb72c

rcgoodfellow force-pushed the ry/ipv6-all-the-things branch from 4a5a0d4 to a4fb72c Compare January 18, 2026 19:15

rcgoodfellow added 9 commits January 19, 2026 06:33

bump maghemite

78c2f6a

bump dendrite

27b97ae

bump softnpu

390c231

more bumps

5a1a1a4

comment out v6 unicast for the moment .....

7e04eff

first swing at bootstrap agent lockstep api

25a161d

various fixes

34955d2

remove rack-init APIs from client-side versioned bootstrap agent api

8f19176

bring rack network config into current versioning scheme

745d1b7

rcgoodfellow force-pushed the ry/ipv6-all-the-things branch 2 times, most recently from dd2621a to d37350d Compare January 21, 2026 21:21

the great type migration continues

4d42838

rcgoodfellow force-pushed the ry/ipv6-all-the-things branch from d37350d to 4d42838 Compare January 21, 2026 23:55

rcgoodfellow added 2 commits January 22, 2026 00:28

remove long dead compat types

6225e55

the final type shuffle?

62b248c

jgallagher reviewed Feb 9, 2026

View reviewed changes

schema/crdb/dbinit.sql Outdated Show resolved Hide resolved

jgallagher reviewed Feb 9, 2026

View reviewed changes

common/src/api/internal/shared/rack_init/v2.rs Outdated Show resolved Hide resolved

internet-diglett and others added 7 commits February 10, 2026 01:15

update bgp imported / exported to support ipv6

db61b45

max_paths should be greater than zero

a43414c

use RouterLifetimeConfig in more places instead of u16

2f71173

don't filter ipv6 from mgadm bgp imported / exported requests

fb531a9

bump maghemite

165d363

provide switch location with exported prefixes

c39d68b

ahl reviewed Feb 11, 2026

View reviewed changes

jgallagher and others added 7 commits February 11, 2026 12:01

Merge remote-tracking branch 'origin/main' into ry/ipv6-all-the-things

d97bf9a

bump maghemite, review feedback

cbd3a63

fix openapi

93248d3

pull in maghemite 639

f689bb6

support v6 infra ip addresses for bootstore sync

4d35436

render IpAddr::UNSPECIFIED as 'link-local'

acd95fb

allow v6 static routing in early networking

8a54196

jgallagher reviewed Feb 13, 2026

View reviewed changes

nexus/src/app/background/tasks/sync_switch_configuration.rs Outdated Show resolved Hide resolved

check for unspecified addr in Option<IpX> fields

72e1f85

iliana reviewed Feb 13, 2026

View reviewed changes

internet-diglett added 2 commits February 14, 2026 01:33

Merge branch 'main' into ry/ipv6-all-the-things

d50f4d4

regenerate openapi

9913c6a

internet-diglett enabled auto-merge (squash) February 14, 2026 01:41

internet-diglett disabled auto-merge February 14, 2026 01:47

internet-diglett enabled auto-merge (squash) February 14, 2026 01:48

internet-diglett merged commit 4a456c9 into main Feb 14, 2026
19 checks passed

internet-diglett deleted the ry/ipv6-all-the-things branch February 14, 2026 03:49

	let instance_lookup =
	nexus.instance_lookup(&opctx, instance_selector)?;
	let ips = nexus
	.instance_list_external_ips(&opctx, &instance_lookup)
	.await?;
	Ok(HttpResponseOk(ResultsPage { items: ips, next_page: None }))

Conversation

rcgoodfellow commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

internet-diglett commented Jan 16, 2026

Uh oh!

internet-diglett commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

rcgoodfellow commented Dec 27, 2025 •

edited

Loading

internet-diglett commented Jan 16, 2026 •

edited

Loading