Conversation
c75fdb6 to
b813569
Compare
b813569 to
1bd54d4
Compare
|
Things look good to me so far. I'm not able to replicate the CI build-and-test failures on my local workstation so those might be transient failures. Looks like sled-agent is failing here on the deploy task: |
|
Local deployment is working so it looks like the deploy task will work once we pull the new xde / update the illumos image in ci |
88ec5ed to
4a5a0d4
Compare
4a5a0d4 to
a4fb72c
Compare
dd2621a to
d37350d
Compare
d37350d to
4d42838
Compare
Several small, related changes to `MaxPathConfig` and `RouterLifetimeConfig`: * remove `new_unchecked()` (required changing some `into()`s into `try_into()`s, but I think this is quite a bit safer) * add custom `Deserialize` impls that validate bounds * add custom `JsonSchema` impls that describe the bounds (for `MaxPathConfig`, the min value of 1 also causes progenitor to generate a `NonZeroU8` in clients, which I didn't know it could do) * remove a duplicate `MaxPathConfig` definition
| async fn networking_bgp_exported( | ||
| rqctx: RequestContext<Self::Context>, | ||
| ) -> Result<HttpResponseOk<BgpExported>, HttpError>; | ||
| ) -> Result<HttpResponseOk<Vec<BgpExported>>, HttpError>; |
There was a problem hiding this comment.
Even in the case of relatively bounded Vec returns, we typically have a paginated response interface. In the past where actual pagination has been impractical, we've faked it up e.g. by always returning a ResultsPage with next_page: None.
I see several instances of that that I think we should address.
There was a problem hiding this comment.
Here's an example of us doing this. Several benefits including future-proofing and client enumeration:
omicron/nexus/src/external_api/http_entrypoints.rs
Lines 4969 to 4974 in 0ee7d73
There was a problem hiding this comment.
Follow up issue here
| async fn networking_bgp_imported( | ||
| rqctx: RequestContext<Self::Context>, | ||
| query_params: Query<params::BgpRouteSelector>, | ||
| ) -> Result<HttpResponseOk<Vec<BgpImported>>, HttpError>; |
There was a problem hiding this comment.
Follow up issue here
| progenitor::generate_api!( | ||
| spec = "../../openapi/bootstrap-agent/bootstrap-agent-1.0.0-127591.json", | ||
| spec = "../../openapi/bootstrap-agent/bootstrap-agent-2.0.0-632b71.json", |
There was a problem hiding this comment.
Is this right? I was under the impression we needed to stay on bootstrap-agent 1.0.0 this release so that we could upgrade through this change from R17.
There was a problem hiding this comment.
Please double-check, but we think this is okay:
- The only API changes here are to remove calls that now live in
bootstrap-agent-lockstep, because those calls were only ever made by lockstep clients (during RSS). - The type changes made to types that are kept in the bootstore (slightly different than the bootstrap API, although there's overlap), under
common/src/api/internal/shared/*/v2.rs, only made wire-compatible changes, allowing us to still deserialize the old bootstore. The kinds of changes made are:- adding new fields tagged with
#[serde(default)](e.g.,BgpPeerConfig::router_lifetime) - still deserializes thanks to the default tag - making required fields optional (e.g.,
UplinkAddressConfig::address) - still deserializes and will show up asSome(_) - changing IP types that were ipv4-only to be generic IP (e.g.,
RackNetworkConfig::infra_ip_{first,last}) - still deserializes because we can parse an IPv4 string as a generic IpAddr
- adding new fields tagged with
This is obviously all very manual and error prone, hence filing #9801, which we basically must do before any more changes need to happen to these types.
There was a problem hiding this comment.
What I thought the issue was is that older bootstrap agent server will 404 any responses from clients that are generated from the 2.0.0 spec, because the server isn't aware of that version yet.
But if that's okay, then that all seems fine.
There was a problem hiding this comment.
... I completely forgot about this. I think we only tested bootstore compatibility via mupdate, which wouldn't run into problems here. 🤦
That said, I think we're okay, but please double check this too! The only endpoints left in the bootstrap API are baseboard_get() and components_get(). I don't think there are any callers of components_get(). There's one caller of baseboard_get(): other sled-agent instances to service a "sled add". This would fail mid-online-update, but it's probably okay to note that adding a sled during an update from R17 to R18 needs to wait until after all the OS updates are done?
There was a problem hiding this comment.
I think that makes sense, but I will admit to not having double-checked it. I suppose if there is a show-stopper we will catch it in upgrade testing (we should definitely make sure we perform online upgrade of a racklette from 17.1 to 18).
There was a problem hiding this comment.
Verified online-update from 17.2 to 18 on a racklet and it worked just fine for me.
There was a problem hiding this comment.
Correction: Upgrade to this commit turned out to cause bgp config to be lost (which manifests itself during a cold boot or in the next update). The fix is in #9863.
This PR pulls in various IPv6 work. The biggest code changes revolve around integrating new Maghemite APIs around IPv6 peers and unnumbered peers. This has meant changing data structures in the rack initialize API.
Before this PR, the rack initialize API was in the client-side versioned bootstrap API. This makes it impossible to change. However, it was observed that the only client of the rack initialize API is wicketd, essentially making the three API endpoints under
rack-initailizelockstep. With that in mind, the rack initialize endpoints have been factored out as a lockstep API, and a new version of the bootstrap client side API has been created that deprecates use of rack-initialize.Another tricky aspect of changing these data structures is that they are in the boot store. In particular we are changing the BGP peer member from an
Ipv4Addrto anIpAddr. Thisshouldbe a bootstore backwards compatible change, but testing is required to ensure it is.Remaining work items:
Functional milestones:
Depends on