Skip to content

Add sled could not complete #9713

@askfongjojo

Description

@askfongjojo

This is on rack2, after a mupdate (to get past a failed reconfigurator-driven update) and blueprint archival. Two new sleds in cubby 2 and 3 were updated to the same version:

root@oxz_switch0:~# pilot host ls
CUBBY IP                        SERIAL      IMAGE
2     fe80::aa40:25ff:fe04:c96  BRM22250001 ci 5eb1337/4e5b80e 2026-01-21 22:22
3     fe80::aa40:25ff:fe04:412  BRM13250012 ci 5eb1337/4e5b80e 2026-01-21 22:22
7     fe80::aa40:25ff:fe04:6d6  BRM27230045 ci 5eb1337/4e5b80e 2026-01-21 22:22
8     fe80::aa40:25ff:fe04:3d5  BRM44220011 ci 5eb1337/4e5b80e 2026-01-21 22:22
9     fe80::aa40:25ff:fe04:357  BRM44220005 ci 5eb1337/4e5b80e 2026-01-21 22:22
10    fe80::aa40:25ff:fe04:3d4  BRM42220009 ci 5eb1337/4e5b80e 2026-01-21 22:22
11    fe80::aa40:25ff:fe04:191  BRM42220006 ci 5eb1337/4e5b80e 2026-01-21 22:22
12    fe80::aa40:25ff:fe04:393  BRM42220057 ci 5eb1337/4e5b80e 2026-01-21 22:22
13    fe80::aa40:25ff:fe04:1d1  BRM42220018 ci 063828b/10bf4ba 2025-10-09 02:10
14    fe80::aa40:25ff:fe04:195  BRM42220051 ci 5eb1337/4e5b80e 2026-01-21 22:22
16    fe80::aa40:25ff:fe04:352  BRM42220014 ci 5eb1337/4e5b80e 2026-01-21 22:22
17    fe80::aa40:25ff:fe04:192  BRM42220017 ci 5eb1337/4e5b80e 2026-01-21 22:22
21    fe80::aa40:25ff:fe04:353  BRM42220031 ci 5eb1337/4e5b80e 2026-01-21 22:22
23    fe80::aa40:25ff:fe04:395  BRM42220016 ci 5eb1337/4e5b80e 2026-01-21 22:22
25    fe80::aa40:25ff:fe04:354  BRM44220010 ci 5eb1337/4e5b80e 2026-01-21 22:22

Component versions before adding sled 2:

root@oxz_switch0:~# omdb nexus update-status
Count of each component type by system version:

                  |18.0.0-0.ci+git5eb13372380 
------------------+---------------------------
RoT bootloader    |15                         
RoT               |15                         
SP                |15                         
Host OS (phase 1) |12                         
Host OS (phase 2) |12                         
Zone              |149   
root@oxz_switch0:~# omdb nexus sleds list-uninitialized
RACK_ID                              CUBBY SERIAL      PART        REVISION 
de608e01-b8e4-4d93-b972-a7dbed36dd22 2     BRM22250001 913-0000023 1        
de608e01-b8e4-4d93-b972-a7dbed36dd22 3     BRM13250012 913-0000023 1        
de608e01-b8e4-4d93-b972-a7dbed36dd22 13    BRM42220018 913-0000019 6 

An attempt to add sled 2 ended with an error

root@oxz_switch0:~# omdb nexus sleds add BRM22250001 913-0000023 -w
Error: adding sled

Caused by:
    0: Communication Error: error sending request for url (http://[fd00:1122:3344:104::56]:12232/sleds/add)
    1: error sending request for url (http://[fd00:1122:3344:104::56]:12232/sleds/add)
    2: operation timed out

The sled still got added to the cluster (the sled is listed in omdb db sleds and gone from list-uninitialized), and also has the component versions set:

root@oxz_switch0:~# omdb nexus update-status
Count of each component type by system version:

                  |18.0.0-0.ci+git5eb13372380 
------------------+---------------------------
RoT bootloader    |16                         
RoT               |16                         
SP                |16                         
Host OS (phase 1) |13                         
Host OS (phase 2) |13                         
Zone              |149

The current blueprint executor status is as follows:

root@oxz_switch0:~# omdb nexus blueprints list 2>/dev/null | tail -5
      2d05edea-be41-47e3-a99f-4604c6d61a9f 06767a6b-f1c3-45ca-bb4f-0317487e7c16 2026-01-22T20:05:23.387Z 
      162afbd6-5a20-446f-abf5-fcc83a5b6b96 2d05edea-be41-47e3-a99f-4604c6d61a9f 2026-01-23T05:58:53.302Z 
      d5d2b2cd-12e8-4854-9faf-391a0db1ae9e 162afbd6-5a20-446f-abf5-fcc83a5b6b96 2026-01-23T05:59:36.847Z 
      0b924bd4-d860-4767-a99d-0703e669a6f0 d5d2b2cd-12e8-4854-9faf-391a0db1ae9e 2026-01-23T05:59:38.121Z 
* yes 94192783-0bf5-4c23-90c1-ba223c040d49 0b924bd4-d860-4767-a99d-0703e669a6f0 2026-01-23T05:59:42.778Z 

root@oxz_switch0:~# omdb nexus background-tasks show blueprint_planner
task: "blueprint_planner"
  configured period: every 1m
  currently executing: no
  last completed activation: iter 1774, triggered by a dependent task completing
    started at 2026-01-23T06:14:27.977Z (23s ago) and ran for 777ms
    plan unchanged from parent 94192783-0bf5-4c23-90c1-ba223c040d49
    note: 249/5000 blueprints in database
planning report:
* zone adds waiting on blockers
* zone adds and updates are blocked:
  - current target release generation (35) is lower than minimum required by blueprint (36)
* zone updates waiting on zone add blockers
* will ensure cockroachdb setting: "22.1"

The sled only has a global zone at this point and I've marked it non-provisionable for now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions