After a rack reboot (power outage on Madrid), dpd on sled 16's switch zone comes up but never finishes initialization: the dropshot API (:12224) is never bound, so swadm gets connection-refused and BGP establish fails. On svcadm restart, dpd gets further into SDE bring-up and then hits a bf-sde assertion in pipe_mgr_drv_completion_cb ("Unhandled FIFO 36"), and the API still never binds. The peer switch (oxz_switch0) came back from the same reboot healthy.
Symptom 1: initial wedge (post-reboot, before any restart)
- svcs shows svc:/oxide/dendrite:default online, dpd PID alive.
- No listener on :12224 (netstat shows dpd owns only AF_UNIX sockets + the SMF log fd
- All tokio runtime workers parked
- SMF log /var/svc/log/oxide-dendrite:default.log was 0 bytes
- swadm Connection refused (os error 146) on localhost:12224.
- All BGP sessions on this switch stuck in Connect/Peer ASN: None
/opt/oxide/dendrite/bin/swadm switch-port transceiver monitors qsfp15
Error: failed to get transceiver monitors
Caused by:
0: Communication Error: error sending request for url (http://localhost:12224/ports/qsfp15/transceiver/monitors)
1: error sending request for url (http://localhost:12224/ports/qsfp15/transceiver/monitors)
2: client error (Connect)
3: tcp connect error
4: Connection refused (os error 146)
Symptom 2: after svcadm restart svc:/oxide/dendrite:default
dpd now logs and progresses through SDE bring-up, but then:
Unhandled FIFO 36/opType 0 MsgId 0x0 at pipe_mgr_drv_completion_cb:3983.
ASSERTION FAILED: "0" (0) from pipe_mgr_drv_completion_cb:3984
The :12224 API still never binds after the restart.
pstack summary:
- bf-sde threads (
bf_dma, bf_interrupt, bf_port_fsm, bf_switchd_process_async_*) alive in their normal poll/usleep loops.
- Every tokio-runtime-worker parked in
park_condvar.
- dpd holds no TCP listeners; only AF_UNIX sockets + the log fd.
(full pstack + log attached)
pstack_madrid_16_dpd.txt
After a rack reboot (power outage on Madrid), dpd on sled 16's switch zone comes up but never finishes initialization: the dropshot API (:12224) is never bound, so swadm gets connection-refused and BGP establish fails. On
svcadm restart, dpd gets further into SDE bring-up and then hits abf-sdeassertion inpipe_mgr_drv_completion_cb ("Unhandled FIFO 36"), and the API still never binds. The peer switch (oxz_switch0) came back from the same reboot healthy.Symptom 1: initial wedge (post-reboot, before any restart)
Symptom 2: after svcadm restart svc:/oxide/dendrite:default
dpd now logs and progresses through SDE bring-up, but then:
The :12224 API still never binds after the restart.
pstack summary:
bf_dma, bf_interrupt,bf_port_fsm, bf_switchd_process_async_*) alive in their normal poll/usleep loops.park_condvar.(full pstack + log attached)
pstack_madrid_16_dpd.txt