Skip to content

Conversation

@iximeow
Copy link
Member

@iximeow iximeow commented Jan 26, 2026

relevant propolis change since last bump (plus some commits that are docs/tools-only, and disabling doorbell buffers + reverting the disable):

  • nvme: CQ db_buf EventIdx should slightly lag CQ tail

this fixes what we ended up finding as an associated issue in oxidecomputer/propolis#1008, where guests could end up having submitted disk operations we were told to process and subsequently never did.

two propolis changes since last bump:

* Add DTrace script to monitor viona activity
* do not adverstise Doorbell Buffer support for now

The second is the only one that ends up in a build. While we're still
figuring out what's going on in the Propolis issue (1008), this at least
seems to keep guests from experiencing the issue. The NVMe Doorbell
Buffer feature here was merged into Propolis right after R17 along with
the other NVMe reworking, so there's not a loss of behavior here so
far as released software is concerned.

For development/dogfood/etc there should be no issue with losing the
Doorbell Buffer feature, in the same way we are confident that enabling
Doorbell Buffer support is OK to do as it was implemented. It's just a
faster (not needing as many interrupts and exits) way to communicate
NVMe queue state.
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regrettable but entirely reasonable!

@hawkw
Copy link
Member

hawkw commented Jan 28, 2026

@iximeow now that you've figured out the root cause of propolis#1008, do we still want to move forwards with this, or do we think we'll get a fix in before R18?

@iximeow
Copy link
Member Author

iximeow commented Jan 28, 2026

I plan on having a patch up for propolis#1008 up in a bit, and I kinda expect to adjust this PR to include whatever Propolis commit comes out of that. so no need to merge this bump as-is, though it wouldn't be wrong to do so

@leftwo
Copy link
Contributor

leftwo commented Jan 28, 2026

I plan on having a patch up for propolis#1008 up in a bit, and I kinda expect to adjust this PR to include whatever Propolis commit comes out of that. so no need to merge this bump as-is, though it wouldn't be wrong to do so

I think we should just wait for the #1008 fix :)
No need to burn CI cycles just for this.

@iximeow iximeow changed the title Bump Propolis to disable Doorbell Buffer support (for now) Bump Propolis Jan 29, 2026
@AlejandroME AlejandroME added this to the 18 milestone Jan 30, 2026
@iximeow
Copy link
Member Author

iximeow commented Jan 30, 2026

build-and-test (helios) failed, it's just taking an hour to upload the artifacts of the failure:

warning: 834/2699 tests were not run due to signal

but fundamentally it's the same thing as #9758

@hawkw
Copy link
Member

hawkw commented Jan 30, 2026

i would quite like to get oxidecomputer/propolis#1028 in, since it looks like #9731 is gonna make the cut for R18 and i should like to avoid scrubbing RO volumes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants