nvme_driver: Set correct drain state for new queues (#2864)#2867
Merged
alandau merged 1 commit intomicrosoft:release/1.7.2511from Mar 2, 2026
Merged
Conversation
In microsoft#2833, saved queues (eager and proto) were given the correct drain state, but new queues were given the `AllDrained` state meaning they could race before all existing queues have drained. This could happen if IO is received during draining on a CPU (thus queue) that has never seen IO before the save. This PR fixes the race by giving new queues the correct drain state - either `AllDrained` if all existing queues have drained already, or `SelfDrained`, like proto queues (no IO pending on them) if not. (cherry picked from commit 831eb3d)
mattkur
approved these changes
Mar 2, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a restore-time race in the NVMe driver’s “drain-after-restore” barrier by ensuring newly created I/O queues (created after restore, not from saved state) receive an appropriate drain state instead of incorrectly starting in AllDrained.
Changes:
- Add
DrainAfterRestoreBuilder::{is_drain_complete, new_for_new_queue}to pick the correct drain state for newly created queues during an active drain. - Thread an explicit
DrainAfterRestoreintoQueuePair::newso new queues don’t implicitly default toAllDrained. - Store a shared
DrainAfterRestoreBuilderin the worker during restore so new queues participate in the drain barrier until completion.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| vm/devices/storage/disk_nvme/nvme_driver/src/queue_pair.rs | Adds builder helpers to determine drain completion and compute the correct drain state for brand-new queues; updates QueuePair::new to accept an explicit drain state. |
| vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs | Persists the drain builder across restore so newly created queues can block on draining; uses new_for_new_queue() when creating new I/O queues. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Clean cherry pick of PR #2864
In #2833, saved queues (eager and proto) were given the correct drain state, but new queues were given the
AllDrainedstate meaning they could race before all existing queues have drained. This could happen if IO is received during draining on a CPU (thus queue) that has never seen IO before the save.This PR fixes the race by giving new queues the correct drain state - either
AllDrainedif all existing queues have drained already, orSelfDrained, like proto queues (no IO pending on them) if not.