Skip to content

nvme_driver: Drain all queues after restore#2833

Merged
alandau merged 2 commits intomicrosoft:mainfrom
alandau:global-drain
Feb 27, 2026
Merged

nvme_driver: Drain all queues after restore#2833
alandau merged 2 commits intomicrosoft:mainfrom
alandau:global-drain

Conversation

@alandau
Copy link
Contributor

@alandau alandau commented Feb 24, 2026

w1 | w1' c1' w2 c2 c1

Guest sends a write request w1. Then we do servicing. Then storvsp duplicates the request into w1', which completes at c1'. If we allow the queue to run freely (aka accept new guest requests), then at this point the guest can issue another write w2 which completes at c2 and only then the original pre-save write completes at c1 overwriting the "latest" write w2, as far as the guest is concerned.

That's the reason we stop the queue before all pre-save requests complete.

But with multiple queues if we drain one queue and start accepting new guest requests on it, while there are other still-draining queues, we can have the same race.

Therefore, with this PR, all queues are drained before new guest requests are accepted to avoid races.

@alandau alandau marked this pull request as ready for review February 24, 2026 23:28
@alandau alandau requested a review from a team as a code owner February 24, 2026 23:28
Copilot AI review requested due to automatic review settings February 24, 2026 23:28
@alandau alandau requested a review from a team as a code owner February 24, 2026 23:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements a mechanism to drain all NVMe I/O queues after restore by coordinating completion of in-flight commands across multiple queues before allowing new I/O. The implementation uses a three-state machine (Draining -> SelfDrained -> AllDrained) with a shared atomic counter to synchronize when all queues have completed draining.

Changes:

  • Introduced DrainAfterRestore enum to track queue draining state during restore
  • Modified queue handler to block new I/O commands until all queues finish draining existing commands
  • Added coordination mechanism using shared atomic counter and device interrupt signaling

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File Description
vm/devices/storage/disk_nvme/nvme_driver/src/queue_pair.rs Adds DrainAfterRestore state machine, modifies queue handler event loop to support draining mode, updates restore logic to mark empty queues as drained
vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs Integrates DrainAfterRestore into queue restoration, creates shared drain coordinator for I/O queues, passes appropriate drain state to admin and prototype queues

Copy link
Contributor

@mattkur mattkur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nifty. You asked for feedback on the approach. I think it looks good to me. I left a few nit comments (though I imagine the things I commented on are already on your radar). Thanks for this, Alex!

I think regression testing is "good enough" for this fix, but hope you can also find a way to prove if there is (or is not) a real bug here.

@github-actions
Copy link

@alandau alandau requested a review from Copilot February 25, 2026 23:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

@github-actions
Copy link

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

@alandau alandau changed the title WIP: nvme_driver: Drain all queues after restore nvme_driver: Drain all queues after restore Feb 26, 2026
Request(Req),
Command(Cmd),
Completion(spec::Completion),
NoOp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How about naming this DrainComplete (or something like that)?

@gurasinghMS
Copy link
Contributor

Feel free to take/ignore the nit comments. Overall logic looks pretty sound to me! (Appreciate the detailed comments btw)

mattkur
mattkur previously approved these changes Feb 26, 2026
Copy link
Contributor

@mattkur mattkur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only find nits. I think this looks good. Nice work, and a tricky area. Thanks Alex!

.iter()
.filter(|q| !q.queue_data.handler_data.pending_cmds.commands.is_empty())
.count();
tracing::info!(nonempty_queues, "drain-after-restore initialization");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, please include ?pci_id as a field.

let old_counter = counter.fetch_sub(1, Ordering::AcqRel);
if old_counter == 1 {
signal.signal_uncached();
tracing::info!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, include pci_id here (unless this is really painful to plumb; can happen in a separate PR later)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@alandau
Copy link
Contributor Author

alandau commented Feb 26, 2026

Addressed nits, thanks Matt & Guramrit!

@alandau alandau merged commit c1541ef into microsoft:main Feb 27, 2026
56 checks passed
mattkur pushed a commit to mattkur/openvmm that referenced this pull request Feb 27, 2026
```
w1 | w1' c1' w2 c2 c1
```

Guest sends a write request w1. Then we do servicing. Then storvsp
duplicates the request into w1', which completes at c1'. If we allow the
queue to run freely (aka accept new guest requests), then at this point
the guest can issue another write w2 which completes at c2 and only then
the original pre-save write completes at c1 overwriting the "latest"
write w2, as far as the guest is concerned.

That's the reason we stop the queue before all pre-save requests
complete.

But with multiple queues if we drain one queue and start accepting new
guest requests on it, while there are other still-draining queues, we
can have the same race.

Therefore, with this PR, all queues are drained before new guest
requests are accepted to avoid races.

(cherry picked from commit c1541ef)
mattkur added a commit that referenced this pull request Feb 27, 2026
Clean cherry pick of PR #2833

```
w1 | w1' c1' w2 c2 c1
```

Guest sends a write request w1. Then we do servicing. Then storvsp
duplicates the request into w1', which completes at c1'. If we allow the
queue to run freely (aka accept new guest requests), then at this point
the guest can issue another write w2 which completes at c2 and only then
the original pre-save write completes at c1 overwriting the "latest"
write w2, as far as the guest is concerned.

That's the reason we stop the queue before all pre-save requests
complete.

But with multiple queues if we drain one queue and start accepting new
guest requests on it, while there are other still-draining queues, we
can have the same race.

Therefore, with this PR, all queues are drained before new guest
requests are accepted to avoid races.

Co-authored-by: Alex Landau <alexlandau@microsoft.com>
@benhillis
Copy link
Member

Backported to release/1.7.2511 in #2851

@benhillis benhillis added backported_1.7.2511 PR that has been backported to release/1.7.2511 and removed backport_1.7.2511 labels Feb 27, 2026
alandau added a commit that referenced this pull request Feb 28, 2026
In #2833, saved queues (eager and proto) were given the correct drain
state, but new queues were given the `AllDrained` state meaning they
could race before all existing queues have drained. This could happen if
IO is received during draining on a CPU (thus queue) that has never seen
IO before the save.

This PR fixes the race by giving new queues the correct drain state -
either `AllDrained` if all existing queues have drained already, or
`SelfDrained`, like proto queues (no IO pending on them) if not.
alandau added a commit that referenced this pull request Mar 2, 2026
Clean cherry pick of PR #2864

In #2833, saved queues (eager and proto) were given the correct drain
state, but new queues were given the `AllDrained` state meaning they
could race before all existing queues have drained. This could happen if
IO is received during draining on a CPU (thus queue) that has never seen
IO before the save.

This PR fixes the race by giving new queues the correct drain state -
either `AllDrained` if all existing queues have drained already, or
`SelfDrained`, like proto queues (no IO pending on them) if not.
@alandau alandau deleted the global-drain branch March 2, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backported_1.7.2511 PR that has been backported to release/1.7.2511

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants