Skip to content

api: Avoid race-condition in volume-attach timeout handling#593

Open
leust wants to merge 1 commit into
stable/2023.2-m3from
timeout-rm-bdm
Open

api: Avoid race-condition in volume-attach timeout handling#593
leust wants to merge 1 commit into
stable/2023.2-m3from
timeout-rm-bdm

Conversation

@leust
Copy link
Copy Markdown

@leust leust commented Jan 21, 2026

When nova-api calls reserve_block_device_name RPC to nova-compute and the call times out, we try to clean up by deleting the BDM entry. However, during the timeout window a second attachment request for the same volume can come in, create a valid BDM, and progress to talking to Cinder. The original timed-out request then deletes this new valid BDM, leaving the volume in an inconsistent state.

We fix this by checking if the BDM has attachment_id set before deleting it. The attachment_id field is only populated in _check_attach_and_reserve_volume(), which we only call after the reserve_block_device_name RPC succeeds. If attachment_id is set, we know the BDM belongs to a subsequent request that has already progressed past the RPC phase, so we should not delete it.

Change-Id: I7ed649a5cab7f254690f329fac285128d8cd1c92

@leust leust marked this pull request as ready for review January 21, 2026 15:22
Copy link
Copy Markdown

@joker-at-work joker-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just 2 minor things.

Comment thread nova/tests/unit/compute/test_api.py Outdated
Comment on lines +594 to +596
self.assertRaises(oslo_exceptions.MessagingTimeout,
self.compute_api.attach_volume,
self.context, instance, volume['id'])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is off by 2

Comment thread nova/compute/api.py Outdated
objects.BlockDeviceMapping.get_by_volume_and_instance(
context, volume['id'], instance.uuid)
if bdm.attachment_id:
LOG.warning(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You sure this needs to be a warning? I think info would be enough - debug would be fine, too, imho.

Could we start the log-string on this same line like below for the LOG.debug after the bdm.destroy()?

When nova-api calls reserve_block_device_name RPC to nova-compute and
the call times out, we try to clean up by deleting the BDM entry. However,
during the timeout window a second attachment request for the same volume
can come in, create a valid BDM, and progress to talking to Cinder.
The original timed-out request then deletes this new valid BDM, leaving
the volume in an inconsistent state.

We fix this by checking if the BDM has attachment_id set before
deleting it. The attachment_id field is only populated in
_check_attach_and_reserve_volume(), which we only call after the
reserve_block_device_name RPC succeeds. If attachment_id is set,
we know the BDM belongs to a subsequent request that has already
progressed past the RPC phase, so we should not delete it.

Change-Id: I7ed649a5cab7f254690f329fac285128d8cd1c92
Copy link
Copy Markdown
Member

@fwiesel fwiesel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me, if I am wrong, but my reading of the code is, that there is no real uniqueness constraint on the block-device-mapping (the uuid is randomly generated), and that allows us to create multiple of them for the same instance-volume pair.

If that is the case, it doesn't quite solve the issue.

I'd suggest to pass a uuid to _create_volume_bdm so we know exactly which BDM we want to delete and delete that.

Having a block-device-mapping without an attachment_id is an expected intermediate state, so the second thread can also be in that state at the point of time we handle the exception.

We need to ensure that it is really our block-device-mapping we clean up. The query simply gets the first block-device-mapping by calling get_by_volume_and_instance, by all accounts that could be any. Such as the one from the second thread.

Comment thread nova/compute/api.py
Comment on lines +5154 to +5162
bdm = \
objects.BlockDeviceMapping.get_by_volume_and_instance(
context, volume['id'], instance.uuid)
if bdm.attachment_id:
LOG.debug("BDM for volume %s has attachment_id set, "
"not deleting to avoid race-condition",
volume['id'])
else:
bdm.destroy()
Copy link
Copy Markdown
Member

@fwiesel fwiesel Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming there is just one block-device-mapping per volume-and-instance (which I think is not enforced), why isn't then a race here?

I mean, I get the block-device-mapping then the other thread saves the new version with attachment_id and then I delete it because my copy doesn't have the attachment-id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants