Race condition in attaching/detaching volumes when compute manager is unreachable

Bug #1180040 reported by Loganathan Parthipan
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Nikola Đipanov

Bug Description

When a compute manager is offline, or if it cannot pick up messages for some reason, a race condition exists in attaching/detaching volumes.

Try attach and detach a volume and then bring the compute manager online. Then the reserve_block_device_name message gets delivered and a block_device_mapping is created for this instance/volume regardless of the state of the volume. This will result in the following issues.

1. The mountpoint is no longer be usable.
2. os-volume_attachments API will list the volume as attached to the instance.

Steps to reproduce (This was recreated in Devstack with nova trunk 75af47a.)

1. Spawn an instance (Mine is a multinode devstack setup, so I spawn it to a different machine than the api, but the race condition should be reproducible in a single-node setup too)
2. Create a volume
3. Stop the compute manager (n-cpu)
4. Try to attach the volume to the instance, it should fail after a while
5. Try to detach the volume
6. List the volumes. The volume should be in 'available' state. Optionally you can delete it at this point
7. Check db for block_device_mapping. It shouldn't have any reference to this volume
8. Start compute manager on the node that the instance is running
9. Check db for block_device_mapping and it should now have a new entry associating this volume and instance regardless of the state of the volume

Tags: volumes
Revision history for this message
Michael Still (mikal) wrote :

That's for the detailed bug report!

Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Changed in nova:
status: Triaged → In Progress
assignee: nobody → Jason Dillaman (jdillaman)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/33527

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

After looking at this a bit more closely - I immediately thought that we need to keep track of the volume attach requests and make sure that it is not possible that a detach on a volume that is still not attached should not really succeede. I realized that we already keep track of volume states in cinder so why not use cinder to make sure that we are handling this as expected.

What the previous attempt proposed was IMHO not ideal as the device name decision really should be up to the virt driver ultimately, while the proposed solution moved it (back) into the database layer.

I propose that we add two things to the volume attach code that will remove this race.

1) Fail the attach if the compute node is not available.
2) Fail the detach if there is a pending attach call that is not completed (we can use cinder API for this by reserving the volume before we do the reserve_block_device rpc call).

Revision history for this message
Loganathan Parthipan (parthipan) wrote :

When you say 'Fail the attach' in (1) do you mean we fail-fast the API request itself?

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Hi, yes - exactly that.

I posted a patch, and again I have no idea why it didn't get picked up by LP.

https://review.openstack.org/#/c/81256/

Changed in nova:
assignee: Jason Dillaman (jdillaman) → Nikola Đipanov (ndipanov)
milestone: none → icehouse-rc1
tags: added: volumes
Revision history for this message
John Garbutt (johngarbutt) wrote :

Not sure this blocks RC1, seems like a good fix, but its not a regression.

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Also after looking some more into this - this bug may be invalid after switching to oslo.messaging (actually this commit that fixes it might have got synced over before the switch https://github.com/openstack/oslo-incubator/commit/30a50c8a6c534f01d518eb3ce4cf0d35877d9a7f) as each message TTL is now equal to the call timeout, so as long as we timeout - we should be OK and the message won't be delivered after that.

We can still make this a bit less racy by reserving the volumes before making any RPC calls, but in most cases everything should clean up correctly now.

Revision history for this message
Loganathan Parthipan (parthipan) wrote :

I saw this issue almost 10 months ago. :) Haven't tried reproducing it recently.

Revision history for this message
Tracy Jones (tjones-i) wrote :

based on comments - removing from rc1

Changed in nova:
milestone: icehouse-rc1 → none
Changed in nova:
status: In Progress → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/692940

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/692940

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.