Unnecessary bdm entry is created if same volume is attached twice to an instance

Bug #1427060 reported by Ankit Agrawal on 2015-03-02
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Rikimaru Honjo

Bug Description

When we try to attach an already attached volume to an instance it raises InvalidVolume exception but creates an entry in block_device_mapping table with deleted flag set to true. Ideally when attach volume fails it should not create any entries in database.

Steps to reproduce:

1. Created an instance named test_vm_1.
2. Created a volume named test_volume_1.
3. Verified that instance is in active state and volume is in available state.
4. Attach volume using below command:
    $ nova volume-attach <instance_id> <volume_id>.
5. Confirmed that volume is in 'in-use' status using below command:
    $ cinder list.
6. Execute volume-attach command again with same volume_id.
    $ nova volume-attach <instance_id> <volume_id>.

After executing step 6 it raises "Invalid volume" exception and attached volume can be used normally which is correct. But when you check block_device_mapping table using below sql query, you will find an additional bdm entry which should not be created.

select * from block_device_mapping where instance_uuid='ee94830b-5d39-42a7-b8c2-6175bb77563a';

jichenjc (jichenjc) wrote :

If we take a look at following code, line 3023,3024 can be moved to previous line before reserve bdm
because it's only a status check , I will try to submit a patch and see whether they are some traps of
concurrency issue there

3007 def _attach_volume(self, context, instance, volume_id, device,
3008 disk_bus, device_type):
3009 """Attach an existing volume to an existing instance.
3010
3011 This method is separated to make it possible for cells version
3012 to override it.
3013 """
3014 # NOTE(vish): This is done on the compute host because we want
3015 # to avoid a race where two devices are requested at
3016 # the same time. When db access is removed from
3017 # compute, the bdm will be created here and we will
3018 # have to make sure that they are assigned atomically.
3019 volume_bdm = self.compute_rpcapi.reserve_block_device_name(
3020 context, instance, device, volume_id, disk_bus=disk_bus,
3021 device_type=device_type)
3022 try:
3023 volume = self.volume_api.get(context, volume_id)
3024 self.volume_api.check_attach(context, volume, instance=instance)
3025 self.volume_api.reserve_volume(context, volume_id)
3026 self.compute_rpcapi.attach_volume(context, instance=instance,
3027 volume_id=volume_id, mountpoint=device, bdm=volume_bdm)
3028 except Exception:
3029 with excutils.save_and_reraise_exception():
3030 volume_bdm.destroy()
3031

Changed in nova:
status: New → Confirmed
assignee: nobody → jichenjc (jichenjc)

Fix proposed to branch: master
Review: https://review.openstack.org/163177

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
importance: Undecided → Low
melanie witt (melwitt) on 2015-04-20
tags: added: volumes
removed: ntt

Change abandoned by jichenjc (<email address hidden>) on branch: master
Review: https://review.openstack.org/163177
Reason: seems an old bug exists and no better way now..

Changed in nova:
assignee: jichenjc (jichenjc) → nobody
status: In Progress → Confirmed
Nikola Đipanov (ndipanov) wrote :

So this behavior can have pretty bad consequences, if we fail to delete the created BDM.

If in normal operation once the BDM record is created reserving the device name on the compute host, we will attempt to mark it as 'attaching' in Cinder. This fails if the volume is already attached, and as part of the cleanup we delete the BDM record we created.

If something goes wrong with the reply message however (there is a problem with the network, or for some reason the API worker that accepted the attach request goes away in the meantime are some of the problems I can think of), the cleanup code never runs and we are stuck with an invalid BDM, which we have to clean up manually. This is not ideal obviously but is made worse, by the fact that when looking for the BDM record, we commonly select records only by instance and volume IDs which will not be unique in this case, and Nova may attempt to use the wrong one by chance which causes things like detach, migrate etc. to fail until manual cleanup

Fix proposed to branch: master
Review: https://review.openstack.org/290793

Changed in nova:
assignee: nobody → Nikola Đipanov (ndipanov)
status: Confirmed → In Progress
Matt Riedemann (mriedem) wrote :

Per comment 4, we had some patches from Dan Smith at one point to create a uuid column in the block_device_mappings table and then use that in the objects, so we could avoid issues with bdm_update_or_create in the DB API. Those never landed because they were written for a latent boot from volume race with cells v1 and we decided to just not fix cells v1, but could still be useful for situations like this.

Changed in nova:
importance: Low → Medium
Matt Riedemann (mriedem) wrote :

This is Dan's BDM object change to use a uuid:

https://review.openstack.org/#/c/242603/

Andrey Volkov (avolkov) wrote :

https://review.openstack.org/#/c/290793 is updated according to changes in volume API (reserve call is added).
Order of creating BDM and volume reserving is reversed due guess that remote HTTP call can fail more frequently than DB request.

Changed in nova:
assignee: Nikola Đipanov (ndipanov) → Lee Yarwood (lyarwood)

Hi Lee Yarwood, any updates on this bug? If not, I would like to work on it.

Lee Yarwood (lyarwood) wrote :

I'm going to revisit the following stack of patches in Pike and hopefully close this out finally :

https://review.openstack.org/#/q/topic:bug/1427060+status:open

Please feel free to review and rebase these if you have time but I'd like to retain ownership of the bug as a reminder to revisit this when the next cycle opens up.

Change abandoned by Zhang Hua (<email address hidden>) on branch: master
Review: https://review.openstack.org/453451
Reason: sorry, this patch can't stop creating new dirty BDM record when happen to fail to run 'volume_bdm.destroy()' - https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3715, so abandon it

Changed in nova:
assignee: Lee Yarwood (lyarwood) → Rikimaru Honjo (honjo-rikimaru-c6)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers