Race conditions attaching/detaching volumes

Bug #2035911 reported by Gorka Eguileor
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
Han Guangyu

Bug Description

For cinder volume attach and detach operations Nova is properly using os-brick's `guard_connection` as a context manager to protect against race conditions the the local disconnection of the volume and the call to cinder to unmap/unexport the volume with other volumes' local connection and map/export.

This is the code:

    def detach(self, context, instance, volume_api, virt_driver,
               attachment_id=None, destroy_bdm=False):
        volume = self._get_volume(context, volume_api, self.volume_id)
        # Let OS-Brick handle high level locking that covers the local os-brick
        # detach and the Cinder call to call unmap the volume. Not all volume
        # backends or hosts require locking.
        with brick_utils.guard_connection(volume):
            self._do_detach(context, instance, volume_api, virt_driver,
                            attachment_id, destroy_bdm)

    @update_db
    def attach(self, context, instance, volume_api, virt_driver,
               do_driver_attach=False, **kwargs):
        volume = self._get_volume(context, volume_api, self.volume_id)
        volume_api.check_availability_zone(context, volume,
                                           instance=instance)
        # Let OS-Brick handle high level locking that covers the call to
        # Cinder that exports & maps the volume, and for the local os-brick
        # attach. Not all volume backends or hosts require locking.
        with brick_utils.guard_connection(volume):
            self._do_attach(context, instance, volume, volume_api,
                            virt_driver, do_driver_attach)

But there are many other places where Nova does attach or detach not using those 2 methods (`detach` and `attach`).

One example is when we delete an instance that has cinder volumes attached, another one is when finishing an instance live migration.

Nova needs to always use the `guard_connection` context manager.

There are places where it will be easy to fix such as the `_remove_volume_connection` method in cinder/volume/manager.py.

But there are others where it looks like it will be harder like `_shutdown_instance` in the same file, because the volumes are locally detached through the `self.driver.destroy` call and only after all the volumes (potentially from different backends) have been locally removed does it call `self.volume_api.attachment_delete` for each of the volumes.

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Tagged as low-hanging-fruit as new contributors could add the context manager easily first, and then ask us how to modify the other methods.

tags: added: volumes
tags: added: low-hanging-fruit
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Changed in nova:
assignee: nobody → Han Guangyu (han-guangyu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.