OpenStack Compute (nova)

Race conditions attaching/detaching volumes

Bug #2035911 reported by Gorka Eguileor on 2023-09-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Confirmed	Medium	Han Guangyu

Bug Description

For cinder volume attach and detach operations Nova is properly using os-brick's `guard_connection` as a context manager to protect against race conditions the the local disconnection of the volume and the call to cinder to unmap/unexport the volume with other volumes' local connection and map/export.

This is the code:

    def detach(self, context, instance, volume_api, virt_driver,
               attachment_id=None, destroy_bdm=False):
        volume = self._get_volume(context, volume_api, self.volume_id)
        # Let OS-Brick handle high level locking that covers the local os-brick
        # detach and the Cinder call to call unmap the volume. Not all volume
        # backends or hosts require locking.
        with brick_utils.guard_connection(volume):
            self._do_detach(context, instance, volume_api, virt_driver,
                            attachment_id, destroy_bdm)

    @update_db
    def attach(self, context, instance, volume_api, virt_driver,
               do_driver_attach=False, **kwargs):
        volume = self._get_volume(context, volume_api, self.volume_id)
        volume_api.check_availability_zone(context, volume,
                                           instance=instance)
        # Let OS-Brick handle high level locking that covers the call to
        # Cinder that exports & maps the volume, and for the local os-brick
        # attach. Not all volume backends or hosts require locking.
        with brick_utils.guard_connection(volume):
            self._do_attach(context, instance, volume, volume_api,
                            virt_driver, do_driver_attach)

But there are many other places where Nova does attach or detach not using those 2 methods (`detach` and `attach`).

One example is when we delete an instance that has cinder volumes attached, another one is when finishing an instance live migration.

Nova needs to always use the `guard_connection` context manager.

There are places where it will be easy to fix such as the `_remove_volume_connection` method in cinder/volume/manager.py.

But there are others where it looks like it will be harder like `_shutdown_instance` in the same file, because the volumes are locally detached through the `self.driver.destroy` call and only after all the volumes (potentially from different backends) have been locally removed does it call `self.volume_api.attachment_delete` for each of the volumes.

Tags:

Revision history for this message

Sylvain Bauza (sylvain-bauza) wrote on 2023-09-26:

Tagged as low-hanging-fruit as new contributors could add the context manager easily first, and then ask us how to modify the other methods.

tags:	added: volumes
tags:	added: low-hanging-fruit
Changed in nova:
status:	New → Confirmed
importance:	Undecided → Medium

Han Guangyu (han-guangyu) on 2023-09-28

Changed in nova:
assignee:	nobody → Han Guangyu (han-guangyu)

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.