Terminating an instance while attaching a volume leads to both actions failing

Bug #1355348 reported by Andrew Laski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Andrew Laski

Bug Description

This is happening with the xenapi driver, but it's possible that this can happen with others. The sequence of events I'm witnessing is:

An attach_volume request is made and shortly after a terminate_instance request is made.

From the attach_volume request the block device mapping has been updated, the volume has been connected to the hypervisor, but has not been attached to the instance. The terminate request begins processing before the volume connection is attached to the instance so when it detaches volumes and their connections it misses the latest one that's still attaching. This leads to a failure when asking Cinder to clean up the volume, such as:

2014-08-06 20:30:14.324 30737 TRACE nova.compute.manager [instance: <uuid>] ClientException: DELETE on http://127.0.0.1/volumes/<uuid>/export?force=False returned '409' with 'Volume '<uuid>' is currently attached to '127.0.0.1'' (HTTP 409) (Request-ID: req-)

And in turn, when the attach_volume tries to attach the volume to the instance it finds that the instance no longer exists due to the terminate request. This leaves the instance undeletable and the volume stuck.

Having attach_volume share the instance lock with terminate_instance should resolve this. Virt drivers may also want to try to cope with this internally and not rely on a lock.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/113341

Changed in nova:
assignee: nobody → Andrew Laski (alaski)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/113341
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4c4dc3a6d331426e472e2dd1e9b0513da7cb7450
Submitter: Jenkins
Branch: master

commit 4c4dc3a6d331426e472e2dd1e9b0513da7cb7450
Author: Andrew Laski <email address hidden>
Date: Mon Aug 11 14:36:30 2014 -0400

    Lock attach_volume

    There are some issues with instance and volume cleanup when the volume
    is not in a fully attached state so it will be safer to not attempt a
    terminate_instance while there are attachments in progress.

    Change-Id: I4347794e51004a881bf4ef5ee30f65ac28773e51
    Closes-Bug: #1355348

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-rc1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.