OpenStack Compute (nova)

Terminating an instance while attaching a volume leads to both actions failing

Bug #1355348 reported by Andrew Laski on 2014-08-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Undecided	Andrew Laski	OpenStack Compute (nova) 2014.2 "juno"

Bug Description

This is happening with the xenapi driver, but it's possible that this can happen with others. The sequence of events I'm witnessing is:

An attach_volume request is made and shortly after a terminate_instance request is made.

From the attach_volume request the block device mapping has been updated, the volume has been connected to the hypervisor, but has not been attached to the instance. The terminate request begins processing before the volume connection is attached to the instance so when it detaches volumes and their connections it misses the latest one that's still attaching. This leads to a failure when asking Cinder to clean up the volume, such as:

2014-08-06 20:30:14.324 30737 TRACE nova.compute.manager [instance: <uuid>] ClientException: DELETE on http://127.0.0.1/volumes/<uuid>/export?force=False returned '409' with 'Volume '<uuid>' is currently attached to '127.0.0.1'' (HTTP 409) (Request-ID: req-)

And in turn, when the attach_volume tries to attach the volume to the instance it finds that the instance no longer exists due to the terminate request. This leaves the instance undeletable and the volume stuck.

Having attach_volume share the instance lock with terminate_instance should resolve this. Virt drivers may also want to try to cope with this internally and not rely on a lock.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-08-11: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/113341

Changed in nova:
assignee:	nobody → Andrew Laski (alaski)
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-16: Fix merged to nova (master)

Reviewed: https://review.openstack.org/113341
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4c4dc3a6d331426e472e2dd1e9b0513da7cb7450
Submitter: Jenkins
Branch: master

commit 4c4dc3a6d331426e472e2dd1e9b0513da7cb7450
Author: Andrew Laski <email address hidden>
Date: Mon Aug 11 14:36:30 2014 -0400

Lock attach_volume

    There are some issues with instance and volume cleanup when the volume
    is not in a fully attached state so it will be safer to not attempt a
    terminate_instance while there are attachments in progress.

Change-Id: I4347794e51004a881bf4ef5ee30f65ac28773e51
Closes-Bug: #1355348

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2014-10-01

Changed in nova:
milestone:	none → juno-rc1
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in nova:
milestone:	juno-rc1 → 2014.2

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.