Errored instance can't be deleted if volume deleted first

Bug #1222979 reported by Ed Bak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Ed Bak
Havana
Fix Released
Undecided
Unassigned

Bug Description

1. Create a bootable volume "nova volume-create --image-id <image_id> 10"
2. Boot a vm using the volume created in step 1 " nova boot --flavor 1 --image <image_id> --block-device-mapping vda=<vol_id>:::0 instance1"

If the instance fails to spawn in step 2, the instance ends up in an ERROR state. The volume goes back to available. The hard part is creating a situation in which step 2 fails. One way is to create enough quantum ports to exceed your port quota prior to attempting to spawn the instance.

3. Delete the volume.
4. Attempt to delete the instance. An exception gets thrown by driver.destroy because the volume is not found but the exception is not ignored and the instance can never be deleted. Exceptions from _cleanup_volumes get ignored for this same reason. I think another exception handler needs to be added to also ignore VolumeNotFound from driver.destroy.

I've reproduced this with current code from trunk.

Ed Bak (ed-bak2)
Changed in nova:
assignee: nobody → Ed Bak (ed-bak2)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/45914

Changed in nova:
status: New → In Progress
Revision history for this message
Phil Day (philip-day) wrote :
Download full text (3.3 KiB)

Stack Trace:

2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1625, in terminate_instance
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp do_terminate_instance(instance, bdms)
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/lockutils.py", line 247, in inner
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp retval = f(*args, **kwargs)
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1617, in do_terminate_instance
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp reservations=reservations)
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/hooks.py", line 104, in inner
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp rv = f(*args, **kwargs)
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1588, in _delete_instance
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp project_id=project_id)
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1561, in _delete_instance
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp self._shutdown_instance(context, instance, bdms)
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1502, in _shutdown_instance
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp requested_networks)
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1492, in _shutdown_instance
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp block_device_info)
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 831, in destroy
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp self._cleanup(instance, network_info, block_device_info, destroy_disks)
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 911, in _cleanup
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp disk_dev)
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1007, in v...

Read more...

Revision history for this message
Phil Day (philip-day) wrote :

2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/pymodules/python2.7/hpdriver/cinderdriver/libvirtdriver.py", line 75, in disconnect_volume
2013-09-03 06:34:46.173 29204 TRACE nova.openstack.common.rpc.amqp volume_info['provider_location'])

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/45914
Committed: http://github.com/openstack/nova/commit/506a8f58cf4b8cecf90b647c7deba47da2a4dfec
Submitter: Jenkins
Branch: master

commit 506a8f58cf4b8cecf90b647c7deba47da2a4dfec
Author: Ed Bak <email address hidden>
Date: Tue Sep 10 17:32:28 2013 +0000

    libvirt: Allow delete to complete when a volume disconnect fails

    If an instance which was booted from a volume fails to spawn, the
    instance is left in an ERROR state. If the failure is in the networking
    stage Nova will not have attached the volume via the Cinder driver,
    and so the volume remains in an available state. In this state the
    volume may be deleted or assigned to another instance.
    During the subsequent delete an exception may be thrown from
    the cinder driver which prevents the instance from completing the
    deletion process leaving the instance stuck in Error and unable to
    be deleted.

    Volume exceptions should be logged and ignored when the instance
    is being deleted.

    Closes-Bug #1222979

    Change-Id: Icb3796b0ddba25cf344953a649b2e762fab6d782

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
importance: Undecided → High
tags: added: havana-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/58707

Changed in nova:
milestone: none → icehouse-1
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.