Instance backup dir isn't deleted if instance is deleted during confirming resize

Bug #1202484 reported by wangpan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
wangpan

Bug Description

Reproduce steps with latest master branch of nova(but I believe grizzly, folsom also have this bug, and this occurs on our folsom stable product env):

1.hack the nova-compute code as follow:
diff --git a/nova/compute/manager.py b/nova/compute/manager.py
index e1302bf..5a82f0b 100755
--- a/nova/compute/manager.py
+++ b/nova/compute/manager.py
@@ -2399,7 +2399,7 @@ class ComputeManager(manager.SchedulerDependentManager):

         self._notify_about_instance_usage(context, instance,
                                           "resize.confirm.start")
-
+ LOG.debug(_("--------------------sleep 60s now------------------"))
+ time.sleep(60) # sleep here and wait for the instance to be deleted
         with self._error_out_instance_on_exception(context, instance['uuid'],
                                                    reservations):
             # NOTE(danms): delete stashed migration information
2. create an instance under devstack env
3. resize it
4. confirm the resize operation
5. delete the confirming instance when you saw the hacked log message 'sleep 60s now'
6. then the instance will be deleted and the instance backup dir 'ddb764e6-c07a-4e81-8347-283213b84a98_resize' will not be deleted.

the error log is:
2013-07-18 03:19:14.714 DEBUG nova.compute.manager [req-067c64aa-313f-49fc-b47b-2e83391f72ad admin admin] [instance: ddb764e6-c07a-4e81-8347-283213b84a98] Instance has been destroyed from under us while trying to set it to ERROR from (pid=7552) _set_instance_error_state /opt/stack/nova/nova/compute/manager.py:430
2013-07-18 03:19:14.717 DEBUG nova.openstack.common.rpc.amqp [req-067c64aa-313f-49fc-b47b-2e83391f72ad admin admin] Making synchronous call on conductor ... from (pid=7552) multicall /opt/stack/nova/nova/openstack/common/rpc/amqp.py:518
2013-07-18 03:19:14.718 DEBUG nova.openstack.common.rpc.amqp [req-067c64aa-313f-49fc-b47b-2e83391f72ad admin admin] MSG_ID is c57691619c834285bbbf786f16e01a5a from (pid=7552) multicall /opt/stack/nova/nova/openstack/common/rpc/amqp.py:521
2013-07-18 03:19:14.718 DEBUG nova.openstack.common.rpc.amqp [req-067c64aa-313f-49fc-b47b-2e83391f72ad admin admin] UNIQUE_ID is a1e36d387f0045e5bc9c67bfe4dbdf6a. from (pid=7552) _add_unique_id /opt/stack/nova/nova/openstack/common/rpc/amqp.py:327
2013-07-18 03:19:14.751 ERROR nova.openstack.common.rpc.amqp [req-067c64aa-313f-49fc-b47b-2e83391f72ad admin admin] Exception during message handling
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 426, in _process_data
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp **args)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 99, in wrapped
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp temp_level, payload)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 76, in wrapped
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 279, in decorated_function
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp function(self, context, *args, **kwargs)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 243, in decorated_function
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 2410, in confirm_resize
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp system_metadata=sys_meta)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 414, in _instance_update
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp **kwargs)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/conductor/api.py", line 414, in instance_update
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp updates, 'conductor')
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/conductor/rpcapi.py", line 134, in instance_update
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp version='1.38')
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/proxy.py", line 126, in call
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp result = rpc.call(context, real_topic, msg, timeout)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/__init__.py", line 140, in call
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp return _get_impl().call(CONF, context, topic, msg, timeout)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/impl_kombu.py", line 824, in call
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp rpc_amqp.get_connection_pool(conf, Connection))
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 539, in call
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp rv = list(rv)
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 504, in __iter__
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp raise result
2013-07-18 03:19:14.751 TRACE nova.openstack.common.rpc.amqp InstanceNotFound_Remote: Instance ddb764e6-c07a-4e81-8347-283213b84a98 could not be found.

wangpan (hzwangpan)
description: updated
description: updated
Revision history for this message
wangpan (hzwangpan) wrote :

if this bug occurs, the network rules on the source host will not be clean, too.
because the clean up operation in _cleanup_resize() is not run:
            self.unplug_vifs(instance, network_info)
            self.firewall_driver.unfilter_instance(instance, network_info)

wangpan (hzwangpan)
Changed in nova:
assignee: nobody → wangpan (hzwangpan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/37836

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/44639

Revision history for this message
wangpan (hzwangpan) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/44639
Committed: http://github.com/openstack/nova/commit/8db14390d418561f7c372902a4a383dc9047603c
Submitter: Jenkins
Branch: master

commit 8db14390d418561f7c372902a4a383dc9047603c
Author: Wangpan <email address hidden>
Date: Mon Sep 2 15:24:16 2013 +0800

    Fixes race cond between delete and confirm resize

    If we delete an instance after a confirming resize operation,
    and if the instance is deleted firstly at the dest compute node,
    then the *_resize dir and network info of this instance will be
    left alone on the source compute node, never be cleaned up.
    Please refer the bug page to get the reproduce steps

    Fixes bug 1202484

    Change-Id: Ib1e3d976373bd7f4d086d4f9716b0bca4b383bab

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → havana-rc1
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-rc1 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.