openstack mitaka evacuate fails and no clean up the evacuated instance xml in libvirt

Bug #1784798 reported by Rock_Zhao
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Unassigned

Bug Description

Description
===========
For example:
Theres compute Nodes: A and B and C
instance "vmtest" runs on A now.

Steps to reproduce
==================
1.nova host-evacuate A
2.Assumed instance "vmtest" evacuated failed on B with libvirt error. And found the instance xml of vmtest is still in /etc/libvirt/qemu
3.Then try again "nova host-evacuate A".If vmtest is scheduled on B again, there will throw exception: Instance instance-vmtest already exists.

Expected result
===============
When vmtest first evacuated failed on B, remove instance xml of vmtest on B

Actual result
=============
If vmtest is scheduled on B again, there will throw exception: Instance instance-vmtest already exists.

Whether it should be clean up the instance xml of vmtest on the failed evacuated node B?

Environment
===========
openstack mitaka

Logs
==========
09ed2cfc-2400-4f82-9dbd-0b7e22f4711b] Successfully reverted task state from rebuilding on failure for instance.
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher [req-db38e4ac-b740-463e-b543-6a4e4d9645bd d8aeb1794cbb47259ad126a7310c65ef 9f86737686af478487424ebb31ea2be6 - - -]
 Exception during message handling: Instance instance-000024d4 already exists.
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher incoming.message))
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _dispatch
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 150, in inner
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher return func(*args, **kwargs)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 110, in wrapped
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher payload)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher self.force_reraise()
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 89, in wrapped
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 359, in decorated_function
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher LOG.warning(msg, e, instance=instance)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher self.force_reraise()
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 328, in decorated_function
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 409, in decorated_function
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 387, in decorated_function
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info())
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher self.force_reraise()
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 375, in decorated_function
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2809, in rebuild_instance
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher bdms, recreate, on_shared_storage, preserve_ephemeral)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2853, in _do_rebuild_instance_with_claim
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher self._do_rebuild_instance(*args, **kwargs)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2872, in _do_rebuild_instance
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher self._check_instance_exists(context, instance)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1543, in _check_instance_exists
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher raise exception.InstanceExists(name=instance.name)
2018-08-01 12:52:35.288 5665 ERROR oslo_messaging.rpc.dispatcher InstanceExists: Instance instance-000024d4 already exists.

Rock_Zhao (36514239-3)
description: updated
Rock_Zhao (36514239-3)
description: updated
Revision history for this message
Matt Riedemann (mriedem) wrote :

Mitaka and Newton are both end of life, but looking at current master (rocky) this is still a problem most likely:

https://github.com/openstack/nova/blob/d4dbb42593893c1d1ed51a127b7183a314bcac2c/nova/compute/manager.py#L3102

Unless something magically started cleaning up the guest on a failed evacuate.

Can you tell where the evacuate failed? Was it during or after the driver.spawn() call here?

https://github.com/openstack/nova/blob/d4dbb42593893c1d1ed51a127b7183a314bcac2c/nova/compute/manager.py#L2906

Changed in nova:
status: New → Triaged
Revision history for this message
Matt Riedemann (mriedem) wrote :

I guess it's possible you could have hit something like bug 1764883 when saving the instance state after spawn() and then we wouldn't cleanup the guest from the hypervisor on the dest host.

Changed in nova:
importance: Undecided → Low
Revision history for this message
Matt Riedemann (mriedem) wrote :

I'll push up a quick and dirty patch for this to see what it looks like, but it's definitely not as trivial as just destroying the guest because we also have to rollback things like port binding updates and volume attachments, and determine if we're on shared storage or not so that we don't inadvertently destroy the disks which are shared with the source host.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/588087

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Revision history for this message
Rock_Zhao (36514239-3) wrote :
Download full text (10.1 KiB)

First time,instance "vmtest" evacuated failed on B with libvirt error.

Key word from log is: libvirtError: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

Log
====================
2018-07-31 18:20:01.464 30831 INFO nova.compute.manager [req-0f721cbe-f8d4-43fb-a4c5-944fe13baff4 d8aeb1794cbb47259ad126a7310c65ef 9f86737686af478487424ebb31ea2be6 - - -] [instanc
e: 6470fe19-f58e-4bca-a8a8-0048165f2dc4] Successfully reverted task state from rebuild_spawning on failure for instance.
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher [req-0f721cbe-f8d4-43fb-a4c5-944fe13baff4 d8aeb1794cbb47259ad126a7310c65ef 9f86737686af478487424ebb31ea2be6 - - -
] Exception during message handling: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the rep
ly, the reply timeout expired, or the network connection was broken.
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher incoming.message))
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _dispatch
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 150, in inner
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher return func(*args, **kwargs)
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 110, in wrapped
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher payload)
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher self.force_reraise()
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2018-07-31 18:20:01.469 30831 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 89, in wrapped
2018-07-31 18:20:01.469 30831 ERROR oslo_messagin...

Revision history for this message
Matt Riedemann (mriedem) wrote :

Hmm, I thought you said the guest was spawned on the dest host B but then something after that failed. If the spawn() failed but libvirt is reporting that the guest exists on that host, then something is wrong in the libvirt driver and we need to clean that up.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/588087

Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → nobody
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.