Evacuated instances should be completed when ComputeHostNotFound

Bug #1952745 reported by Konrad Cempura
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
Unassigned

Bug Description

Scenario 1:
- remove compute physically by format disk
- evacuate VMs from removed compute
- remove orphaned resource provider for removed compute
- add new compute with the same name as removed one
- migrate evacuated VMs on new compute

Expected result
===============
VMs are working correctly.

Actual result
=============
Definition of VMs are removed from libvirt.

Scenario 2:
- remove compute physically by format disk
- evacuate VMs from removed compute
- remove orphaned resource provider for removed compute
- add new compute with the same name as removed one
- restart nova_compute

Expected result
===============
Evacuations are completed on first run of nova_compute.

Actual result
=============
Evacuations are completed after restart.

Scenario 3:
- remove compute physically by format disk
- evacuate VMs from removed compute
- add new compute with the same name as removed one but using capital letters
- migrate evacuated VMs on new compute

Expected result
===============
VMs are working correctly on new compute.

Actual result
=============
Definitions of VMs are removed from libvirt.

Environment
===========
1. Openstack Train
Commit-Id: 4cf72ea6bfc58d33da894f248184c08c36055884
Also occurs on master.

2. Libvirt + KVM
libvirtd (libvirt) 4.5.0

Proposed solution
=================

Evacuations should be completed when ComputeHostNotFound occurs.

Proposed patch
==============

diff --git a/nova/compute/manager.py b/nova/compute/manager.py
index eaedc0238f..df56430aa6 100644
--- a/nova/compute/manager.py
+++ b/nova/compute/manager.py
@@ -752,16 +752,16 @@ class ComputeManager(manager.Manager):
                         context, self.host, migration.source_node).uuid
                     compute_nodes[migration.source_node] = cn_uuid
                 except exception.ComputeHostNotFound:
- LOG.error("Failed to clean allocation of evacuated "
- "instance as the source node %s is not found",
- migration.source_node, instance=instance)
- continue
- cn_uuid = compute_nodes[migration.source_node]
+ LOG.warning("Failed to clean allocation of evacuated "
+ "instance as the source node %s is not found",
+ migration.source_node, instance=instance)
+
+ cn_uuid = compute_nodes.get(migration.source_node)

             # If the instance was deleted in the interim, assume its
             # allocations were properly cleaned up (either by its hosting
             # compute service or the API).
- if (not instance.deleted and
+ if (cn_uuid and not instance.deleted and
                     not self.reportclient.
                         remove_provider_tree_from_instance_allocation(
                             context, instance.uuid, cn_uuid)):

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

What do you exactly mean by "Definitions of VMs are removed from libvirt." ?

Which migration are you doing ? I guess cold migrate ? If so, are you confirming the resize or do you have a config option value for it ?

https://docs.openstack.org/api-ref/compute/?expanded=migrate-server-migrate-action-detail#migrate-server-migrate-action

TBH, I don't really see a problem with what you say : if you recreate a nova-compute service, you need to restart it for removing the evacuated instances, but I could maybe misunderstand your concerns.

Changed in nova:
status: New → Incomplete
Revision history for this message
Konrad Cempura (kcem) wrote :
Download full text (18.1 KiB)

XML definitions of VMs on libvirt. It looks like VMs are removed from evacuated compute but this is not this compute anymore. It has only the same name.

Yes, cold migration. I migrate VMs to new compute with the same name as compute that VMs has been evacuated earlier. I confirm migration and everything looks ok unless nova_compute is restarted.

I am sorry if description is not detailed enough. I'll try to give more details.

Scenario 1:
===========

As described before plus details on cold migration:

Cold migrate 34f25e4c-6069-4ce0-ac28-79d28890d50a and a7f4c239-0df1-4b40-99d1-dcd1f71799f1 to vcmp1 and confirm (Id 6 and 8).

[softiops@vosc1 ~]$ nova migration-list
+----+--------------------------------------+-------------+-----------+----------------+--------------+-------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+
| Id | UUID | Source Node | Dest Node | Source Compute | Dest Compute | Dest Host | Status | Instance UUID | Old Flavor | New Flavor | Created At | Updated At | Type |
+----+--------------------------------------+-------------+-----------+----------------+--------------+-------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+
| 8 | 46f06d16-e039-4bf7-8d2d-2df2e1c9ad48 | vcmp0 | vcmp1 | vcmp0 | vcmp1 | 128.10.0.21 | confirmed | 34f25e4c-6069-4ce0-ac28-79d28890d50a | 1 | 1 | 2021-11-30T15:49:09.000000 | 2021-11-30T15:49:26.000000 | migration |
| 6 | 4792706a-d9d9-4acc-a990-8d9bbc712f79 | vcmp0 | vcmp1 | vcmp0 | vcmp1 | 128.10.0.21 | confirmed | a7f4c239-0df1-4b40-99d1-dcd1f71799f1 | 1 | 1 | 2021-11-30T15:48:51.000000 | 2021-11-30T15:49:16.000000 | migration |
| 5 | 6f5acc58-a403-4325-8efb-f48eb10a6689 | vcmp1 | vcmp0 | vcmp1 | vcmp0 | 128.10.0.20 | done | a7f4c239-0df1-4b40-99d1-dcd1f71799f1 | None | None | 2021-11-29T11:50:54.000000 | 2021-11-29T11:51:03.000000 | evacuation |
| 3 | fd462cf4-a2a0-4d94-84c8-47dd5c5971fc | vcmp1 | vcmp0 | vcmp1 | vcmp0 | 128.10.0.20 | done | 34f25e4c-6069-4ce0-ac28-79d28890d50a | None | None | 2021-11-29T11:50:53.000000 | 2021-11-29T11:51:01.000000 | evacuation |
| 1 | 34cdb93e-b0cf-41ec-88b5-a5255392fcb1 | vcmp0 | vcmp1 | vcmp0 | vcmp1 | 128.10.0.21 | confirmed | 34f25e4c-6069-4ce0-ac28-79d28890d50a | 1 | 1 | 2021-11-29T10:38:24.000000 | 2021-11-29T10:38:51.000000 | migration |
+----+--------------------------------------+-------------+-----------+----------------+--------------+-------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+

docker exec -it nova_libvirt virsh list
 Id Name State
------------------------------------------...

Revision history for this message
Konrad Cempura (kcem) wrote :

> "TBH, I don't really see a problem with what you say : if you recreate a nova-compute service, you need to restart it for removing the evacuated instances, but I could maybe misunderstand your concerns."

1. It is possible to cold migrate evacuated instances to new compute that has the same name like source compute from which instances has been evacuated earlier and that instances will be removed from libvirt on first compute restart (and you definitely don't want that).

2. Evacuations from compute-1 are not completed on first run of nova_compute on new compute-1 that get the same name like broken compute-1 and make possible scenario 1 to happen. They try to complete them but there are errors in logs and it is successful after restart.

3. It is possible to cold migrate evacuated instances to compute that has slightly different name; difference is in capital letters (ex. compute-1 vs. COMPUTE-1) and the result will be the same as in point 1. With exception that evacuations from compute-1 will never be completed. Instances evacuated from compute-1 will be deleted if they are moved to COMPUTE-1 in future... but in unexpected moment -> after COMPUTE-1 restart (it may never happen but when it happen you will lose some/all instances).

Scenarios to point 1, 2, 3:

Scenario 1:
- remove compute-1 physically by format disk
- evacuate instances from removed compute-1 to compute-2
- remove compute-1 service from openstack cloud
- remove orphaned resource provider for compute-1
- configure new compute with name compute-1 and add it to openstack cloud
- cold migrate evacuated instances from compute-2 to compute-1
- accept migrations
- restart compute-1
- instances from compute-1 are GONE from libvirt (they exists in OpenStack)

Scenario 2:
- remove compute-1 physically by format disk
- evacuate instances from removed compute-1
- remove compute-1 service from openstack
- remove orphaned resource provider for compute-1
- add new compute with the same name: compute-1
- evacuations are not completed, error during first start occur in logs; second restart finish evacuations

Scenario 3:
- remove compute-1 physically by format disk
- evacuate instances from removed compute-1 to compute-2
- remove compute-1 service from openstack cloud
- remove orphaned resource provider for compute-1
- configure new compute with slightly different name: COMPUTE-1 and add it to openstack cloud
- restart COMPUTE-1 or service nova_compute on COMPUTE-1 as many times as you like
- cold migrate evacuated instances from compute-2 to COMPUTE-1
- accept migrations
- restart COMPUTE-1
- instances from COMPUTE-1 are GONE from libvirt (they exists in OpenStack)
- evacuations from compute-1 will be never completed (until instances got on COMPUTE-1 and COMPUTE-1 will be restarted)

Konrad Cempura (kcem)
Changed in nova:
status: Incomplete → New
Revision history for this message
Artom Lifshitz (notartom) wrote :

I suspect this is a valid bug.

When nova-compute starts up, it looks for migration records with type 'evacuation', status 'done', and itself as the source host. It then destroys the associated libvirt instances from the hypervisor, and sets the migration record to 'completed' to avoid destroying them again on subsequent startups.

This is all well and good if it's the original compute host that comes back after being evacuated, but what if it's a brand new compute host with the same host name?

It'll find the 'done' evacuations, look for the associated libvirt instances to destroy on the hypervisor, not find any because it's a brand new compute, and I suspect at this point it will not set the migration record to 'completed', though code examination doesn't support this [1] unfortunately.

Assuming I'm correct, if previously-evacuated instances are now migrated back to the new (but with the old hostname) compute, and the nova-compute service is restarted, it'll pick up those 'done' migration records and destroy the libvirt instances.

I'm trying to reproduce this with a functional test, but not having any luck (unrelated issues) so far.

[1] https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L837-L873

Changed in nova:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/841170

Revision history for this message
sean mooney (sean-k-mooney) wrote :

To me this is more of a feature request then a bug.

the way the compute node was replaced i don't belive is currently supported
by nova.

we expect that if you use the same hostname you are actually preserving all data on the server and just replacing the failed component i.e. you are preparing the failed server not replacing it.

if you are replacing the server it should use a different hostname.

so i agree for the sake of operator UX we could enhance nova to support this type of replacement but really this feels like an RFE to me. give the potential for data loss i am willing to consider this a bug so that it can be backported but normally i would suggest this should be a specless blueprint to add this new feature.

Revision history for this message
Artom Lifshitz (notartom) wrote :

So this is what I see in the func test logs:

2022-05-10 09:48:27,041 WARNING [nova.compute.manager] Compute node compute0 not found in the database. If this is the first time this service is starting on this host, then you can ignore this warning.
2022-05-10 09:48:27,065 INFO [nova.compute.manager] Cleaning up allocations of the instance as it has been evacuated from this host
2022-05-10 09:48:27,065 ERROR [nova.compute.manager] Failed to clean allocation of evacuated instance as the source node compute0 is not found
2022-05-10 09:48:27,065 INFO [nova.compute.manager] Looking for unclaimed instances stuck in BUILDING status for nodes managed by this host
2022-05-10 09:48:27,075 WARNING [nova.compute.manager] No compute node record found for host compute0. If this is the first time this service is starting on this host, then you can ignore this warning.
2022-05-10 09:48:27,084 WARNING [nova.compute.resource_tracker] No compute node record for compute0:compute0
2022-05-10 09:48:27,087 INFO [nova.compute.resource_tracker] Compute node record created for compute0:compute0 with uuid: 934ed5eb-8426-4bc5-a614-9fee6d2068cc

In other words, at the time we do the destroy, there is no compute node record, which causes us to hit the following if+continue:

            if migration.source_node not in hostname_to_cn_uuid:
                LOG.error("Failed to clean allocation of evacuated "
                          "instance as the source node %s is not found",
                          migration.source_node, instance=instance)
                continue

Meaning we continue out of the outer for loop and never do this bit:

            migration.status = 'completed'
            migration.save()

The compute node record is created by the resource tracker from update_available_resource(), which is called from the pre_start_hook() in the compute manager, which is called *after* init_host() at service init time:

        self.manager.init_host()
        self.model_disconnected = False
        ctxt = context.get_admin_context()
        self.service_ref = objects.Service.get_by_host_and_binary(
            ctxt, self.host, self.binary)
        if self.service_ref:
            _update_service_ref(self.service_ref)

        else:
            try:
                self.service_ref = _create_service_ref(self, ctxt)
            except (exception.ServiceTopicExists,
                    exception.ServiceBinaryExists):
                # NOTE(danms): If we race to create a record with a sibling
                # worker, don't fail here.
                self.service_ref = objects.Service.get_by_host_and_binary(
                    ctxt, self.host, self.binary)

        self.manager.pre_start_hook()

So I wonder if we could just move the pre_start_hook call further up?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/841308

Changed in nova:
status: Confirmed → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.