Comment 4 for bug 1896463

Revision history for this message
Wonil Choi (wonil22) wrote :

reproduce ratio: 1/1000 ~ 1/5000. actually I tried to reproduce this with evacuating 20 VMs on a host.
please review that these sequence.

1. (manager) request evacuate and move_claim() called and created on destination host[1].
2020-09-22 19:16:48.440 8 INFO nova.virt.libvirt.driver [-] [instance: d74302ce-5c79-43f0-a035-98449a6aa62b] Instance spawned successfully.

2. (RT) Enter the _update_available_resource() and get the instances from DB[2], instance's host is not changed to destination yet. so the get_by_host_and_node()[2] did not include the evacuated instance (i.e. d74302ce-5c79-43f0-a035-98449a6aa62b )
2020-09-22 19:16:48.765 8 DEBUG oslo_concurrency.lockutils [req-0689f6b0-8e5b-41e3-a56a-d5439474ead4 - - - - -] Lock "compute_resources" acquired by "nova.compute.resource_tracker._update_available_resource" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
...
2020-09-22 19:16:48.960 8 INFO oslo.messaging.notification.compute.instance.rebuild.end [req-eaa15672-5559-4bb0-9a02-aad134ccd60b c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] {"event_type": "compute.instance.rebuild.end", "timestamp": "2020-09-22 10:16:48.960283", "payload": {"state_description": "", "availability_zone": "nova", "terminated_at": "", "ephemeral_gb": 0, "instance_type_id": 280, "deleted_at": "", "reservation_id": "r-vwlai4vk", "memory_mb": 10240, "display_name": "VIDA_VM3", "fixed_ips": [{"version": 4, "vif_mac": "fa:16:3e:87:c4:46", "floating_ips": [], "label": "sriov-network", "meta": {}, "address": "100.90.80.18", "type": "fixed"}, {"version": 4, "vif_mac": "fa:16:3e:86:76:c2", "floating_ips": [], "label": "sriov-network", "meta": {}, "address": "100.90.80.69", "type": "fixed"}, {"version": 4, "vif_mac": "fa:16:3e:bb:88:c6", "floating_ips": [], "label": "sriov-network", "meta": {}, "address": "100.90.80.37", "type": "fixed"}], "hostname": "vida-vm3", "state": "active", "progress": "", "launched_at": "2020-09-22T10:16:48.508530", "metadata": {}, "node": "com11", "ramdisk_id": "", "access_ip_v6": null, "disk_gb": 10, "access_ip_v4": null, "kernel_id": "", "image_name": "rhel7.4", "host": "com11", "user_id": "c424fbb3d41f444bb7a025266fda36da", "image_ref_url": "http://114.128.0.211:9292/images/28a91968-9df2-4b02-8212-78a86a56e353", "cell_name": "", "root_gb": 10, "tenant_id": "6255a6910b9b4d3ba34a93624fe7fb22", "created_at": "2020-09-22 06:51:54+00:00", "instance_id": "d74302ce-5c79-43f0-a035-98449a6aa62b", "instance_type": "VIDA_flavor", "vcpus": 2, "image_meta": {"min_disk": "10", "container_format": "bare", "min_ram": "0", "disk_format": "raw", "base_image_ref": "28a91968-9df2-4b02-8212-78a86a56e353"}, "architecture": null, "os_type": null, "instance_flavor_id": "302"}, "priority": "INFO", "publisher_id": "compute.com10", "message_id": "3e96ce39-fd53-4324-be64-b3a822db5215"}

3. (manager) Save the instance with destination host and migration status 'done'[3]
(I added temporal logs on the end of rebuild_instance()[4])
2020-09-22 19:16:49.349 8 DEBUG nova.compute.manager [req-eaa15672-5559-4bb0-9a02-aad134ccd60b c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] [instance: d74302ce-5c79-43f0-a035-98449a6aa62b] Set Migration status rebuild_instance /usr/lib/python2.7/site-packages/nova/compute/manager.py:3085

4. (RT) get the migration list[5], the instance(d74302ce) is not included[6][7].
2020-09-22 19:16:50.038 8 WARNING nova.compute.resource_tracker [req-0689f6b0-8e5b-41e3-a56a-d5439474ead4 - - - - -] Instance d74302ce-5c79-43f0-a035-98449a6aa62b is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'VCPU': 2, u'MEMORY_MB': 10240, u'DISK_GB': 10}}. Skipping heal of allocation because we do not know what to do.

5. pci devices are free by .clean_usage() [8]

[1] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/manager.py#L3197
[2] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/resource_tracker.py#L757
[3] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/manager.py#L3250-L3257
[4] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/manager.py#L3257
[5] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/resource_tracker.py#L768
[6] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/resource_tracker.py#L774
[7] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/resource_tracker.py#L1364
[8] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/resource_tracker.py#L789