evacuation failed: Port update failed : Unable to correlate PCI slot

Bug #1896463 reported by Wonil Choi on 2020-09-21
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Balazs Gibizer
Queens
Low
Unassigned
Rocky
Low
Unassigned
Stein
Low
Unassigned
Train
Low
Unassigned
Ussuri
Low
Balazs Gibizer
Victoria
Low
Balazs Gibizer

Bug Description

Description
===========
if the _update_available_resource() of resource_tracker is called between _do_rebuild_instance_with_claim() and instance.save() when evacuating VM instances on destination host,

nova/compute/manager.py

2931 def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
2932 +-- 84 lines: injected_files, new_pass, orig_sys_metadata,-------------------------------------------------------------------
3016 claim_ctxt = rebuild_claim(
3017 context, instance, scheduled_node,
3018 limits=limits, image_meta=image_meta,
3019 migration=migration)
3020 self._do_rebuild_instance_with_claim(
3021 +-- 47 lines: claim_ctxt, context, instance, orig_image_ref,-----------------------------------------------------------------
3068 instance.apply_migration_context()
3069 # NOTE (ndipanov): This save will now update the host and node
3070 # attributes making sure that next RT pass is consistent since
3071 # it will be based on the instance and not the migration DB
3072 # entry.
3073 instance.host = self.host
3074 instance.node = scheduled_node
3075 instance.save()
3076 instance.drop_migration_context()

the instance is not handled as managed instance of the destination host because it is not updated on DB yet.

2020-09-19 07:27:36.321 8 WARNING nova.compute.resource_tracker [req-b35d5b9a-0786-4809-bd81-ad306cdda8d5 - - - - -] Instance 22f6ca0e-f964-4467-83a3-f2bf12bb05ae is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'MEMORY_MB': 12288, u'VCPU': 2, u'DISK_GB': 10}}. Skipping heal of allocation because we do not know what to do.

And so the SRIOV ports (PCI device) was free by clean_usage() eventhough the VM has the VF port already.

 743 def _update_available_resource(self, context, resources):
 744 +-- 45 lines: # initialize the compute node object, creating it--------------------------------------------------------------
 789 self.pci_tracker.clean_usage(instances, migrations, orphans)
 790 dev_pools_obj = self.pci_tracker.stats.to_device_pools_obj()

After that, evacuated this VM to another compute host again, we got the error like below.

Steps to reproduce
==================
1. create a VM on com1 with SRIOV VF ports.
2. stop and disable nova-compute service on com1
3. wait 60 sec (nova-compute reporting interval)
4. evauate the VM to com2
5. wait the VM is active on com2
6. enable and start nova-compute on com1
7. wait 60 sec (nova-compute reporting interval)
8. stop and disable nova-compute service on com2
9. wait 60 sec (nova-compute reporting interval)
10. evauate the VM to com1
11. wait the VM is active on com1
12. enable and start nova-compute on com2
13. wait 60 sec (nova-compute reporting interval)
14. go to step 2.

Expected result
===============
Evacuation should be done without errors.

Actual result
=============
Evacuation failed with "Port update failed"

Environment
===========
openstack-nova-compute-18.0.1-1 with SRIOV ports are used. libvirt is used.

Logs & Configs
==============
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [req-38dd0be2-7223-4a59-8073-dd1b072125c5 c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Setting instance vm_state to ERROR: PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Traceback (most recent call last):
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7993, in _error_out_instance_on_exception
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] yield
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3025, in rebuild_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration, request_spec)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3087, in _do_rebuild_instance_with_claim
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] self._do_rebuild_instance(*args, **kwargs)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3190, in _do_rebuild_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] context, instance, self.host, migration)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 2953, in setup_instance_network_on_host
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 3058, in _update_port_binding_for_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] pci_slot)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]

Wonil Choi (wonil22) wrote :

Before saving instance with new host(destination host) in rebuild_instance(),

2931 def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
2932 +--136 lines: injected_files, new_pass, orig_sys_metadata,-------------------------------------------------------------------
3068 instance.apply_migration_context()
3069 # NOTE (ndipanov): This save will now update the host and node
3070 # attributes making sure that next RT pass is consistent since
3071 # it will be based on the instance and not the migration DB
3072 # entry.
3073 instance.host = self.host
3074 instance.node = scheduled_node
3075 instance.save()
3076 instance.drop_migration_context()
3077
3078 # NOTE (ndipanov): Mark the migration as done only after we
3079 # mark the instance as belonging to this host.
3080 self._set_migration_status(migration, 'done')

resource tracker(RT) may get instances by get_by_host_and_node()

 743 def _update_available_resource(self, context, resources):
 744 +-- 12 lines: # initialize the compute node object, creating it--------------------------------------------------------------
 756 # Grab all instances assigned to this node:
 757 instances = objects.InstanceList.get_by_host_and_node(

and after saving instance host(destination host) and set the migration status to 'done', RT get the migrations.

 767 # Grab all in-progress migrations:
 768 migrations = objects.MigrationList.get_in_progress_by_host_and_node(
 769 context, self.host, nodename)
 770

After these situation, pci devices are free by clean_usage()

 789 self.pci_tracker.clean_usage(instances, migrations, orphans)

tags: added: evacuate pci resource-tracker
Balazs Gibizer (balazs-gibizer) wrote :

I was not able to reproduce this issue on master in a multinode devstack.

Could you please attach the debug logs from the two nova-compute services from the reproduction?

How frequently could the problem be reproduced? Once in 10 attempts or 100 attempts?

Nova does consider ongoing migrations even if the instance is not assigned to the given host[1]. During evacuation we do have a Migration object[2]. This object is created at [3]. At that point destination only set if the evacuation is forced to a target host.

Later during the rebuild_instance there is a claim. That claim updates the migration object[4][5]. Also the _move_claim[6] first updates the migration object and then claims the pci devices. So I don't see how this can race with the _update_available_resource() run.

I mark this bug Incomplete as I don't have a reproduction and I don't see how the described race happens. If you have any more input or idea then please set it back to New.

[1] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/pci/manager.py#L346-L353
[2] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/manager.py#L3135
[3] https://github.com/openstack/nova/blob/b5330a97ae6f874b75fa9588ae122a1d1e827035/nova/compute/api.py#L5206-L5209
[4] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/resource_tracker.py#L282
[5] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/resource_tracker.py#L393
[6] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/resource_tracker.py#L282-L322

Changed in nova:
status: New → Incomplete
Minho Ban (mhban) wrote :

> How frequently could the problem be reproduced? Once in 10 attempts or 100 attempts?

Very rare because the racy section is short.

> Also the _move_claim[6] first updates the migration object and then claims the pci devices. So I don't see how this can race with the _update_available_resource() run.

Problem is not in the claim step, but in the free step. Currently clean_usage() part isn't protected from below:

3075 instance.save()
3076 instance.drop_migration_context()

Wonil Choi (wonil22) wrote :
Download full text (4.7 KiB)

reproduce ratio: 1/1000 ~ 1/5000. actually I tried to reproduce this with evacuating 20 VMs on a host.
please review that these sequence.

1. (manager) request evacuate and move_claim() called and created on destination host[1].
2020-09-22 19:16:48.440 8 INFO nova.virt.libvirt.driver [-] [instance: d74302ce-5c79-43f0-a035-98449a6aa62b] Instance spawned successfully.

2. (RT) Enter the _update_available_resource() and get the instances from DB[2], instance's host is not changed to destination yet. so the get_by_host_and_node()[2] did not include the evacuated instance (i.e. d74302ce-5c79-43f0-a035-98449a6aa62b )
2020-09-22 19:16:48.765 8 DEBUG oslo_concurrency.lockutils [req-0689f6b0-8e5b-41e3-a56a-d5439474ead4 - - - - -] Lock "compute_resources" acquired by "nova.compute.resource_tracker._update_available_resource" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
...
2020-09-22 19:16:48.960 8 INFO oslo.messaging.notification.compute.instance.rebuild.end [req-eaa15672-5559-4bb0-9a02-aad134ccd60b c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] {"event_type": "compute.instance.rebuild.end", "timestamp": "2020-09-22 10:16:48.960283", "payload": {"state_description": "", "availability_zone": "nova", "terminated_at": "", "ephemeral_gb": 0, "instance_type_id": 280, "deleted_at": "", "reservation_id": "r-vwlai4vk", "memory_mb": 10240, "display_name": "VIDA_VM3", "fixed_ips": [{"version": 4, "vif_mac": "fa:16:3e:87:c4:46", "floating_ips": [], "label": "sriov-network", "meta": {}, "address": "100.90.80.18", "type": "fixed"}, {"version": 4, "vif_mac": "fa:16:3e:86:76:c2", "floating_ips": [], "label": "sriov-network", "meta": {}, "address": "100.90.80.69", "type": "fixed"}, {"version": 4, "vif_mac": "fa:16:3e:bb:88:c6", "floating_ips": [], "label": "sriov-network", "meta": {}, "address": "100.90.80.37", "type": "fixed"}], "hostname": "vida-vm3", "state": "active", "progress": "", "launched_at": "2020-09-22T10:16:48.508530", "metadata": {}, "node": "com11", "ramdisk_id": "", "access_ip_v6": null, "disk_gb": 10, "access_ip_v4": null, "kernel_id": "", "image_name": "rhel7.4", "host": "com11", "user_id": "c424fbb3d41f444bb7a025266fda36da", "image_ref_url": "http://114.128.0.211:9292/images/28a91968-9df2-4b02-8212-78a86a56e353", "cell_name": "", "root_gb": 10, "tenant_id": "6255a6910b9b4d3ba34a93624fe7fb22", "created_at": "2020-09-22 06:51:54+00:00", "instance_id": "d74302ce-5c79-43f0-a035-98449a6aa62b", "instance_type": "VIDA_flavor", "vcpus": 2, "image_meta": {"min_disk": "10", "container_format": "bare", "min_ram": "0", "disk_format": "raw", "base_image_ref": "28a91968-9df2-4b02-8212-78a86a56e353"}, "architecture": null, "os_type": null, "instance_flavor_id": "302"}, "priority": "INFO", "publisher_id": "compute.com10", "message_id": "3e96ce39-fd53-4324-be64-b3a822db5215"}

3. (manager) Save the instance with destination host and migration status 'done'[3]
(I added temporal logs on the end of rebuild_instance()[4])
2020-09-22 19:16:49.349 8 DEBUG nova.compute.manager [req-eaa15672-5559-4bb0-9a02-aad134ccd60b c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - def...

Read more...

Wonil Choi (wonil22) on 2020-09-23
Changed in nova:
status: Incomplete → New
Wonil Choi (wonil22) wrote :

I can reproduce this easily with adding 10 sec delay between InstanceList.get_by_host_and_node() and MigrationList.get_in_progress_by_host_and_node() [1]

[1] https://github.com/openstack/nova/blob/5645f75d6b3adac00f6be8e0eae4565c4eb2ab5d/nova/compute/resource_tracker.py#L766

Balazs Gibizer (balazs-gibizer) wrote :

Thanks for the additional information. I will try to the reproduction with the 10sec delay.

Balazs Gibizer (balazs-gibizer) wrote :

OK. I was able to reproduce the bug. Setting this confirmed. I will push a functional test that shows the problem soon.

Changed in nova:
status: New → Confirmed
assignee: nobody → Balazs Gibizer (balazs-gibizer)
importance: Undecided → Low
Wonil Choi (wonil22) wrote :

after the evacuation failed, the instance can not be evacuated again. I think this is critical situation for "evacuation" scenario even though the ratio is rare.

I've pushed a reproducer see commeent #9

sean mooney (sean-k-mooney) wrote :

just adding the previous filed downstream redhat bug
https://bugzilla.redhat.com/show_bug.cgi?id=1852110

this can happen in queens for context so when we root cause the issue and fix it it should like be backported to queens. tjere are other older bugs form newton that look similar related to unshelve so its posible that the same issue is affecting multiple move operations.

sean mooney (sean-k-mooney) wrote :

i have taged this against the older release and set the same importance. im not sure i we will backport that far so we might update to later baranches to wontfix at some point but we can do that later

Fix proposed to branch: master
Review: https://review.opendev.org/754815

Changed in nova:
status: Confirmed → In Progress

@Wonil: could you try the fix I proposed[1] in your env? I think it should be fairly simple to backport it to rocky version.

[1] https://review.opendev.org/754815

I did some analysis about the other move operations to see if they are affected by the same issue.

During cold migrate and resize the instance.host is set to the dest host when the instance reaches the VERIFY_RESIZE state. At this point the migration is set to 'finished' but finished is not an end state of the migration. A migration in finished state still considered in progress by the resource tracker. Later when the resize is confirmed (or reverted) then the migration status is put to 'confirmed' ('reverted') and this is an end state. So a simple delay in the resource tracker is not enough to trigger the race. A long enough delay is needed in the RT that allows the user to actually confirm the migration by hand. Or the automatic confirmation task to be configured and actually run. This makes the race pretty unlikely.

In case of live migration the process is a lot more complex. There is multiple steps in the live migration process that grabs the COMPUTE_RESOURCE_SEMAPHORE in the resource tracker (e.g. the PCI claim) so simply starting and then slowing down the update_available_resources task in the resource tracker does not work as a reproduction. I gave up on trying to reproduce the same problem but I'm not convinced that such fault does not exist for live migration. A reproduction should trigger a periodic task after the last COMPUTE_RESOURCE_SEMAPHORE grabbing live migration step but before the live migration finishes.

The unshelve (after shelve offload) is not affected by the bug as it uses the same instance_claim as the boot process. the instance_claim will set the instance.host under the COMPUTE_RESOURCE_SEMAPHORE lock so there is no chance for an overlapping update_available_resource periodic run.

Wonil Choi (wonil22) wrote :

@Balazs, Sure I will try the fix and share the result.

Wonil Choi (wonil22) wrote :

I cannot reproduce it with the patch ported in my rocky env. Thanks.

@Wonil: thanks for the feedback

Reviewed: https://review.opendev.org/754100
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3f348602ae4a40c52c7135b2cb48deaa6052c488
Submitter: Zuul
Branch: master

commit 3f348602ae4a40c52c7135b2cb48deaa6052c488
Author: Balazs Gibizer <email address hidden>
Date: Thu Sep 24 15:04:21 2020 +0200

    Reproduce bug 1896463 in func env

    There is a race condition between the rebuild and the
    _update_available_resource periodic task on the compute. This patch adds
    a reproducer functional test. Unfortunately it needs some injected sleep
    to make the race happen in a stable way. This is suboptimal but only
    adds 3 seconds of slowness to the test execution.

    Change-Id: Id0577bceed9808b52da4acc352cf9c204f6c8861
    Related-Bug: #1896463

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/754815
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7675964af81472f3dd57db952d704539e61a3e6e
Submitter: Zuul
Branch: master

commit 7675964af81472f3dd57db952d704539e61a3e6e
Author: Balazs Gibizer <email address hidden>
Date: Mon Sep 28 18:44:02 2020 +0200

    Set instance host and drop migration under lock

    The _update_available_resources periodic makes resource allocation
    adjustments while holding the COMPUTE_RESOURCE_SEMAPHORE based on the
    list of instances assigned to this host of the resource tracker and
    based on the migrations where the source or the target host is the host
    of the resource tracker. So if the instance.host or the migration
    context changes without holding the COMPUTE_RESOURCE_SEMAPHORE while
    the _update_available_resources task is running there there will be data
    inconsistency in the resource tracker.

    This patch makes sure that during evacuation the instance.host and the
    migration context is changed while holding the semaphore.

    Change-Id: Ica180165184b319651d22fe77e076af036228860
    Closes-Bug: #1896463

This issue was fixed in the openstack/nova 22.1.0 release.

This issue was fixed in the openstack/nova 21.2.0 release.

This issue was fixed in the openstack/nova 23.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.