confirm resize fails with CPUUnpinningInvalid when resizing to the same host

Bug #1961188 reported by Pavlo Shchelokovskyy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

This is very similar to https://bugs.launchpad.net/nova/+bug/1944759 (which should be fixed already) but still happens when resizing to the same host.

reproduction:

fresh single node devstack/master (Nova commit b5029890c1c5b1b5153c9ca2fc9a8ea2437f635d)

in nova-cpu.conf I set (have 4 vcpus in my devstack VM)

[DEFAULT]
allow_resize_to_same_host = True # already set by default on a single node devstack
update_resources_interval = 20 # to increase chances of a race

[compute]
cpu_shared_set = 0
cpu_dedicated_set = 1-3

create 2 flavors with 1 and 2 pinned cpu each, and start resizing (and confirming) a cirros-based instance between them back and forth.

Some times the resize confirm fails with

Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "a3b3ecbe-2039-42fb-8365-da12e3c93bae" acquired by "nova.compute.manager.ComputeManager.confirm_resize.<lo
cals>.do_confirm_resize" :: waited 0.000s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Going to confirm migration 33 {{(pid=136855) do_confirm_resize /opt/
stack/nova/nova/compute/manager.py:4287}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Acquired lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.8
/dist-packages/oslo_concurrency/lockutils.py:294}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Building network info cache for instance {{(pid=136855) _get_instanc
e_nw_info /opt/stack/nova/nova/network/neutron.py:1997}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'info_cache' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr /opt/st
ack/nova/nova/objects/instance.py:1099}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Instance cache missing network info. {{(pid=136855) _get_preexisting
_port_ids /opt/stack/nova/nova/network/neutron.py:3300}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Updating instance_info_cache with network_info: [] {{(pid=136855) up
date_instance_cache_with_nw_info /opt/stack/nova/nova/network/neutron.py:117}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Releasing lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.
8/dist-packages/oslo_concurrency/lockutils.py:312}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'migration_context' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr
/opt/stack/nova/nova/objects/instance.py:1099}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" acquired by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source"
:: waited 0.000s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" "released" by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source
" :: held 0.037s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Confirm resize failed on source host master-dsvm. Resource allocatio
ns in the placement service will be removed regardless because the instance is now on the destination host master-dsvm. You can try hard rebooting the instance to correct its state.: nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subs
et of pinned CPU set [2, 3]
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Traceback (most recent call last):
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/manager.py", line 4316, in do_confirm_resize
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._confirm_resize(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/manager.py", line 4401, in _confirm_resize
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self.rt.drop_move_claim_at_source(context, instance, migration)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 391, in inner
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] return f(*args, **kwargs)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 563, in drop_move_claim_at_source
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._drop_move_claim(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 638, in _drop_move_claim
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._update_usage(usage, nodename, sign=-1)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1321, in _update_usage
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] cn.numa_topology = hardware.numa_usage_from_instance_numa(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/virt/hardware.py", line 2476, in numa_usage_from_instance_numa
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] new_cell.unpin_cpus(pinned_cpus)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/objects/numa.py", line 106, in unpin_cpus
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] raise exception.CPUUnpinningInvalid(requested=list(cpus),
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subset of pinned CPU set [2, 3]

full log snippet is at https://paste.opendev.org/show/biKlHnGI4PPt451riHXn/

melanie witt (melwitt)
tags: added: compute numa resize
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Trying to see whether we changed the logic in master.
Looks not.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.