update_available_resource periodic fails with exception.CPUPinningInvalid if there is incoming post-migrating migration with cpu pinning

Bug #1953359 reported by Balazs Gibizer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Balazs Gibizer
Victoria
Fix Released
Medium
Balazs Gibizer
Wallaby
Fix Released
Medium
Balazs Gibizer
Xena
Fix Released
Medium
Balazs Gibizer

Bug Description

The update_available_resource() periodic task in the compute fails with exception.CPUPinningInvalid exception (and stop processing the rest of the instances) if there is an incoming migration (or resize or evacuation) that is in post-migrating state (not yet executed finish_resize) and the instance has CPU pinning.

Reproduce:
* build a multinode env with dedicated cpus and cpu pinning configured
* configure the update_available_resource to run frequently (just to ease the reproduction of the race) (e.g. set [DEFAULT]update_resources_interval = 10)
* create inst1 on the first node and create inst2 on the second node both with requesting one pinned cpu
* check that inst1 pinned to the same pcpu id on node1 as inst2 on node2
* slow down the processing on finish_resize messages in the system to ease the reproduction of the race (e.g. inject sleep or load rabbit etc.)
* migrate inst1 to node2

If you are managed to hit the case when the periodic runs on node2 just after the resize_claim of inst1 finished but the finish_resize RPC call of inst1 is not processed (the migration context is not applied to the instance and the migration is not in finished state but in post-migration) then you will see a CPU pinning conflict. It is because the resource tracker already tracks the incoming instance [1] (the host and node is set in resize_instance already[2]) but the instance still not have the migration context applied (as it is only done in finish_resize[3]) so the instance.numa_topology still points to the source topology.

Reproduced both in stable/victoria downstream and in latest master in an upstream devstack.

2021-12-06 15:07:18,013 ERROR [nova.compute.manager] Error updating resources for node compute2.
Traceback (most recent call last):
  File "/root/rtox/nova/functional-py38/nova/compute/manager.py", line 10011, in _update_available_resource_for_node
    self.rt.update_available_resource(context, nodename,
  File "/root/rtox/nova/functional-py38/nova/compute/resource_tracker.py", line 895, in update_available_resource
    self._update_available_resource(context, resources, startup=startup)
  File "/root/rtox/nova/functional-py38/.tox/functional-py38/lib/python3.8/site-packages/oslo_concurrency/lockutils.py", line 391, in inner
    return f(*args, **kwargs)
  File "/root/rtox/nova/functional-py38/nova/compute/resource_tracker.py", line 936, in _update_available_resource
    instance_by_uuid = self._update_usage_from_instances(
  File "/root/rtox/nova/functional-py38/nova/compute/resource_tracker.py", line 1500, in _update_usage_from_instances
    self._update_usage_from_instance(context, instance, nodename)
  File "/root/rtox/nova/functional-py38/nova/compute/resource_tracker.py", line 1463, in _update_usage_from_instance
    self._update_usage(self._get_usage_dict(instance, instance),
  File "/root/rtox/nova/functional-py38/nova/compute/resource_tracker.py", line 1268, in _update_usage
    cn.numa_topology = hardware.numa_usage_from_instance_numa(
  File "/root/rtox/nova/functional-py38/nova/virt/hardware.py", line 2382, in numa_usage_from_instance_numa
    new_cell.pin_cpus(pinned_cpus)
  File "/root/rtox/nova/functional-py38/nova/objects/numa.py", line 95, in pin_cpus
    raise exception.CPUPinningInvalid(requested=list(cpus),
nova.exception.CPUPinningInvalid: CPU set to pin [0] must be a subset of free CPU set [1]

[1] https://github.com/openstack/nova/blob/7670303aabe16d1d7c25e411d7bd413aee7fdcf3/nova/compute/resource_tracker.py#L928-L929
[2] https://github.com/openstack/nova/blob/7670303aabe16d1d7c25e411d7bd413aee7fdcf3/nova/compute/manager.py#L5639-L5653
[3] https://github.com/openstack/nova/blob/7670303aabe16d1d7c25e411d7bd413aee7fdcf3/nova/compute/manager.py#L5780

tags: added: numa
tags: added: compute resource-tracker
tags: added: resize
Changed in nova:
assignee: nobody → Balazs Gibizer (balazs-gibizer)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/820540

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/820549

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/820550

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/820553

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/820554

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/820555

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/820558

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/820559

Changed in nova:
importance: Undecided → Medium
Revision history for this message
melanie witt (melwitt) wrote :

It looks like this might be a duplicate of:

https://bugs.launchpad.net/nova/+bug/1952915

?

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I mark this as duplicate of https://bugs.launchpad.net/nova/+bug/1952915

While this bug reports on error in the periodic task and that bug reports on a resize revert, the root cause of the two bugs are the same and both reporters reached the same conclusion independently about the root cause.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820540
Committed: https://opendev.org/openstack/nova/commit/c59224d715a21998f40f72cf4e37efdc990e4d7e
Submitter: "Zuul (22348)"
Branch: master

commit c59224d715a21998f40f72cf4e37efdc990e4d7e
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 16:36:41 2021 +0100

    Reproduce bug 1953359

    This patch adds a functional test that reproduces a race between
    incoming migration and the update_available_resource periodic

    Change-Id: I4be429c56aaa15ee12f448978c38214e741eae63
    Related-Bug: #1953359

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/820856

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/820859

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Balazs Gibizer <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/820856
Reason: abandon it for now

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820859
Committed: https://opendev.org/openstack/nova/commit/9f296d775d8f58fcbd03393c81a023268c7071cb
Submitter: "Zuul (22348)"
Branch: master

commit 9f296d775d8f58fcbd03393c81a023268c7071cb
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 16:36:41 2021 +0100

    Extend the reproducer for 1953359 and 1952915

    This patch extends the original reproduction
    I4be429c56aaa15ee12f448978c38214e741eae63 to cover
    bug 1952915 as well as they have a common root cause.

    Change-Id: I57982131768d87e067d1413012b96f1baa68052b
    Related-Bug: #1953359
    Related-Bug: #1952915

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820549
Committed: https://opendev.org/openstack/nova/commit/32c1044d86a8d02712c8e3abdf8b3e4cff234a9c
Submitter: "Zuul (22348)"
Branch: master

commit 32c1044d86a8d02712c8e3abdf8b3e4cff234a9c
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 17:06:51 2021 +0100

    [rt] Apply migration context for incoming migrations

    There is a race condition between an incoming resize and an
    update_available_resource periodic in the resource tracker. The race
    window starts when the resize_instance RPC finishes and ends when the
    finish_resize compute RPC finally applies the migration context on the
    instance.

    In the race window, if the update_available_resource periodic is run on
    the destination node, then it will see the instance as being tracked on
    this host as the instance.node is already pointing to the dest. But the
    instance.numa_topology still points to the source host topology as the
    migration context is not applied yet. This leads to CPU pinning error if
    the source topology does not fit to the dest topology. Also it stops the
    periodic task and leaves the tracker in an inconsistent state. The
    inconsistent state only cleanup up after the periodic is run outside of
    the race window.

    This patch applies the migration context temporarily to the specific
    instances during the periodic to keep resource accounting correct.

    Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a
    Closes-Bug: #1953359
    Closes-Bug: #1952915

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/821941

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/821943

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/822047

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/822048

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/822050

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820550
Committed: https://opendev.org/openstack/nova/commit/f0a6d946aaa6c30f826cfced75c2fb06fdb379a8
Submitter: "Zuul (22348)"
Branch: stable/xena

commit f0a6d946aaa6c30f826cfced75c2fb06fdb379a8
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 16:36:41 2021 +0100

    Reproduce bug 1953359

    This patch adds a functional test that reproduces a race between
    incoming migration and the update_available_resource periodic

    Change-Id: I4be429c56aaa15ee12f448978c38214e741eae63
    Related-Bug: #1953359
    (cherry picked from commit c59224d715a21998f40f72cf4e37efdc990e4d7e)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/821941
Committed: https://opendev.org/openstack/nova/commit/0411962938ae1de39f8dccb03efe4567f82ad671
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 0411962938ae1de39f8dccb03efe4567f82ad671
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 16:36:41 2021 +0100

    Extend the reproducer for 1953359 and 1952915

    This patch extends the original reproduction
    I4be429c56aaa15ee12f448978c38214e741eae63 to cover
    bug 1952915 as well as they have a common root cause.

    Change-Id: I57982131768d87e067d1413012b96f1baa68052b
    Related-Bug: #1953359
    Related-Bug: #1952915
    (cherry picked from commit 9f296d775d8f58fcbd03393c81a023268c7071cb)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820553
Committed: https://opendev.org/openstack/nova/commit/1235dc324ebc1c6ac6dc94da0f45ffffcc546d2c
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 1235dc324ebc1c6ac6dc94da0f45ffffcc546d2c
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 17:06:51 2021 +0100

    [rt] Apply migration context for incoming migrations

    There is a race condition between an incoming resize and an
    update_available_resource periodic in the resource tracker. The race
    window starts when the resize_instance RPC finishes and ends when the
    finish_resize compute RPC finally applies the migration context on the
    instance.

    In the race window, if the update_available_resource periodic is run on
    the destination node, then it will see the instance as being tracked on
    this host as the instance.node is already pointing to the dest. But the
    instance.numa_topology still points to the source host topology as the
    migration context is not applied yet. This leads to CPU pinning error if
    the source topology does not fit to the dest topology. Also it stops the
    periodic task and leaves the tracker in an inconsistent state. The
    inconsistent state only cleanup up after the periodic is run outside of
    the race window.

    This patch applies the migration context temporarily to the specific
    instances during the periodic to keep resource accounting correct.

    Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a
    Closes-Bug: #1953359
    Closes-Bug: #1952915
    (cherry picked from commit 32c1044d86a8d02712c8e3abdf8b3e4cff234a9c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820554
Committed: https://opendev.org/openstack/nova/commit/d8859e4f95f5abb20c844d914f2716cba047630e
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit d8859e4f95f5abb20c844d914f2716cba047630e
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 16:36:41 2021 +0100

    Reproduce bug 1953359

    This patch adds a functional test that reproduces a race between
    incoming migration and the update_available_resource periodic

    Change-Id: I4be429c56aaa15ee12f448978c38214e741eae63
    Related-Bug: #1953359
    (cherry picked from commit c59224d715a21998f40f72cf4e37efdc990e4d7e)
    (cherry picked from commit f0a6d946aaa6c30f826cfced75c2fb06fdb379a8)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/821943
Committed: https://opendev.org/openstack/nova/commit/94f17be190cce060ba8afcafbade4247b27b86f0
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 94f17be190cce060ba8afcafbade4247b27b86f0
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 16:36:41 2021 +0100

    Extend the reproducer for 1953359 and 1952915

    This patch extends the original reproduction
    I4be429c56aaa15ee12f448978c38214e741eae63 to cover
    bug 1952915 as well as they have a common root cause.

    Change-Id: I57982131768d87e067d1413012b96f1baa68052b
    Related-Bug: #1953359
    Related-Bug: #1952915
    (cherry picked from commit 9f296d775d8f58fcbd03393c81a023268c7071cb)
    (cherry picked from commit 0411962938ae1de39f8dccb03efe4567f82ad671)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820555
Committed: https://opendev.org/openstack/nova/commit/5f2f283a75243d2e2629d3c5f7e5ef4b3994972d
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 5f2f283a75243d2e2629d3c5f7e5ef4b3994972d
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 17:06:51 2021 +0100

    [rt] Apply migration context for incoming migrations

    There is a race condition between an incoming resize and an
    update_available_resource periodic in the resource tracker. The race
    window starts when the resize_instance RPC finishes and ends when the
    finish_resize compute RPC finally applies the migration context on the
    instance.

    In the race window, if the update_available_resource periodic is run on
    the destination node, then it will see the instance as being tracked on
    this host as the instance.node is already pointing to the dest. But the
    instance.numa_topology still points to the source host topology as the
    migration context is not applied yet. This leads to CPU pinning error if
    the source topology does not fit to the dest topology. Also it stops the
    periodic task and leaves the tracker in an inconsistent state. The
    inconsistent state only cleanup up after the periodic is run outside of
    the race window.

    This patch applies the migration context temporarily to the specific
    instances during the periodic to keep resource accounting correct.

    Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a
    Closes-Bug: #1953359
    Closes-Bug: #1952915
    (cherry picked from commit 32c1044d86a8d02712c8e3abdf8b3e4cff234a9c)
    (cherry picked from commit 1235dc324ebc1c6ac6dc94da0f45ffffcc546d2c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820558
Committed: https://opendev.org/openstack/nova/commit/e549fec76fd2015e6e21ee5138bf06142a71e71a
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit e549fec76fd2015e6e21ee5138bf06142a71e71a
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 16:36:41 2021 +0100

    Reproduce bug 1953359

    This patch adds a functional test that reproduces a race between
    incoming migration and the update_available_resource periodic

    Fixes:
        - Added more memory to mock 'host_info', since the default would not
          fit the instance. Default was changed in later releases

    Change-Id: I4be429c56aaa15ee12f448978c38214e741eae63
    Related-Bug: #1953359
    (cherry picked from commit c59224d715a21998f40f72cf4e37efdc990e4d7e)
    (cherry picked from commit f0a6d946aaa6c30f826cfced75c2fb06fdb379a8)
    (cherry picked from commit d8859e4f95f5abb20c844d914f2716cba047630e)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820856
Committed: https://opendev.org/openstack/nova/commit/8d4487465b60cd165dc76dea5a9fdb3c4dbf5740
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 8d4487465b60cd165dc76dea5a9fdb3c4dbf5740
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 16:36:41 2021 +0100

    Extend the reproducer for 1953359 and 1952915

    This patch extends the original reproduction
    I4be429c56aaa15ee12f448978c38214e741eae63 to cover
    bug 1952915 as well as they have a common root cause.

    Change-Id: I57982131768d87e067d1413012b96f1baa68052b
    Related-Bug: #1953359
    Related-Bug: #1952915
    (cherry picked from commit 9f296d775d8f58fcbd03393c81a023268c7071cb)
    (cherry picked from commit 0411962938ae1de39f8dccb03efe4567f82ad671)
    (cherry picked from commit 94f17be190cce060ba8afcafbade4247b27b86f0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820559
Committed: https://opendev.org/openstack/nova/commit/d54bd316b331d439a26a7318ca68cab5f6280ab2
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit d54bd316b331d439a26a7318ca68cab5f6280ab2
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 17:06:51 2021 +0100

    [rt] Apply migration context for incoming migrations

    There is a race condition between an incoming resize and an
    update_available_resource periodic in the resource tracker. The race
    window starts when the resize_instance RPC finishes and ends when the
    finish_resize compute RPC finally applies the migration context on the
    instance.

    In the race window, if the update_available_resource periodic is run on
    the destination node, then it will see the instance as being tracked on
    this host as the instance.node is already pointing to the dest. But the
    instance.numa_topology still points to the source host topology as the
    migration context is not applied yet. This leads to CPU pinning error if
    the source topology does not fit to the dest topology. Also it stops the
    periodic task and leaves the tracker in an inconsistent state. The
    inconsistent state only cleanup up after the periodic is run outside of
    the race window.

    This patch applies the migration context temporarily to the specific
    instances during the periodic to keep resource accounting correct.

    Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a
    Closes-Bug: #1953359
    Closes-Bug: #1952915
    (cherry picked from commit 32c1044d86a8d02712c8e3abdf8b3e4cff234a9c)
    (cherry picked from commit 1235dc324ebc1c6ac6dc94da0f45ffffcc546d2c)
    (cherry picked from commit 5f2f283a75243d2e2629d3c5f7e5ef4b3994972d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 22.4.0

This issue was fixed in the openstack/nova 22.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.2.0

This issue was fixed in the openstack/nova 23.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.1.0

This issue was fixed in the openstack/nova 24.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.0.0.0rc1

This issue was fixed in the openstack/nova 25.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/839353

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/839354

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/839355

Revision history for this message
johjuhyun (juhyun-joh) wrote :

Hello. Do you have any plan to backport it to rocky branch?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/nova/+/822047
Committed: https://opendev.org/openstack/nova/commit/c92e7821e3b97c8469fc2a68621428549d36d755
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit c92e7821e3b97c8469fc2a68621428549d36d755
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 16:36:41 2021 +0100

    Reproduce bug 1953359

    This patch adds a functional test that reproduces a race between
    incoming migration and the update_available_resource periodic

    Conflicts: fixed conflict on test_numa_server to only add test case for
    1953359

    Fixes:
        - Changed 'start_compute' call to 'start_computes', since the former
          is not present in Ussuri
        - Added more memory to mock 'host_info', since the default would not
          fit the instance. Default was changed in later releases
        - Bumped the API version from 2.0 to 2.1 in the test, since
          microversion 2.47 is required creating an instance
          in a specific host and 2.0 is not supporting microversions. This
          was not needed for later releases, because the API version was
          bumped with some changes made by [1]
        - Reset the original microversion in 'create_server' after the POST
          request, so that subsequent calls are not affected

    [1] Later change that bumps API version on parent classes
    https://review.opendev.org/c/openstack/nova/+/741282

    Co-Authored-By: Gabriel Silva Trevisan <email address hidden>

    Change-Id: I4be429c56aaa15ee12f448978c38214e741eae63
    Related-Bug: #1953359
    (cherry picked from commit c59224d715a21998f40f72cf4e37efdc990e4d7e)
    (cherry picked from commit f0a6d946aaa6c30f826cfced75c2fb06fdb379a8)
    (cherry picked from commit d8859e4f95f5abb20c844d914f2716cba047630e)
    (cherry picked from commit e549fec76fd2015e6e21ee5138bf06142a71e71a)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/nova/+/822048
Committed: https://opendev.org/openstack/nova/commit/9b8e5cec303a621824366e1794665d6b849fefad
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 9b8e5cec303a621824366e1794665d6b849fefad
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 16:36:41 2021 +0100

    Extend the reproducer for 1953359 and 1952915

    This patch extends the original reproduction
    I4be429c56aaa15ee12f448978c38214e741eae63 to cover
    bug 1952915 as well as they have a common root cause.

    Change-Id: I57982131768d87e067d1413012b96f1baa68052b
    Related-Bug: #1953359
    Related-Bug: #1952915
    (cherry picked from commit 9f296d775d8f58fcbd03393c81a023268c7071cb)
    (cherry picked from commit 0411962938ae1de39f8dccb03efe4567f82ad671)
    (cherry picked from commit 94f17be190cce060ba8afcafbade4247b27b86f0)
    (cherry picked from commit 8d4487465b60cd165dc76dea5a9fdb3c4dbf5740)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/nova/+/822050
Committed: https://opendev.org/openstack/nova/commit/1d0b7051da430ed00ae49901a32ec6af46c1a64e
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 1d0b7051da430ed00ae49901a32ec6af46c1a64e
Author: Balazs Gibizer <email address hidden>
Date: Mon Dec 6 17:06:51 2021 +0100

    [rt] Apply migration context for incoming migrations

    There is a race condition between an incoming resize and an
    update_available_resource periodic in the resource tracker. The race
    window starts when the resize_instance RPC finishes and ends when the
    finish_resize compute RPC finally applies the migration context on the
    instance.

    In the race window, if the update_available_resource periodic is run on
    the destination node, then it will see the instance as being tracked on
    this host as the instance.node is already pointing to the dest. But the
    instance.numa_topology still points to the source host topology as the
    migration context is not applied yet. This leads to CPU pinning error if
    the source topology does not fit to the dest topology. Also it stops the
    periodic task and leaves the tracker in an inconsistent state. The
    inconsistent state only cleanup up after the periodic is run outside of
    the race window.

    This patch applies the migration context temporarily to the specific
    instances during the periodic to keep resource accounting correct.

    Conflicts: on resource_tracker: changed
    'MigrationList.get_in_progress_and_error' call back to
    'MigrationList.get_in_progress_by_host_and_node', since this change was
    only added by 255b3f2f918843ca5dd9b99e109ecd2189b6b749, and is not
    present in stable/ussuri.

    Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a
    Closes-Bug: #1953359
    Closes-Bug: #1952915
    (cherry picked from commit 32c1044d86a8d02712c8e3abdf8b3e4cff234a9c)
    (cherry picked from commit 1235dc324ebc1c6ac6dc94da0f45ffffcc546d2c)
    (cherry picked from commit 5f2f283a75243d2e2629d3c5f7e5ef4b3994972d)
    (cherry picked from commit d54bd316b331d439a26a7318ca68cab5f6280ab2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/839353
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/839354
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/839355
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova ussuri-eol

This issue was fixed in the openstack/nova ussuri-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.