Comment 15 for bug 1944759

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/810913
Committed: https://opendev.org/openstack/nova/commit/c8b04d183f560a616a79577c4d4ae9465d03e541
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit c8b04d183f560a616a79577c4d4ae9465d03e541
Author: Balazs Gibizer <email address hidden>
Date: Fri Sep 24 15:17:28 2021 +0200

    Store old_flavor already on source host during resize

    During resize, on the source host, in resize_instance(), the instance.host
    and .node is updated to point to the destination host. This indicates to
    the source host's resource tracker that the allocation of this instance
    does not need to be tracked as an instance but as an outbound migration
    instead. However for the source host's resource tracker to do that it,
    needs to use the instance.old_flavor. Unfortunately the
    instance.old_flavor is only set during finish_resize() on the
    destination host. (resize_instance cast to the finish_resize). So it is
    possible that a running resize_instance() set the instance.host to point
    to the destination and then before the finish_resize could set the
    old_flavor an update_available_resources periodic runs on the source
    host. This causes that the allocation of this instance is not tracked as
    an instance as the instance.host point to the destination but it is not
    tracked as a migration either as the instance.old_flavor is not yet set.
    So the allocation on the source host is simply dropped by the periodic
    job.

    When such migration is confirmed the confirm_resize() tries to drop
    the same resource allocation again but fails as the pinned CPUs of the
    instance already freed.

    When such migration is reverted instead, then revert succeeds but the
    source host resource allocation will not contain the resource allocation
    of the instance until the next update_available_resources periodic runs
    and corrects it.

    This does not affect resources tracked exclusively in placement (e.g.
    VCPU, MEMORY_MB, DISK_GB) but it does affect NUMA related resource that
    are still tracked in the resource tracker (e.g. huge pages, pinned
    CPUs).

    This patch moves the instance.old_flavor setting to the source node to
    the same transaction that sets the instance.host to point to the
    destination host. Hence solving the race condition.

    Change-Id: Ic0d6c59147abe5e094e8f13e0d3523b178daeba9
    Closes-Bug: #1944759
    (cherry picked from commit b841e553214be9a732703e2dfed6c97698ef9b71)
    (cherry picked from commit d4edcd62bae44c01885268a6cf7b7fae92617060)