nova needs to disallow resource consumption changes on image rebuild

Bug #1763766 reported by Chris Friesen on 2018-04-13
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
sean mooney
Ocata
Medium
Unassigned
Pike
Medium
Unassigned
Queens
Medium
sean mooney

Bug Description

When doing a rebuild the assumption throughout the code is that we are not changing the resources consumed by the guest (that is what a resize is for). The complication here is that there are a number of image properties which might affect the instance resource consumption (in conjunction with a suitable flavor):

hw_numa_nodes=X
hw_numa_cpus.X=Y
hw_numa_mem.X=Y
hw_mem_page_size=X
hw_cpu_thread_policy=X
hw_cpu_policy=X

Due to the assumptions made in the rest of the code, we need to add a check to ensure that on a rebuild the above image properties do not differ between the old and new images.

While they might look suspicious, I think that the following image properties *should* be allowed to differ, since they only affect the topology seen by the guest:

hw_cpu_threads
hw_cpu_cores
hw_cpu_sockets
hw_cpu_max_threads
hw_cpu_max_cores
hw_cpu_max_sockets
hw_cpu_realtime_mask

Revision history for this message
Matt Riedemann (mriedem) wrote :

For background, this was discussed in bug 1750623 and deemed to be a separate bug which requires its own fix. As far as I can tell, this has been a problem since NUMA support was added, which I think was Juno.

Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Matt Riedemann (mriedem) wrote :

Some related discussion in the mailing list here:

http://lists.openstack.org/pipermail/openstack-dev/2018-April/129726.html

Revision history for this message
Hu Zhou (hu.zhou) wrote :

I am looking for a way to tell NUMAtopology filter's hardware.numa_fit_instance_to_host to skip tune above host topology in rebuild case.

Furthermore:
What's point of running all filters during rebuild with new image (not evacuation to new host)?
Except instance side changes are allowed e.g. guest CPU/NUMA topology, all image and flavor metadata changes should not be allowed when rebuild. Only image is replaced with whatever partitions and content inside.

Revision history for this message
Chris Friesen (cbf123) wrote :

Currently we allow rebuilding to a new image with different image properties, so we need to ensure that the new image properties take effect on a rebuild. For example, the new image could interact with the AggregateImagePropertiesIsolation or ImagePropertiesFilter scheduler filters so we need to ensure that these filters get run.

I'm not sure we need to skip hardware.numa_fit_instance_to_host() since I think it's okay if the topology as seen by the guest is changed over a rebuild.

The important thing is that the resource consumption as seen by nova and reported in placement can't change. So we need to disallow rebuilds to an image with properties that are incompatible with the current resource consumption.

summary: - nova needs to disallow topology changes on image rebuild
+ nova needs to disallow resource consumption changes on image rebuild
Revision history for this message
Hu Zhou (hu.zhou) wrote :

I hit a situation when free vcpus on host <= instance flavor used vcpus (CPU pinning requested). The rebuild such instance on the same host (e.g. Guest OS Upgrade with new image) will fail.
Even with 3 other bugs fixes:
    https://bugs.launchpad.net/nova/+bug/1750623
    https://bugs.launchpad.net/nova/+bug/1750618
    https://bugs.launchpad.net/nova/+bug/1772523
NUMAtopology filter's hardware.numa_fit_instance_to_host will will call _numa_fit_instance_cell_with_pinning() which compares host_cell.avail_cpus < required_cpus, then skip the host selection with debug message "Not enough available CPUs to schedule instance. Oversubscription is not possible ...".
With educated guess, memory fit may also fail.

What you pointed out is truly valid.
    "The important thing is that the resource consumption as seen by nova and reported in placement can't
    change. So we need to disallow rebuilds to an image with properties that are incompatible with the
    current resource consumption."

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/687957

Changed in nova:
assignee: nobody → sean mooney (sean-k-mooney)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/687957
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6f5358ac1992b17b7f3f99d9a32290e0d4740dae
Submitter: Zuul
Branch: master

commit 6f5358ac1992b17b7f3f99d9a32290e0d4740dae
Author: Sean Mooney <email address hidden>
Date: Thu Oct 10 18:00:07 2019 +0100

    Block rebuild when NUMA topology changed

    If the image change during a rebuild it's possible for the request
    NUMA topology to change. As a rebuild uses a noop claim in the
    resource tracker the NUMA topology will not be updated as part of
    a rebuild.

    If the NUMA constraints do not change, a rebuild will continue as normal.
    If the new constraints conflict with the existing NUMA constraints of the
    instance the rebuild will be rejected without altering the status of the
    instance.

    This change introduces an API check to block rebuild when the NUMA
    requirements for the new image do not match the existing NUMA constraints.
    This is in line with the previous check introduced to prevent the rebuild of
    volume-backed instances which similarly are not supported.

    This change adds functional tests to assert the expected behaviour of
    rebuilding NUMA instances with new images. This change also asserts that
    in place rebuilds of numa instances is currently not supported.

    Closes-Bug: #1763766
    Partial-implements: blueprint inplace-rebuild-of-numa-instances
    Change-Id: I0322d872bdff68936033a6f5a54e8296a6fb3434

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/698260

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/698530

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/700127

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/698260
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f6060ab6b54261ff50b8068732f6e509619d713e
Submitter: Zuul
Branch: master

commit f6060ab6b54261ff50b8068732f6e509619d713e
Author: Sean Mooney <email address hidden>
Date: Tue Dec 10 14:20:33 2019 +0000

    FUP for in-place numa rebuild

    This patch addresses a number of typos and minor
    issues raised during review of [1][2]. A summary
    of the changes are corrections to typos in comments,
    a correction to the exception message, an update to
    the release note and the addition of debug logging.

    [1] I0322d872bdff68936033a6f5a54e8296a6fb3434
    [2] I48bccc4b9adcac3c7a3e42769c11fdeb8f6fd132

    Related-Bug: #1804502
    Related-Bug: #1763766

    Change-Id: I8975e524cd5a9c7dfb065bb2dc8ceb03f1b89e7b

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/698530
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=745de99063bf77704a7f0610fe9e3647257eaa50
Submitter: Zuul
Branch: stable/train

commit 745de99063bf77704a7f0610fe9e3647257eaa50
Author: Sean Mooney <email address hidden>
Date: Thu Oct 10 18:00:07 2019 +0100

    Block rebuild when NUMA topology changed

    If the image change during a rebuild it's possible for the request
    NUMA topology to change. As a rebuild uses a noop claim in the
    resource tracker the NUMA topology will not be updated as part of
    a rebuild.

    If the NUMA constraints do not change, a rebuild will continue as normal.
    If the new constraints conflict with the existing NUMA constraints of the
    instance the rebuild will be rejected without altering the status of the
    instance.

    This change introduces an API check to block rebuild when the NUMA
    requirements for the new image do not match the existing NUMA constraints.
    This is in line with the previous check introduced to prevent the rebuild of
    volume-backed instances which similarly are not supported.

    This change adds functional tests to assert the expected behaviour of
    rebuilding NUMA instances with new images. This change also asserts that
    in place rebuilds of numa instances is currently not supported.

    Modifications:
     nova/tests/functional/libvirt/test_numa_servers.py

    NOTE(stephenfin): The new tests added in 'test_numa_servers.py' had to
    be modified to use the old-style '_wait_for_state_change' function,
    since change I80cdc0a33ec27b1389130c22f9c3a8ff69f6b1a0 isn't present on
    'stable/train' and it's too large and invasive to justify backporting.
    In addition, a 'super()' call had to be updated to use the Python 2
    compatible 'super(ClassName, self)' style.

    Closes-Bug: #1763766
    Partial-implements: blueprint inplace-rebuild-of-numa-instances
    Change-Id: I0322d872bdff68936033a6f5a54e8296a6fb3434
    (cherry picked from commit 6f5358ac1992b17b7f3f99d9a32290e0d4740dae)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/700127
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=48bb9a9663374936221144bb6a24688128a51146
Submitter: Zuul
Branch: stable/train

commit 48bb9a9663374936221144bb6a24688128a51146
Author: Sean Mooney <email address hidden>
Date: Tue Dec 10 14:20:33 2019 +0000

    FUP for in-place numa rebuild

    This patch addresses a number of typos and minor
    issues raised during review of [1][2]. A summary
    of the changes are corrections to typos in comments,
    a correction to the exception message, an update to
    the release note and the addition of debug logging.

    [1] I0322d872bdff68936033a6f5a54e8296a6fb3434
    [2] I48bccc4b9adcac3c7a3e42769c11fdeb8f6fd132

    Related-Bug: #1804502
    Related-Bug: #1763766

    Change-Id: I8975e524cd5a9c7dfb065bb2dc8ceb03f1b89e7b
    (cherry picked from commit f6060ab6b54261ff50b8068732f6e509619d713e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/702972

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/702974

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/703116

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/703118

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/703140

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/703142

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/702972
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9f57d16f38f0290718e3ba78393d064746f7e527
Submitter: Zuul
Branch: stable/stein

commit 9f57d16f38f0290718e3ba78393d064746f7e527
Author: Sean Mooney <email address hidden>
Date: Thu Oct 10 18:00:07 2019 +0100

    Block rebuild when NUMA topology changed

    If the image change during a rebuild it's possible for the request
    NUMA topology to change. As a rebuild uses a noop claim in the
    resource tracker the NUMA topology will not be updated as part of
    a rebuild.

    If the NUMA constraints do not change, a rebuild will continue as normal.
    If the new constraints conflict with the existing NUMA constraints of the
    instance the rebuild will be rejected without altering the status of the
    instance.

    This change introduces an API check to block rebuild when the NUMA
    requirements for the new image do not match the existing NUMA constraints.
    This is in line with the previous check introduced to prevent the rebuild of
    volume-backed instances which similarly are not supported.

    This change adds functional tests to assert the expected behaviour of
    rebuilding NUMA instances with new images. This change also asserts that
    in place rebuilds of numa instances is currently not supported.

    Conflicts:
        nova/api/openstack/compute/servers.py
        nova/tests/functional/libvirt/test_numa_servers.py

    NOTE(sean-k-mooney): due to the lack of
    Ifcda7336d56c9b623720ee018ec5697740986273 the Fake HostInfo objects
    created in the functional tests were updated to use the NUMAHostInfo
    class. Prior to Ifcda7336d56c9b623720ee018ec5697740986273 the
    Fake HostInfo class did not construct a numa topology from the kwargs
    and instead only set a numa topology if it was passed in during
    construction. In older release the initialization of the numa
    topology from kwargs was a feature of NUMAHostInfo.
    The servers.py conflicts are due to a lack of
    I5576fa2a67d2771614266022428b4a95487ab6d5 in Stein.

    Closes-Bug: #1763766
    Partial-implements: blueprint inplace-rebuild-of-numa-instances
    Change-Id: I0322d872bdff68936033a6f5a54e8296a6fb3434
    (cherry picked from commit 6f5358ac1992b17b7f3f99d9a32290e0d4740dae)
    (cherry picked from commit 745de99063bf77704a7f0610fe9e3647257eaa50)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/702974
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8346c527b379395851a9de063b4978b489076bf6
Submitter: Zuul
Branch: stable/stein

commit 8346c527b379395851a9de063b4978b489076bf6
Author: Sean Mooney <email address hidden>
Date: Tue Dec 10 14:20:33 2019 +0000

    FUP for in-place numa rebuild

    This patch addresses a number of typos and minor
    issues raised during review of [1][2]. A summary
    of the changes are corrections to typos in comments,
    a correction to the exception message, an update to
    the release note and the addition of debug logging.

    [1] I0322d872bdff68936033a6f5a54e8296a6fb3434
    [2] I48bccc4b9adcac3c7a3e42769c11fdeb8f6fd132

    Related-Bug: #1804502
    Related-Bug: #1763766

    Conflicts:
        nova/tests/functional/libvirt/test_numa_servers.py
    NOTE(sean-k-mooney): conflict was due to the use of
    NUMAHostInfo instead of HostInfo.

    Change-Id: I8975e524cd5a9c7dfb065bb2dc8ceb03f1b89e7b
    (cherry picked from commit f6060ab6b54261ff50b8068732f6e509619d713e)
    (cherry picked from commit 48bb9a9663374936221144bb6a24688128a51146)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.1.0

This issue was fixed in the openstack/nova 20.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.1.0

This issue was fixed in the openstack/nova 19.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/703116
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=643405b3a751f7547417e48e00db05d9fe8e99c4
Submitter: Zuul
Branch: stable/rocky

commit 643405b3a751f7547417e48e00db05d9fe8e99c4
Author: Sean Mooney <email address hidden>
Date: Thu Oct 10 18:00:07 2019 +0100

    Block rebuild when NUMA topology changed

    If the image change during a rebuild it's possible for the request
    NUMA topology to change. As a rebuild uses a noop claim in the
    resource tracker the NUMA topology will not be updated as part of
    a rebuild.

    If the NUMA constraints do not change, a rebuild will continue as normal.
    If the new constraints conflict with the existing NUMA constraints of the
    instance the rebuild will be rejected without altering the status of the
    instance.

    This change introduces an API check to block rebuild when the NUMA
    requirements for the new image do not match the existing NUMA constraints.
    This is in line with the previous check introduced to prevent the rebuild of
    volume-backed instances which similarly are not supported.

    This change adds functional tests to assert the expected behaviour of
    rebuilding NUMA instances with new images. This change also asserts that
    in place rebuilds of numa instances is currently not supported.

    Conflicts:
        nova/api/openstack/compute/servers.py
        nova/tests/functional/integrated_helpers.py
        nova/tests/functional/libvirt/test_numa_servers.py
        nova/tests/unit/compute/test_compute_api.py

    Modified:
        nova/tests/unit/api/openstack/compute/test_server_actions.py
        nova/tests/unit/compute/test_compute.py

    NOTE(sean-k-mooney): Due to the lack of changes
    I3b4c1153bebdeab2eb8bc2108aa177732f5c6a97 and
    I06fad233006c7bab14749a51ffa226c3801f951b,
    nova/api/openstack/compute/servers.py was modified to
    add exception handling for the 'ImageNUMATopologyRebuildConflict'
    on rebuild and 'nova/tests/functional/libvirt/test_numa_servers.py'
    was modified to mock the libvirt connection in the
    'NUMAServersRebuildTests.setUp' function.
    Due to a lack of I34ffaf285718059b55f90e812b57f1e11d566c6f
    'nova/tests/unit/api/openstack/compute/test_server_actions.py' and
    'nova/tests/unit/compute/test_compute.py' were updated to use valid
    image UUIDs.

    Closes-Bug: #1763766
    Partial-implements: blueprint inplace-rebuild-of-numa-instances
    Change-Id: I0322d872bdff68936033a6f5a54e8296a6fb3434
    (cherry picked from commit 6f5358ac1992b17b7f3f99d9a32290e0d4740dae)
    (cherry picked from commit 745de99063bf77704a7f0610fe9e3647257eaa50)
    (cherry picked from commit 9f57d16f38f0290718e3ba78393d064746f7e527)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/703118
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=84c63816602dcdf91885d20bb5d26cec336fb71e
Submitter: Zuul
Branch: stable/rocky

commit 84c63816602dcdf91885d20bb5d26cec336fb71e
Author: Sean Mooney <email address hidden>
Date: Tue Dec 10 14:20:33 2019 +0000

    FUP for in-place numa rebuild

    This patch addresses a number of typos and minor
    issues raised during review of [1][2]. A summary
    of the changes are corrections to typos in comments,
    a correction to the exception message, an update to
    the release note and the addition of debug logging.

    [1] I0322d872bdff68936033a6f5a54e8296a6fb3434
    [2] I48bccc4b9adcac3c7a3e42769c11fdeb8f6fd132

    Change-Id: I8975e524cd5a9c7dfb065bb2dc8ceb03f1b89e7b
    Related-Bug: #1804502
    Related-Bug: #1763766
    (cherry picked from commit f6060ab6b54261ff50b8068732f6e509619d713e)
    (cherry picked from commit 48bb9a9663374936221144bb6a24688128a51146)
    (cherry picked from commit 8346c527b379395851a9de063b4978b489076bf6)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers