Rebuild server with NUMATopologyFilter enabled fails (in some cases)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
Medium
|
sean mooney | ||
| Queens |
Low
|
sean mooney | ||
| Rocky |
Medium
|
Lee Yarwood | ||
| Stein |
Medium
|
sean mooney | ||
| Train |
Medium
|
sean mooney |
Bug Description
Description
===========
server rebuild will fail in nova scheduler on NUMATopologyFilter if the computes do not have enough capacity (even though clearly the running server is already accounted into that calculation)
to resolve the issue a fix is required in NUMATopologyFilter to not perform the rebuild operation in the case that the request is due to rebuild.
the result of such a case will be that server rebuild will fail with error of "no valid host found"
(do not mix resize with rebuild functions...)
Steps to reproduce
==================
1. create a flavor that contain metadata that will point to a specific compute (use host aggregate with same key:value metadata
make sure flavor contain topology related metadata:
hw:cpu_cores='1', hw:cpu_
2. create a server on that compute (preferably using heat stack)
3. (try to) rebuild the server using stack update
4. issue reproduced
Expected result
===============
server in an active running state (if image was replaced in the rebuild command than with a reference to the new image in the server details.
Actual result
=============
server in error state with error of no valid host found.
Message
No valid host was found. There are not enough hosts available.
Code
500
Details
File "/usr/lib/
Environment
===========
detected in Rocky release
KVM hypervisor
Ceph storage
Neutron networks
Logs & Configs
==============
in nova.conf:
enabled_
logs: tbd
Changed in nova: | |
assignee: | nobody → Inbar Stolberg (inbarsto) |
description: | updated |
tags: | added: numa scheduler |
description: | updated |
Changed in nova: | |
status: | New → Confirmed |
Changed in nova: | |
status: | Confirmed → In Progress |
Inbar Stolberg (inbarsto) wrote : | #2 |
suggested fix: https:/
sean mooney (sean-k-mooney) wrote : | #3 |
as per my comment https:/
we cannont skip valdiating the numa toplogy of a host on rebuild as the image can alter the guest numa toplogy.
if we rebuilt with the same image we skip going back to the schduler so the only time we go to the schduler on rebuild if if the image changed which means we cannot assume there is enough space on the current host.
as presented this bug is invalid however if you can present a way to re validate that the existing numa toplogy is valid with the new image instad of just skiping that may be reasonable.
Changed in nova: | |
status: | In Progress → Invalid |
Inbar Stolberg (inbarsto) wrote : | #4 |
@sean-k-mooney the bug still exist so please don't disqualify the bug if you don't like the PR attached to it.
also the PR solves most of the issue without causing new issues, the only scenario it does not fix is the one you rightfully mentioned but to solve it will require an extremely large change and it is not likely that it will be done any time soon.
please reconsider the PR (as mentioned it fixes some not all of the cases).
Changed in nova: | |
status: | Invalid → In Progress |
Changed in nova: | |
status: | In Progress → Confirmed |
David Hill (david-hill-ubisoft) wrote : | #5 |
Couldn't we simply skip scheduling if the images properties remained unchanged ? I mean, let's say I have RHEL 7.5 image with a given metadata, create a new RHEL 7.6 image containing the excat same metadata, why should we go through scheduling back again ? This is an issue for customer lacking resources ...
OpenStack Infra (hudson-openstack) wrote : | #6 |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
assignee: | Inbar Stolberg (inbarsto) → David Hill (david-hill-ubisoft) |
status: | Confirmed → In Progress |
sean mooney (sean-k-mooney) wrote : | #7 |
@inbar stoberg
i was not disqualifying it as invalid because i did not like the proposed change.
i marked it as invalid as in place rebuild for instance with a numa toplogy has never
been supported so this is a new feature not a bug.
the numa topogy filter works by delegating to the nova.virt.
a new cpu and memory assiginment for an instance on a given host. if the hardware module
is able to calualted an assignemnt give the constratits of the image and flaovr then
the filter reports the host passes.
the hardware module does not have the concept of a rebuild so it always calulates the assignment
as if it was a new instance. to make the filter work in an inplace rebuild case woudl require
the hardware module to be extended to be able to revalidate the exstitng assignment in the resouce track.
@david hill
ack it is but it is something that has never been supported so its not a bug.
that said your could work but i have left a review pointing out that you are checking the wrong
image properties. if you generalise your approch to check all image properties that start with
hw_numa and hw_cpu + hw_mem_page_size it might be workable.
tags: | added: rebuild |
Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https:/
Reason: This looks abandoned so I'm going to abandon it. I didn't read all of the details, but ignoring this filter during rebuild if the image changes risks just pushing the failure down to the compute since we don't actually claim for rebuild in the compute which is bug 1763766.
OpenStack Infra (hudson-openstack) wrote : | #9 |
Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https:/
Changed in nova: | |
status: | In Progress → Confirmed |
Changed in nova: | |
assignee: | David Hill (david-hill-ubisoft) → nobody |
importance: | Undecided → Medium |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
assignee: | nobody → sean mooney (sean-k-mooney) |
status: | Confirmed → In Progress |
Inbar Stolberg (inbarsto) wrote : | #11 |
please see comments on PR: https:/
PR contains same logic as https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 3f9411071d4c1a0
Author: Sean Mooney <email address hidden>
Date: Mon Oct 21 16:17:17 2019 +0000
Disable NUMATopologyFilter on rebuild
This change leverages the new NUMA constraint checking added in
in I0322d872bdff68
NUMATopolog
As the new behavior of rebuild enfroces that no changes
to the numa constraints are allowed on rebuild we no longer
need to execute the NUMATopologyFilter. Previously
the NUMATopologyFilter would process the rebuild request
as if it was a request to spawn a new instnace as the
numa_
As such prior to this change a rebuild would only succeed
if a host had enough additional capacity for a second instance
on the same host meeting the requirement of the new image and
existing flavor. This behavior was incorrect on two counts as
a rebuild uses a noop claim. First the resouce usage cannot
change so it was incorrect to require the addtional capacity
to rebuild an instance. Secondly it was incorrect not to assert
the resouce usage remained the same.
I0322d872bd
rebuild against altering the resouce usage and this change
allows in place rebuild.
This change found a latent bug that will be adressed in a follow
up change and updated the functional tests to note the incorrect
behavior.
Change-Id: I48bccc4b9adcac
Closes-Bug: #1804502
Implements: blueprint inplace-
Changed in nova: | |
status: | In Progress → Fix Released |
Related fix proposed to branch: master
Review: https:/
Fix proposed to branch: stable/train
Review: https:/
Related fix proposed to branch: stable/train
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit f6060ab6b54261f
Author: Sean Mooney <email address hidden>
Date: Tue Dec 10 14:20:33 2019 +0000
FUP for in-place numa rebuild
This patch addresses a number of typos and minor
issues raised during review of [1][2]. A summary
of the changes are corrections to typos in comments,
a correction to the exception message, an update to
the release note and the addition of debug logging.
[1] I0322d872bdff68
[2] I48bccc4b9adcac
Related-Bug: #1804502
Related-Bug: #1763766
Change-Id: I8975e524cd5a9c
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/train
commit 94c0362918169a1
Author: Sean Mooney <email address hidden>
Date: Mon Oct 21 16:17:17 2019 +0000
Disable NUMATopologyFilter on rebuild
This change leverages the new NUMA constraint checking added in
in I0322d872bdff68
NUMATopolog
As the new behavior of rebuild enfroces that no changes
to the numa constraints are allowed on rebuild we no longer
need to execute the NUMATopologyFilter. Previously
the NUMATopologyFilter would process the rebuild request
as if it was a request to spawn a new instnace as the
numa_
As such prior to this change a rebuild would only succeed
if a host had enough additional capacity for a second instance
on the same host meeting the requirement of the new image and
existing flavor. This behavior was incorrect on two counts as
a rebuild uses a noop claim. First the resouce usage cannot
change so it was incorrect to require the addtional capacity
to rebuild an instance. Secondly it was incorrect not to assert
the resouce usage remained the same.
I0322d872bd
rebuild against altering the resouce usage and this change
allows in place rebuild.
This change found a latent bug that will be adressed in a follow
up change and updated the functional tests to note the incorrect
behavior.
Change-Id: I48bccc4b9adcac
Closes-Bug: #1804502
Implements: blueprint inplace-
(cherry picked from commit 3f9411071d4c1a0
tags: | added: in-stable-train |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/train
commit 48bb9a966337493
Author: Sean Mooney <email address hidden>
Date: Tue Dec 10 14:20:33 2019 +0000
FUP for in-place numa rebuild
This patch addresses a number of typos and minor
issues raised during review of [1][2]. A summary
of the changes are corrections to typos in comments,
a correction to the exception message, an update to
the release note and the addition of debug logging.
[1] I0322d872bdff68
[2] I48bccc4b9adcac
Related-Bug: #1804502
Related-Bug: #1763766
Change-Id: I8975e524cd5a9c
(cherry picked from commit f6060ab6b54261f
Fix proposed to branch: stable/stein
Review: https:/
Related fix proposed to branch: stable/stein
Review: https:/
Fix proposed to branch: stable/rocky
Review: https:/
Related fix proposed to branch: stable/rocky
Review: https:/
Fix proposed to branch: stable/queens
Review: https:/
Related fix proposed to branch: stable/queens
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit 4a691c33d136117
Author: Sean Mooney <email address hidden>
Date: Mon Oct 21 16:17:17 2019 +0000
Disable NUMATopologyFilter on rebuild
This change leverages the new NUMA constraint checking added in
in I0322d872bdff68
NUMATopolog
As the new behavior of rebuild enfroces that no changes
to the numa constraints are allowed on rebuild we no longer
need to execute the NUMATopologyFilter. Previously
the NUMATopologyFilter would process the rebuild request
as if it was a request to spawn a new instnace as the
numa_
As such prior to this change a rebuild would only succeed
if a host had enough additional capacity for a second instance
on the same host meeting the requirement of the new image and
existing flavor. This behavior was incorrect on two counts as
a rebuild uses a noop claim. First the resouce usage cannot
change so it was incorrect to require the addtional capacity
to rebuild an instance. Secondly it was incorrect not to assert
the resouce usage remained the same.
I0322d872bd
rebuild against altering the resouce usage and this change
allows in place rebuild.
This change found a latent bug that will be adressed in a follow
up change and updated the functional tests to note the incorrect
behavior.
Change-Id: I48bccc4b9adcac
Closes-Bug: #1804502
Implements: blueprint inplace-
(cherry picked from commit 3f9411071d4c1a0
(cherry picked from commit 94c0362918169a1
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit 8346c527b379395
Author: Sean Mooney <email address hidden>
Date: Tue Dec 10 14:20:33 2019 +0000
FUP for in-place numa rebuild
This patch addresses a number of typos and minor
issues raised during review of [1][2]. A summary
of the changes are corrections to typos in comments,
a correction to the exception message, an update to
the release note and the addition of debug logging.
[1] I0322d872bdff68
[2] I48bccc4b9adcac
Related-Bug: #1804502
Related-Bug: #1763766
Conflicts:
NOTE(
NUMAHostInfo instead of HostInfo.
Change-Id: I8975e524cd5a9c
(cherry picked from commit f6060ab6b54261f
(cherry picked from commit 48bb9a966337493
tags: | added: in-stable-stein |
This issue was fixed in the openstack/nova 20.1.0 release.
This issue was fixed in the openstack/nova 19.1.0 release.
Laurent Dumont (baconpackets) wrote : | #29 |
Hey everyone,
We are tracking down a similar issue where a in-place rebuild through Heat might fail depending on the resources in used by other instances on the compute. I'm trying to get a reproducible scenario but I'm unable to.
Is there any specific combination of NUMA topology, SRIOV that triggers this?
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit f08d0ccf844e127
Author: Sean Mooney <email address hidden>
Date: Mon Oct 21 16:17:17 2019 +0000
Disable NUMATopologyFilter on rebuild
This change leverages the new NUMA constraint checking added in
in I0322d872bdff68
NUMATopolog
As the new behavior of rebuild enfroces that no changes
to the numa constraints are allowed on rebuild we no longer
need to execute the NUMATopologyFilter. Previously
the NUMATopologyFilter would process the rebuild request
as if it was a request to spawn a new instnace as the
numa_
As such prior to this change a rebuild would only succeed
if a host had enough additional capacity for a second instance
on the same host meeting the requirement of the new image and
existing flavor. This behavior was incorrect on two counts as
a rebuild uses a noop claim. First the resouce usage cannot
change so it was incorrect to require the addtional capacity
to rebuild an instance. Secondly it was incorrect not to assert
the resouce usage remained the same.
I0322d872bd
rebuild against altering the resouce usage and this change
allows in place rebuild.
This change found a latent bug that will be adressed in a follow
up change and updated the functional tests to note the incorrect
behavior.
Conflicts:
NOTE(
Change-Id: I48bccc4b9adcac
Closes-Bug: #1804502
Implements: blueprint inplace-
(cherry picked from commit 3f9411071d4c1a0
(cherry picked from commit 94c0362918169a1
(cherry picked from commit 4a691c33d136117
tags: | added: in-stable-rocky |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit 84c63816602dcdf
Author: Sean Mooney <email address hidden>
Date: Tue Dec 10 14:20:33 2019 +0000
FUP for in-place numa rebuild
This patch addresses a number of typos and minor
issues raised during review of [1][2]. A summary
of the changes are corrections to typos in comments,
a correction to the exception message, an update to
the release note and the addition of debug logging.
[1] I0322d872bdff68
[2] I48bccc4b9adcac
Change-Id: I8975e524cd5a9c
Related-Bug: #1804502
Related-Bug: #1763766
(cherry picked from commit f6060ab6b54261f
(cherry picked from commit 48bb9a966337493
(cherry picked from commit 8346c527b379395
Fix proposed to branch: master /review. openstack. org/629646
Review: https:/