[SRU] confirm resize fails with CPUUnpinningInvalid
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Balazs Gibizer | ||
Ussuri |
New
|
Undecided
|
Unassigned | ||
Ubuntu Cloud Archive |
New
|
Undecided
|
Unassigned | ||
Ussuri |
New
|
Undecided
|
Unassigned | ||
nova (Ubuntu) |
New
|
Undecided
|
Unassigned | ||
Focal |
New
|
Undecided
|
Unassigned |
Bug Description
* SRU DESCRIPTION BELOW *
Nova has a race condition between resize_instance() compute manager call and the update_
I've pushed a reproduction test: https:/
It is reproducible at least on master, xena, wallaby, and victoria
===============
SRU DESCRIPTION
===============
[Impact]
Due to a race condition the tracking of pinned CPU resources can go off-sync causing "No valid host" errors while being unable to create new instances with CPU pinning, as the previous pinned CPUs were not marked as freed.
Part of the reason is addressed in the fix for LP#1953359 where a migration context is not pointing to the proper node during the race condition window, resulting in a CPUPinningInvalid error. This fix complements LP#1953359 by addressing the improper tracking of resources that happens only when the resource tracker periodic job runs in the source node while the flavor registered corresponds to the one of the destination. That is solved by setting the instance.old_flavor so the CPU pinning resources are tracked properly.
[Test case]
The test case for this was already implemented on non-live functional tests upstream:
in nova/tests/
- test_resize_
- test_resize_
- test_resize_
As this is a race condition it is very difficult to validate, even upstream, so the functional tests mock certain parts of the code to simulate the entire scope of the workflow. It is a non-live functional test, so it is more akin to a broader unit test.
[Regression Potential]
The code is considered stable today in newer releases and the scope of the code affected is fairly limited. Given that it is a race condition that it is difficult to validate, despite the non-live functional tests, the regression potential is moderate.
[Other Info]
None.
Changed in nova: | |
assignee: | nobody → Balazs Gibizer (balazs-gibizer) |
importance: | Undecided → Medium |
description: | updated |
tags: | added: compute numa race-condition resize |
description: | updated |
summary: |
- confirm resize fails with CPUUnpinningInvalid + [SRU] confirm resize fails with CPUUnpinningInvalid |
description: | updated |
Fix proposed to branch: master /review. opendev. org/c/openstack /nova/+ /810763
Review: https:/