Tempest failure due to possible affinity group race and cpu pinning

Bug #2023693 reported by Frank Ritchie
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

Description
===========

The temptest test:

test_create_server_with_scheduler_hint_group_affinity

fails in Openstack Yoga but passes with Openstack Victoria.

The test is run on the same hardware with the same configuration.

-----

Relevant info:

1: cpu pinning is enabled via vcpu_pin_set in nova.conf
2: the property hw:cpu_policy=dedicated is set in the flavor

This configuration has literally been working for years.

There seems to be a race type situation where both claims are made before the cpu free list is updated.

-----

Relevant logs:

CPU 64 in the list of usable CPUs

2023-06-09 21:26:01.223 858862 INFO nova.virt.hardware [-] Computed NUMA topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84], [82, 38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68], [80, 36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60], [26, 70]], vCPUs mapping: [(0, 64)]

The first claim is made:

2023-06-09 21:26:01.223 858862 INFO nova.compute.claims [-] [instance: ecc5bf99-9583-4acd-b075-19535e380c67] Claim successful on node foo.example.com

CPU 64 is still available:

2023-06-09 21:26:01.261 858862 INFO nova.virt.hardware [-] Computed NUMA topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84], [82, 38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68], [80, 36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60], [26, 70]], vCPUs mapping: [(0, 64)]

The second claim is made:

2023-06-09 21:26:01.262 858862 INFO nova.compute.claims [-] [instance: f65fe4dd-5733-4a9d-be71-32f79e514906] Claim successful on node foo.example.com

The error is now seen:

2023-06-09 21:26:01.351 858862 ERROR nova.compute.manager [-] [instance: f65fe4dd-5733-4a9d-be71-32f79e514906] Failed to build and run instance: nova.exception.CPUPinningInvalid: CPU set to pin [64] must be a subset of free CPU set [8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 52, 54, 56, 58, 60, 62, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86]

Additional error:

ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance...

Steps to reproduce
==================

Enable CPU pinning with Openstack Nova and run the tempest test:

test_create_server_with_scheduler_hint_group_affinity

It fails every time for me.

Expected result
===============

Test passes

Actual result
=============

Test fails

Environment
===========

Nova version: 25.0.2

Revision history for this message
Frank Ritchie (fritchie) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.