Pinned instance with thread policy can consume VCPU
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Stephen Finucane | ||
Train |
Fix Released
|
High
|
Stephen Finucane | ||
Ussuri |
Fix Released
|
High
|
Stephen Finucane |
Bug Description
In Train, we introduced the concept of the 'PCPU' resource type to track pinned instance CPU usage. The '[compute] cpu_dedicated_set' is used to indicate which host cores should be used by pinned instances and, once this config option was set, nova would start reporting 'PCPU' resource types in addition to (or entirely instead of, if 'cpu_shared_set' was unset) 'VCPU'. Requests for pinned instances (via the 'hw:cpu_
We anticipated some upgrade issues with this change, whereby there could be a period during an upgrade in which some hosts would have the new configuration, meaning they'd be reporting PCPU, but the remainder would still be on legacy config and therefore would continue reporting just VCPU. An instance could be reasonably expected to land on any host, but since only the hosts with the new configuration were reporting 'PCPU' inventory and the 'hw:cpu_
We worked around this issue by adding support for a fallback placement query, enabled by default, which would make a second request using 'VCPU' inventory instead of 'PCPU'. The idea behind this was that the hosts with 'PCPU' inventory would be preferred, meaning we'd only try the 'VCPU' allocation if the preferred path failed. Crucially, we anticipated that if a host with new style configuration was picked up by this second 'VCPU' query, an instance would never actually be able to build there. This is because the new-style configuration would be reflected in the 'numa_topology' blob of the 'ComputeNode' object, specifically via the 'cpuset' (for cores allocated to 'VCPU') and 'pcpuset' (for cores allocated to 'PCPU') fields. With new-style configuration, both of these are set to unique values. If the scheduler had determined that there wasn't enough 'PCPU' inventory available for the instance, that would implicitly mean there weren't enough of the cores listed in the 'pcpuset' field still available.
Turns out there's a gap in this thinking: thread policies. The 'isolate' CPU thread policy previously meant "give me a host with no hyperthreads, else a host with hyperthreads but mark the thread siblings of the cores used by the instance as reserved". This didn't translate to a new 'PCPU' world where we needed to know how many cores we were consuming up front before landing on the host. To work around this, we removed support for the latter case and instead relied on a trait, 'HW_CPU_
# Steps to reproduce
1. Using a host with hyperthreading support enabled, configure both '[compute] cpu_dedicated_set' and '[compute] cpu_shared_set'
2. Boot an instance with the 'hw:cpu_
# Expected result
Instance should not boot since the host has hyperthreads.
# Actual result
Instance boots.
tags: | added: libvirt |
this has a signicant upgrade impact so i think this is imporant to fix and backport.
i have repoduced this locally too so moveing to triaged.