With the introduction of the cpu-resources work [1], (libvirt) hosts can
now report 'PCPU' inventory separate from 'VCPU' inventory, which is
consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As
part of that effort, we had to drop support for the ability to boot
instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want
hyperthreads) on hosts with hyperthreading. This had been previously
implemented by marking thread siblings of the host cores used by such an
instance as reserved and unusable by other instances, but such a design
wasn't possible in world where we had to track resource consumption in
placement before landing in the host. Instead, the 'isolate' policy now
simply means "give me a host without hyperthreads". This is enforced by
hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and
instances with the 'isolate' policy requesting
'HW_CPU_HYPERTHREADING=forbidden'.
Or at least, that's how it should work. We also have a fallback query
for placement to find hosts with 'VCPU' inventory and that doesn't care
about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure
hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could
continue to be scheduled to. We figured that this second fallback query
could accidentally pick up hosts with new-style configuration, but we
are also tracking the available and used cores from those listed in the
'[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects
(specifically, via the 'pcpuset' and 'cpu_pinning' fields of the
'NUMACell' child objects). These are validated by both the
'NUMATopologyFilter' and the virt driver itself, which means hosts with
new style configuration that got caught up in this second query would be
rejected by this filter or by a late failure on the host. (Hint: there's
much more detail on this in the spec).
Unfortunately we didn't think about hyperthreading. If a host gets
picked up in the second request, it might well have enough PCPU
inventory but simply be rejected in the first query since it had
hyperthreads. In this case, because it has enough free cores available
for pinning, neither the filter nor the virt driver will reject the
request, resulting in a situation whereby the instance ends up falling
back to the old code paths and consuming $flavor.vcpu host cores, plus
the thread siblings for each of these cores. Despite this, it will be
marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement.
This patch proves this to be the case, allowing us to resolve the issue
later.
Reviewed: https:/ /review. opendev. org/748251 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=49a793c8ee7 a9be26e4e3d6ddd 097a6ee6fea29d
Committed: https:/
Submitter: Zuul
Branch: stable/ussuri
commit 49a793c8ee7a9be 26e4e3d6ddd097a 6ee6fea29d
Author: Stephen Finucane <email address hidden>
Date: Thu Jul 30 17:37:38 2020 +0100
tests: Add reproducer for bug #1889633
With the introduction of the cpu-resources work [1], (libvirt) hosts can policy= dedicated' ). As thread_ policy= isolate' (i.e. I don't want HYPERTHREADING' trait, and CPU_HYPERTHREAD ING=forbidden' .
now report 'PCPU' inventory separate from 'VCPU' inventory, which is
consumed by instances with pinned CPUs ('hw:cpu_
part of that effort, we had to drop support for the ability to boot
instances with 'hw:cpu_
hyperthreads) on hosts with hyperthreading. This had been previously
implemented by marking thread siblings of the host cores used by such an
instance as reserved and unusable by other instances, but such a design
wasn't possible in world where we had to track resource consumption in
placement before landing in the host. Instead, the 'isolate' policy now
simply means "give me a host without hyperthreads". This is enforced by
hosts with hyperthreads reporting the 'HW_CPU_
instances with the 'isolate' policy requesting
'HW_
Or at least, that's how it should work. We also have a fallback query HYPERTHREADING' trait. This was envisioned to ensure gyFilter' and the virt driver itself, which means hosts with
for placement to find hosts with 'VCPU' inventory and that doesn't care
about the 'HW_CPU_
hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could
continue to be scheduled to. We figured that this second fallback query
could accidentally pick up hosts with new-style configuration, but we
are also tracking the available and used cores from those listed in the
'[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects
(specifically, via the 'pcpuset' and 'cpu_pinning' fields of the
'NUMACell' child objects). These are validated by both the
'NUMATopolo
new style configuration that got caught up in this second query would be
rejected by this filter or by a late failure on the host. (Hint: there's
much more detail on this in the spec).
Unfortunately we didn't think about hyperthreading. If a host gets
picked up in the second request, it might well have enough PCPU
inventory but simply be rejected in the first query since it had
hyperthreads. In this case, because it has enough free cores available
for pinning, neither the filter nor the virt driver will reject the
request, resulting in a situation whereby the instance ends up falling
back to the old code paths and consuming $flavor.vcpu host cores, plus
the thread siblings for each of these cores. Despite this, it will be
marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement.
This patch proves this to be the case, allowing us to resolve the issue
later.
[1] https:/ /specs. openstack. org/openstack/ nova-specs/ specs/train/ approved/ cpu-resources. html
Change-Id: I87cd4d14192b1a 40cbdca6e3af0f8 18f2cab613e 4d1481bdabd9d23 bc8d5d6a2e)
Signed-off-by: Stephen Finucane <email address hidden>
Related-Bug: #1889633
(cherry picked from commit 737e0c0111acd36