Bug #1889633 “Pinned instance with thread policy can consume VCP...” : Bugs : OpenStack Compute (nova)

Revision history for this message

sean mooney (sean-k-mooney) wrote on 2020-07-30:

#1

this has a signicant upgrade impact so i think this is imporant to fix and backport.
i have repoduced this locally too so moveing to triaged.

Changed in nova:
importance:	Undecided → High
status:	New → Triaged

Stephen Finucane (stephenfinucane) on 2020-07-30

tags:

added: libvirt

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-30: Related fix proposed to nova (master)

#2

Related fix proposed to branch: master
Review: https://review.opendev.org/744020

Changed in nova:
assignee:	nobody → Stephen Finucane (stephenfinucane)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-30: Fix proposed to nova (master)

#3

Fix proposed to branch: master
Review: https://review.opendev.org/744021

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-26: Related fix merged to nova (master)

#4

Download full text (3.3 KiB)

Reviewed: https://review.opendev.org/744020
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=737e0c0111acd364d1481bdabd9d23bc8d5d6a2e
Submitter: Zuul
Branch: master

commit 737e0c0111acd364d1481bdabd9d23bc8d5d6a2e
Author: Stephen Finucane <email address hidden>
Date: Thu Jul 30 17:37:38 2020 +0100

tests: Add reproducer for bug #1889633

    With the introduction of the cpu-resources work [1], (libvirt) hosts can
    now report 'PCPU' inventory separate from 'VCPU' inventory, which is
    consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As
    part of that effort, we had to drop support for the ability to boot
    instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want
    hyperthreads) on hosts with hyperthreading. This had been previously
    implemented by marking thread siblings of the host cores used by such an
    instance as reserved and unusable by other instances, but such a design
    wasn't possible in world where we had to track resource consumption in
    placement before landing in the host. Instead, the 'isolate' policy now
    simply means "give me a host without hyperthreads". This is enforced by
    hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and
    instances with the 'isolate' policy requesting
    'HW_CPU_HYPERTHREADING=forbidden'.

    Or at least, that's how it should work. We also have a fallback query
    for placement to find hosts with 'VCPU' inventory and that doesn't care
    about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure
    hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could
    continue to be scheduled to. We figured that this second fallback query
    could accidentally pick up hosts with new-style configuration, but we
    are also tracking the available and used cores from those listed in the
    '[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects
    (specifically, via the 'pcpuset' and 'cpu_pinning' fields of the
    'NUMACell' child objects). These are validated by both the
    'NUMATopologyFilter' and the virt driver itself, which means hosts with
    new style configuration that got caught up in this second query would be
    rejected by this filter or by a late failure on the host. (Hint: there's
    much more detail on this in the spec).

    Unfortunately we didn't think about hyperthreading. If a host gets
    picked up in the second request, it might well have enough PCPU
    inventory but simply be rejected in the first query since it had
    hyperthreads. In this case, because it has enough free cores available
    for pinning, neither the filter nor the virt driver will reject the
    request, resulting in a situation whereby the instance ends up falling
    back to the old code paths and consuming $flavor.vcpu host cores, plus
    the thread siblings for each of these cores. Despite this, it will be
    marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement.

This patch proves this to be the case, allowing us to resolve the issue
later.

[1] https://specs.openstack.org/openstack/nova-specs/specs/train/app...

Reviewed:  https://review.opendev.org/744020
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=737e0c0111acd364d1481bdabd9d23bc8d5d6a2e
Submitter: Zuul
Branch:    master

commit 737e0c0111acd364d1481bdabd9d23bc8d5d6a2e
Author: Stephen Finucane <stephenfin@redhat.com>
Date:   Thu Jul 30 17:37:38 2020 +0100

tests: Add reproducer for bug #1889633
    
    With the introduction of the cpu-resources work [1], (libvirt) hosts can
    now report 'PCPU' inventory separate from 'VCPU' inventory, which is
    consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As
    part of that effort, we had to drop support for the ability to boot
    instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want
    hyperthreads) on hosts with hyperthreading. This had been previously
    implemented by marking thread siblings of the host cores used by such an
    instance as reserved and unusable by other instances, but such a design
    wasn't possible in world where we had to track resource consumption in
    placement before landing in the host. Instead, the 'isolate' policy now
    simply means "give me a host without hyperthreads". This is enforced by
    hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and
    instances with the 'isolate' policy requesting
    'HW_CPU_HYPERTHREADING=forbidden'.
    
    Or at least, that's how it should work. We also have a fallback query
    for placement to find hosts with 'VCPU' inventory and that doesn't care
    about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure
    hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could
    continue to be scheduled to. We figured that this second fallback query
    could accidentally pick up hosts with new-style configuration, but we
    are also tracking the available and used cores from those listed in the
    '[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects
    (specifically, via the 'pcpuset' and 'cpu_pinning' fields of the
    'NUMACell' child objects). These are validated by both the
    'NUMATopologyFilter' and the virt driver itself, which means hosts with
    new style configuration that got caught up in this second query would be
    rejected by this filter or by a late failure on the host. (Hint: there's
    much more detail on this in the spec).
    
    Unfortunately we didn't think about hyperthreading. If a host gets
    picked up in the second request, it might well have enough PCPU
    inventory but simply be rejected in the first query since it had
    hyperthreads. In this case, because it has enough free cores available
    for pinning, neither the filter nor the virt driver will reject the
    request, resulting in a situation whereby the instance ends up falling
    back to the old code paths and consuming $flavor.vcpu host cores, plus
    the thread siblings for each of these cores. Despite this, it will be
    marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement.
    
    This patch proves this to be the case, allowing us to resolve the issue
    later.
    
    [1] https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html
    
    Change-Id: I87cd4d14192b1a40cbdca6e3af0f818f2cab613e
    Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
    Related-Bug: #1889633

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-26: Fix merged to nova (master)

#5

Reviewed: https://review.opendev.org/744021
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9c270332041d6b98951c0b57d7b344fd551a413c
Submitter: Zuul
Branch: master

commit 9c270332041d6b98951c0b57d7b344fd551a413c
Author: Stephen Finucane <email address hidden>
Date: Thu Jul 30 17:36:24 2020 +0100

hardware: Reject requests for no hyperthreads on hosts with HT

    Attempting to boot an instance with 'hw:cpu_policy=dedicated' will
    result in a request from nova-scheduler to placement for allocation
    candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an
    instance with 'hw:cpu_thread_policy=isolate' will result in a request
    for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e.
    hosts without hyperthreading. This has been the case since the
    cpu-resources feature was implemented in Train. However, as part of that
    work and to enable upgrades from hosts that predated Train, we also make
    a second request for candidates with $flavor.vcpu 'VCPU' inventory. The
    idea behind this is that old compute nodes would only report 'VCPU' and
    should be useable, and any new compute nodes that got caught up in this
    second request could never actually be scheduled to since there wouldn't
    be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset'
    available to schedule to, resulting in rejection by the
    'NUMATopologyFilter'. However, if a host was rejected in the first
    query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could
    get picked up by the second query and would happily be scheduled to,
    resulting in an instance consuming 'VCPU' inventory from a host that
    properly supported 'PCPU' inventory.

    The solution is simply, though also a huge hack. If we detect that the
    host is using new style configuration and should be able to report
    'PCPU', check if the instance asked for no hyperthreading and whether
    the host has it. If all are True, reject the request.

    Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1889633

Reviewed:  https://review.opendev.org/744021
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9c270332041d6b98951c0b57d7b344fd551a413c
Submitter: Zuul
Branch:    master

commit 9c270332041d6b98951c0b57d7b344fd551a413c
Author: Stephen Finucane <stephenfin@redhat.com>
Date:   Thu Jul 30 17:36:24 2020 +0100

hardware: Reject requests for no hyperthreads on hosts with HT
    
    Attempting to boot an instance with 'hw:cpu_policy=dedicated' will
    result in a request from nova-scheduler to placement for allocation
    candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an
    instance with 'hw:cpu_thread_policy=isolate' will result in a request
    for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e.
    hosts without hyperthreading. This has been the case since the
    cpu-resources feature was implemented in Train. However, as part of that
    work and to enable upgrades from hosts that predated Train, we also make
    a second request for candidates with $flavor.vcpu 'VCPU' inventory. The
    idea behind this is that old compute nodes would only report 'VCPU' and
    should be useable, and any new compute nodes that got caught up in this
    second request could never actually be scheduled to since there wouldn't
    be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset'
    available to schedule to, resulting in rejection by the
    'NUMATopologyFilter'. However, if a host was rejected in the first
    query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could
    get picked up by the second query and would happily be scheduled to,
    resulting in an instance consuming 'VCPU' inventory from a host that
    properly supported 'PCPU' inventory.
    
    The solution is simply, though also a huge hack. If we detect that the
    host is using new style configuration and should be able to report
    'PCPU', check if the instance asked for no hyperthreading and whether
    the host has it. If all are True, reject the request.
    
    Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9
    Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
    Closes-Bug: #1889633

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-26: Related fix proposed to nova (stable/ussuri)

#6

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/748251

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-26: Fix proposed to nova (stable/ussuri)

#7

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/748252

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-26: Related fix proposed to nova (stable/train)

#8

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/748254

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-26: Fix proposed to nova (stable/train)

#9

Fix proposed to branch: stable/train
Review: https://review.opendev.org/748255

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-28: Related fix merged to nova (stable/ussuri)

#10

Download full text (3.4 KiB)

Reviewed: https://review.opendev.org/748251
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=49a793c8ee7a9be26e4e3d6ddd097a6ee6fea29d
Submitter: Zuul
Branch: stable/ussuri

commit 49a793c8ee7a9be26e4e3d6ddd097a6ee6fea29d
Author: Stephen Finucane <email address hidden>
Date: Thu Jul 30 17:37:38 2020 +0100

tests: Add reproducer for bug #1889633

    With the introduction of the cpu-resources work [1], (libvirt) hosts can
    now report 'PCPU' inventory separate from 'VCPU' inventory, which is
    consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As
    part of that effort, we had to drop support for the ability to boot
    instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want
    hyperthreads) on hosts with hyperthreading. This had been previously
    implemented by marking thread siblings of the host cores used by such an
    instance as reserved and unusable by other instances, but such a design
    wasn't possible in world where we had to track resource consumption in
    placement before landing in the host. Instead, the 'isolate' policy now
    simply means "give me a host without hyperthreads". This is enforced by
    hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and
    instances with the 'isolate' policy requesting
    'HW_CPU_HYPERTHREADING=forbidden'.

    Or at least, that's how it should work. We also have a fallback query
    for placement to find hosts with 'VCPU' inventory and that doesn't care
    about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure
    hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could
    continue to be scheduled to. We figured that this second fallback query
    could accidentally pick up hosts with new-style configuration, but we
    are also tracking the available and used cores from those listed in the
    '[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects
    (specifically, via the 'pcpuset' and 'cpu_pinning' fields of the
    'NUMACell' child objects). These are validated by both the
    'NUMATopologyFilter' and the virt driver itself, which means hosts with
    new style configuration that got caught up in this second query would be
    rejected by this filter or by a late failure on the host. (Hint: there's
    much more detail on this in the spec).

    Unfortunately we didn't think about hyperthreading. If a host gets
    picked up in the second request, it might well have enough PCPU
    inventory but simply be rejected in the first query since it had
    hyperthreads. In this case, because it has enough free cores available
    for pinning, neither the filter nor the virt driver will reject the
    request, resulting in a situation whereby the instance ends up falling
    back to the old code paths and consuming $flavor.vcpu host cores, plus
    the thread siblings for each of these cores. Despite this, it will be
    marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement.

This patch proves this to be the case, allowing us to resolve the issue
later.

[1] https://specs.openstack.org/openstack/nova-specs/specs/tr...

Reviewed:  https://review.opendev.org/748251
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=49a793c8ee7a9be26e4e3d6ddd097a6ee6fea29d
Submitter: Zuul
Branch:    stable/ussuri

commit 49a793c8ee7a9be26e4e3d6ddd097a6ee6fea29d
Author: Stephen Finucane <stephenfin@redhat.com>
Date:   Thu Jul 30 17:37:38 2020 +0100

tests: Add reproducer for bug #1889633
    
    With the introduction of the cpu-resources work [1], (libvirt) hosts can
    now report 'PCPU' inventory separate from 'VCPU' inventory, which is
    consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As
    part of that effort, we had to drop support for the ability to boot
    instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want
    hyperthreads) on hosts with hyperthreading. This had been previously
    implemented by marking thread siblings of the host cores used by such an
    instance as reserved and unusable by other instances, but such a design
    wasn't possible in world where we had to track resource consumption in
    placement before landing in the host. Instead, the 'isolate' policy now
    simply means "give me a host without hyperthreads". This is enforced by
    hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and
    instances with the 'isolate' policy requesting
    'HW_CPU_HYPERTHREADING=forbidden'.
    
    Or at least, that's how it should work. We also have a fallback query
    for placement to find hosts with 'VCPU' inventory and that doesn't care
    about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure
    hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could
    continue to be scheduled to. We figured that this second fallback query
    could accidentally pick up hosts with new-style configuration, but we
    are also tracking the available and used cores from those listed in the
    '[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects
    (specifically, via the 'pcpuset' and 'cpu_pinning' fields of the
    'NUMACell' child objects). These are validated by both the
    'NUMATopologyFilter' and the virt driver itself, which means hosts with
    new style configuration that got caught up in this second query would be
    rejected by this filter or by a late failure on the host. (Hint: there's
    much more detail on this in the spec).
    
    Unfortunately we didn't think about hyperthreading. If a host gets
    picked up in the second request, it might well have enough PCPU
    inventory but simply be rejected in the first query since it had
    hyperthreads. In this case, because it has enough free cores available
    for pinning, neither the filter nor the virt driver will reject the
    request, resulting in a situation whereby the instance ends up falling
    back to the old code paths and consuming $flavor.vcpu host cores, plus
    the thread siblings for each of these cores. Despite this, it will be
    marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement.
    
    This patch proves this to be the case, allowing us to resolve the issue
    later.
    
    [1] https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html
    
    Change-Id: I87cd4d14192b1a40cbdca6e3af0f818f2cab613e
    Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
    Related-Bug: #1889633
    (cherry picked from commit 737e0c0111acd364d1481bdabd9d23bc8d5d6a2e)

tags:

added: in-stable-ussuri

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-28: Fix merged to nova (stable/ussuri)

#11

Reviewed: https://review.opendev.org/748252
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7ddab327675d36a4ba59d02d22d042d418236336
Submitter: Zuul
Branch: stable/ussuri

commit 7ddab327675d36a4ba59d02d22d042d418236336
Author: Stephen Finucane <email address hidden>
Date: Thu Jul 30 17:36:24 2020 +0100

hardware: Reject requests for no hyperthreads on hosts with HT

    Attempting to boot an instance with 'hw:cpu_policy=dedicated' will
    result in a request from nova-scheduler to placement for allocation
    candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an
    instance with 'hw:cpu_thread_policy=isolate' will result in a request
    for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e.
    hosts without hyperthreading. This has been the case since the
    cpu-resources feature was implemented in Train. However, as part of that
    work and to enable upgrades from hosts that predated Train, we also make
    a second request for candidates with $flavor.vcpu 'VCPU' inventory. The
    idea behind this is that old compute nodes would only report 'VCPU' and
    should be useable, and any new compute nodes that got caught up in this
    second request could never actually be scheduled to since there wouldn't
    be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset'
    available to schedule to, resulting in rejection by the
    'NUMATopologyFilter'. However, if a host was rejected in the first
    query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could
    get picked up by the second query and would happily be scheduled to,
    resulting in an instance consuming 'VCPU' inventory from a host that
    properly supported 'PCPU' inventory.

    The solution is simply, though also a huge hack. If we detect that the
    host is using new style configuration and should be able to report
    'PCPU', check if the instance asked for no hyperthreading and whether
    the host has it. If all are True, reject the request.

    Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1889633
    (cherry picked from commit 9c270332041d6b98951c0b57d7b344fd551a413c)

Reviewed:  https://review.opendev.org/748252
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7ddab327675d36a4ba59d02d22d042d418236336
Submitter: Zuul
Branch:    stable/ussuri

commit 7ddab327675d36a4ba59d02d22d042d418236336
Author: Stephen Finucane <stephenfin@redhat.com>
Date:   Thu Jul 30 17:36:24 2020 +0100

hardware: Reject requests for no hyperthreads on hosts with HT
    
    Attempting to boot an instance with 'hw:cpu_policy=dedicated' will
    result in a request from nova-scheduler to placement for allocation
    candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an
    instance with 'hw:cpu_thread_policy=isolate' will result in a request
    for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e.
    hosts without hyperthreading. This has been the case since the
    cpu-resources feature was implemented in Train. However, as part of that
    work and to enable upgrades from hosts that predated Train, we also make
    a second request for candidates with $flavor.vcpu 'VCPU' inventory. The
    idea behind this is that old compute nodes would only report 'VCPU' and
    should be useable, and any new compute nodes that got caught up in this
    second request could never actually be scheduled to since there wouldn't
    be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset'
    available to schedule to, resulting in rejection by the
    'NUMATopologyFilter'. However, if a host was rejected in the first
    query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could
    get picked up by the second query and would happily be scheduled to,
    resulting in an instance consuming 'VCPU' inventory from a host that
    properly supported 'PCPU' inventory.
    
    The solution is simply, though also a huge hack. If we detect that the
    host is using new style configuration and should be able to report
    'PCPU', check if the instance asked for no hyperthreading and whether
    the host has it. If all are True, reject the request.
    
    Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9
    Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
    Closes-Bug: #1889633
    (cherry picked from commit 9c270332041d6b98951c0b57d7b344fd551a413c)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-07: Related fix merged to nova (stable/train)

#12

Download full text (3.4 KiB)

Reviewed: https://review.opendev.org/748254
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b60be4a9416cb5b15b7accb99c6e5ecdac40c3c9
Submitter: Zuul
Branch: stable/train

commit b60be4a9416cb5b15b7accb99c6e5ecdac40c3c9
Author: Stephen Finucane <email address hidden>
Date: Thu Jul 30 17:37:38 2020 +0100

tests: Add reproducer for bug #1889633

    With the introduction of the cpu-resources work [1], (libvirt) hosts can
    now report 'PCPU' inventory separate from 'VCPU' inventory, which is
    consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As
    part of that effort, we had to drop support for the ability to boot
    instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want
    hyperthreads) on hosts with hyperthreading. This had been previously
    implemented by marking thread siblings of the host cores used by such an
    instance as reserved and unusable by other instances, but such a design
    wasn't possible in world where we had to track resource consumption in
    placement before landing in the host. Instead, the 'isolate' policy now
    simply means "give me a host without hyperthreads". This is enforced by
    hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and
    instances with the 'isolate' policy requesting
    'HW_CPU_HYPERTHREADING=forbidden'.

    Or at least, that's how it should work. We also have a fallback query
    for placement to find hosts with 'VCPU' inventory and that doesn't care
    about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure
    hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could
    continue to be scheduled to. We figured that this second fallback query
    could accidentally pick up hosts with new-style configuration, but we
    are also tracking the available and used cores from those listed in the
    '[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects
    (specifically, via the 'pcpuset' and 'cpu_pinning' fields of the
    'NUMACell' child objects). These are validated by both the
    'NUMATopologyFilter' and the virt driver itself, which means hosts with
    new style configuration that got caught up in this second query would be
    rejected by this filter or by a late failure on the host. (Hint: there's
    much more detail on this in the spec).

    Unfortunately we didn't think about hyperthreading. If a host gets
    picked up in the second request, it might well have enough PCPU
    inventory but simply be rejected in the first query since it had
    hyperthreads. In this case, because it has enough free cores available
    for pinning, neither the filter nor the virt driver will reject the
    request, resulting in a situation whereby the instance ends up falling
    back to the old code paths and consuming $flavor.vcpu host cores, plus
    the thread siblings for each of these cores. Despite this, it will be
    marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement.

This patch proves this to be the case, allowing us to resolve the issue
later.

[1] https://specs.openstack.org/openstack/nova-specs/specs/tra...

Reviewed:  https://review.opendev.org/748254
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b60be4a9416cb5b15b7accb99c6e5ecdac40c3c9
Submitter: Zuul
Branch:    stable/train

commit b60be4a9416cb5b15b7accb99c6e5ecdac40c3c9
Author: Stephen Finucane <stephenfin@redhat.com>
Date:   Thu Jul 30 17:37:38 2020 +0100

tests: Add reproducer for bug #1889633
    
    With the introduction of the cpu-resources work [1], (libvirt) hosts can
    now report 'PCPU' inventory separate from 'VCPU' inventory, which is
    consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As
    part of that effort, we had to drop support for the ability to boot
    instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want
    hyperthreads) on hosts with hyperthreading. This had been previously
    implemented by marking thread siblings of the host cores used by such an
    instance as reserved and unusable by other instances, but such a design
    wasn't possible in world where we had to track resource consumption in
    placement before landing in the host. Instead, the 'isolate' policy now
    simply means "give me a host without hyperthreads". This is enforced by
    hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and
    instances with the 'isolate' policy requesting
    'HW_CPU_HYPERTHREADING=forbidden'.
    
    Or at least, that's how it should work. We also have a fallback query
    for placement to find hosts with 'VCPU' inventory and that doesn't care
    about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure
    hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could
    continue to be scheduled to. We figured that this second fallback query
    could accidentally pick up hosts with new-style configuration, but we
    are also tracking the available and used cores from those listed in the
    '[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects
    (specifically, via the 'pcpuset' and 'cpu_pinning' fields of the
    'NUMACell' child objects). These are validated by both the
    'NUMATopologyFilter' and the virt driver itself, which means hosts with
    new style configuration that got caught up in this second query would be
    rejected by this filter or by a late failure on the host. (Hint: there's
    much more detail on this in the spec).
    
    Unfortunately we didn't think about hyperthreading. If a host gets
    picked up in the second request, it might well have enough PCPU
    inventory but simply be rejected in the first query since it had
    hyperthreads. In this case, because it has enough free cores available
    for pinning, neither the filter nor the virt driver will reject the
    request, resulting in a situation whereby the instance ends up falling
    back to the old code paths and consuming $flavor.vcpu host cores, plus
    the thread siblings for each of these cores. Despite this, it will be
    marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement.
    
    This patch proves this to be the case, allowing us to resolve the issue
    later.
    
    [1] https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html
    
    Change-Id: I87cd4d14192b1a40cbdca6e3af0f818f2cab613e
    Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
    Related-Bug: #1889633
    (cherry picked from commit 737e0c0111acd364d1481bdabd9d23bc8d5d6a2e)
    (cherry picked from commit 49a793c8ee7a9be26e4e3d6ddd097a6ee6fea29d)

tags:

added: in-stable-train

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-12: Fix merged to nova (stable/train)

#13

Reviewed: https://review.opendev.org/748255
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=44676ddf843ba84e26721cd2e3f65dc45a881f66
Submitter: Zuul
Branch: stable/train

commit 44676ddf843ba84e26721cd2e3f65dc45a881f66
Author: Stephen Finucane <email address hidden>
Date: Thu Jul 30 17:36:24 2020 +0100

hardware: Reject requests for no hyperthreads on hosts with HT

    Attempting to boot an instance with 'hw:cpu_policy=dedicated' will
    result in a request from nova-scheduler to placement for allocation
    candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an
    instance with 'hw:cpu_thread_policy=isolate' will result in a request
    for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e.
    hosts without hyperthreading. This has been the case since the
    cpu-resources feature was implemented in Train. However, as part of that
    work and to enable upgrades from hosts that predated Train, we also make
    a second request for candidates with $flavor.vcpu 'VCPU' inventory. The
    idea behind this is that old compute nodes would only report 'VCPU' and
    should be useable, and any new compute nodes that got caught up in this
    second request could never actually be scheduled to since there wouldn't
    be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset'
    available to schedule to, resulting in rejection by the
    'NUMATopologyFilter'. However, if a host was rejected in the first
    query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could
    get picked up by the second query and would happily be scheduled to,
    resulting in an instance consuming 'VCPU' inventory from a host that
    properly supported 'PCPU' inventory.

    The solution is simply, though also a huge hack. If we detect that the
    host is using new style configuration and should be able to report
    'PCPU', check if the instance asked for no hyperthreading and whether
    the host has it. If all are True, reject the request.

    Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1889633
    (cherry picked from commit 9c270332041d6b98951c0b57d7b344fd551a413c)
    (cherry picked from commit 7ddab327675d36a4ba59d02d22d042d418236336)

Reviewed:  https://review.opendev.org/748255
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=44676ddf843ba84e26721cd2e3f65dc45a881f66
Submitter: Zuul
Branch:    stable/train

commit 44676ddf843ba84e26721cd2e3f65dc45a881f66
Author: Stephen Finucane <stephenfin@redhat.com>
Date:   Thu Jul 30 17:36:24 2020 +0100

hardware: Reject requests for no hyperthreads on hosts with HT
    
    Attempting to boot an instance with 'hw:cpu_policy=dedicated' will
    result in a request from nova-scheduler to placement for allocation
    candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an
    instance with 'hw:cpu_thread_policy=isolate' will result in a request
    for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e.
    hosts without hyperthreading. This has been the case since the
    cpu-resources feature was implemented in Train. However, as part of that
    work and to enable upgrades from hosts that predated Train, we also make
    a second request for candidates with $flavor.vcpu 'VCPU' inventory. The
    idea behind this is that old compute nodes would only report 'VCPU' and
    should be useable, and any new compute nodes that got caught up in this
    second request could never actually be scheduled to since there wouldn't
    be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset'
    available to schedule to, resulting in rejection by the
    'NUMATopologyFilter'. However, if a host was rejected in the first
    query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could
    get picked up by the second query and would happily be scheduled to,
    resulting in an instance consuming 'VCPU' inventory from a host that
    properly supported 'PCPU' inventory.
    
    The solution is simply, though also a huge hack. If we detect that the
    host is using new style configuration and should be able to report
    'PCPU', check if the instance asked for no hyperthreading and whether
    the host has it. If all are True, reject the request.
    
    Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9
    Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
    Closes-Bug: #1889633
    (cherry picked from commit 9c270332041d6b98951c0b57d7b344fd551a413c)
    (cherry picked from commit 7ddab327675d36a4ba59d02d22d042d418236336)

OpenStack Compute (nova)

Pinned instance with thread policy can consume VCPU

Bug Description

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to
OpenStack Compute (nova)	Fix Released	High	Stephen Finucane
Train	Fix Released	High	Stephen Finucane
Ussuri	Fix Released	High	Stephen Finucane