'hw:cpu_thread_policy=isolate' does not schedule on non-HT hosts

Bug #1550317 reported by Stephen Finucane
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Stephen Finucane
Mitaka
Fix Released
Undecided
Stephen Finucane

Bug Description

The 'isolate' policy is supposed to function on both hosts with HyperThreading (HT) and those without. The former works, but the latter does not. This appears to be a regression. Results below.

---

# Platform

Testing was conducted on two single-node, Fedora 23-based
(4.3.5-300.fc23.x86_64) OpenStack instances (built with devstack). The systems
are a dual-socket, ten core, systems with HT enabled on one and disabled
on the other (2 sockets * 10 cores * 1/2 threads
= 20/40 "pCPUs". 0-9/0-9,20-29 = node0, 10-19/10-19,30-39 = node1).

Commit `8bafc9` of Nova was used.

# Steps

## Create flavors

    $ openstack flavor create pinned.isolate \
        --id 103 --ram 2048 --disk 0 --vcpus 4
    $ openstack flavor set pinned.isolate \
        --property "hw:cpu_policy=dedicated" \
        --property "hw:cpu_thread_policy=isolate"

## Validate a HT-enabled node

This should match the expectations of the spec and provide a single thread
to guests while avoiding other guests scheduling on the other host
sibling threads. Therefore, the guest should see four sockets, one core
per socket, and one thread per core.

    $ openstack server create --flavor=pinned.isolate \
        --image=cirros-0.3.4-x86_64-uec --wait test1

    $ sudo virsh list
     Id Name State
    ----------------------------------------------------
     3 instance-00000003 running

    $ sudo virsh dumpxml 3
    <domain type='kvm' id='3'>
      <name>instance-00000003</name>
      ...
      <vcpu placement='static'>4</vcpu>
      <cputune>
        <shares>4096</shares>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='0'/>
        <vcpupin vcpu='2' cpuset='25'/>
        <vcpupin vcpu='3' cpuset='8'/>
        <emulatorpin cpuset='0-1,8,25'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='0'/>
        <memnode cellid='0' mode='strict' nodeset='0'/>
      </numatune>
      ...
      <cpu>
        <topology sockets='4' cores='1' threads='1'/>
        <numa>
          <cell id='0' cpus='0-3' memory='2097152' unit='KiB'/>
        </numa>
      </cpu>
      ...
    </domain>

    $ openstack server delete test1

No problems here.

## Validate a HT-disabled node

This should work exactly the same here as it did on the HT-enabled host,
minus the reservation of any thread sibling (there aren't any)

    $ openstack server create --flavor=pinned.isolate \
        --image=cirros-0.3.4-x86_64-uec --wait test1
    Error creating server: test1

    Error creating server

    $ openstack server list
    +--------------------------------------+-------+--------+----------+
    | ID | Name | Status | Networks |
    +--------------------------------------+-------+--------+----------+
    | 1f212d45-585e-41df-abd7-6abb12ca86a1 | test1 | ERROR | |
    +--------------------------------------+-------+--------+----------+

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/285321

Changed in nova:
assignee: nobody → Stephen Finucane (sfinucan)
status: New → In Progress
tags: added: mitaka-backport-potential
Changed in nova:
assignee: Stephen Finucane (sfinucan) → Waldemar Znoinski (wznoinsk)
Changed in nova:
assignee: Waldemar Znoinski (wznoinsk) → Stephen Finucane (sfinucan)
Jay Pipes (jaypipes)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/285321
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0b2e34f92507fd490faaec3285049b28446dc94c
Submitter: Jenkins
Branch: master

commit 0b2e34f92507fd490faaec3285049b28446dc94c
Author: Stephen Finucane <email address hidden>
Date: Fri Feb 26 13:07:56 2016 +0000

    virt/hardware: Fix 'isolate' case on non-SMT hosts

    The 'isolate' policy is supposed to function on both hosts with an
    SMT architecture (e.g. HyperThreading) and those without. The former
    is true, but the latter is broken due to a an underlying implementation
    detail in how vCPUs are "packed" onto pCPUs.

    The '_pack_instance_onto_cores' function expects to work with a list of
    sibling sets. Since non-SMT hosts don't have siblings, the function is
    being given a list of all cores as one big sibling set. However, this
    conflicts with the idea that, in the 'isolate' case, only one sibling
    from each sibling set should be used. Using one sibling from the one
    available sibling set means it is not possible to schedule instances
    with more than one vCPU.

    Resolve this mismatch by instead providing the function with a list of
    multiple sibling sets, each containing a single core.

    This also resolves another bug. When booting instances on a non-HT
    host, the resulting NUMA topology should not define threads. By
    correctly considering the cores on these systems as non-siblings,
    the resulting instance topology will contain multiple cores with only
    a single thread in each.

    Change-Id: I2153f25fdb6382ada8e62fddf4215d9a0e3a6aa7
    Closes-bug: #1550317
    Closes-bug: #1417723

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/nova 14.0.0.0b1

This issue was fixed in the openstack/nova 14.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/326944

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/326944
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=57638a8c83f98a8c139378b3b1e920b4f1c49a95
Submitter: Jenkins
Branch: stable/mitaka

commit 57638a8c83f98a8c139378b3b1e920b4f1c49a95
Author: Stephen Finucane <email address hidden>
Date: Fri Feb 26 13:07:56 2016 +0000

    virt/hardware: Fix 'isolate' case on non-SMT hosts

    The 'isolate' policy is supposed to function on both hosts with an
    SMT architecture (e.g. HyperThreading) and those without. The former
    is true, but the latter is broken due to a an underlying implementation
    detail in how vCPUs are "packed" onto pCPUs.

    The '_pack_instance_onto_cores' function expects to work with a list of
    sibling sets. Since non-SMT hosts don't have siblings, the function is
    being given a list of all cores as one big sibling set. However, this
    conflicts with the idea that, in the 'isolate' case, only one sibling
    from each sibling set should be used. Using one sibling from the one
    available sibling set means it is not possible to schedule instances
    with more than one vCPU.

    Resolve this mismatch by instead providing the function with a list of
    multiple sibling sets, each containing a single core.

    This also resolves another bug. When booting instances on a non-HT
    host, the resulting NUMA topology should not define threads. By
    correctly considering the cores on these systems as non-siblings,
    the resulting instance topology will contain multiple cores with only
    a single thread in each.

    Change-Id: I2153f25fdb6382ada8e62fddf4215d9a0e3a6aa7
    Closes-bug: #1550317
    Closes-bug: #1417723
    (cherry picked from commit 0b2e34f92507fd490faaec3285049b28446dc94c)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 13.1.1

This issue was fixed in the openstack/nova 13.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.