OpenStack Compute (nova)

Odd number of vCPUs breaks 'prefer' threads policy

Bug #1467927 reported by Stephen Finucane on 2015-06-23

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Undecided	Stephen Finucane

Bug Description

Using a CPU policy of dedicated ('hw:cpu_policy=dedicated') results in vCPUs being pinned to pCPUs, per the original blueprint:

http://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/virt-driver-cpu-pinning.html

When scheduling instance with this extra spec there appears to be an implicit use of the 'prefer' threads policy, i.e. where possible vCPUs are pinned to thread siblings first. This is "implicit" because the threads policy aspect of this spec has not yet been implemented.

However, this implicit 'prefer' policy breaks when a VM with an odd number of vCPUs is booted. This has been seen on a Hyper-Threading-enabled host where "sibling sets" are two long, but it would presumably happen on any host where the number of siblings (or any number between this value and one) is not an factor of the number of vCPUs (i.e. vCPUs % n != 0, for siblings <= n > 0).

It is reasonable to assume that a three vCPU VM, for example, should try best effort and use siblings for at the first two vCPUs of the VM (assuming you're on a host system with HyperThreading and sibling sets are of length two). This would give us a true best effort implementation.

---

# Testing Configuration

Testing was conducted on a single-node, Fedora 21-based (3.17.8-300.fc21.x86_64) OpenStack instance (built with devstack). The system is a dual-socket, 10 core, HT-enabled system (2 sockets * 10 cores * 2 threads = 40 "pCPUs". 0-9,20-29 = node0, 10-19,30-39 = node1). Two flavors were used:

openstack flavor create --ram 4096 --disk 20 --vcpus 3 demo.odd
nova flavor-key demo.odd set hw:cpu_policy=dedicated

openstack flavor create --ram 4096 --disk 20 --vcpus 4 demo.even
nova flavor-key demo.even set hw:cpu_policy=dedicated

# Results

Correct case ("even" number of vCPUs)
=====================================

The output from 'virsh dumpxml [ID]' for the four vCPU VM is given below. Similar results can be seen for varying "even" numbers of vCPUs (2, 4, 10 tested):

Incorrect case ("odd" number of vCPUs)
======================================

The output from 'virsh dumpxml [ID]' for the three vCPU VM is given below. Similar results can be seen for varying "odd" numbers of vCPUs (3, 5 tested):

This isn't correct. We would expect something closer to this:

Tags:

Stephen Finucane (stephenfinucane) on 2015-06-23

Changed in nova:
assignee:	nobody → Stephen Finucane (sfinucan)

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2015-06-24:

@Stephen Finucane (sfinucan):

Since you are set as assignee, I switch the status to "In Progress".

Changed in nova:
status:	New → In Progress

Revision history for this message

Daniel Berrange (berrange) wrote on 2015-06-29:

Before considering the placement on the host OS, you need to consider the topology exposed to the guest OS

For a 3 vCPU guests there are 3 possible topologies

- 1 socket, 1 core, 3 threads
- 1 socket, 3 cores, 1 thread
- 3 sockets, 1 core, 1 thread

In the description it does not mention any of the flavour / image settings for vCPU topology, so I'm assuming the testing was done with the defaults, which means 3 sockets, 1 core, 1 thread. Given that every vCPU would appear as a socket, Nova should not be attempting to prefer hyperthread siblings at all, because telling the guest that it has separate sockets, when in fact it is running on thread siblings would result in bad guest schedular decisions. The Nova schedular should only prefer pinning to hyperthread siblings in the host, if the guest topology also has been setup with hyperthreads.

Revision history for this message

Stephen Finucane (stephenfinucane) wrote on 2015-06-30:

I was unsure how important the CPU topology exposed to the guest was, but you're correct in saying that using a best-effort prefer policy would result in bad scheduler decisions. We still have an 'implicit' separate policy for odd numbers of cores and an implicit 'prefer' policy for even numbers, but seeing as we don't really support thread policies yet then this isn't really a bug.

I will close this bug and keep the above in mind when adding support for the thread policies.

Changed in nova:
status:	In Progress → Invalid

OpenStack Infra (hudson-openstack) on 2015-10-20

Changed in nova:
assignee:	Stephen Finucane (sfinucan) → Nikola Đipanov (ndipanov)
status:	Invalid → In Progress

Revision history for this message

Chris Friesen (cbf123) wrote on 2015-11-09:

I think Daniel has a good point, that it's invalid to prefer thread siblings and then tell the guest it's on separate sockets. I would suggest that it's also not valid to use host cores and tell the guest they're separate sockets.

Ideally the topology we specify to the guest should match the topology on the host in order to allow the guest scheduler to make the best possible decisions.

So I think we should fix the fact that odd/even gives different behaviour, but ideally we should also fix https://bugs.launchpad.net/nova/+bug/1417723

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-12: Fix merged to nova (master)

Reviewed: https://review.openstack.org/229573
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=46e98cfeb8463c68f912052ace036b490774a89f
Submitter: Jenkins
Branch: master

commit 46e98cfeb8463c68f912052ace036b490774a89f
Author: Nikola Dipanov <email address hidden>
Date: Wed Sep 30 16:24:28 2015 +0100

Revert "Store correct VirtCPUTopology"

This reverts commit 8358936a24cd223046580ddfa3bfb37a943abc91.

    The bug patch being reverted was trying to fix is not a bug bug at all
    but is actually by design. The NUMA cell CPU
    topology was meant to carry information about threads that we want to
    expose to the single cell based on how it was fitted with regards to
    threading on the host, so that we can expose this information to the
    guest OS if possible for optimal performance.

    The final topology exposed to the guest takes this information into
    account as well as any request for particular topology passed in by the
    user and decides on the final solution. There is no reason to store this
    as it will be different for different hosts.

We want to revert this as it is really not something that we want to be
doing, and will make it easier to cleanly fix 1501358.

    Change-Id: Iae06c468c076337b9d6e85d0d0dc85a063b827d1
    Partial-bug: 1501358
    Partial-bug: 1467927

OpenStack Infra (hudson-openstack) on 2015-12-01

Changed in nova:
assignee:	Nikola Đipanov (ndipanov) → Stephen Finucane (sfinucan)

OpenStack Infra (hudson-openstack) on 2015-12-03

Changed in nova:
assignee:	Stephen Finucane (sfinucan) → Nikola Đipanov (ndipanov)

OpenStack Infra (hudson-openstack) on 2015-12-10

Changed in nova:
assignee:	Nikola Đipanov (ndipanov) → Stephen Finucane (sfinucan)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-05: Related fix merged to nova (master)

Reviewed: https://review.openstack.org/229574
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cc500b2c9880ceb62686e9f1f5db8e6d0e7613d1
Submitter: Jenkins
Branch: master

commit cc500b2c9880ceb62686e9f1f5db8e6d0e7613d1
Author: Nikola Dipanov <email address hidden>
Date: Wed Sep 30 17:21:49 2015 +0100

hardware: stop using instance cell topology in CPU pinning logic

    Currently we consider the existing topology of the instance NUMA cell when
    deciding on the proper way to fit instance CPUs onto host. This is
    actually wrong after https://review.openstack.org/#/c/198312/. We don't
    need to consider the requested topology in the CPU fitting any more as
    the code that decides on the final CPU topology takes all of this into
    account.

    We still need to expose the topology so that the spawning process can
    use that to calculate the final CPU topology for the instance, but we
    should not consider this information relevant on any of the subsequent
    CPU pinning fit recalculations.

    Change-Id: I9a947a984bf8ec6413e9b4c45d61e8989bf903a1
    Co-Authored-By: Stephen Finucane <email address hidden>
    Related-bug: 1501358
    Related-bug: 1467927

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-05: Fix merged to nova (master)

Reviewed: https://review.openstack.org/229575
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d5bed8fad9d51dd0d4ba9cf452c6f5c8c543b264
Submitter: Jenkins
Branch: master

commit d5bed8fad9d51dd0d4ba9cf452c6f5c8c543b264
Author: Nikola Dipanov <email address hidden>
Date: Wed Sep 30 19:30:59 2015 +0100

Fix CPU pinning for odd number of CPUs w hyperthreading

    Previous logic was not handling the cases when the instance cell has an
    odd number of vcpus that is larger than any sibling set on the host. The
    logic would (accidentally) either reject to pin such an instance even
    though there is ample free cores, or would (in case there were enough
    free sibling sets on the host) spread the instance instead of pack it
    onto siblings as the default policy suggests.

    This patch fixes some incorrect assumptions in the code, while also
    simplifying it. As an added bonus - we still attempt to expose
    (via the Topology, and this time correctly) the largest possible number
    of threads that we can expose to the instance.

    Finally - we add some more comments to clear up the intent behind the
    current packing logic with pointers how it could be tweaked to achieve
    different results in the future.

    Change-Id: I2c0b3b250ffb1a7483299df13b317cdb24f8141d
    Co-Authored-By: Stephen Finucane <email address hidden>
    Closes-bug: 1501358
    Closes-bug: 1467927

Changed in nova:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1501358

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.