when using dedicated cpus, the emulator thread should be affined as well

Bug #1417671 reported by Chris Friesen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Daniel Berrange

Bug Description

I'm running nova trunk, commit 752954a.

I configured a flavor with two vcpus and extra specs "hw:cpu_policy=dedicated" in order to enable vcpu pinning.

I booted up an instance with this flavor, and "virsh dumpxml" shows that the two vcpus were affined suitably to host cpus, but the emulator thread was left to float across the available host cores on that numa node.

  <cputune>
    <shares>2048</shares>
        <vcpupin vcpu='0' policy='other' priority='0' cpuset='4'/>
        <vcpupin vcpu='1' policy='other' priority='0' cpuset='5'/>
    <emulatorpin cpuset='3-11'/>
  </cputune>

Looking at the kvm process shortly after creation, we see quite a few emulator threads running with the emulatorpin affinity:

compute-2:~$ taskset -apc 136143
pid 136143's current affinity list: 3-11
pid 136144's current affinity list: 0,3-24,27-47
pid 136146's current affinity list: 4
pid 136147's current affinity list: 5
pid 136149's current affinity list: 0
pid 136433's current affinity list: 3-11
pid 136434's current affinity list: 3-11
pid 136435's current affinity list: 3-11
pid 136436's current affinity list: 3-11
pid 136437's current affinity list: 3-11
pid 136438's current affinity list: 3-11
pid 136439's current affinity list: 3-11
pid 136440's current affinity list: 3-11
pid 136441's current affinity list: 3-11
pid 136442's current affinity list: 3-11
pid 136443's current affinity list: 3-11
pid 136444's current affinity list: 3-11
pid 136445's current affinity list: 3-11
pid 136446's current affinity list: 3-11
pid 136447's current affinity list: 3-11
pid 136448's current affinity list: 3-11
pid 136449's current affinity list: 3-11
pid 136450's current affinity list: 3-11
pid 136451's current affinity list: 3-11
pid 136452's current affinity list: 3-11
pid 136453's current affinity list: 3-11
pid 136454's current affinity list: 3-11

Since the purpose of "hw:cpu_policy=dedicated" is to provide a dedicated host CPU for each guest CPU, the libvirt emulatorpin cpuset for a given guest should be set to one (or possibly more) of the CPUs specified for that guest. Otherwise, any work done by the emulator threads could rob CPU time from another guest instance.

Personally I'd like to see the emulator thread affined the same as guest vCPU 0 (we use guest vCPU0 as a maintenance processor while doing the "real work" on the other vCPUs), but an argument could be made that it should be affined to the logical OR of all the guest vCPU cpusets.

Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/154580

Changed in nova:
assignee: nobody → Daniel Berrange (berrange)
status: Confirmed → In Progress
Revision history for this message
Daniel Berrange (berrange) wrote :

> Personally I'd like to see the emulator thread affined the same as guest vCPU 0 (we use guest vCPU0 as a maintenance
> processor while doing the "real work" on the other vCPUs), but an argument could be made that it should be affined to the
> logical OR of all the guest vCPU cpusets.

The design intention from the original blueprint was to use the union of the vCPU cpusets, so that's what my proposed fix does.

It is certainly valid to file a wishlist bug to support other configurable policies though. There are at least 3 credible alternative policies and no one solution will be perfect for all use cases.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/154845

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/154846

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/154845
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6e7f916aac885615d8cef5b0df52d24919836cc3
Submitter: Jenkins
Branch: master

commit 6e7f916aac885615d8cef5b0df52d24919836cc3
Author: Daniel P. Berrange <email address hidden>
Date: Wed Feb 11 12:19:49 2015 +0000

    libvirt: Fix logically inconsistent host NUMA topology

    The _fake_caps_numa_topology() sets up a fake host NUMA topology
    that the tests will use. The definition it has setup though is
    logically inconsistent. The setup as 4 nodes with 2 CPUs in each
    cell, but it then says there are 2 cores per socket and 2 thread
    siblings per core. This implies 4 CPUs per cell, not 2. Either
    the topology should have 2 cores with 1 thread, or 1 core with
    2 threads. The test cases using this topology are written to
    treat it as meaning 1 core with 2 threads, so fix it according
    to this interpretation

    Related-bug: 1417671
    Change-Id: I9d7928a6f9b15f8eb29ef92606e6ca1d2b688a6e

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/154846
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=face0fd9251c923c9aecaa19db9779f908840b5d
Submitter: Jenkins
Branch: master

commit face0fd9251c923c9aecaa19db9779f908840b5d
Author: Daniel P. Berrange <email address hidden>
Date: Wed Feb 11 12:25:38 2015 +0000

    libvirt: rewrite NUMA topology generator to be more flexible

    The current _fake_caps_numa_topology method returns a NUMA topology
    with 4 cells, 1 socket per cell, 1 core per socket & 2 threads per
    core.

    Rewrite the method to it can generate topologies with arbitrary
    values for cells, sockets, cores & threads.

    Related-bug: 1417671
    Change-Id: I94d5b1eeb34d6681683996a5d01033f12eb7c5b8

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/154580
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=353e823cc31a62464049c6abbc62a67152e64bae
Submitter: Jenkins
Branch: master

commit 353e823cc31a62464049c6abbc62a67152e64bae
Author: Daniel P. Berrange <email address hidden>
Date: Tue Feb 10 17:46:15 2015 +0000

    libvirt: fix emulator thread pinning when doing strict CPU pinning

    When guest vCPUs are confined to particular set of host pCPUs, the
    emulator threads from QEMU are intended to be pinned to the union
    of the host pCPUs that the vCPUs are associated with.

    The code in Nova was in fact confining the emulator threads to the
    union of pCPUs in the host NUMA nodes that the guest was confined
    to. When running guests in NUMA mode with shared CPUs / overcommit
    this was functionally identical, but when running guests with
    dedicated CPUs this was incorrect.

    The (incorrect) libvirt config being generated was

      <cputune>
          <shares>4096</shares>
          <vcpupin vcpu='0' cpuset='0'/>
          <vcpupin vcpu='1' cpuset='1'/>
          <vcpupin vcpu='2' cpuset='4'/>
          <vcpupin vcpu='3' cpuset='5'/>
          <emulatorpin cpuset='0-5'/>
      </cputune>

    when it should have been

      <cputune>
          <shares>4096</shares>
          <vcpupin vcpu='0' cpuset='0'/>
          <vcpupin vcpu='1' cpuset='1'/>
          <vcpupin vcpu='2' cpuset='4'/>
          <vcpupin vcpu='3' cpuset='5'/>
          <emulatorpin cpuset='0-1,4-5'/>
      </cputune>

    This was not caught because the unit tests were only checking with
    a host numa topology that had the same number of cpus as the guest
    topology. The test is changed to make the host topology massive
    compared to guest topology

    Closes-bug: 1417671
    Change-Id: I81e0fa1d9b09ec2df2af5e21c7d95b21be435f90

tags: added: juno-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.