NumaTopololgyFilter Not behaving as expected (returns 0 hosts)

Bug #1464286 reported by Dave Johnston
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Low
Stephen Finucane

Bug Description

I have a system with 32 cores (2 sockets, 8 cores, hyperthreading enabled).
The NUMA topology as follows:

numactl --hardware

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65501 MB
node 0 free: 38562 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65535 MB
node 1 free: 63846 MB
node distances:
node 0 1
  0: 10 20
  1: 20 10

I have defined an flavor in Openstack with 12 vcpus as follows:
nova flavor-show c4.3xlarge
+----------------------------+------------------------------------------------------+
| Property | Value |
+----------------------------+------------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 40 |
| extra_specs | {"hw:cpu_policy": "dedicated", "hw:numa_nodes": "1"} |
| id | 1d76a225-90c1-4f6f-a59b-000795c33e63 |
| name | c4.3xlarge |
| os-flavor-access:is_public | True |
| ram | 24576 |
| rxtx_factor | 1.0 |
| swap | 8192 |
| vcpus | 12 |
+----------------------------+------------------------------------------------------+

I expect to be able to launch two instances of this flavor on the 32 core host, one contained within each NUMA node.

When I launch two instances, the first succeeds, but the second fails. The instance xml is attached, along with the system capabilities.

If I change hw:numa_nodes = 2, then I can launch two copies of the instance.

N.B for the purposes of testing I have disabled all vcpu_pin and isolcpu settings.

This was tested on RDO Kilo running on CentOS 7.
I had to upgrade the hypervisor with packages from the ovirt master branch in order to support NUMA pinning.

Tags: libvirt numa
Revision history for this message
Dave Johnston (dave-johnston) wrote :
Revision history for this message
Dave Johnston (dave-johnston) wrote :

Some more info.

When I change hw:numa_nodes = 2 in my 12 vCPU flavor, I can launch 2 instances.
I can also launch a 4vcpu flavor that has the following extra_specs:

{"hw:cpu_policy": "dedicated", "hw:numa_nodes": "2"}

If then enable vcpu_pin_set in nova, with the following:
    vcpu_pin_set = 2-7,16-23,10-15,24-31

i.e. I want to pin to all vcpus apart from 0,1 and 8,,9 (the first two from each NUMA node).

If I now try to launch the two 12vCPU instances + 1 4 vCPU instance, the smaller instance fails.

The two 12 vCPU instances get the following pinning

  <cputune>
    <shares>12288</shares>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='20'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='23'/>
    <vcpupin vcpu='4' cpuset='5'/>
    <vcpupin vcpu='5' cpuset='21'/>
    <vcpupin vcpu='6' cpuset='10'/>
    <vcpupin vcpu='7' cpuset='26'/>
    <vcpupin vcpu='8' cpuset='12'/>
    <vcpupin vcpu='9' cpuset='28'/>
    <vcpupin vcpu='10' cpuset='13'/>
    <vcpupin vcpu='11' cpuset='29'/>
    <emulatorpin cpuset='4-5,7,10,12-13,20-21,23,26,28-29'/>
  </cputune>

and

  <cputune>
    <shares>12288</shares>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='20'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='23'/>
    <vcpupin vcpu='4' cpuset='5'/>
    <vcpupin vcpu='5' cpuset='21'/>
    <vcpupin vcpu='6' cpuset='10'/>
    <vcpupin vcpu='7' cpuset='26'/>
    <vcpupin vcpu='8' cpuset='12'/>
    <vcpupin vcpu='9' cpuset='28'/>
    <vcpupin vcpu='10' cpuset='13'/>
    <vcpupin vcpu='11' cpuset='29'/>
    <emulatorpin cpuset='4-5,7,10,12-13,20-21,23,26,28-29'/>
  </cputune>

I have excluded 0,1 and 8,9 but I belive 16,17 from NUMA node 1 and 24,25 from NUMA node 2 should still be available for the small instance.

description: updated
tags: added: numa
Changed in nova:
assignee: nobody → Dave Johnston (dave-johnston)
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Dave Johnston:

It's been over 2 months since you are set as assignee but without
a commit to solve this bug. To signalize to other contributors that
this is not in progress and can be worked on, I remove you as assignee.
If you still plan to work on this, please set yourself as assignee
again and provide a patch in Gerrit in the near future.

Please consider updating your Launchpad profile with your IRC nickname
and hanging around in #openstack-nova on irc.freenode.net this makes
it easier to communicate with each other (see [1] for more).

If you have any questions about this process, just ping me (markus_z)
in IRC.

[1] https://wiki.openstack.org/wiki/Nova/Mentoring#Top_Tips_for_working_with_the_Nova_community

tags: added: libvirt
Changed in nova:
assignee: Dave Johnston (dave-johnston) → nobody
Revision history for this message
Matt Riedemann (mriedem) wrote :

Are you able to recreate this with Liberty?

Revision history for this message
Sean Dague (sdague) wrote :

A recreate has been requested, move to incomplete

Changed in nova:
status: New → Incomplete
importance: Undecided → Low
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :
Download full text (5.9 KiB)

So I have a system with 40 cores (2 sockets, 10 cores, hypethreading enabled).
The NUMA topology is as follows:

    $ numactl --hardware
    available: 2 nodes (0-1)
    node 0 cpus: 0 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29
    node 0 size: 32083 MB
    node 0 free: 16652 MB
    node 1 cpus: 10 11 12 13 14 15 16 17 18 19 30 31 32 33 34 35 36 37 38 39
    node 1 size: 32237 MB
    node 1 free: 25386 MB
    node distances:
    node 0 1
      0: 10 21
      1: 21 10

I'm using OpenStack provisioned by DevStack on a Fedora 23 host:

    $ cat /etc/*-release*
    Fedora release 23 (Twenty Three)
    ...
    $ uname -r
    4.3.5-300.fc23.x86_64

    $ cd /opt/stack/nova
    $ git show --oneline
    8bafc99 Merge "remove the unnecessary parem of set_vm_state_and_notify"

I defined a flavor similar to yours, but without the unnecessary swap and
disk space and with a smaller RAM allocation (KISS?).

    $ openstack flavor create bug.1464286 --id 100 --ram 8192 --disk 0 \
        --vcpus 12

    $ openstack flavor set bug.1464286 \
        --property "hw:cpu_policy=dedicated" \
        --property "hw:numa_nodes=1"

    $ openstack flavor show bug.1464286
    +----------------------------+----------------------------------------------+
    | Field | Value |
    +----------------------------+----------------------------------------------+
    | OS-FLV-DISABLED:disabled | False |
    | OS-FLV-EXT-DATA:ephemeral | 0 |
    | disk | 0 |
    | id | 100 |
    | name | bug.1464286 |
    | os-flavor-access:is_public | True |
    | properties | hw:cpu_policy='dedicated', hw:numa_nodes='1' |
    | ram | 8192 |
    | rxtx_factor | 1.0 |
    | swap | |
    | vcpus | 12 |
    +----------------------------+----------------------------------------------+

I also modified the default quotas to allow allocation of more than 20 cores:

    $ openstack quota set --cores 40 demo

I boot one instance...

    $ openstack server create --flavor=bug.1464286 \
        --image=cirros-0.3.4-x86_64-uec --wait test1

    $ sudo virsh list
     Id Name State
    ----------------------------------------------------
     20 instance-00000010 running

    $ sudo virsh dumpxml 20
    <domain type='kvm' id='20'>
      <name>instance-00000010</name>
      ...
      <vcpu placement='static'>12</vcpu>
      <cputune>
        <shares>12288</shares>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='21'/>
        <vcpupin vcpu='2' cpuset='0'/>
        <vcpupin vcpu='3' cpuset='20'/>
    ...

Read more...

Changed in nova:
status: Incomplete → Invalid
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Just to clarify, I think one of these commits fix this:

* https://review.openstack.org/#/c/229574/
* https://review.openstack.org/#/c/229575/

Changed in nova:
assignee: nobody → Stephen Finucane (sfinucan)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.