Comment 1 for bug 1734204

Revision history for this message
Andreas Karis (akaris) wrote :

Root Cause

Nova by default will first fill up NUMA node 0 if there are still free pCPUs. This issue happens when the requested pCPUs still fir into NUMA 0, but the hugepages on NUMA 0 aren't sufficient for the instance memory to fit. Unfortunately, at time of this writing, one cannot tell nova to spawn an instance on a specific NUMA node.
Diagnostic Steps

On a hypervisor with 2MB hugepages and 512 free hugepages per NUMA node:
Raw

[root@overcloud-compute-1 ~]# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 2048 kB
Node 0 HugePages_Total: 1024
Node 0 HugePages_Free: 512
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 2048 kB
Node 1 HugePages_Total: 1024
Node 1 HugePages_Free: 512
Node 1 HugePages_Surp: 0

And with the following NUMA architecture:
Raw

[root@overcloud-compute-1 nova]# lscpu | grep -i NUMA
NUMA node(s): 2
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7

Spawn 3 instances with the following flavor (1 vCPU and 512 MB or memory):
Raw

[stack@undercloud-4 ~]$ nova flavor-show m1.tiny
+----------------------------+-------------------------------------------------------------+
| Property | Value |
+----------------------------+-------------------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 8 |
| extra_specs | {"hw:cpu_policy": "dedicated", "hw:mem_page_size": "large"} |
| id | 49debbdb-c12e-4435-97ef-f575990b352f |
| name | m1.tiny |
| os-flavor-access:is_public | True |
| ram | 512 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+-------------------------------------------------------------+

The new instance will boot and will use memory from NUMA 1:
Raw

[stack@undercloud-4 ~]$ nova list | grep d98772d1-119e-48fa-b1d9-8a68411cba0b
| d98772d1-119e-48fa-b1d9-8a68411cba0b | cirros-test0 | ACTIVE | - | Running | provider1=2000:10::f816:3eff:fe8d:a6ef, 10.0.0.102 |

Raw

[root@overcloud-compute-1 nova]# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 2048 kB
Node 0 HugePages_Total: 1024
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 2048 kB
Node 1 HugePages_Total: 1024
Node 1 HugePages_Free: 256
Node 1 HugePages_Surp: 0

Raw

nova boot --nic net-id=$NETID --image cirros --flavor m1.tiny --key-name id_rsa cirros-test0

The 3rd instance fails to boot:
Raw

[stack@undercloud-4 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------+
| 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc | cirros-test0 | ERROR | - | NOSTATE | |
| a44c43ca-49ad-43c5-b8a1-543ed8ab80ad | cirros-test0 | ACTIVE | - | Running | provider1=2000:10::f816:3eff:fe0f:565b, 10.0.0.105 |
| e21ba401-6161-45e6-8a04-6c45cef4aa3e | cirros-test0 | ACTIVE | - | Running | provider1=2000:10::f816:3eff:fe69:18bd, 10.0.0.111 |
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------+

From the compute node, we can see that free hugepages on NUMA Node 0 are exhausted, whereas in theory there's still enough space on NUMA node 1:
Raw

[root@overcloud-compute-1 qemu]# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 2048 kB
Node 0 HugePages_Total: 1024
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 2048 kB
Node 1 HugePages_Total: 1024
Node 1 HugePages_Free: 512
Node 1 HugePages_Surp: 0

/var/log/nova/nova-compute.log reveals that the instance CPU shall be pinned to NUMA node 0:
Raw

  <name>instance-00000006</name>
  <uuid>1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="14.0.8-5.el7ost"/>
      <nova:name>cirros-test0</nova:name>
      <nova:creationTime>2017-11-23 19:53:00</nova:creationTime>
      <nova:flavor name="m1.tiny">
        <nova:memory>512</nova:memory>
        <nova:disk>8</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>1</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="5d1785ee87294a6fad5e2bdddd91cc20">admin</nova:user>
        <nova:project uuid="8c307c08d2234b339c504bfdd896c13e">admin</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="6350211f-5a11-4e02-a21a-cb1c0d543214"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>524288</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>1</vcpu>
  <cputune>
    <shares>1024</shares>
    <vcpupin vcpu='0' cpuset='2'/>
    <emulatorpin cpuset='2'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>

In the above, also look at the nodeset='0' in the numatune section, which indicates that memory shall be claimed from NUMA 0.