Insufficient memory for guest pages when using NUMA

Bug #1863757 reported by Andre Ruiz
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

This is a Queens / Bionic openstack deploy.

Compute nodes are using hugepages for nova instances (reserved at boot time):

root@compute1:~# cat /proc/meminfo | grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 332
HugePages_Free: 184
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB

There are two numa nodes, as follows:

root@compute1:~# lscpu | grep -i numa
NUMA node(s): 2
NUMA node0 CPU(s): 0-19,40-59
NUMA node1 CPU(s): 20-39,60-79

Compute nodes are using DPDK, and memory for it has been reserved with the following directive:

reserved-huge-pages: "node:0,size:1GB,count:8;node:1,size:1GB,count:8"

A number of instances have already been created on node "compute1", until the point that current memory usage is as follows:

root@compute1:~# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 0 kB
Node 0 ShmemHugePages: 0 kB
Node 0 HugePages_Total: 166
Node 0 HugePages_Free: 26
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 0 kB
Node 1 ShmemHugePages: 0 kB
Node 1 HugePages_Total: 166
Node 1 HugePages_Free: 158
Node 1 HugePages_Surp: 0

Problem:

When a new instance is created (8 cores and 32gb ram), nova tries to schedule it on numa node 0 and fails with "Insufficient free host memory pages available to allocate guest RAM", even though there is enough memory available on numa node 1.

This behavior has been seem by other users also here (although the solution on that bug seems to be more a coincidence than a proper solution -- then classified as not a bug, which I don't believe is the case):

https://bugzilla.redhat.com/show_bug.cgi?id=1517004

Flavor being used has nothing special except a property for hw:mem_page_size='large'.

Instance is being forced to be created on "zone1::compute1", otherwise no kind of pinning of cpus or other resources. All the forcing of vm going to node0 seems to be nova's decision when instantiating it.

Revision history for this message
Andre Ruiz (andre-ruiz) wrote :
Download full text (13.5 KiB)

Relevant logs while the creation fails:

2020-02-17 19:09:09.775 4544 ERROR nova.virt.libvirt.guest [req-0d75d9bd-ff40-4d1a-b80e-1bb029cb0bc2 066d2f8824a744d5b23b783a6a0c8dfe 86289cab83454823800f8119a8dfa16c - 9d7701f7eb2c467ca7d9bded8fa273c4 9d7701f7eb2c467ca7d9bded8fa273c4] Error launching a defined domain with XML: <domain type='kvm'>
  <name>instance-000005e4</name>
  <uuid>49d37209-4f8e-45fb-ba25-816da602f2e3</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="17.0.9"/>
      <nova:name>brtlvlts1169fu</nova:name>
      <nova:creationTime>2020-02-17 19:09:03</nova:creationTime>
      <nova:flavor name="8c-32768m">
        <nova:memory>32768</nova:memory>
        <nova:disk>0</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>8</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="066d2f8824a744d5b23b783a6a0c8dfe">ericsson</nova:user>
        <nova:project uuid="86289cab83454823800f8119a8dfa16c">ocs-prd-pal</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="59adaf6e-aaa1-4b8a-b7f0-36cab9db4e9e"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>33554432</memory>
  <currentMemory unit='KiB'>33554432</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB' nodeset='0'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <shares>8192</shares>
    <vcpupin vcpu='0' cpuset='0-19,40-59'/>
    <vcpupin vcpu='1' cpuset='0-19,40-59'/>
    <vcpupin vcpu='2' cpuset='0-19,40-59'/>
    <vcpupin vcpu='3' cpuset='0-19,40-59'/>
    <vcpupin vcpu='4' cpuset='0-19,40-59'/>
    <vcpupin vcpu='5' cpuset='0-19,40-59'/>
    <vcpupin vcpu='6' cpuset='0-19,40-59'/>
    <vcpupin vcpu='7' cpuset='0-19,40-59'/>
    <emulatorpin cpuset='0-19,40-59'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>OpenStack Foundation</entry>
      <entry name='product'>OpenStack Nova</entry>
      <entry name='version'>17.0.9</entry>
      <entry name='serial'>6dd001ba-a38a-4bd9-b54e-52ef8ee4a10c</entry>
      <entry name='uuid'>49d37209-4f8e-45fb-ba25-816da602f2e3</entry>
      <entry name='family'>Virtual Machine</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-bionic'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <topology sockets='8' cores='1' threads='1'/>
    <numa>
      <cell id='0' cpus='0-7' memory='33554432' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='fil...

Andre Ruiz (andre-ruiz)
description: updated
Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

This bug seems similar, although it has been fixed.

https://bugs.launchpad.net/nova/+bug/1734204

Andre Ruiz (andre-ruiz)
description: updated
Revision history for this message
Marcelo Subtil Marcal (msmarcal) wrote :

Nova version:

nova-common:
  Installed: 2:17.0.9-0ubuntu3
  Candidate: 2:17.0.12-0ubuntu1
  Version table:
     2:17.0.12-0ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
     2:17.0.10-0ubuntu2.1 500
        500 http://archive.ubuntu.com/ubuntu bionic-security/main amd64 Packages
 *** 2:17.0.9-0ubuntu3 100
        100 /var/lib/dpkg/status
     2:17.0.1-0ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages

Andre Ruiz (andre-ruiz)
description: updated
Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

Just a side note, there's some related info here:

https://bugs.launchpad.net/nova/+bug/1601994

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Yes, this has been resolved since Stein as noted at 1734204. Unfortunately Queen is in Extended Maintenance and we no longer release new versions so this is not likely to be fixed there.

Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

I was informed by canonical openstack eng. team they were evaluating the chances of backporting the change to queens. This is affecting production environments so this possibility is very welcome.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.