Insufficient memory for guest pages when using NUMA
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
New
|
Undecided
|
Unassigned |
Bug Description
This is a Queens / Bionic openstack deploy.
Compute nodes are using hugepages for nova instances (reserved at boot time):
root@compute1:~# cat /proc/meminfo | grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 332
HugePages_Free: 184
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
There are two numa nodes, as follows:
root@compute1:~# lscpu | grep -i numa
NUMA node(s): 2
NUMA node0 CPU(s): 0-19,40-59
NUMA node1 CPU(s): 20-39,60-79
Compute nodes are using DPDK, and memory for it has been reserved with the following directive:
reserved-
A number of instances have already been created on node "compute1", until the point that current memory usage is as follows:
root@compute1:~# cat /sys/devices/
Node 0 AnonHugePages: 0 kB
Node 0 ShmemHugePages: 0 kB
Node 0 HugePages_Total: 166
Node 0 HugePages_Free: 26
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 0 kB
Node 1 ShmemHugePages: 0 kB
Node 1 HugePages_Total: 166
Node 1 HugePages_Free: 158
Node 1 HugePages_Surp: 0
Problem:
When a new instance is created (8 cores and 32gb ram), nova tries to schedule it on numa node 0 and fails with "Insufficient free host memory pages available to allocate guest RAM", even though there is enough memory available on numa node 1.
This behavior has been seem by other users also here (although the solution on that bug seems to be more a coincidence than a proper solution -- then classified as not a bug, which I don't believe is the case):
https:/
Flavor being used has nothing special except a property for hw:mem_
Instance is being forced to be created on "zone1::compute1", otherwise no kind of pinning of cpus or other resources. All the forcing of vm going to node0 seems to be nova's decision when instantiating it.
Relevant logs while the creation fails:
2020-02-17 19:09:09.775 4544 ERROR nova.virt. libvirt. guest [req-0d75d9bd- ff40-4d1a- b80e-1bb029cb0b c2 066d2f8824a744d 5b23b783a6a0c8d fe 86289cab8345482 3800f8119a8dfa1 6c - 9d7701f7eb2c467 ca7d9bded8fa273 c4 9d7701f7eb2c467 ca7d9bded8fa273 c4] Error launching a defined domain with XML: <domain type='kvm'> instance- 000005e4< /name> 49d37209- 4f8e-45fb- ba25-816da602f2 e3</uuid> openstack. org/xmlns/ libvirt/ nova/1. 0"> nova:name> brtlvlts1169fu< /nova:name> nova:creationTi me>2020- 02-17 19:09:03< /nova:creationT ime>
<nova: memory> 32768</ nova:memory>
<nova: disk>0< /nova:disk>
<nova: swap>0< /nova:swap>
<nova: ephemeral> 0</nova: ephemeral>
<nova: vcpus>8< /nova:vcpus> /nova:flavor> 4a744d5b23b783a 6a0c8dfe" >ericsson< /nova:user>
<nova: project uuid="86289cab8 3454823800f8119 a8dfa16c" >ocs-prd- pal</nova: project> aaa1-4b8a- b7f0-36cab9db4e 9e"/> instance> >33554432< /memory> >33554432< /currentMemory> 'static' >8</vcpu> 8192</shares> '0-19,40- 59'/> '0-19,40- 59'/> '0-19,40- 59'/> '0-19,40- 59'/> '0-19,40- 59'/> '0-19,40- 59'/> '0-19,40- 59'/> '0-19,40- 59'/> '0-19,40- 59'/> rer'>OpenStack Foundation</entry> >OpenStack Nova</entry> >17.0.9< /entry> >6dd001ba- a38a-4bd9- b54e-52ef8ee4a1 0c</entry> >49d37209- 4f8e-45fb- ba25-816da602f2 e3</entry> >Virtual Machine</entry> 'pc-i440fx- bionic' >hvm</type> 'shared' /> 'delay' /> 'catchup' /> destroy< /on_poweroff> restart< /on_reboot> destroy< /on_crash> /usr/bin/ kvm-spice< /emulator>
<name>
<uuid>
<metadata>
<nova:instance xmlns:nova="http://
<nova:package version="17.0.9"/>
<
<
<nova:flavor name="8c-32768m">
<
<nova:owner>
<nova:user uuid="066d2f882
</nova:owner>
<nova:root type="image" uuid="59adaf6e-
</nova:
</metadata>
<memory unit='KiB'
<currentMemory unit='KiB'
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='0'/>
</hugepages>
</memoryBacking>
<vcpu placement=
<cputune>
<shares>
<vcpupin vcpu='0' cpuset=
<vcpupin vcpu='1' cpuset=
<vcpupin vcpu='2' cpuset=
<vcpupin vcpu='3' cpuset=
<vcpupin vcpu='4' cpuset=
<vcpupin vcpu='5' cpuset=
<vcpupin vcpu='6' cpuset=
<vcpupin vcpu='7' cpuset=
<emulatorpin cpuset=
</cputune>
<numatune>
<memory mode='strict' nodeset='0'/>
<memnode cellid='0' mode='strict' nodeset='0'/>
</numatune>
<sysinfo type='smbios'>
<system>
<entry name='manufactu
<entry name='product'
<entry name='version'
<entry name='serial'
<entry name='uuid'
<entry name='family'
</system>
</sysinfo>
<os>
<type arch='x86_64' machine=
<boot dev='hd'/>
<smbios mode='sysinfo'/>
</os>
<features>
<acpi/>
<apic/>
</features>
<cpu mode='host-model' check='partial'>
<model fallback='allow'/>
<topology sockets='8' cores='1' threads='1'/>
<numa>
<cell id='0' cpus='0-7' memory='33554432' unit='KiB' memAccess=
</numa>
</cpu>
<clock offset='utc'>
<timer name='pit' tickpolicy=
<timer name='rtc' tickpolicy=
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>
<on_reboot>
<on_crash>
<devices>
<emulator>
<disk type='fil...