Insufficient free host memory pages available to allocate guest RAM with Open vSwitch DPDK in Newton

Bug #1734204 reported by Andreas Karis
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Sahid Orentino
Ubuntu Cloud Archive
Invalid
Undecided
Unassigned
Queens
Incomplete
High
Unassigned
nova (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Incomplete
High
Unassigned

Bug Description

When spawning an instance and scheduling it onto a compute node which still has sufficient pCPUs for the instance and also sufficient free huge pages for the instance memory, nova returns:
Raw

[stack@undercloud-4 ~]$ nova show 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc
(...)
| fault | {"message": "Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc. Last exception: internal error: process exited while connecting to monitor: 2017-11-23T19:53:20.311446Z qemu-kvm: -chardev pty,id=cha", "code": 500, "details": " File \"/usr/lib/python2.7/site-packages/nova/conductor/manager.py\", line 492, in build_instances |
| | filter_properties, instances[0].uuid) |
| | File \"/usr/lib/python2.7/site-packages/nova/scheduler/utils.py\", line 184, in populate_retry |
| | raise exception.MaxRetriesExceeded(reason=msg) |
| | ", "created": "2017-11-23T19:53:22Z"}
(...)

And /var/log/nova/nova-compute.log on the compute node gives the following ERROR message:
Raw

2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [req-2ad59cdf-4901-4df1-8bd7-ebaea20b9361 5d1785ee87294a6fad5e2bdddd91cc20 8c307c08d2234b339c504bfdd896c13e - - -] [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] Instance failed
to spawn
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] Traceback (most recent call last):
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2087, in _build_resources
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] yield resources
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1928, in _build_and_run_instance
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] block_device_info=block_device_info)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2674, in spawn
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] destroy_disks_on_failure=True)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5013, in _create_domain_and_network
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] destroy_disks_on_failure)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] self.force_reraise()
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] six.reraise(self.type_, self.value, self.tb)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4985, in _create_domain_and_network
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] post_xml_callback=post_xml_callback)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4903, in _create_domain
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] guest.launch(pause=pause)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] self._encoded_xml, errors='ignore')
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] self.force_reraise()
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] six.reraise(self.type_, self.value, self.tb)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] return self._domain.createWithFlags(flags)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] result = proxy_call(self._autowrap, f, *args, **kwargs)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] rv = execute(f, *args, **kwargs)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] six.reraise(c, e, tb)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] rv = meth(*args, **kwargs)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1069, in createWithFlags
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] libvirtError: internal error: process exited while connecting to monitor: 2017-11-23T19:53:20.311446Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/3 (label charserial1)
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc] 2017-11-23T19:53:20.477183Z qemu-kvm: -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/7-instance-00000006,share=yes,size=536870912,host-nodes=0,policy=bind: os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc]
2017-11-23 19:53:21.021 153615 ERROR nova.compute.manager [instance: 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc]

Additionally, libvirt creates the following log file:
Raw

[root@overcloud-compute-1 qemu]# cat instance-00000006.log
2017-11-23 19:53:02.145+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08:54:01, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: overcloud-compute-1.localdomain
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000006,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-5-instance-00000006/master-key.aes -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off -cpu SandyBridge,vme=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/5-instance-00000006,share=yes,size=536870912,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.0.8-5.el7ost,serial=4f88fcca-0cd3-4e19-8dc4-4436a54daff8,uuid=1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-5-instance-00000006/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/run/openvswitch/vhu9758ef15-d2 -netdev vhost-user,chardev=charnet0,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:d6:89:65,bus=pci.0,addr=0x3 -add-fd set=0,fd=29 -chardev file,id=charserial0,path=/dev/fdset/0,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 172.16.2.8:2 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2017-11-23T19:53:03.217386Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/3 (label charserial1)
2017-11-23T19:53:03.359799Z qemu-kvm: -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/5-instance-00000006,share=yes,size=536870912,host-nodes=0,policy=bind: os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM

2017-11-23 19:53:03.630+0000: shutting down, reason=failed
2017-11-23 19:53:10.052+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08:54:01, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: overcloud-compute-1.localdomain
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000006,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-instance-00000006/master-key.aes -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off -cpu SandyBridge,vme=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/6-instance-00000006,share=yes,size=536870912,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.0.8-5.el7ost,serial=4f88fcca-0cd3-4e19-8dc4-4436a54daff8,uuid=1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-instance-00000006/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/run/openvswitch/vhu9758ef15-d2 -netdev vhost-user,chardev=charnet0,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:d6:89:65,bus=pci.0,addr=0x3 -add-fd set=0,fd=29 -chardev file,id=charserial0,path=/dev/fdset/0,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 172.16.2.8:2 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2017-11-23T19:53:11.466399Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/3 (label charserial1)
2017-11-23T19:53:11.729226Z qemu-kvm: -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/6-instance-00000006,share=yes,size=536870912,host-nodes=0,policy=bind: os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM

2017-11-23 19:53:12.159+0000: shutting down, reason=failed
2017-11-23 19:53:19.370+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08:54:01, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: overcloud-compute-1.localdomain
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000006,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-instance-00000006/master-key.aes -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off -cpu SandyBridge,vme=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/7-instance-00000006,share=yes,size=536870912,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.0.8-5.el7ost,serial=4f88fcca-0cd3-4e19-8dc4-4436a54daff8,uuid=1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-7-instance-00000006/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/run/openvswitch/vhu9758ef15-d2 -netdev vhost-user,chardev=charnet0,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:d6:89:65,bus=pci.0,addr=0x3 -add-fd set=0,fd=29 -chardev file,id=charserial0,path=/dev/fdset/0,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 172.16.2.8:2 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2017-11-23T19:53:20.311446Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/3 (label charserial1)
2017-11-23T19:53:20.477183Z qemu-kvm: -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/7-instance-00000006,share=yes,size=536870912,host-nodes=0,policy=bind: os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM

2017-11-23 19:53:20.724+0000: shutting down, reason=failed

Revision history for this message
Andreas Karis (akaris) wrote :
Download full text (6.2 KiB)

Root Cause

Nova by default will first fill up NUMA node 0 if there are still free pCPUs. This issue happens when the requested pCPUs still fir into NUMA 0, but the hugepages on NUMA 0 aren't sufficient for the instance memory to fit. Unfortunately, at time of this writing, one cannot tell nova to spawn an instance on a specific NUMA node.
Diagnostic Steps

On a hypervisor with 2MB hugepages and 512 free hugepages per NUMA node:
Raw

[root@overcloud-compute-1 ~]# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 2048 kB
Node 0 HugePages_Total: 1024
Node 0 HugePages_Free: 512
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 2048 kB
Node 1 HugePages_Total: 1024
Node 1 HugePages_Free: 512
Node 1 HugePages_Surp: 0

And with the following NUMA architecture:
Raw

[root@overcloud-compute-1 nova]# lscpu | grep -i NUMA
NUMA node(s): 2
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7

Spawn 3 instances with the following flavor (1 vCPU and 512 MB or memory):
Raw

[stack@undercloud-4 ~]$ nova flavor-show m1.tiny
+----------------------------+-------------------------------------------------------------+
| Property | Value |
+----------------------------+-------------------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 8 |
| extra_specs | {"hw:cpu_policy": "dedicated", "hw:mem_page_size": "large"} |
| id | 49debbdb-c12e-4435-97ef-f575990b352f |
| name | m1.tiny |
| os-flavor-access:is_public | True |
| ram | 512 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+-------------------------------------------------------------+

The new instance will boot and will use memory from NUMA 1:
Raw

[stack@undercloud-4 ~]$ nova list | grep d98772d1-119e-48fa-b1d9-8a68411cba0b
| d98772d1-119e-48fa-b1d9-8a68411cba0b | cirros-test0 | ACTIVE | - | Running | provider1=2000:10::f816:3eff:fe8d:a6ef, 10.0.0.102 |

Raw

[root@overcloud-compute-1 nova]# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 2048 kB
Node 0 HugePages_Total: 1024
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 2048 kB
Node 1 HugePages_Total: 1024
Node 1 HugePages_Free: 256
Node 1 HugePages_Surp: 0

Raw

nova boot --nic net-id=$NETID --image cirros --flavor m1.tiny --key-name ...

Read more...

tags: added: libvirt numa openstack-version.newton
Changed in nova:
assignee: nobody → sahid (sahid-ferdjaoui)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/532168
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8241deee197206d5458a61884ce08e1c2c2603d9
Submitter: Zuul
Branch: master

commit 8241deee197206d5458a61884ce08e1c2c2603d9
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Tue Jan 9 06:46:03 2018 -0500

    hardware: fix memory check usage for small/large pages

    When huge pages are requested, we need to examine the host cell's
    specific huge page amounts, not the amount of physical memory
    associated with the host NUMA node.

    Also when no pagesize is requested we should consider to compute
    memory usage based on small pages since the amount of physical memory
    available may also include some large pages.

    The unit-tests has been updated to pass correct NUMACell mempages
    information since the check is not done based on the pages available
    on host. Also that the test 'test_host_usage_contiguous' to reflect to
    correct update of pages availalbes on hist based on instance usage.

    Closes-bug: #1734204
    Change-Id: I5f5c621f2f0fa1bc18ee9a97d17085107a5dee53
    Signed-off-by: Sahid Orentino Ferdjaoui <email address hidden>

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/599413

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/599413
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5e69018038176a60abbb3c879ae7ac7f12b8341a
Submitter: Zuul
Branch: stable/rocky

commit 5e69018038176a60abbb3c879ae7ac7f12b8341a
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Tue Jan 9 06:46:03 2018 -0500

    hardware: fix memory check usage for small/large pages

    When huge pages are requested, we need to examine the host cell's
    specific huge page amounts, not the amount of physical memory
    associated with the host NUMA node.

    Also when no pagesize is requested we should consider to compute
    memory usage based on small pages since the amount of physical memory
    available may also include some large pages.

    The unit-tests has been updated to pass correct NUMACell mempages
    information since the check is not done based on the pages available
    on host. Also that the test 'test_host_usage_contiguous' to reflect to
    correct update of pages availalbes on hist based on instance usage.

    Closes-bug: #1734204
    Change-Id: I5f5c621f2f0fa1bc18ee9a97d17085107a5dee53
    Signed-off-by: Sahid Orentino Ferdjaoui <email address hidden>
    (cherry picked from commit 8241deee197206d5458a61884ce08e1c2c2603d9)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.1.0

This issue was fixed in the openstack/nova 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/716328

Changed in cloud-archive:
status: New → Invalid
Changed in nova (Ubuntu):
status: New → Invalid
Changed in nova (Ubuntu Bionic):
importance: Undecided → High
status: New → Triaged
Revision history for this message
Corey Bryant (corey.bryant) wrote :

For the Ubuntu SRU, the stable/queens upstream backport is failing tests. Additionally the fix for https://bugs.launchpad.net/nova/+bug/1810977 would need backporting to stable/queens. These would both need to land upstream in stable/queens first and we would need capability to test both fixes and ensure no regressions. For now, marking as won't fix. Suggestion is to upgrade to rocky+ where this is already fixed.

Changed in nova (Ubuntu Bionic):
status: Triaged → Won't Fix
Revision history for this message
James Page (james-page) wrote :

Picking this back up again - I'll fold the fix for the regression introduced by this change into the same SRU so it will consist of two patches.

Changed in nova (Ubuntu Bionic):
status: Won't Fix → Triaged
assignee: nobody → James Page (james-page)
James Page (james-page)
Changed in nova (Ubuntu Bionic):
status: Triaged → In Progress
Revision history for this message
Xav Paice (xavpaice) wrote :

Hi, just wondering if there's any update on the work to get this into Bionic?

Revision history for this message
David O Neill (dmzoneill) wrote :

Requested an update form engineering

Revision history for this message
James Page (james-page) wrote :

Revised upstream review which passes tests:

https://review.opendev.org/c/openstack/nova/+/726867

I'll pick that into the packaging for bionic/queens and pop in a PPA for testing.

Revision history for this message
James Page (james-page) wrote :

Once I sort out backportpackage on Hirsute package for bionic will appear here:

https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4054

Revision history for this message
James Page (james-page) wrote :

Package has built for bionic - we need a full test case to support the SRU process as well.

Revision history for this message
Zachary Zehring (zzehring) wrote :

subscribed ~field-high

Revision history for this message
James Page (james-page) wrote :

We have a patch for this but no reliable test case which is going to block any SRU activity - if field or bootstack teams have a reliable reproducer then that would help hugely.

Revision history for this message
James Page (james-page) wrote :

I've marked this as incomplete as we still don't have a standalone reproducer documented.

I'm happy to revisit if an impacted deployment would be able to validate the fixes.

Changed in nova (Ubuntu Bionic):
status: In Progress → Incomplete
assignee: James Page (james-page) → nobody
Revision history for this message
James Page (james-page) wrote :

Note that the PPA no longer contains the nova build including the proposed fix.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/queens)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/queens
Review: https://review.opendev.org/c/openstack/nova/+/726867
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/queens
Review: https://review.opendev.org/c/openstack/nova/+/716328
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Removing subscription from field-high as incomplete has not had any new data coming back.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.