hw:mem_page_size is not respecting all documented values

Bug #1816454 reported by Tyler Stachecki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Dongcan Ye

Bug Description

Per the Rocky documentation for hugepages:
https://docs.openstack.org/nova/rocky/admin/huge-pages.html

2MB hugepages can be specified either as:
--property hw:mem_page_size=2Mb, or
--property hw:mem_page_size=2048

However, whenever I use the former notation (2Mb), conductor fails with the misleading NUMA error below... whereas with the latter notation (2048), allocation succeeds and the resulting instance is backed with 2MB hugepages on an x86_64 platform (as verified by checking `/proc/meminfo | grep HugePages_Free` before/after stopping the created instance).

ERROR nova.scheduler.utils [req-de6920d5-829b-411c-acd7-1343f48824c9 cb2abbb91da54209a5ad93a845b4cc26 cb226ff7932d40b0a48ec129e162a2fb - default default] [instance: 5b53d1d4-6a16-4db9-ab52-b267551c6528] Error from last host: node1 (node FQDN-REDACTED): ['Traceback (most recent call last):\n', ' File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2106, in _build_and_run_instance\n with rt.instance_claim(context, instance, node, limits):\n', ' File "/usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner\n return f(*args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 217, in instance_claim\n pci_requests, overhead=overhead, limits=limits)\n', ' File "/usr/lib/python3/dist-packages/nova/compute/claims.py", line 95, in __init__\n self._claim_test(resources, limits)\n', ' File "/usr/lib/python3/dist-packages/nova/compute/claims.py", line 162, in _claim_test\n "; ".join(reasons))\n', 'nova.exception.ComputeResourcesUnavailable: Insufficient compute resources: Requested instance NUMA topology cannot fit the given host NUMA topology.\n', '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', ' File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1940, in _do_build_and_run_instance\n filter_properties, request_spec)\n', ' File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2156, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=e.format_message())\n', 'nova.exception.RescheduledException: Build of instance 5b53d1d4-6a16-4db9-ab52-b267551c6528 was re-scheduled: Insufficient compute resources: Requested instance NUMA topology cannot fit the given host NUMA topology.\n']

Additional info:
I am using Debian testing (buster) and all OpenStack packages included therein.

$ dpkg -l | grep nova
ii nova-common 2:18.1.0-2 all OpenStack Compute - common files
ii nova-compute 2:18.1.0-2 all OpenStack Compute - compute node
ii nova-compute-kvm 2:18.1.0-2 all OpenStack Compute - compute node (KVM)
ii python3-nova 2:18.1.0-2 all OpenStack Compute - libraries
ii python3-novaclient 2:11.0.0-2 all client library for OpenStack Compute API - 3.x

$ dpkg -l | grep qemu
ii ipxe-qemu 1.0.0+git-20161027.b991c67-1 all PXE boot firmware - ROM images for qemu
ii qemu-block-extra:amd64 1:3.1+dfsg-2+b1 amd64 extra block backend modules for qemu-system and qemu-utils
ii qemu-kvm 1:3.1+dfsg-2+b1 amd64 QEMU Full virtualization on x86 hardware
ii qemu-system-common 1:3.1+dfsg-2+b1 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-data 1:3.1+dfsg-2 all QEMU full system emulation (data files)
ii qemu-system-gui 1:3.1+dfsg-2+b1 amd64 QEMU full system emulation binaries (user interface and audio support)
ii qemu-system-x86 1:3.1+dfsg-2+b1 amd64 QEMU full system emulation binaries (x86)
ii qemu-utils 1:3.1+dfsg-2+b1 amd64 QEMU utilities

* I forced nova to allocate on the same hypervisor (node1) when checking for the issue and can repeatedly allocate using a flavor which specifies hugepages with hw:mem_page_size=2048 -- on the contrary, when using a flavor which is otherwise unchanged except for the 2048/2Mb difference, allocation repeatedly fails.

* I am using libvirt+kvm. I don't think it matters, but I am using Ceph as a storage backend and neutron in a very basic VLAN-based segmentation configuration (no OVS or anything remotely fancy).

* I specified hw:numa_nodes='1' when creating the flavor... and all my hypervisors only have 1 NUMA node, so allocation should always succeed as long as there are free huge pages (which there are).

Tags: doc numa
tags: added: numa
Dongcan Ye (hellochosen)
Changed in nova:
assignee: nobody → Dongcan Ye (hellochosen)
Revision history for this message
Dongcan Ye (hellochosen) wrote :

Could you trying hw:mem_page_size=2MB here?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/673252

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/673252
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e8f5641aef4696374026b887c713175899f0719b
Submitter: Zuul
Branch: master

commit e8f5641aef4696374026b887c713175899f0719b
Author: Dongcan Ye <email address hidden>
Date: Mon Jul 29 09:53:38 2019 +0000

    Fix wrong huge pages in doc

    Change-Id: Ic3839eeca02c50451c884d23d313a135f04994ba
    Related-Bug: #1816454

Revision history for this message
Tyler Stachecki (tjstachecki) wrote :

Sorry for taking so long to respond; have not been able to test this until tonight.

Yes, '2MB' works (whereas the formerly documented value '2Mb' does not).

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Looks like this was resolved in https://review.opendev.org/#/c/673252/

Changed in nova:
status: New → Fix Released
tags: added: doc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.