NUMA Topology cell memory sent to xml in MiB, but qemu uses KiB
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Nikola Đipanov |
Bug Description
Currently when specifying NUMA cell memory via flavor extra_specs or image properties, MiB units are used. According to the libvirt xml domain format documentation (http://
In this example, we use the following extra_specs:
"hw:numa_policy": "strict", "hw:numa_mem.1": "2048", "hw:numa_mem.0": "6144", "hw:numa_nodes": "2", "hw:numa_cpus.0": "0,1,2", "hw:numa_cpus.1": "3"
The flavor has 8192 MB of ram and 4 vcpus.
When using qemu 2.1.0, the following will be seen in the n-cpu logs when booting a machine with NUMA specs.
"libvirtError: internal error: process exited while connecting to monitor: qemu-system-x86_64: total memory for NUMA nodes (8388608) should equal RAM size (200000000)"
Please note that the 200000000 is 8388608 KiB in bytes and hex (simply an issue with the qemu error message). The error shows that 8192 KiB is being requested rather than 8192 MiB. Because the RAM size does not equal the total memory size, the machine fails to boot.
When using versions of qemu lower than 2.1.0 the issue is not obvious, as machines with NUMA specs boot, but only because of a bug (that has since been resolved) in qemu. This is because the check to ensure that RAM size equals the NUMA node total memory does not happen in versions lower than 2.1.0
In short, we should be using KiB units for NUMA cell memory, or at least be converting from MiB to KiB before creating the xml. Otherwise, NUMA placement will not behave as intended.
To be fair, I haven't had the chance to look at the memory placement in a guest booted using qemu 2.0.0 or lower, though I suspect the memory placement would be incorrect.. If anyone has the chance to look, it would be greatly appreciated.
I am currently investigating the appropriate fix for this alongside Tiago Mello. We made a quick fix in /nova/virt/
Mutiplying by 1024 allowed the machine to properly boot, but it is probably a bit too quick and dirty. Just thought it would be worth mentioning.
Sys-info:
x86_64 machine
Virt-info:
qemu version 2.1.0
libvirt version 1.2.2
Kenerl-info:
3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
OS-info:
Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Release: 14.04
Codename: trusty
Changed in nova: | |
assignee: | nobody → Michael Turek (mjturek) |
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → Low |
description: | updated |
summary: |
- NUMA Topology cell memory in MiB units rather than KiB units + NUMA Topology cell memory sent to libvirt in MiB when qemu expects KiB |
description: | updated |
summary: |
- NUMA Topology cell memory sent to libvirt in MiB when qemu expects KiB + NUMA Topology cell memory sent to xml in MiB when qemu expects KiB |
summary: |
- NUMA Topology cell memory sent to xml in MiB when qemu expects KiB + NUMA Topology cell memory sent to xml in MiB, but qemu uses KiB |
Changed in nova: | |
assignee: | Michael Turek (mjturek) → Nikola Đipanov (ndipanov) |
Changed in nova: | |
milestone: | none → juno-rc1 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | juno-rc1 → 2014.2 |
So after a bit more investigating, I have a better understanding of what the consequences of specifying cell memory in MiB rather than the expected KiB.
When using qemu-2.1.0:
The feature simply does not work. Machines with NUMA specs that should boot, fail at the libvirt/qemu level and go to error. This happens regardless of whether cell memory is specified or is using the default of equally distributing the memory across the cells.
When using qemu-2.0.0 (or lower):
Machines boot, but with the wrong NUMA topology. For example, with either of the following extra_specs:
{"hw:numa_policy": "strict", "hw:numa_mem.1": "2048", "hw:numa_mem.0": "6144", "hw:numa_nodes": "2", "hw:numa_cpus.0": "0,1,2", "hw:numa_cpus.1": "3"}
{{"hw:numa_policy": "strict", "hw:numa_nodes": "2"}
The following topology is found on the guest:
node 0 cpus: 0 1 2 3
node 0 size: 7986 MB
node 0 free: 7568 MB
node distances:
node 0
0: 10
The quick fix that Tiago and I tried produces the following topology, which is the intended behavior:
When extra specs are{"hw: numa_policy" : "strict", "hw:numa_nodes": "2"}
node 0 cpus: 0 1
node 0 size: 3955 MB
node 0 free: 3728 MB
node 1 cpus: 2 3
node 1 size: 4031 MB
node 1 free: 3846 MB
node distances:
node 0 1
0: 10 20
1: 20 10
When extra_specs are {"hw:numa_policy": "strict", "hw:numa_mem.1": "2048", "hw:numa_mem.0": "6144", "hw:numa_nodes": "2", "hw:numa_cpus.0": "0,1,2", "hw:numa_cpus.1": "3"}
available: 2 nodes (0-1)
node 0 cpus: 0 1 2
node 0 size: 5971 MB
node 0 free: 5587 MB
node 1 cpus: 3
node 1 size: 2015 MB
node 1 free: 1983 MB
node distances:
node 0 1
0: 10 20
1: 20 10
So in short, the feature is not working as intended and once qemu-2.1.0 becomes more common, it will be broken. I'll be proposing a fix later today for this.