libvirt not recognizing NUMA architecture

Bug #614322 reported by Frank Müller
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
Wishlist
Unassigned

Bug Description

Ubuntu 10.04 LTS
Kernel 2.6.32-24-server

latest updates applied

libvirt-bin 0.7.5-5ubuntu27
qemu-kvm 0.12.3+noroms-0ubuntu9.2

Hardware
HP DL360 G6 server, 2 quad core Xeons E5520, 24GB RAM

Actual Nehalem CPUs are using the NUMA architecture to allocate RAM per CPU.
I did a lot of benchmarks with KVM on these machines, and Ubuntu shows a very slow network throughput and i/o performance.
We suspect, that this has to do with libvirtd not recognizing the NUMA feature.

A simple "virsh capabilities" proves this:

  <host>
    <cpu>
      <arch>x86_64</arch>
      <model>core2duo</model>
      <topology sockets='4' cores='4' threads='1'/>
      <feature name='lahf_lm'/>
      <feature name='rdtscp'/>
      <feature name='popcnt'/>
      <feature name='dca'/>
      <feature name='xtpr'/>
      <feature name='cx16'/>
      <feature name='tm2'/>
      <feature name='est'/>
      <feature name='vmx'/>
      <feature name='ds_cpl'/>
      <feature name='pbe'/>
      <feature name='tm'/>
      <feature name='ht'/>
      <feature name='ss'/>
      <feature name='acpi'/>
      <feature name='ds'/>
    </cpu>
    <migration_features>
      <live/>
      <uri_transports>
        <uri_transport>tcp</uri_transport>
      </uri_transports>
    </migration_features>
  </host>

Expected result would be (RedHat and CentOS on the same machine)

    <cpu>
      <arch>x86_64</arch>
      <model>core2duo</model>
      <topology sockets='2' cores='4' threads='2'/>
 --snip--
    <topology>
      <cells num='2'>
        <cell id='0'>
          <cpus num='8'>
            <cpu id='0'/>
            <cpu id='2'/>
            <cpu id='4'/>
            <cpu id='6'/>
            <cpu id='8'/>
            <cpu id='10'/>
            <cpu id='12'/>
            <cpu id='14'/>
          </cpus>
        </cell>
        <cell id='1'>
          <cpus num='8'>
            <cpu id='1'/>
            <cpu id='3'/>
            <cpu id='5'/>
            <cpu id='7'/>
            <cpu id='9'/>
            <cpu id='11'/>
            <cpu id='13'/>
            <cpu id='15'/>
          </cpus>
        </cell>
      </cells>
    </topology>
    <secmodel>
      <model>selinux</model>
      <doi>0</doi>
    </secmodel>
  </host>

numactl --hardware:
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14
node 0 size: 12277 MB
node 0 free: 11659 MB
node 1 cpus: 1 3 5 7 9 11 13 15
node 1 size: 12287 MB
node 1 free: 11677 MB
node distances:
node 0 1
  0: 10 20
  1: 20 10

numastat shows a lot of misses, when doing benchmarks in a VM:

                           node0 node1
numa_hit 15527158 17015505
numa_miss 7032982 3512950
numa_foreign 3512950 7032982
interleave_hit 8078 8264
local_node 15525187 17006655
other_node 7034953 3521800

Also RedHat and CentOS perform a lot better.
When I get about 40-50MB/s network throughput (copying a file to an attached iSCSI RAID) with Ubuntu, I get 120MB/s with CentOS.
Overall performance shows the same results and differences between the systems.

This is a very big problem, which makes KVM unusable.
We had to switch to CentOS 5.5 and Xen3, which performs even better than KVM on CentOS.
But that's not a satisfying solution, because we rely just on Ubuntu for years now.

Summary
Ubuntu libvirt doesn't recognize NUMA features and shows a very slow KVM performance.
Other systems like CentOS recognize NUMA and perform a lot better.
Please fix it.

Additional info:

This posting (based on the experiences on our systems) gives a more detailed description of this case:
http://marc.info/?l=kvm&m=127921447903105&w=2

cat /proc/cpuinfo:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
stepping : 5
cpu MHz : 2266.449
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 4532.89
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

and so on until core 15.

Cheers,
Frank

Tags: kvm libvirt numa

Related branches

affects: linux (Ubuntu) → libvirt (Ubuntu)
Revision history for this message
Chuck Short (zulcss) wrote :

Thanks for the bug report. Which version are you running on redhat and centos?

Changed in libvirt (Ubuntu):
importance: Undecided → Wishlist
status: New → Opinion
Revision history for this message
Frank Müller (mueller-wave-computer) wrote :

Hi there,

I used the latest RHEL6 Beta with libvirt-0.8.1-13.el6.x86_64 and
the latest CentOS 5.5 with libvirt-0.6.3-33.el5_5.1.

I tried Ubuntu 8.04 and 10.04 and both show about 50% of the performance that RHEL and CentOS do.
This is a serious problem. No company, that wants to virtualize applications that need the performance, would choose kvm on Ubuntu as platform this way.
If Ubuntu wants to be an enterprise ready distribution, they have to compete with RHEL, Xen, ESXi and things like that.

Cheers,
Frank

Revision history for this message
Frank Müller (mueller-wave-computer) wrote :

So, any news? Is anybody able to confirm this?

Frank

Revision history for this message
Mark Burgo (burgo-mark) wrote :

yes I see this with 10.04 and 10.10 it appears that libvirt was not built with libnuma. I also agree and would like to see this corrected.

10.04 libvirt version 0.7.5

10.10 libvirt version 0.8.3

if you need the capabilities I will be more than happy to upload

Revision history for this message
Frank Müller (mueller-wave-computer) wrote :

You can easily compile the latest libvirt version yourself:

aptitude install build-essential, libxml2-dev, libgnutls-dev, libdevmapper-dev, libparted0-dev, libvirt-dev
./configure –prefix=/usr –exec-prefix=/usr –libdir=/usr/lib –includedir=/usr/include && make && make install

There is a configure option "--with-numa", but I didn't use ist. libvirt recognizes the numa cells without this option.

This improves the network performance, but it's still slower than KVM on CentOS and still much slower than Xen.

Revision history for this message
EAB (erwin-true) wrote :

I can confirm this.

Most of our hosts have 64GB RAM and 2 Intel Nehalem hexa-cores.
With 3 VM's using 16GB RAM each.
I noticed performance degredation and the host also swapped out a lot of memory. The swapping degraded performance dramatically.
vm.swappiness=0 in /etc/sysctl.conf did not help.

It seems that NUMA on Intel CPU's can be expensive because RAM needs to be transfered from other nodes. With only 1 node (socket) there is no problem. With 2 or more nodes you see slower.

Without the capabilities you can prevent this behavior by pinning the vcpu's.
You should spread you VM's over the available nodes:

numactl --hardware | grep "node 0 cpus"
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22

Your XML should contain something like this:
<vcpu cpuset='0,2,4,6,8,10,12,14,16,18,20,22'>1</vcpu>
<vcpu cpuset='0,2,4,6,8,10,12,14,16,18,20,22'>2</vcpu>
<vcpu cpuset='0,2,4,6,8,10,12,14,16,18,20,22'>3</vcpu>
<vcpu cpuset='0,2,4,6,8,10,12,14,16,18,20,22'>4</vcpu>

The next VM should use they other node.
numactl --hardware | grep "node 1 cpus"
node 0 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
<vcpu cpuset='1,3,5,7,9,11,13,15,17,19,21,23'>1</vcpu>
<vcpu cpuset='1,3,5,7,9,11,13,15,17,19,21,23'>2</vcpu>
<vcpu cpuset='1,3,5,7,9,11,13,15,17,19,21,23'>3</vcpu>
<vcpu cpuset='1,3,5,7,9,11,13,15,17,19,21,23'>4</vcpu>

So you don't need the NUMA info from "virsh capabilities".

We now split up our hosts by the amount of NUMA-nodes to prevent performance-degradation and swapping.

Revision history for this message
Ralf Spenneberg (ralq) wrote :

I can confirm this and I am very dissapointed that still after more than a year this issue has not been fixed.
I am currently running Ubuntu precise (development, the next lts) on
HP DL 585 G2 with 4 Dual Cores and thus 4 Numa cells.
Numactl shows the hardware:
# numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7
cpubind: 0 1 2 3
nodebind: 0 1 2 3
membind: 0 1 2 3

virsh does not display the numa information:
# virsh nodeinfo
CPU model: x86_64
CPU(s): 8
CPU frequency: 1000 MHz
CPU socket(s): 4
Core(s) per socket: 2
Thread(s) per core: 1
NUMA cell(s): 1
Memory size: 32948228 kB
# virsh freecell 0
error: this function is not supported by the connection driver: NUMA memory information not available on this platform

Just using numactl is no option if the guests are to be migrated. The numa-placement needs to be handled by libvirt!

Ralf Spenneberg (ralq)
Changed in libvirt (Ubuntu):
status: Opinion → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi,

noone seems to have commented on the reason why numa is disabled in libvirt. In Ubuntu, both libnuma1 and numactl are in universe. So in order toenable those in libvirt (which is in main), we would have to do a main inclusion request (MIR) for those.

But, they are also disabled in debian. It seems worth opening a bug against the debian package to inquire as to why it is disabled there.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Actually, the debian package's 0.7.7-3 changelog entry from Mar 2010 shows:

  * [b69d3cc] Revert "Enable NUMA support" since it breaks the python
    bindings.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Regarding the change from Opinion to Confirmed, ordinarily I'd object, but in this case I agree - this *is* a bug, whose resolution potentially depends on some other bugs. I don't object to numa being enabled if the underlying bugs get fixed, so if you have time, by all means please do work on this bug, preferably through upstream and debian. It's not an opinion. Thanks, Ralf.

Note that in the last year and a half, the underlying bugs may have even been fixed upstream, so whoever looks into this, I suggest finding a test case (it's not in the changelog) and re-testing.

Revision history for this message
Ralf Spenneberg (ralq) wrote :

I have discussed the matter with Guido Günther from the Debian team. After some testing he reenabled the numa feature in the debian package:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=648323

Would you like to follow him?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks very much, Ralf - we will sync that package shortly.

Revision history for this message
SaveTheRbtz (savetherbtz) wrote :

Any progress after two weeks?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Unfortunately we can't enable numa in libvirt until numa is in main. I opened bug 891232 to request that. Until that happens, this bug is blocked.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.9.8-2ubuntu14

---------------
libvirt (0.9.8-2ubuntu14) precise; urgency=low

  * re-enable numa (undo delta against debian) (LP: #614322):
    - debian/control: remove from dependencies
    - debian/rules: turn it off
 -- Serge Hallyn <email address hidden> Tue, 13 Mar 2012 11:25:53 -0500

Changed in libvirt (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.