[regression] libvirt cannot start guests on NUMA node 1
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libvirt (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Trusty |
Fix Released
|
High
|
Unassigned |
Bug Description
=======
SRU Justification:
Impact: libvirt cannot start on numa nodes other than 0
Test case: See the libvirt xml below, use it to start on numa node 1
Regression potential: this is a cherrypick of an upstream patch. However
it did require some backporting, introducing a greater chance for error.
=======
On Ubuntu Trusty, with libvirt 1.2.2, libvirt cannot start guests on NUMA node 1 (probably any node other than 0):
$ virsh start pen2.office.
error: Failed to start domain pen2.office.
error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
This worked on Ubuntu Precise with libvirt 0.9.8. Rebuilding libvirt 1.2.8 from vivid on Trusty works. It boots, and I've verified it allocates the memory from the correct NUMA node. Bisecting using released libvirt versions narrows it down to broken on 1.2.6 and working on 1.2.7.
There were a number of NUMA changes in 1.2.7, so it's not immediately clear which commit may be causing this. And it may take more than one commit being cherry-picked to address this.
So before I spend any more time bisecting, I'd like to know how you'd like to proceed on this. For example, if you just want to SRU libvirt 1.2.8 into Trusty, I won't bother narrowing it down further.
The relevant libvirt XML:
<vcpu placement='static' cpuset=
<numatune>
<memory mode='strict' nodeset='1'/>
</numatune>
The NUMA topology from virsh capabilities:
<topology>
<cells num='2'>
<cell id='0'>
<memory unit='KiB'
<cpus num='6'>
<cpu id='0' socket_id='0' core_id='0' siblings='0'/>
<cpu id='1' socket_id='0' core_id='1' siblings='1'/>
<cpu id='2' socket_id='0' core_id='2' siblings='2'/>
<cpu id='3' socket_id='0' core_id='8' siblings='3'/>
<cpu id='4' socket_id='0' core_id='9' siblings='4'/>
<cpu id='5' socket_id='0' core_id='10' siblings='5'/>
</cpus>
</cell>
<cell id='1'>
<memory unit='KiB'
<cpus num='6'>
<cpu id='6' socket_id='1' core_id='0' siblings='6'/>
<cpu id='7' socket_id='1' core_id='1' siblings='7'/>
<cpu id='8' socket_id='1' core_id='2' siblings='8'/>
<cpu id='9' socket_id='1' core_id='8' siblings='9'/>
<cpu id='10' socket_id='1' core_id='9' siblings='10'/>
<cpu id='11' socket_id='1' core_id='10' siblings='11'/>
</cpus>
</cell>
</cells>
</topology>
Changed in libvirt (Ubuntu): | |
status: | New → Fix Released |
Changed in libvirt (Ubuntu Trusty): | |
status: | New → Confirmed |
importance: | Undecided → High |
Changed in libvirt (Ubuntu): | |
importance: | Undecided → High |
description: | updated |
Quoting Richard Laager (<email address hidden>):
> Public bug reported:
Thanks for reporting and looking into this bug.
> On Ubuntu Trusty, with libvirt 1.2.2, libvirt cannot start guests on NUMA node 1 (probably any node other than 0): wiktel. com wiktel. com
> $ virsh start pen2.office.
> error: Failed to start domain pen2.office.
> error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
>
> This worked on Ubuntu Precise with libvirt 0.9.8. Rebuilding libvirt
> 1.2.8 from vivid on Trusty works. It boots, and I've verified it
> allocates the memory from the correct NUMA node. Bisecting using
> released libvirt versions narrows it down to broken on 1.2.6 and working
> on 1.2.7.
>
> There were a number of NUMA changes in 1.2.7, so it's not immediately
> clear which commit may be causing this. And it may take more than one
> commit being cherry-picked to address this.
>
> So before I spend any more time bisecting, I'd like to know how you'd
> like to proceed on this. For example, if you just want to SRU libvirt
> 1.2.8 into Trusty, I won't bother narrowing it down further.
Unfortunately we can't just SRU 1.2.8 into Trusty. Hopefully the
actual fix will come down to a few patches we can cherrypick.