Libvirt CPU affinity error

Bug #1439280 reported by Matt Kassawara
90
This bug affects 16 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
High
Unassigned
Vivid
Fix Released
High
Unassigned
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Vivid
Confirmed
Undecided
Unassigned

Bug Description

=============================================
SRU Justification
1. Impact: VMs fail to launch with TCG (non-kvm-accelerated)
2. Stable fix: cherrypick a patch from upstream.
3. Regression potential: this only slightly relaxes the check for multiple cpus, and is a cherrpyick from upstream. It therefore should not introduce any regressions.
4. Test case: See below - or simply attempt to launch a VM with multiple cpus on non-accelerated qemu.
=============================================

I'm testing the Kilo packages from the cloud archive staging PPA on 14.04 and cannot launch a VM due to a Libvirt CPU affinity error. I'm using QEMU because my environment resides on cloud servers.

Package versions:

ii nova-common 1:2015.1~b3-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-compute 1:2015.1~b3-0ubuntu1~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 1:2015.1~b3-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 1:2015.1~b3-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii python-nova 1:2015.1~b3-0ubuntu1~cloud0 all OpenStack Compute Python libraries
ii python-novaclient 1:2.22.0-0ubuntu1~cloud0 all client library for OpenStack Compute API
ii libvirt-bin 1.2.12-0ubuntu8~cloud0 amd64 programs for the libvirt library
ii libvirt0 1.2.12-0ubuntu8~cloud0 amd64 library for interfacing with different virtualization systems

Content of nova compute logs while attempting to launch an CirrOS/m1.tiny instance :

2015-03-31 23:00:07.106 31118 INFO nova.compute.manager [req-88539061-8d37-45a9-8edb-edd12f199d07 f214e083aa91455e8437996c2dfe815b 337052ec76e54c42ae891843a1451ca6 - - -] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Starting instance...
2015-03-31 23:00:07.180 31118 INFO nova.compute.claims [-] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Attempting claim: memory 512 MB, disk 1 GB
2015-03-31 23:00:07.181 31118 INFO nova.compute.claims [-] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Total memory: 1996 MB, used: 512.00 MB
2015-03-31 23:00:07.181 31118 INFO nova.compute.claims [-] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] memory limit: 2994.00 MB, free: 2482.00 MB
2015-03-31 23:00:07.182 31118 INFO nova.compute.claims [-] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Total disk: 39 GB, used: 0.00 GB
2015-03-31 23:00:07.182 31118 INFO nova.compute.claims [-] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] disk limit not specified, defaulting to unlimited
2015-03-31 23:00:07.206 31118 INFO nova.compute.claims [-] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Claim successful
2015-03-31 23:00:07.295 31118 INFO nova.scheduler.client.report [-] Compute_service record updated for ('msk-os4cpu1', 'msk-os4cpu1')
2015-03-31 23:00:07.407 31118 INFO nova.scheduler.client.report [-] Compute_service record updated for ('msk-os4cpu1', 'msk-os4cpu1')
2015-03-31 23:00:07.574 31118 INFO nova.virt.libvirt.driver [req-5352dc7c-4474-4637-b6f1-1d2697580392 - - - - -] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Creating image
2015-03-31 23:00:07.819 31118 INFO nova.scheduler.client.report [-] Compute_service record updated for ('msk-os4cpu1', 'msk-os4cpu1')
2015-03-31 23:00:07.907 31118 INFO nova.virt.disk.vfs.api [req-5352dc7c-4474-4637-b6f1-1d2697580392 - - - - -] Unable to import guestfs, falling back to VFSLocalFS
2015-03-31 23:00:10.902 31118 ERROR nova.virt.libvirt.driver [req-5352dc7c-4474-4637-b6f1-1d2697580392 - - - - -] Error launching a defined domain with XML: <domain type='qemu'>
  <name>instance-00000003</name>
  <uuid>0551a285-e0ba-4f67-b562-04a6eef5db63</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="2015.1-b3"/>
      <nova:name>demo-instance1</nova:name>
      <nova:creationTime>2015-03-31 23:00:08</nova:creationTime>
      <nova:flavor name="m1.tiny">
        <nova:memory>512</nova:memory>
        <nova:disk>1</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>1</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="f214e083aa91455e8437996c2dfe815b">demo</nova:user>
        <nova:project uuid="337052ec76e54c42ae891843a1451ca6">demo</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="38047887-61a7-41ea-9b49-27987d5e8bb9"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>524288</currentMemory>
  <vcpu placement='static' cpuset='0-1'>1</vcpu>
  <cputune>
    <shares>1024</shares>
  </cputune>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>OpenStack Foundation</entry>
      <entry name='product'>OpenStack Nova</entry>
      <entry name='version'>2015.1-b3</entry>
      <entry name='serial'>d6b7f537-d637-c7cd-c824-7b2b5be7ef4c</entry>
      <entry name='uuid'>0551a285-e0ba-4f67-b562-04a6eef5db63</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-utopic'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-model'>
    <model fallback='allow'/>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/nova/instances/0551a285-e0ba-4f67-b562-04a6eef5db63/disk'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <interface type='bridge'>
      <mac address='fa:16:3e:14:a5:ae'/>
      <source bridge='qbr4d42f321-ee'/>
      <target dev='tap4d42f321-ee'/>
      <model type='virtio'/>
      <driver name='qemu'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='file'>
      <source path='/var/lib/nova/instances/0551a285-e0ba-4f67-b562-04a6eef5db63/console.log'/>
      <target port='0'/>
    </serial>
    <serial type='pty'>
      <target port='1'/>
    </serial>
    <console type='file'>
      <source path='/var/lib/nova/instances/0551a285-e0ba-4f67-b562-04a6eef5db63/console.log'/>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'/>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='16384' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
      <stats period='10'/>
    </memballoon>
  </devices>
</domain>

2015-03-31 23:00:10.903 31118 ERROR nova.compute.manager [req-5352dc7c-4474-4637-b6f1-1d2697580392 - - - - -] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Instance failed to spawn
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Traceback (most recent call last):
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2398, in _build_resources
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] yield resources
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2270, in _build_and_run_instance
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] block_device_info=block_device_info)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2356, in spawn
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] block_device_info=block_device_info)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4375, in _create_domain_and_network
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] power_on=power_on)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4306, in _create_domain
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] LOG.error(err)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 85, in __exit__
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] six.reraise(self.type_, self.value, self.tb)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4296, in _create_domain
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] domain.createWithFlags(launch_flags)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] result = proxy_call(self._autowrap, f, *args, **kwargs)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] rv = execute(f, *args, **kwargs)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] six.reraise(c, e, tb)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] rv = meth(*args, **kwargs)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 896, in createWithFlags
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] libvirtError: Requested operation is not valid: cpu affinity is not supported
2015-03-31 23:00:10.903 31118 TRACE nova.compute.manager [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63]
2015-03-31 23:00:10.907 31118 INFO nova.compute.manager [req-88539061-8d37-45a9-8edb-edd12f199d07 f214e083aa91455e8437996c2dfe815b 337052ec76e54c42ae891843a1451ca6 - - -] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Terminating instance
2015-03-31 23:00:10.913 31118 INFO nova.virt.libvirt.driver [-] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Instance destroyed successfully.
2015-03-31 23:00:10.936 31118 INFO nova.virt.libvirt.driver [req-5352dc7c-4474-4637-b6f1-1d2697580392 - - - - -] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Deleting instance files /var/lib/nova/instances/0551a285-e0ba-4f67-b562-04a6eef5db63_del
2015-03-31 23:00:10.937 31118 INFO nova.virt.libvirt.driver [req-5352dc7c-4474-4637-b6f1-1d2697580392 - - - - -] [instance: 0551a285-e0ba-4f67-b562-04a6eef5db63] Deletion of /var/lib/nova/instances/0551a285-e0ba-4f67-b562-04a6eef5db63_del complete
2015-03-31 23:00:11.053 31118 INFO nova.scheduler.client.report [req-5352dc7c-4474-4637-b6f1-1d2697580392 - - - - -] Compute_service record updated for ('msk-os4cpu1', 'msk-os4cpu1')

Revision history for this message
Stephen Gordon (sgordon) wrote :

This guest XML would be expected to fail if the guest is running on a host that is using qemu w/o kvm acceleration:

     <vcpu placement='static' cpuset='0-1'>1</vcpu>

...as qemu does not support pinning when kvm isn't available. What's not clear to me is why the Ubuntu version would be adding this line since it sounds like master/kilo-3 is behaving correctly per the spec (only adding these lines where the user explicitly requests direct pinning of CPUs on the image or flavor [1]).

[1] http://specs.openstack.org/openstack/nova-specs/specs/juno/approved/virt-driver-cpu-pinning.html

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu):
status: New → Confirmed
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Yes, to test CPU pinning/NUMA with libvirt you ought to use Nested KVM.
Please report results after testing with that.

That said, some notes below.

Quoting Dan Berrange from a different review with a complete response on
*why*:

   It is fundamentally impossible to test CPU pinning with TCG (aka plain
   QEMU) because TCG only has a single thread for all virtual CPUs. As such
   there is no mechanism to pin vCPU threads with TCG. Nested KVM is thus
   the only possible option for testing any of the NUMA / CPU pinning
   stuff. Instructions for nested KVM setup on a KVM host are documented
   here

    http://docs.openstack.org/developer/nova/devref/testing/libvirt-numa.html#provisioning-a-virtual-machine-for-testing

From my testing, a Nova guest booted with a NUMA flavor, will have the
below contextual XML snippets w.r.t vCPU placement:

    . . .
      <vcpu placement='static'>4</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='0-3'/>
        <vcpupin vcpu='1' cpuset='0-3'/>
        <vcpupin vcpu='2' cpuset='0-3'/>
        <vcpupin vcpu='3' cpuset='0-3'/>
        <emulatorpin cpuset='0-3'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='0'/>
        <memnode cellid='0' mode='strict' nodeset='0'/>
      </numatune>
    . . .
      <cpu>
        <topology sockets='4' cores='1' threads='1'/>
        <numa>
          <cell id='0' cpus='0-3' memory='1048576' unit='KiB'/>
        </numa>
      </cpu>
    . . .

Here's the working example XMLs from my testing.

Libvirt XML for the guest hypervisor (also called L1), running DevStack
and will host Nova instances which are nested guests:

    https://kashyapc.fedorapeople.org/virt/openstack/nova-libvirt-numa-testing/devstack-vm-libvirt.xml

And, Nova guest XML, booted with a NUMA flavor:

    https://kashyapc.fedorapeople.org/virt/openstack/nova-libvirt-numa-testing/nova-guest-libvirt.xml

Changed in nova (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Matt Kassawara (ionosphere80) wrote :

I'm not testing NUMA. I am launching a basic CirrOS image using the m1.tiny flavor, neither of which should trigger NUMA bits. In fact, adding the "hw:cpu_policy=shared" to extra_specs in the m1.tiny flavor has no impact on this issue.

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :
Download full text (3.3 KiB)

Matt, you're right, allow me to correct myself below. Short: I still cannot reproduce it.

I just tested it below in a Single node DevStack with today's Nova
git with the Nova instance being QEMU emulated, but I cannot reproduce
the said failure in this bug description.

Test environment
----------------

    $ uname -r; rpm -q libvirt-daemon-kvm qemu-system-x86
    4.0.0-0.rc5.git4.1.fc22.x86_64
    libvirt-daemon-kvm-1.2.13-2.fc22.x86_64
    qemu-system-x86-2.3.0-0.2.rc1.fc22.x86_64

I'm at these commits in my All-In-One DevStack environment:

    cinder:
    commit c7ca4b95b56539dd560dc88038ab994d50c8394d
    devstack:
    commit 72bdc8c27102db3b65651ded3a9944798238a2d4
    glance:
    commit f84e49db5a455b36901b642125b5cf850f90c81d
    keystone:
    commit af568dd1afdcdc9ed7275a2824a2ca5ca50b004c
    neutron:
    commit 483de6313fab5913f9e68eb24afe65c36bd9b623
    nova:
    commit 74ca660ab688e15ccd59ddfbfcdc9e1cecdc553d
    requirements:
    commit 56ab196ad1fb0e356d3fe0ec63e744ed10104a5d

Test
----

    $ nova flavor-show 1
    +----------------------------+---------+
    | Property | Value |
    +----------------------------+---------+
    | OS-FLV-DISABLED:disabled | False |
    | OS-FLV-EXT-DATA:ephemeral | 0 |
    | disk | 1 |
    | extra_specs | {} |
    | id | 1 |
    | name | m1.tiny |
    | os-flavor-access:is_public | True |
    | ram | 512 |
    | rxtx_factor | 1.0 |
    | swap | |
    | vcpus | 1 |
    +----------------------------+---------+

Boot the instance:

    $ nova boot --config-drive false --flavor 1 \
        --key_name oskey1 --image cirros-0.3.3-x86_64-disk cirrvm1

Nova instance's guest XML attached.

That's the QEMU invocation
--------------------------

$ ps -ef | grep qemu-system-x86_64
qemu 2889 1 14 11:12 ? 00:00:11 /usr/bin/qemu-system-x86_64 -name instance-00000001 -S -machine pc-i440fx-2.3,accel=tcg,usb=off -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid cc5cae21-129b-4c63-b8e9-c642a871efdf -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=2015.1,serial=d23b2cbb-f02d-4a8e-b6dd-184a86aa8348,uuid=cc5cae21-129b-4c63-b8e9-c642a871efdf -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000001.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/kashyapc/src/cloud/data/nova/instances/cc5cae21-129b-4c63-b8e9-c642a871efdf/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:d1:f0:3e,bus=pci.0,addr=0x2 -chardev file,id=charserial0,path=/home/kashyapc/src/cloud/data/nova/instances/cc5cae21-129b-4c63-b8e9-c642a871efdf/console.log -d...

Read more...

Revision history for this message
Matt Kassawara (ionosphere80) wrote :

I'm fairly certain this is specific to the Ubuntu packages, not upstream nova or libvirt.

Tom Fifield (fifieldt)
Changed in nova (Ubuntu):
status: Incomplete → New
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

@Matt: Since you're farily certain that this is specific to Ubuntu, then I hope Ubuntu's Nova package maintainers will take a look. . .

Revision history for this message
Mark Vanderwiel (vanderwl) wrote :

Any ideas on how one could workaround this?

How to disable affinity in nova, libvirt, or qemu config's?

Some easy place in the code to hack?

Revision history for this message
Matt Kassawara (ionosphere80) wrote :

Still an issue in the 2015.1~rc1-0ubuntu1~cloud0 packages. I built an environment on hosts that support KVM and successfully launched an instance, so this issue only affects QEMU. I built another environment using upstream source and successfully launched an instance using both QEMU and KVM, so I'm fairly certain this only applies to the Ubuntu packages... probably an errant patch.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu):
status: New → Confirmed
Revision history for this message
James Page (james-page) wrote :

Nothing jumps out at me from the list of patches we have in the nova package - most are working around testing challenges due to offline build environments.

Revision history for this message
James Page (james-page) wrote :

Raising a libvirt task to get the libvirt maintainers attention - I'll poke on irc as well.

Revision history for this message
Martin Mailand (todin) wrote :

@Matt V: I hacked an easy place in /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py line 4720.
change
if CONF.libvirt.virt_type not in ['qemu', 'kvm']:
IN
if CONF.libvirt.virt_type not in ['kvm']:

the commit who changed the numa behavior is 945ab28.
I am not sure, does qemu without kvm has numa support?

@Matt K: Are you certain that it applies only to ubuntu packages? The changes are made upstream.

I tested in virtualbox.

Revision history for this message
Matt Kassawara (ionosphere80) wrote :

My installation with packages uses the cloud archive repo that includes libvirt 1.2.12. My installation from source uses the generic 14.04 LTS repo that includes libvirt 1.2.2. Both installations use the same nova code (RC1), but the older version of libvirt doesn't exhibit this issue. Also, I'm fairly certain that QEMU doesn't support any form of NUMA.

Revision history for this message
Martin Mailand (todin) wrote :

@Matt:
in line 359 in the driver.py is the minimum libvirt version definded for which the numa code is activated.
MIN_LIBVIRT_NUMA_VERSION = (1, 2, 7)

Therfore you did not trigger the behavior with the source code installation.

Revision history for this message
Chuck Short (zulcss) wrote :

I wasnt able to reproduce this on vivid.

Revision history for this message
Matt Kassawara (ionosphere80) wrote :

Chuck,

What version of libvirt?

Revision history for this message
Liusheng (liusheng) wrote :

I have met the same issue:

root@openstack:~# virsh version
Compiled against library: libvirt 1.2.12
Using library: libvirt 1.2.12
Using API: QEMU 1.2.12
Running hypervisor: QEMU 2.2.0

-----------------------------------------------------------------------------
root@openstack:~# dpkg -l |grep nova
ii nova-api 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - API frontend
ii nova-cert 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - certificate management
ii nova-common 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-compute 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii nova-conductor 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - conductor service
ii nova-consoleauth 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - Console Authenticator
ii nova-novncproxy 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - NoVNC proxy
ii nova-scheduler 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - virtual machine scheduler
ii python-nova 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute Python libraries
ii python-novaclient 1:2.22.0-0ubuntu1~cloud0 all client library for OpenStack Compute API

Revision history for this message
Mark Vanderwiel (vanderwl) wrote :

I also see this is still an issue with http://ubuntu-cloud.archive.canonical.com/ubuntu trusty-updates/kilo main

VERSION="14.04.1 LTS, Trusty Tahr"

# virsh version
Compiled against library: libvirt 1.2.12
Using library: libvirt 1.2.12
Using API: QEMU 1.2.12
Running hypervisor: QEMU 2.2.0

nova-compute 1:2015.1~rc1-0ubuntu1~cloud0

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu):
status: New → Confirmed
Revision history for this message
Rui Chen (kiwik-chenrui) wrote :

Looks like a nova bug, I guess it will issue in virt_type=qemu, libvirt>=1.2.7 and compute host that supporting NUMA.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I'm a bit confused - what exactly is the bug? It is not that cpusets inside the guests to not work right? It's that the qemu guest's vcpus are not pinned to the specified cpus on the host? The host is a true hardware host? The guest is running accelerated KVM? How is it being verified that it is misbhaving - the kvm process is just not in a proper cpuset??

Revision history for this message
Martin Mailand (todin) wrote :

The hostsystem is virtualbox, and the guest system is qemu without kvm, because virtualbox doesn't support hardware acceleration.
The Problem is, that nova-compute generates an invalid xml for this combination.
The offending part is "<vcpu placement='static'>4</vcpu>".

This part is not accepted from libvirt and an error is logged.

As a result of this I am unable to test Openstack in a Vagrant Virtualbox environment.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks - so the solution is for nova to only drop the placement='static' from that line, whenever it knows it is using tcg?

Revision history for this message
Matt Kassawara (ionosphere80) wrote :

On the same system (QEMU only) and same version of nova (Kilo), I can launch an instance with Libvirt 1.2.2 (included with Ubuntu 14.04) but receive this error with Libvirt 1.2.12 (included with the Kilo cloud-archive repo). Either Libvirt 1.2.12 reports the wrong capabilities to nova or nova makes some sort of incorrect assumptions with it.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1439280] Re: Libvirt CPU affinity error

Can you show 'virsh capabilities' output with both packages?

Revision history for this message
Liusheng (liusheng) wrote :
Revision history for this message
Tony Breeds (o-tony) wrote :

But that makes no sense. If you were changing

# While earlier versions could support NUMA reporting and
# NUMA placement, not until 1.2.7 was there the ability
# to pin guest nodes to host nodes, so mandate that. Without
# this the scheduler cannot make guaranteed decisions, as the
# guest placement may not match what was requested
MIN_LIBVIRT_NUMA_VERSION = (1, 2, 7)

I could see that helping but MIN_LIBVIRT_BLOCKCOMMIT_RELATIVE_VERSION ?

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

@Tony
Yes. You are right. I am sorry that I make a wrong file diff. I hide my last comment, and paste a correct one.

I meet this issue too. After applied following patch, it works. I think in the ubuntu, the libvirt (1.2.12, from cloud-archive) doesn't support numa.

diff --git a/nova/virt/libvirt/driver.py b/nova/virt/libvirt/driver.py
index 98a4537..4d573e1 100644
--- a/nova/virt/libvirt/driver.py
+++ b/nova/virt/libvirt/driver.py
@@ -355,7 +355,7 @@ REQ_HYPERVISOR_DISCARD = "QEMU"
 # to pin guest nodes to host nodes, so mandate that. Without
 # this the scheduler cannot make guaranteed decisions, as the
 # guest placement may not match what was requested
-MIN_LIBVIRT_NUMA_VERSION = (1, 2, 7)
+MIN_LIBVIRT_NUMA_VERSION = (1, 2, 99)
 # While earlier versions could support hugepage backed
 # guests, not until 1.2.8 was there the ability to request
 # a particular huge page size. Without this the scheduler

Revision history for this message
Tony Breeds (o-tony) wrote :

It seems that nova's libvirt driver is generating an invalid domain xml. If I understand correctly specifyin a 'vcpu' node with a cpuset is invlaid in TCG *unless* you also specify emulatorpin See: https://libvirt.org/formatdomain.html#elementsCPUAllocation

Revision history for this message
Tony Breeds (o-tony) wrote :

This patch (which hasn't gone anywhere near upstream yet)

Forces the libvirt driver in nova to avoid generating a cpuset and there fore no longer generates the invalid domain XML.

 Next steps are to discuss my findings with upstream libvirt and nova developers to see if I'm correct of I've just fluked it.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "1439280.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

I found the root cause in my environment.

I use libvirt in LXC. And the lxc doesn't enable the cgroup w/ read and write permission. I add/change the config file with following line.

    lxc.aa_profile = lxc-container-default-with-nesting

And install `cgroup-lite` in lxc guest. The libvirt works with CPU affinity.

Revision history for this message
Mark Vanderwiel (vanderwl) wrote :

Using the patch above (that basically hacks out qemu specifically), nova boot work fine.
Same qemu environment as I noted in my 4-23 post.

Can we get this patch out for formal review by nova folks?

Revision history for this message
Tony Breeds (o-tony) wrote :

We're in discussions with the libvirt devs to work out if the fix is correct and/or exposes a libvirt bug.

Once that discussion concludes there will the a nova patch posted (and tagged for backport)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks - the patch seems to make sense.

Revision history for this message
Tony Breeds (o-tony) wrote :

Summary of the libvirt discussion. current upstream works. the libvirt team would like to identify the libvirt fixes required and get them backported. to the maintenance releases.

With reference to:

https://launchpad.net/ubuntu/+source/libvirt ; and
http://wiki.libvirt.org/page/Maintenance_Releases

If I read those links correctly we're still going to need to fix nova and/or get the pack-ports into the appropriate ubuntu libvirt packages.

Revision history for this message
Tony Breeds (o-tony) wrote :

For the record. Applying this patch to the cloud-archive libvirt package should fix the problem.

http://libvirt.org/git/?p=libvirt.git;a=commit;h=a103bb105c0c189c3973311ff1826972b5bc6ad6

Revision history for this message
Tony Breeds (o-tony) wrote :

I was pointed at the v1.2.12-maint head in the libvirt git which contains this fix already.
http://libvirt.org/git/?p=libvirt.git;a=shortlog;h=refs/heads/v1.2.12-maint

I suggest we close the nova issue with won't fix and get the correctly backported patch into the libvirt package.

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

"wont-fixing" the nova side will leave it broken for quite some time until the backport has made its way into all relevant distro images. I'd prefer to add your patch into nova code as workaround for older libvirt versions.

Changed in libvirt (Ubuntu):
status: Confirmed → Fix Released
importance: Undecided → High
Changed in libvirt (Ubuntu Vivid):
importance: Undecided → High
description: updated
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Matt, or anyone else affected,

Accepted libvirt into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/1.2.12-0ubuntu13 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Vivid):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Tony Breeds (o-tony) wrote :

I can verify that installing 1.2.12-0ubuntu13 on vivid fixes the issue for me.

Please forgive my ignorance but can that package be tagged into cloud-archive once it's officially a vivid update?

tags: added: verification-done
removed: verification-needed
Revision history for this message
Tony Breeds (o-tony) wrote :

@j-rosenboom-j My "fix" for nova will never be accepted upstream. I wont speak for the Ubuntu developers but I strongly suspect that they'll be unwilling to diverge from upstream. Especially as they've already shown the fix will land in vivid.

Revision history for this message
Mark Vanderwiel (vanderwl) wrote :

When will this get fixed for Trusty? that's where it was originally reported.

Revision history for this message
Matt Kassawara (ionosphere80) wrote :

The cloud archive repository currently does not contain a fix for Kilo on Trusty.

Revision history for this message
Tony Breeds (o-tony) wrote :

@vanderwl: The original report was against the trusty cloud-archive repo.

If you look at: https://launchpad.net/ubuntu/+source/libvirt You can see that the version in trusty is NOT affected by this issue.

Only vivid and the could archive PPA

Revision history for this message
Mark Vanderwiel (vanderwl) wrote :

Matt, Tony, that's for the clarification. I'm still a bit confused as to when the cloud archive used by trusty (http://ubuntu-cloud.archive.canonical.com/ubuntu/dists/trusty-updates/) would be updated to point to this new libvirt cloud level. (sorry for being a newbie on this).

Revision history for this message
Tony Breeds (o-tony) wrote :

@vanderwl No problem I'm new to cloud archive as well.

So the URL I have is: https://launchpad.net/~ubuntu-cloud-archive/+archive/ubuntu/kilo-staging/+index
and that shows (for me) that 4 hours ago the (kilo-staging) cloud-archive PPA got the fixed libvirt.

So in theory We're done here. Matt and I just need to verify that the new package is correct (I have no doubt it is)

Revision history for this message
Matt Kassawara (ionosphere80) wrote :

I'm waiting until the package appears in the official cloud archive repository.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu Vivid):
status: New → Confirmed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 1.2.12-0ubuntu13

---------------
libvirt (1.2.12-0ubuntu13) vivid-proposed; urgency=medium

  * 9038-qemu-fix-setting-of-VM-CPU-affinity-with-TCG (LP: #1439280)

 -- Serge Hallyn <email address hidden> Wed, 13 May 2015 10:48:53 -0500

Changed in libvirt (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Chuck Short (zulcss) wrote :

This should be fixed now, please re-open if it isnt.

Changed in nova (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.