aarch64: MSI is not supported by interrupt controller

Bug #1706630 reported by James Page
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

Hit this trying to launch openstack instances on aarch64 using Libvirt 3.5.0 from artful (backported to Xenial via the UCA):

2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [req-fe2bcb79-cffd-48e8-bcd6-4d9ec6352adf e0a9b16991df405a9535658e3b32bc72 f13578a679244c43aa8d0900ba0e6f48 - - -] [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] Instance failed to spawn
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] Traceback (most recent call last):
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2124, in _build_resources
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] yield resources
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1930, in _build_and_run_instance
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] block_device_info=block_device_info)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2698, in spawn
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] destroy_disks_on_failure=True)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5114, in _create_domain_and_network
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] destroy_disks_on_failure)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] self.force_reraise()
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] six.reraise(self.type_, self.value, self.tb)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5086, in _create_domain_and_network
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] post_xml_callback=post_xml_callback)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5004, in _create_domain
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] guest.launch(pause=pause)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 145, in launch
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] self._encoded_xml, errors='ignore')
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] self.force_reraise()
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] six.reraise(self.type_, self.value, self.tb)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 140, in launch
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] return self._domain.createWithFlags(flags)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] result = proxy_call(self._autowrap, f, *args, **kwargs)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] rv = execute(f, *args, **kwargs)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] six.reraise(c, e, tb)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] rv = meth(*args, **kwargs)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1065, in createWithFlags
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] libvirtError: internal error: process exited while connecting to monitor: 2017-07-26T12:58:26.803461Z qemu-system-aarch64: -device ioh3420,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1: MSI is not supported by interrupt controller
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46] 2017-07-26T12:58:26.805038Z qemu-system-aarch64: -device ioh3420,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1: Device initialization failed
2017-07-26 12:58:28.282 45046 ERROR nova.compute.manager [instance: 42c06630-1fe3-4fd0-8cec-549f809e0c46]

Revision history for this message
James Page (james-page) wrote :

Attaching XML generated by OpenStack for the failed instance launch.

tags: added: openstack
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

The attached xml is that returned from libvirt - the nova generated xml does not include a gic version:

<domain type="kvm">
  <uuid>42c06630-1fe3-4fd0-8cec-549f809e0c46</uuid>
  <name>instance-0000000d</name>
  <memory>1048576</memory>
  <vcpu>1</vcpu>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="15.0.5"/>
      <nova:name>openstack-on-lxd-new-libvirt</nova:name>
      <nova:creationTime>2017-07-26 12:58:23</nova:creationTime>
      <nova:flavor name="m1.small">
        <nova:memory>1024</nova:memory>
        <nova:disk>20</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>40</nova:ephemeral>
        <nova:vcpus>1</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="e0a9b16991df405a9535658e3b32bc72">admin</nova:user>
        <nova:project uuid="f13578a679244c43aa8d0900ba0e6f48">admin</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="94303814-8ffe-4ea6-bbfb-a960757a608d"/>
    </nova:instance>
  </metadata>
  <os>
    <type machine="virt">hvm</type>
    <loader type="pflash" readonly="yes">/usr/share/AAVMF/AAVMF_CODE.fd</loader>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>

....

Ryan Beisner (1chb1n)
tags: added: arm64 uosci
Revision history for this message
Kevin Zhao (kevin-zhao) wrote :

I meet the same problem in AArch64, with latest Nova and qemu 2.8.1, libvirt 3.0.0

tags: added: virt-fixed-by-2.10
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This does not match James finding of missing the gic version in the generated XML.
Also the error is to not start the guest at all.
Yet still on the changed MSI interrupt behaviour the following change might be related:

commit b4b9862b536f41fcdf6ad193a306a852c5b5b71a
Author: Michael S. Tsirkin <email address hidden>
Date: Fri Feb 17 04:52:16 2017 +0200

    virtio: Fix no interrupt when not creating msi controller

    For ARM virt machine, if we use virt-2.7 which will not create ITS node,
    the virtio-net can not recieve interrupts so it can't get ip address
    through dhcp.
    This fixes commit 83d768b(virtio: set ISR on dataplane notifications).

    Signed-off-by: Shannon Zhao <email address hidden>
    Reviewed-by: Michael S. Tsirkin <email address hidden>
    Signed-off-by: Michael S. Tsirkin <email address hidden>

This commit is contained in 2.9.
James suggested that this might be fixed in 2.9/2.10 therefore I'll update here once I have a 2.10 to test.
If one wants/can test on Artful there is a daily qemu available at https://launchpad.net/~ubuntu-virt/+archive/ubuntu/virt-daily-upstream (I'd not recommend to also pull the libvirt from there to only change one component).

Revision history for this message
Andrew McLeod (admcleod) wrote :

Can confirm this bug exists as of today, with xenial-pike/proposed

Linux juju-25b69e-14 4.4.0-79-generic #100-Ubuntu SMP Wed May 17 19:58:30 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux
libvirt0: 3.6.0
qemu: 2.8

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Well the status of today should have qemu 2.10-rc3 right?
Is that in xenial-pike/proposed already?

If so the assumption of james that a fix was in qemu 2.9 was wrong and we need a new theory.

Is the generated XML still the same as in comment #3 - could you please on xenial-pike/proposed generate a new one and attach it here?

Revision history for this message
Andrew McLeod (admcleod) wrote :

Re the state of qemu ver, I've asked someone else to comment on that. XML from before was:

==> nova/nova-compute.log <==
2017-08-28 13:15:46.413 23162 ERROR nova.virt.libvirt.guest [req-e2fc9a90-7433-479c-9d96-c2818641fe45 b147293e0a7941899301c59b48068e51 18b9fc2bdb6b45a0a682d79d9130a975 - default default] Error launching a defined domain with XML: <domain type='kvm'>
  <name>instance-00000007</name>
  <uuid>674a2e53-cfbb-4043-a6f0-09536acc1781</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="16.0.0"/>
      <nova:name>openstack-on-lxd-ftw</nova:name>
      <nova:creationTime>2017-08-28 13:15:40</nova:creationTime>
      <nova:flavor name="m2.small">
        <nova:memory>256</nova:memory>
        <nova:disk>3</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>1</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="b147293e0a7941899301c59b48068e51">admin</nova:user>
        <nova:project uuid="18b9fc2bdb6b45a0a682d79d9130a975">admin</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="149520da-94a3-43e6-b2ac-25be1004d16a"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>262144</memory>
  <currentMemory unit='KiB'>262144</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <cputune>
    <shares>1024</shares>
  </cputune>
  <os>
    <type arch='aarch64' machine='virt-2.8'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/instance-00000007_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <gic version='3'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

From the log:
<type arch='aarch64' machine='virt-2.8'>hvm</type>

This is still running on the older qemu.
So the former assumption isn't checked yet.

It is assumed that 2.10 releases today [1].
I addressed some feedback of corecb in regard to your builds in the -rc4 I prepared already.
I'd have thought you have the -rc3 I made available a while ago in UCA already.
So we have to re-test with that once available in UCA as well.

to verify that it should work I took your XML and
- removed the disks I don't have (added a local img file as replacement)
- replaced the bridge I don't have with the default bridge (still a virtio net device)
All device init would still pass on this so I would trigger your bug, but it works just fine.
See xml [2] which gets live expanded to [3].

For now still assuming it is fixed in qemu 2.10 - please get back when it is tested on that in pike.

[1]: https://wiki.qemu.org/index.php/Planning/2.10
[2]: http://paste.ubuntu.com/25423471/
[3]: http://paste.ubuntu.com/25423472/

Changed in libvirt (Ubuntu):
status: New → Incomplete
Revision history for this message
Raghuram Kota (rkota) wrote :

@admcleod : Any updates on Pike testing with qemu2.10 mentioned in comm #9 ?

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I'm running qemu-system-arm 2.10+dfsg-0ubuntu3.1~cloud0 and still seeing this issue.

Changed in libvirt (Ubuntu):
status: Incomplete → New
tags: added: cdoqa-blocker
Revision history for this message
dann frazier (dannf) wrote :

Is this reproducible w/ the hwe kernel?

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

No - with the hwe-kernel (4.10.0-42-generic from 4.10.0-42.46~16.04.1), this doesn't happen.

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

Getting same issue with both of libvirt 4.0.0/qemu 2.11 (18.04.2) and libvirt 5.0.0/qemu 3.1 (19.04) on attempt to run AArch64 guest on AArch64 host. Virtual machine setting is default generated by virt-manager wizard, so there is no big changes to default devices, I just filled iso, hdd and network bridge.

Is there something I missing?

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@RussianNeuromancer for aarch64 you need to tweak quite some things to work.
Openstack does so by default, but I doubt that virt-manager comes up with a great default config.
The most common related mistake is the lac of
  <gic version='3'/>

I'll reset this bug to the incomplete state that it was (as it was gone with HWE kernel, so likely a HWE fix for arm was needed).
If you happen to discuss more for the virt-manager@arm case I'd ask to open a new bug (refer to this one, but keep the discussions separate)

Changed in libvirt (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

> The most common related mistake is the lac of <gic version='3'/>

This is what I found too, but it's seems like in my case something else cause this error. Below is attempt to run https://paste.ubuntu.com/25423471/ from Comment 9 which definitely have <gic version='3'/> :

~$ virsh start instance-00000007
error: Failed to start domain instance-00000007
error: internal error: process exited while connecting to monitor: 2019-06-11T18:49:31.843203Z qemu-system-aarch64: -device ioh3420,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1: MSI is not supported by interrupt controller

Detailed log in bug 1832394.

> If you happen to discuss more for the virt-manager@arm case I'd ask to open a new bug (refer to this one, but keep the discussions separate)

Done:
https://bugs.launchpad.net/ubuntu/+source/virt-manager/+bug/1832394
https://bugs.launchpad.net/ubuntu/+source/virt-manager/+bug/1832395

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Fully functional aarch64 (arm64) kvm xml definition example:

https://pastebin.ubuntu.com/p/jvN6SYHbtD/

Fully functional aarch32 (armhf) kvm xml definition example:

https://pastebin.ubuntu.com/p/nTmSQRf2CQ/

Both are running in an ARMv8 with KVM support for Aarch64 and Aarch32.

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

Rafael, I posted results of testing aarch64 xml example in bug 1832394.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.