[ocata] unsupported configuration: CPU mode 'host-model' for aarch64 kvm domain on aarch64 host is not supported by hypervisor

Bug #1673467 reported by Larry Michel on 2017-03-16
34
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned
OpenStack nova-compute charm
High
James Page
libvirt (Ubuntu)
Undecided
Unassigned
Zesty
Undecided
Unassigned
qemu (Ubuntu)
Undecided
Unassigned

Bug Description

We hit this error in Ocata while trying to launch an arm64 instance:

2017-03-16 08:01:42.329 144245 ERROR nova.virt.libvirt.guest [req-2ad2d5d9-696d-4baa-a071-756e460ca3de 8f431f83f7e44ef1a084e7e27b40a685 a904dd389c5d4817a4d95b8f3268cf4d - - -] Error launching a defined domain with XML: <domain type='kvm'>
  <name>instance-00000001</name>
  <uuid>220bec1b-8907-4da9-9862-9cc2354abf39</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="15.0.0"/>
      <nova:name>guestOS-test-arm64-kvm-xenial-ci_oil_slave14_0</nova:name>
      <nova:creationTime>2017-03-16 08:01:38</nova:creationTime>
      <nova:flavor name="m1.small">
        <nova:memory>2048</nova:memory>
        <nova:disk>20</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>1</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="8f431f83f7e44ef1a084e7e27b40a685">admin</nova:user>
        <nova:project uuid="a904dd389c5d4817a4d95b8f3268cf4d">admin</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="4e864421-efc0-4c39-8b49-d619356f72de"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <cputune>
    <shares>1024</shares>
  </cputune>
  <os>
    <type arch='aarch64' machine='virt-2.8'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/instance-00000001_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <gic version='3'/>
  </features>
  <cpu mode='host-model'>
    <model fallback='allow'/>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/nova/instances/220bec1b-8907-4da9-9862-9cc2354abf39/disk'/>
      <target dev='vda' bus='virtio'/>
      <address type='virtio-mmio'/>
    </disk>
    <controller type='pci' index='0' model='pcie-root'/>
    <interface type='bridge'>
      <mac address='fa:16:3e:74:b0:97'/>
      <source bridge='qbr9a95f1e8-d5'/>
      <target dev='tap9a95f1e8-d5'/>
      <model type='virtio'/>
      <address type='virtio-mmio'/>
    </interface>
    <serial type='pty'>
      <log file='/var/lib/nova/instances/220bec1b-8907-4da9-9862-9cc2354abf39/console.log' append='off'/>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <log file='/var/lib/nova/instances/220bec1b-8907-4da9-9862-9cc2354abf39/console.log' append='off'/>
      <target type='serial' port='0'/>
    </console>
    <memballoon model='virtio'>
      <stats period='10'/>
      <address type='virtio-mmio'/>
    </memballoon>
  </devices>
</domain>

2017-03-16 08:01:42.333 144245 ERROR nova.virt.libvirt.driver [req-2ad2d5d9-696d-4baa-a071-756e460ca3de 8f431f83f7e44ef1a084e7e27b40a685 a904dd389c5d4817a4d95b8f3268cf4d - - -] [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] Failed to start libvirt guest
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1930, in _build_and_run_instance
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] block_device_info=block_device_info)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2688, in spawn
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] destroy_disks_on_failure=True)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5099, in _create_domain_and_network
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] destroy_disks_on_failure)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] self.force_reraise()
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] six.reraise(self.type_, self.value, self.tb)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5071, in _create_domain_and_network
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] post_xml_callback=post_xml_callback)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4989, in _create_domain
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] guest.launch(pause=pause)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] self._encoded_xml, errors='ignore')
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] self.force_reraise()
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] six.reraise(self.type_, self.value, self.tb)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 140, in launch
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] return self._domain.createWithFlags(flags)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] result = proxy_call(self._autowrap, f, *args, **kwargs)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] rv = execute(f, *args, **kwargs)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] six.reraise(c, e, tb)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] rv = meth(*args, **kwargs)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1065, in createWithFlags
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39] libvirtError: unsupported configuration: CPU mode 'host-model' for aarch64 kvm domain on aarch64 host is not supported by hypervisor
2017-03-16 08:01:43.522 144245 ERROR nova.compute.manager [instance: 220bec1b-8907-4da9-9862-9cc2354abf39]

dann frazier (dannf) wrote :

Can you provide the versions of libvirt and QEMU on the system?

ChristianEhrhardt (paelzer) wrote :

Hi Larry,
I quickly tried to reproduce, but the only arm system I had around was blocking me for other reasons. Beofre I spend to much on a system being incomparable anyway I'd like to confirm what type of arm system you exactly have.

Also - assuming it is in your OS test infra - if there is any chance to log into your system please ping "cpaelzer" on IRC.

host-model is supposed to copy the hosts features from virsh capabilities - could you attach the output of the following:
$ virsh capabilities

Finally is this a regression of a case that worked e.g. with Newton/Yakkety but now fails on Ocata/Zesty?

ChristianEhrhardt (paelzer) wrote :

Also subscribing Dannf for virt-on-arm experience

Larry Michel (lmic) wrote :
Download full text (13.9 KiB)

Hi Christian, here's the requested data below. This is with Xenial and Ocata. It has worked with Xenial and Mitaka.

ubuntu@phanpy:~$ virsh capabilities
<capabilities>

  <host>
    <uuid>12ded42d-2831-44a4-b0ba-cc3af3481077</uuid>
    <cpu>
      <arch>aarch64</arch>
      <topology sockets='1' cores='48' threads='1'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
    </cpu>
    <power_management/>
    <migration_features>
      <live/>
      <uri_transports>
        <uri_transport>tcp</uri_transport>
        <uri_transport>rdma</uri_transport>
      </uri_transports>
    </migration_features>
    <topology>
      <cells num='1'>
        <cell id='0'>
          <memory unit='KiB'>131989624</memory>
          <pages unit='KiB' size='4'>32997406</pages>
          <pages unit='KiB' size='2048'>0</pages>
          <distances>
            <sibling id='0' value='10'/>
          </distances>
          <cpus num='48'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
            <cpu id='1' socket_id='0' core_id='1' siblings='1'/>
            <cpu id='2' socket_id='0' core_id='2' siblings='2'/>
            <cpu id='3' socket_id='0' core_id='3' siblings='3'/>
            <cpu id='4' socket_id='0' core_id='4' siblings='4'/>
            <cpu id='5' socket_id='0' core_id='5' siblings='5'/>
            <cpu id='6' socket_id='0' core_id='6' siblings='6'/>
            <cpu id='7' socket_id='0' core_id='7' siblings='7'/>
            <cpu id='8' socket_id='0' core_id='8' siblings='8'/>
            <cpu id='9' socket_id='0' core_id='9' siblings='9'/>
            <cpu id='10' socket_id='0' core_id='10' siblings='10'/>
            <cpu id='11' socket_id='0' core_id='11' siblings='11'/>
            <cpu id='12' socket_id='0' core_id='12' siblings='12'/>
            <cpu id='13' socket_id='0' core_id='13' siblings='13'/>
            <cpu id='14' socket_id='0' core_id='14' siblings='14'/>
            <cpu id='15' socket_id='0' core_id='15' siblings='15'/>
            <cpu id='16' socket_id='0' core_id='16' siblings='16'/>
            <cpu id='17' socket_id='0' core_id='17' siblings='17'/>
            <cpu id='18' socket_id='0' core_id='18' siblings='18'/>
            <cpu id='19' socket_id='0' core_id='19' siblings='19'/>
            <cpu id='20' socket_id='0' core_id='20' siblings='20'/>
            <cpu id='21' socket_id='0' core_id='21' siblings='21'/>
            <cpu id='22' socket_id='0' core_id='22' siblings='22'/>
            <cpu id='23' socket_id='0' core_id='23' siblings='23'/>
            <cpu id='24' socket_id='0' core_id='24' siblings='24'/>
            <cpu id='25' socket_id='0' core_id='25' siblings='25'/>
            <cpu id='26' socket_id='0' core_id='26' siblings='26'/>
            <cpu id='27' socket_id='0' core_id='27' siblings='27'/>
            <cpu id='28' socket_id='0' core_id='28' siblings='28'/>
            <cpu id='29' socket_id='0' core_id='29' siblings='29'/>
            <cpu id='30' socket_id='0' core_id='30' siblings='30'/>
            <cpu id='31' socket_id='0' core_id='31' siblings='31'/>
            <cpu id='32' socket_id='0' core_id='32' siblings='32'/>
            <cpu id='33' socket_id...

ChristianEhrhardt (paelzer) wrote :

Hi,
I was able to reproduce at least partially.

I must admit that I never heard about host-model before, host-passthrough usually.
Checking the doc [1] reveals that the function is broken pre libvirt 3.2/qemu2.9 - and e.g. the qemu is not even released yet.

So even if we would make host-model pass here on the initial check it will not "do what it is supposed".

OTOH the more common host-passthrough is working - so much I could confirm.

I see why host-model is nicer than passthrough, as it would be a bit more portable.
Maybe openstack changed and on Xenial/Mitaka selected passthrough - we might want to do so again on Ocata.

@Openstack Team
1. could you check what model was passed on Mitaka/Newton, I'd assume host-passthrough but well that is only "assuming"
2. if #1 is true, what would you think about fixing this by going back to host-passthrough for Ocata on aarch64?

[1]: https://libvirt.org/formatdomain.html

Changed in libvirt (Ubuntu):
status: New → Incomplete
ChristianEhrhardt (paelzer) wrote :

incomplete on libvirt waiting for openstack expertise on this case

ChristianEhrhardt (paelzer) wrote :

Simple test on the option itself without Openstack

 $ dd if=/dev/zero of=flash0.img bs=1M count=64
 $ dd if=/usr/share/qemu-efi/QEMU_EFI.fd of=flash0.img conv=notrunc
 $ dd if=/dev/zero of=flash1.img bs=1M count=64
 $ wget http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-arm64-uefi1.img

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
        <name>testguest</name>
        <uuid>f9c9e534-6233-482c-842d-88b8606a4604</uuid>
        <memory unit='KiB'>1048576</memory>
        <currentMemory unit='KiB'>1048576</currentMemory>
        <vcpu placement='static'>1</vcpu>
        <os>
                <type arch='aarch64' machine='virt-2.8'>hvm</type>
        </os>
        <features>
                <gic version='3'/>
        </features>
  <cpu mode='host-model'>
          <model fallback='allow'/>
  </cpu>
<!--
  <cpu mode='host-passthrough'>
          <model fallback='allow'/>
  </cpu>
        <cpu mode='custom' match='exact'>
                <model fallback='allow'>host</model>
        </cpu>
-->
        <clock offset='utc'/>
        <on_poweroff>destroy</on_poweroff>
        <on_reboot>restart</on_reboot>
        <on_crash>destroy</on_crash>
        <devices>
                <emulator>/usr/bin/qemu-system-aarch64</emulator>
                <disk type='file' device='disk'>
                        <driver name='qemu' type='raw'/>
                        <source file='/home/ubuntu/xenial-server-cloudimg-arm64-uefi1.img'/>
                        <target dev='hdc' bus='virtio'/>
                        <address type='virtio-mmio'/>
                </disk>
                <controller type='pci' index='0' model='pcie-root'/>
                <memballoon model='none'/>
                <interface type='network'>
                        <mac address='52:54:00:af:8f:2f'/>
                        <source network='default'/>
                        <model type='virtio'/>
                </interface>
                <serial type='pty'>
                        <target port='0'/>
                </serial>
                <console type='pty'>
                        <target type='serial' port='0'/>
                </console>
        </devices>
</domain>

Corey Bryant (corey.bryant) wrote :

Larry, can you get domain xml from newton for comparison?

Also, I assume you hit this on xenial-ocata, so I think it would make sense to just use xenial-newton (vs yakkety-newton) to compare.

Note that xenial-ocata has a new version of libvirt (and qemu):
libvirt: 2.5.0-3ubuntu3
qemu: 1:2.8+dfsg-3ubuntu2

xenial-mitaka and xenial-newton both use:
libvirt: 1.3.1-1ubuntu10.8
qemu: 1:2.5+dfsg-5ubuntu10.9

Corey Bryant (corey.bryant) wrote :
Download full text (3.3 KiB)

I'm not seeing any major changes in nova between newton and ocata wrt host-model. That's not to say this isn't a bug in nova.

As I noted previously, xenial-ocata is using libvirt 2.5.0 whereas xenial-mitaka and xenial-newton are using libvirt 1.3.1. This error and the function that it comes from are new since sometime after libvirt 1.3.1: "libvirtError: unsupported configuration: CPU mode 'host-model' for aarch64 kvm domain on aarch64 host is not supported by hypervisor".

This is coming from the following code, pasted from libvirt 2.5.0:

5020 static int
5021 qemuProcessUpdateGuestCPU(virDomainDefPtr def,
5022 virQEMUCapsPtr qemuCaps,
5023 virCapsPtr caps,
5024 unsigned int flags)
5025 {
5026 int ret = -1;
5027 size_t nmodels = 0;
5028 char **models = NULL;
5029
5030 if (!def->cpu)
5031 return 0;
5032
5033 /* nothing to do if only topology part of CPU def is used */
5034 if (def->cpu->mode == VIR_CPU_MODE_CUSTOM && !def->cpu->model)
5035 return 0;
5036
5037 /* Old libvirt added host CPU model to host-model CPUs for migrations,
5038 * while new libvirt just turns host-model into custom mode. We need
5039 * to fix the mode to maintain backward compatibility and to avoid
5040 * the CPU model to be replaced in virCPUUpdate.
5041 */
5042 if (!(flags & VIR_QEMU_PROCESS_START_NEW) &&
5043 ARCH_IS_X86(def->os.arch) &&
5044 def->cpu->mode == VIR_CPU_MODE_HOST_MODEL &&
5045 def->cpu->model) {
5046 def->cpu->mode = VIR_CPU_MODE_CUSTOM;
5047 }
5048
5049 if (!virQEMUCapsIsCPUModeSupported(qemuCaps, caps, def->virtType,
5050 def->cpu->mode)) {
5051 virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
5052 _("CPU mode '%s' for %s %s domain on %s host is not "
5053 "supported by hypervisor"),
5054 virCPUModeTypeToString(def->cpu->mode),
5055 virArchToString(def->os.arch),
5056 virDomainVirtTypeToString(def->virtType),
5057 virArchToString(caps->host.arch));
5058 return -1;
5059 }

This code was introduced by the following commit:

commit 7ce711a30eaf882ccd0217b2528362b563b6d670
Author: Jiri Denemark <email address hidden>
Date: Wed Jun 22 15:53:48 2016 +0200

    qemu: Update guest CPU def in live XML

    Storing the updated CPU definition in the live domain definition saves
    us from having to update it over and over when we need it. Not to
    mention that we will soon further update the CPU definition according to
    QEMU once it's started.

    A highly wanted side effect of this patch, libvirt will pass all CPU
    features explicitly specified in domain XML to QEMU, even those that are
    already included in the host model.

    This patch should fix the following bugs:
        https://bugzilla.redhat.com/show_bug.cgi?id=1207095
        https://bugzilla.redhat.com/show_bug.cgi?id=1339680
        https://bugzilla.redhat.com/show_bug.cgi?id=1371039
        https://bugzilla....

Read more...

ChristianEhrhardt (paelzer) wrote :

Thanks Corey!
Also for pulling out the code.

It still smells like "was never working well, but now it tells you".

To be sure it was specified that way before @Larry would you add a domain xml from Newton to round up the picture so we can continue thinking on where a fix would be appropriate.

Sean Dague (sdague) wrote :

This looks like an upstream libvirt/qemu issue that got exposed with the version bump in Ubuntu. Closing the upstream Nova side.

Changed in nova:
status: New → Opinion
Ryan Beisner (1chb1n) on 2017-03-20
tags: added: arm64 uosci
Jason Hobbs (jason-hobbs) wrote :

Here's the requested xml from an arm64 newton instance:

https://pastebin.ubuntu.com/24230654/

Raghuram Kota (rkota) wrote :

@Corey @Christian the xml in comm#12 from Jason is Xenial-Newton.

ChristianEhrhardt (paelzer) wrote :

Snipping the interesting parts:

    <type arch='aarch64' machine='virt'>hvm</type>
[...]
  <features>
    <gic version='3'/>
[...]
  <cpu mode='host-model'>
    <model fallback='allow'/>
[...]

That means on Xenial-Newton that worked for you.
Yet as outlined before "worked" is a bit special here as it meant more "passed but didn't work" as outlined in comment #5 - I don't know the details of this.

With that in mind for now lets simplify the Test and try to get Openstack out of the equation for now (although it might eventually be the best place for a solution).

Will update after some tests ...

ChristianEhrhardt (paelzer) wrote :
Download full text (3.3 KiB)

# Similar to my test in comment #7, but with slighly modified XML as in [1]
Start on Xenial, upgrade one by one

At the spot of xx CPU xx I had one of the following:

hm - host model
  <cpu mode='host-model'>
    <model fallback='allow'/>
  </cpu>
hp - host passthrough
  <cpu mode='host-passthrough'>
     <model fallback='allow'/>
  </cpu>
maf - match allow fallback
  <cpu mode='custom' match='exact'>
     <model fallback='allow'>host</model>
  </cpu>
mff - match forbid fallback
  <cpu mode='custom' match='exact'>
     <model fallback='forbid'>host</model>
  </cpu>

Those four modes to some extend mean the same "make it as close to the host cpu as possible" with slight differences, see [2]. And the "bonus" that host-model was broken until soon to be released qemu/libvirt versions.

Adding next release in sources and using "apt-get install -t" I was able to test qemu/libvirt upgrades one by one.

Testing: hm hp maf mff
LV 1.3.1 / Qemu 2.5 ok ok ok ok
LV 1.3.1 / Qemu 2.6.1 fails by LV 1.3.1 not able to handle virt-2.6
LV 2.1 / Qemu 2.6.1 ok ok ok ok
LV 2.1 / Qemu 2.8 ok ok ok ok
LV 2.5 / Qemu 2.8 fail ok ok ok

All of the rest of the system is still as on Xenial, so we can exclude other packages.

We can see that "only" host-model regressed, certainly due to the mentioned changes that it was broken so far.

By that we get to the libvirt code that corey already posted.
That code changed a lot 2.1-2.5.
To some extend it comes down to the check in virQEMUCapsIsCPUModeSupported, but I need to find the changes that lead to add the "host-model doesn't work" statement in the xml doc to get more background.

The whole call to qemuProcessUpdateGuestCPU did not exist in 2.1, to some extend the old check was "qemuProcessStartValidateGuestCPU".
The introduction of the check that now breaks was in 803497a8 that added virQEMUCapsIsCPUModeSupported which checks against "!!qemuCaps->hostCPUModel".
That in turn
Related changes:
- http://libvirt.org/git/?p=libvirt.git;a=commit;h=7ce711a30eaf882ccd0217b2528362b563b6d670 (2.3)
- http://libvirt.org/git/?p=libvirt.git;a=commit;h=803497a8acdc76b9b229bd27d595ec89beed2f3e (2.3)

I checked that on x86 host-model works with libvirt 2.5.
That means there the qemuCaps hold this flag.

After confirming that known it was "easy" to have a much simpler check.
The capabilities this is checked is like "the guest wants X can the emulator run X".
Not that is as easy as:
$ virsh domcapabilities --emulatorbin /usr/bin/qemu-system-aarch64 | grep host-model
    <mode name='host-model' supported='no'/>
While
$ virsh domcapabilities --emulatorbin /usr/bin/qemu-system-x86_64 | grep host-model
    <mode name='host-model' supported='yes'>

With that I went back to
libvirt 2.1 on arm and there the whole cpu section was not populated (and as outlined libvirt did not check against it even if it would be).

I want to check how this is populated and also the commits that lead to the doc in the xml stating that host-model was flawed before L3.2/Q2.9.

But before knowing better I'd say libvirt is right to reject ...

Read more...

ChristianEhrhardt (paelzer) wrote :

On x86 due to a compat fix host-model effectively is CUSTOM see qemuProcessUpdateGuestCPU (in coreys post).

I also checked where this gets populated but didn't find a sweet sport to see if is now different.
But qemuBuildCpuModelArgStr was interesting.

On 2.5 and master stripped of unimportant to aarch64 it is:
- host-passthrough -> add "host"
- host-model -> fail
  (remember x86 converts host-model to custom)
- custom -> add "whatever was given"

So that won't work even if we would let it pass the check.

And while the code changed the semantics did not - yet there it handled host-model "like" passthrough not in the construction but due to a fallback in detection.

That known I checked the qemu command strings it constructed in the past, it was: "-cpu host".
Aha, no specials due to host-model (as it didn't work), so on aarch64 host-model was equal to host-passthrough with the failure that the "advertised" extra that host-model has over host-passthrough did not work.

Now libvirt changed to know that and tells you that host-model is not working, but that is actually correct - even it seemed so in the first place I'd not consider this a regression in the usual sense.

- Regression: sometihng that worked now fails
- This: something that you thought works, but didn't now fails telling you so

Summarizing in next comment to catch everybody and sync with the openstack Team on this.

ChristianEhrhardt (paelzer) wrote :

I know it is a lot of text, but not all of you have to read everything, so here my TL;DR: (IMHO)

- no libvirt regression; host-model never worked, just no one noticed so far
- Openstack should change (at least for that arch) to host-passthrough
  - if the "feature" of host-model is really wanted it can be reenabled on a version check on qemu/libvirt > 2.9/3.2 (or later depending how things work out)

Corey Bryant (corey.bryant) wrote :

Switching the status back to New for upstream nova since there may be a need to switch from host-model to host-passthrough in nova.

Changed in nova:
status: Opinion → New
Kevin Zhao (kevin-zhao) on 2017-03-28
Changed in nova:
assignee: nobody → Kevin Zhao (kevin-zhao)
ChristianEhrhardt (paelzer) wrote :

FYI - On the Libvirt side the new 3.2 release has the following statement which is - at least -
 related to the semantics of a host-* cpu specification. That is really a major change which we will unlikely SRU, but pick up naturally when we move to this or a later libvirt version - likely on aa-relase. On top this change still is x86_64 only for now, I'd expect further changes for other architectures down the road (CPU Feat detection is very different per arch anyway, so other arches might go different routes - but this sheds some light on some of the insufficiencies of the old detection at least).

So far just FYI:

- qemu: Detect host CPU model by asking QEMU on x86_64
Previously, libvirt detected the host CPU model using CPUID
instruction, which caused libvirt to detect a lot of CPU features that
are not supported by QEMU/KVM. Asking QEMU makes sure we don't start it
with unsupported features.

Raghuram Kota (rkota) wrote :

The plan is to implement an arch specific charm change to pass in the "host-passthrough" as CPU mode until a newer libvirt with functional "host-model" parameter is available in a future Ubuntu release.

Ryan Beisner (1chb1n) on 2017-04-03
Changed in charm-nova-compute:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Ryan Beisner (1chb1n)
Ryan Beisner (1chb1n) wrote :

I've tried cpu modes: none, host-model, and host-passthrough. I'm not able to create nova instances on Ocata arm64 with any of these modes.

Apr 03 17:46:03 juju-b15153-15 libvirtd[25804]: unsupported configuration: logfile not supported in this QEMU binary
Apr 03 17:55:31 juju-b15153-15 libvirtd[25804]: End of file while reading data: Input/output error
Apr 03 17:55:48 juju-b15153-15 libvirtd[25804]: this function is not supported by the connection driver: cannot get node CPU data for aarch64 architecture
Apr 03 17:55:48 juju-b15153-15 libvirtd[25804]: Failed to get host CPU
Apr 03 17:55:49 juju-b15153-15 libvirtd[25804]: End of file while reading data: Input/output error
Apr 03 17:56:06 juju-b15153-15 libvirtd[25804]: this function is not supported by the connection driver: cannot get node CPU data for aarch64 architecture
Apr 03 17:56:06 juju-b15153-15 libvirtd[25804]: Failed to get host CPU
Apr 03 17:56:07 juju-b15153-15 libvirtd[25804]: End of file while reading data: Input/output error
Apr 03 17:56:25 juju-b15153-15 libvirtd[25804]: this function is not supported by the connection driver: cannot get node CPU data for aarch64 architecture
Apr 03 17:56:25 juju-b15153-15 libvirtd[25804]: Failed to get host CPU

Ryan Beisner (1chb1n) on 2017-04-04
Changed in charm-nova-compute:
status: Confirmed → Incomplete
ChristianEhrhardt (paelzer) wrote :

Hi Ryan, that is most disturbing as we proved that working in the lower layers (just switching in the guest xml) with only libvirt/qemu.
Can you provide the Guest XMLs it tries to deploy in these cases?

Hua Zhang (zhhuabj) wrote :

I hit this problem with yakkety-newton as well today, seems nested kvm doesn't work on yakkety (mitaka doesn't have this problem), it said:

2017-04-11 12:12:21.047 19019 ERROR nova.compute.manager [instance: 4912dc24-fb89-4b83-9b59-d139547b7c28] libvirtError: invalid argument: could not find capabilities for domaintype=kvm

root@juju-5e1208-basic-yakkety-newton-7:~# kvm-ok
INFO: /dev/kvm does not exist
HINT: sudo modprobe kvm_intel
INFO: Your CPU supports KVM extensions
KVM acceleration can be used

root@juju-5e1208-basic-yakkety-newton-7:~# modprobe kvm_intel
modprobe: ERROR: could not insert 'kvm_intel': Input/output error

then cause we see openstack's errors above, more info can refer the link http://paste.ubuntu.com/24364083/

ChristianEhrhardt (paelzer) wrote :

Hi Hua, this is a different issue - forked that into 1683670

Ryan Beisner (1chb1n) on 2017-06-09
Changed in charm-nova-compute:
assignee: Ryan Beisner (1chb1n) → Andrew McLeod (admcleod)
Ryan Beisner (1chb1n) on 2017-06-09
Changed in charm-nova-compute:
status: Incomplete → New
status: New → Confirmed
ChristianEhrhardt (paelzer) wrote :

Got this today via Mail, linking here:

none:

https://pastebin.canonical.com/190574/

host-model:

https://pastebin.canonical.com/190578/

host-passthrough

https://pastebin.canonical.com/190579/

ChristianEhrhardt (paelzer) wrote :

The former came down to the difference of
none:
  <cpu>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>

host-model:
  <cpu mode='host-model'>
    <model fallback='allow'/>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>

host-passthrough:
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>

ChristianEhrhardt (paelzer) wrote :

@Andrew McLeod
To clarify - your Data is a follow on to Ryan in Comment #21 tight?
And I assume you still see that all of them fail for not getting CPU Data.

Which is odd, as I stated in #22 as it worked before and we were on my and Ryans system only about tweaking host-model to host-passthrough to get things working in regard to the initial bug report.

Thanks for the full XMLs, I'm now testing with these if I can reproduce any of these new/related issues.

ChristianEhrhardt (paelzer) wrote :

@admcleod - While my system is preparing to test this I think the logs you added are already kind of proving that the issue this bug was reported about is kind of solved.

In regard to your logs - the related error:
none:
-> Passes the initialization but then breaks on logfile

host-model:
-> Fails due to host-model being broken

host-passthrough
-> Passes the initialization but then breaks on logfile

That might be an issue, but a different one - so I forked off bug 1697610

In regard to the cpu-types here I got it working with host-passthrough as stated before.
Keeping this bug incomplete.

ChristianEhrhardt (paelzer) wrote :

BTW - if you want to teach the charm to check such things FYI:

$ virsh domcapabilities | grep host-model
    <mode name='host-model' supported='no'/>

Andrew McLeod (admcleod) wrote :

@paelzer #27 - yes the data was a follow up to #21

Sean Dague (sdague) on 2017-06-23
Changed in nova:
assignee: Kevin Zhao (kevin-zhao) → nobody
Sean Dague (sdague) wrote :

Automatically discovered version ocata in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.ocata
Andrew McLeod (admcleod) wrote :

I managed to get around this issue:

libvirtError: unsupported configuration: logfile not supported in this QEMU binary

By modifying the domain xml and removing the child nodes for serial and console.

So now there are more reasonable errors for cpu-modes none and host-model

none:

https://pastebin.canonical.com/192075/

host-model:

https://pastebin.canonical.com/192074/

however, host-passthrough also does not work:

https://pastebin.canonical.com/192073/

Andrew McLeod (admcleod) wrote :

A general description of the workaround for 'logfile not supported in this QEMU binary' on arm64 with ocata is:

in nova/virt/libvirt/guest.py:115, in the create function, added:

        xml = etree.fromstring(xml)
        for bad in xml.xpath("//log"):
                bad.getparent().remove(bad)
        #for item in xml.findall('console'):
        # xml.remove(item)
        #for item in xml.findall('serial'):
        # xml.remove(item)
        xml = etree.tostring(xml)

        # for debugging...
        txt_file = open("/tmp/xml_out.xml", "w")
        txt_file.write(xml)
        txt_file.close()

It appears removing just the 'log' element should be sufficient as this was the specific error, but i tested more thoroughly with removing console and serial elements entirely.

The workaround for the error 'libvirtError: Requested operation is not valid: domain is already running' was to modify dist-packages/libvirt.py:1097, and completely pass the resume function- seems like the check of whether the domain is already running or not is faulty?

Once these changes were made I could launch an instance with host-passthrough, give it a floating IP and ssh into it.

Corey Bryant (corey.bryant) wrote :

It seems like two possible fixes are:

1) Update libvirt's qemuProcessUpdateGuestCPU() to not limit the the switch of cpu mode from host-model->custom to ARCH_IS_X86(). Would there be side-effects if we also switch host-model->custom when ARCH_IS_ARM()? Perhaps we can check with upstream libvirt devs about this if someone hasn't already.

2) In nova, it looks like host-model is set in _get_guest_cpu_model_config(). Could we update that to set mode = "host-passthrough" if AARCH64, where it currently sets mode = "host-model"? If I understand Christian's comments above correctly, this should result in the same behavior we had prior to ocata as host-model was switched to host-passthrough.

ChristianEhrhardt (paelzer) wrote :

Hi Andrew and Corey,
first of all thanks for the re-checks.
- None is usually not supported on arm, as there are too different cpus and you have to select one.
- Host-passthrough fails for "domain is already running" that seems unrelated to the types and actually means it might be running after all :-)
- Host-model, as I said is known broken so the refuse via "'host-model' for aarch64 kvm domain on aarch64 host is not supported by hypervisor" seems correct still

About Coreys suggestions:
I'm pretty sure there were reasons for the "ARCH_IS_X86" in [1] - especially on a platform as diverse in cpu models as arm just turning it to custom might have side effects we don't see.
I haven't discussed #1 with upstream and at least I didn't see anyone else doing it.
Therefore #2 might be the much safer solution if you want it to work on arm for now.

It might be worth to check the host-passthrough case to work before that, but as I said above "domain already running" sounds like an unrelated issue.

[1]: http://libvirt.org/git/?p=libvirt.git;a=commit;h=7ce711a30

ChristianEhrhardt (paelzer) wrote :

Hi trying to get the status right here.
AFAIK these things are changed in libvirt 3.2 to be good - at least for x86, not sure if/how arm followed but since there were major changes we should consider it fixed and re-analyze from there for the development release.
So the coming merge of a newer libvirt will fix it for current Ubuntu-dev.

Since it is a "known behavior" on zesty's version I added a task on won't fix as so far - as I understood - we thought to better fix it in Openstack creating the xml's.

Furthermore adding a qemu task which will need to be >=2.9 to let the code in libvirt >=3.2 really work correctly.

Read [1] for the version references.

[1]: https://libvirt.org/formatdomain.html#elementsCPU

Changed in libvirt (Ubuntu Zesty):
status: New → Triaged
status: Triaged → Won't Fix
no longer affects: qemu (Ubuntu Zesty)
Changed in qemu (Ubuntu):
status: New → Triaged
Changed in libvirt (Ubuntu):
status: Incomplete → Triaged
Ryan Beisner (1chb1n) wrote :

Update: we've got a rough diff patch as a work-around for nova, using host-passthrough. That patch is likely not exactly what we will propose upstream to Nova. Next steps will be to revise and minimize that patch, and propose a fix upstream in nova master. Once that lands, we can propose a cherry pick back to Ocata, etc.

In parallel to that, we will need to reproduce and triage the "domain is already running" issue, if it is indeed persistent and present.

Ryan Beisner (1chb1n) wrote :

We are working to re-confirm this issue exists on Pike-B3, since that is where any upstream nova patches will be initially proposed.

Launchpad Janitor (janitor) wrote :
Download full text (14.4 KiB)

This bug was fixed in the package libvirt - 3.5.0-1ubuntu1

---------------
libvirt (3.5.0-1ubuntu1) artful; urgency=medium

  * Merged with Debian unstable (3.5)
    This closes several bugs:
    - improved handling of host-model since libvirt 3.2 (LP: #1673467)
    - Adding POWER9 cpu model to cpu_map.xml (LP: #1690209)
  * Remaining changes:
    - Disable sheepdog (universe dependency)
    - Disable libssh2 support (universe dependency)
    - Disable firewalld support (universe dependency)
    - Disable selinux
    - Enable esx support
      + Add build-dep to libcurl4-gnutls-dev (required for esx)
    - Set qemu-group to kvm (for compat with older ubuntu)
    - Regularly clear AppArmor profiles for vms that no longer exist
    - Additional apport package-hook
    - Modifications to adapt for our delayed switch away from libvirt-bin (can
      be dropped >18.04).
      + d/p/ubuntu/libvirtd-service-add-bin-alias.patch: systemd: define alias
        to old service name so that old references work
      + d/p/ubuntu/libvirtd-init-add-bin-alias.patch: sysv init: define alias
        to old service name so that old references work
      + d/control: transitional package with the old name and maintainer
        scripts to handle the transition
    - Backwards compatible handling of group rename (can be dropped >18.04).
    - config details and autostart of default bridged network. Creating that is
      now the default in general, yet our solution provides the following on
      top as of today:
      + nat only on some ports <port start='1024' end='65535'/>
      + autostart the default network by default
      + do not autostart if 192.168.122.0 is already taken (e.g. in containers)
    - d/p/ubuntu/Allow-libvirt-group-to-access-the-socket.patch: This is
      the group based access to libvirt functions as it was used in Ubuntu
      for quite long.
      + d/p/ubuntu/daemon-augeas-fix-expected.patch fix some related tests
        due to the group access change.
    - ubuntu/parallel-shutdown.patch: set parallel shutdown by default.
    - d/p/ubuntu/enable-kvm-spice.patch: compat with older Ubuntu qemu/kvm
      which provided a separate kvm-spice.
    - d/p/ubuntu/storage-disable-gluster-test: gluster not enabled, skip test
    - d/p/ubuntu/ubuntu-libxl-qemu-path.patch: this change was split. The
      section that adapts the path of the emulator to the Debian/Ubuntu
      packaging is kept.
    - d/p/ubuntu/ubuntu-libxl-Fix-up-VRAM-to-minimum-requirements.patch: auto
      set VRAM to minimum requirements
    - d/p/ubuntu/xen-default-uri.patch: set default URI on xen hosts
    - Add libxl log directory
    - libvirt-uri.sh: Automatically switch default libvirt URI for users on
      Xen dom0 via user profile (was missing on changelogs before)
    - d/p/ubuntu/apibuild-skip-libvirt-common.h: drop libvirt-common.h from
      included_files to avoid build failures due to duplicate definitions.
    - Update README.Debian with Ubuntu changes
    - Convert libvirt0, libnss_libvirt and libvirt-dev to multi-arch.
    - Enable some additional features on ppc64el and s390x (for arch parity)
      + systemtap, zfs, numa and numad on s390x.
      + sys...

Changed in libvirt (Ubuntu):
status: Triaged → Fix Released
Ryan Beisner (1chb1n) on 2017-07-20
Changed in charm-nova-compute:
importance: High → Critical
Ryan Beisner (1chb1n) wrote :

We need this fix in Ocata, which is delivered via Zesty versions. I see that it is triaged as wont-fix for Zesty. Can we revisit that decision? Is the fix something that can be picked back to the libvirt in Zesty?

ChristianEhrhardt (paelzer) wrote :

Hi Ryan,
the "fix" on the libvirt side is to actually understand host-model on x86.
As outlined before that doesn't mean anything for arm yet.
The changes are too huge in libvirt and also need a much newer qemu to work (2.9) - so they make no sense for zesty which is why it is won't fix.

The fix for Ocata has to be done in openstack (as discussed before) to send host-passthrough instead of host-model - the fix mentioned by you in comment #37, that is the fix that will be the one for UCA-O.

And even for UCA-P based on Artful we will have to test if the final combination of libvirt/qemu in Artful is enough to get it working on arm as well (as most development around this feature-fix was driven by x86 only).

James Page (james-page) wrote :

OK - some updates

I've tested with 2.5.0 and 3.5.0 of libvirt using OpenStack Ocata.

2.5.0 libvirt/qemu does not support host-model - only host-passthrough so I agree that we should just being doing the right thing either in the charm (as we've done for other non-x86 arch) or the nova codebase - my preference is for the first as we avoid to much magic behaviour.

3.5.0 has the same limitation; I'm actually bumping into a different issue for which I'll raise another bug for arm64 instances using that libvirt stack.

tl;dr - lets just use host-passthrough and move on.

Changed in charm-nova-compute:
status: Confirmed → Triaged
importance: Critical → High
Changed in qemu (Ubuntu):
status: Triaged → Won't Fix
Changed in libvirt (Ubuntu):
status: Fix Released → Invalid
status: Invalid → Won't Fix
Changed in nova:
status: New → Invalid
Changed in charm-nova-compute:
assignee: Andrew McLeod (admcleod) → James Page (james-page)

Fix proposed to branch: master
Review: https://review.openstack.org/487422

Changed in charm-nova-compute:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/487422
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=b5d9b18c0afd06b721d78bced96b4c6c19f77834
Submitter: Jenkins
Branch: master

commit b5d9b18c0afd06b721d78bced96b4c6c19f77834
Author: James Page <email address hidden>
Date: Wed Jul 26 14:35:33 2017 +0100

    aarch64: set default cpu_mode to host-passthrough

    Unless explicit configuration is supplied by the charm user, set
    the cpu_mode configuration on the aarch64 architecture to
    host-passthrough; host-model is not supported by the underlying
    hypervisor.

    Change-Id: I6df2d70e7b5fed7e614ca981864f6f737a1a90eb
    Closes-Bug: 1673467

Changed in charm-nova-compute:
status: In Progress → Fix Committed
Ryan Beisner (1chb1n) wrote :

Typo in previous comment. Sept 7 2017 is the correct date.

FYI: The charm fix component of this is slated to be fix-released along with the OpenStack Charms release on Sept 7 2017. Do bear in mind that there may still be some package changes in flight at that time, as the Charms release doesn't align to the Ubuntu distro release in all cases.

Changed in charm-nova-compute:
milestone: none → 17.08
James Page (james-page) on 2017-09-12
Changed in charm-nova-compute:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.