qemu dropped osxsave/ospke feature triggering upgrade issues

Bug #1825195 reported by Christian Ehrhardt  on 2019-04-17
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Status tracked in Eoan
Disco
Undecided
Unassigned
Eoan
Medium
Unassigned
qemu (Fedora)
Won't Fix
Undecided
qemu (Ubuntu)
Medium
Unassigned
Disco
Undecided
Unassigned
Eoan
Medium
Unassigned

Bug Description

[Impact]

 * Newer qemu dropped a few features (that never worked) without
   deprecation period. In some edge cases that might trigger qemu no more
   starting at all.

 * Fix by backporting the upstream fix (in those Ubuntu releases that have
   an affected qemu, which means >=Disco)

[Test Case]

 * Define a lbivirt Guest in your preferred style (uvtool / virt-install /
   ...).
 * Shutdown the guest and then modify it to contain the following:
    <cpu mode='custom' match='exact' check='partial'>
      <model fallback='allow'>core2duo</model>
      <feature policy='disable' name='osxsave'/>
      <feature policy='disable' name='ospke'/>
    </cpu>
 * Start the guest again.
   Without the fix it will fail and show:
   ... Property '.osxsave' not found

[Regression Potential]

 * The features never did anything to qemu (which is the reason they got
   dropped), so there can#t be an actual feature regression.
   The one issue I could imagine is if people prior to Disco were used to
   check their qemu cmdline and expect ospke/osxsave to be there.
   Those people will now see it gone (but if it is there in there case
   they are affected by this bug and their guests won't start anymore). So
   those people affected by this corner case are exactly thos ethat would
   most likely want the fix.

[Other Info]

 * n/a

---

## Bug ##

There are conditions where old qemu/libvirt created guests with the osxsafe or ospke cpu feature:

    <feature policy='disable' name='osxsave'/>
    <feature policy='optional' name='ospke'/>

That feature was removed in recent qemu, this triggers issues starting some old guests definitions using it after upgrade.

  qemu-system-x86_64: can't apply global Broadwell-noTSX-x86_64-
  cpu.osxsave=on: Property '.osxsave' not found

Same bug in Fedora (incomplete) [5] for being non reproducible - probably didn't see the new virt-install avoiding (but not fixing) it.

## What happened ##

Both commandline arg drops "osxsave" / "ospke" were effective no-ops as it was - quote: "never configurable: KVM never returned OSXSAVE on GET_SUPPORTED_CPUID". See removal commits [1][2].

Discussions went on if this should be warnings instead of errors for a while (the deprecation discussion is ongoing anyway). But for now we are in the situation that calling qemu with those features makes it fail to start. I'd not want to derive from upstream qemu and the discussion on depreceation - as mentioned - is a longer one that won't resolve too soon - a.k.a we can't wait on that.

The discussion at [3] reached no conclusion and was forgotten. I checked with Jiri and there was no follow on.

But since specifying the features never meant anything to qemu we should at least consider libvirt to learn about that and just not specify the flags.
If there is a good place for a warning we might emit one if required was set, but this is not strictly required at first.

## when does it trigger => severity ##

Comment 14 [4] adds more details, we checked usage through virsh, python-livbirt (uvtool, multipass, openstack) and virt-install. We only found two cases in virt-install either:
- created the guest in the past --cpu=host-model
  This guest will fail to start on upgrade
- pre virt-install 2.0 (not a Ubuntu combo) it would also break with
  e.g. virt-install 1.5 and qemu 3.1 with --cpu=host-copy
=> Both options were not even documented anymore back n Xenial, but they are kept for compatibility.
Overall - unless we missed a more important use case - this is ugly but not show-stopping in prio.

To further enforce this not being a common case, if you'd have set required you'd have got the following all along:

  error: the CPU is incompatible with host CPU:
  Host CPU does not provide required features: ospke

This is due to the features rarely (never?) exists in the host.
But if set to off or optional (which becomes off if the host doesn't have it) then you'd have seen the new issue. The same applies to "disable" which would pass the pre-check but then let new qemu fail.

Obviously it also triggers if:
- libvirt XML with added <feature policy='optional' name='osxsave'/>
- virt-install with --cpu ...,+osxsave / ospke

## Workaround until resolved ##

Per the analysis above it hopefully should only affect a very low number of people with very old virtual machines anyway, but if you are affected you want a way out and that is fair.

For now just remove that features:
$ virsh edit <guestname>
# remove lines like these
    <feature policy='optional' name='osxsave'/>
    <feature policy='optional' name='ospke'/>
Or the same with disable instead of optional.

Now your guest will start again.

---

[1]: https://git.qemu.org/?p=qemu.git;a=commit;h=f1a23522b03a569f13aad49294bb4c4b1a9500c7
[2]: https://git.qemu.org/?p=qemu.git;a=commit;h=9ccb9784b57804f5c74434ad6ccb66650a015ffc
[3]: https://<email address hidden>/msg561877.html
[4]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1825195/comments/14
[5]: https://bugzilla.redhat.com/show_bug.cgi?id=1644848

Description of problem:
An existing VM (running Fedora 28) will not start in F29.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.Attempt to run a VM created under F28, using Virtual Machine Manager
2.Fail
3.

Actual results:

Error starting domain: internal error: process exited while connecting to monitor: 2018-10-31T17:12:46.079682Z qemu-system-x86_64: can't apply global IvyBridge-IBRS-x86_64-cpu.osxsave=on: Property '.osxsave' not found

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1344, in startup
    self._backend.create()
  File "/usr/lib64/python3.7/site-packages/libvirt.py", line 1080, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: internal error: process exited while connecting to monitor: 2018-10-31T17:12:46.079682Z qemu-system-x86_64: can't apply global IvyBridge-IBRS-x86_64-cpu.osxsave=on: Property '.osxsave' not found

Expected results:
VM working as before

Additional info:
Guest is a basic F28 server with no GUI and has worked correctly before updating the host to F29.

The "osxsave" property was removed from QEMU upstream as it was never actually exposed to the guests.

I expect that your existing guest has this CPU flag encoded in its XML config, as it was a previously supported flag.

Fixing it should be as simple as "virsh edit $guest" as root and delete the mention of "osxsave" feature flag.

Newly provisioned guests shouldn't get given this flag in the first place, only upgraded guests will suffer.

(In reply to Daniel Berrange from comment #1)
> The "osxsave" property was removed from QEMU upstream as it was never
> actually exposed to the guests.
>
> I expect that your existing guest has this CPU flag encoded in its XML
> config, as it was a previously supported flag.
>
> Fixing it should be as simple as "virsh edit $guest" as root and delete the
> mention of "osxsave" feature flag.
>
> Newly provisioned guests shouldn't get given this flag in the first place,
> only upgraded guests will suffer.

That solved it, thanks. Curiously, a Windows 10 guest, also inherited from F28, does not have this problem as the osxsave feature was not set. I've no idea why.

Maybe the problematic guest was in fact installed under an even earlier Fedora release than the Windows guest ?

In any case, while this is a genuine problem, I don't think we're going to try todo anything to automagically remove the flags on upgrade, so i'm moving this to WONTFIX.

(In reply to Daniel Berrange from comment #3)
> Maybe the problematic guest was in fact installed under an even earlier
> Fedora release than the Windows guest ?
>
> In any case, while this is a genuine problem, I don't think we're going to
> try todo anything to automagically remove the flags on upgrade, so i'm
> moving this to WONTFIX.

In fact it's the oher way round. The Windows guest was installed over a year ago. The Fedora guest is only a couple of months old at most.

(In reply to Patrick O'Callaghan from comment #4)
> (In reply to Daniel Berrange from comment #3)
> > Maybe the problematic guest was in fact installed under an even earlier
> > Fedora release than the Windows guest ?
> >
> > In any case, while this is a genuine problem, I don't think we're going to
> > try todo anything to automagically remove the flags on upgrade, so i'm
> > moving this to WONTFIX.
>
> In fact it's the oher way round. The Windows guest was installed over a year
> ago. The Fedora guest is only a couple of months old at most.

FYI an attempt to create a new VM triggered this error again. Since the VM XML file was never created, I had to track down the offending line in /usr/share/libvirt/cpu_map/x86_features.xml and remove it.

(In reply to Patrick O'Callaghan from comment #5)
> (In reply to Patrick O'Callaghan from comment #4)
> > (In reply to Daniel Berrange from comment #3)
> > > Maybe the problematic guest was in fact installed under an even earlier
> > > Fedora release than the Windows guest ?
> > >
> > > In any case, while this is a genuine problem, I don't think we're going to
> > > try todo anything to automagically remove the flags on upgrade, so i'm
> > > moving this to WONTFIX.
> >
> > In fact it's the oher way round. The Windows guest was installed over a year
> > ago. The Fedora guest is only a couple of months old at most.
>
> FYI an attempt to create a new VM triggered this error again. Since the VM
> XML file was never created, I had to track down the offending line in
> /usr/share/libvirt/cpu_map/x86_features.xml and remove it.

Removing the line from /usr/share/libvirt/cpu_map/x86_features.xml didn't fix the problem. I'm still getting a complaint about .osxsave so presumably it's being set somewhere else.

Reopening. Patrick can you provide:

* full virt-manager --debug output from app startup to reproducing the bug
* /var/log/libvirt/qemu/$vmname.log , for the VM name you are trying to create
* output of: sudo virsh domcapabilities

(In reply to Cole Robinson from comment #7)
> Reopening. Patrick can you provide:
>
> * full virt-manager --debug output from app startup to reproducing the bug
> * /var/log/libvirt/qemu/$vmname.log , for the VM name you are trying to
> create
> * output of: sudo virsh domcapabilities

On a second attempt, the error refuses to show itself. I tried both with and without the change to /usr/share/libvirt/cpu_map/x86_features.xml and it made no difference. I can only assume it's because I rebooted after some system updates (though these were not to qemu or libvirt directly). Now on kernel 4.19.13-300.fc29.x86_64 if it matters.

Sorry for the noise. I'll come back to this if it happens again.

TODO:
- manual adding of the osxsave flag
- Test old creation with host-model if it would have added osxsave

Ref: https://<email address hidden>/msg561877.html

TODO:
- test on Bionic host going in/out of stein-staging PPA (easiest test base to go back and forth)

Note: I pinged on IRC people being part of last years discussion as it seems to just have ended without actual resolution of the issue other then "bad luck update the guest XML".
After all, maybe they know where the discussions continued ...

Changed in qemu (Fedora):
importance: Unknown → Undecided
status: Unknown → Won't Fix

Added an updated summary for the reasons, now starting the tests on how much (or not) of a corner case that will most likely be.

Note: I also pinged the openstack Team to check their configs if it would be an issue for them.
Subscribed jamespage and coreycb here now as well.

description: updated
description: updated
description: updated
summary: - qemu lost osxsave feature bit (ok) which might cause upgrade issues (not
- so ok)
+ qemu dropped osxsave/ospke feature triggering upgrade issues

Triggers that would in the past have added the flag in the past - it will have to be fixed anyway, but this is mostly to evaluate the severity/urgency of the issue.

TL;DR: it seems rare to have been added, yet we haven't covered all cases and there must be a reason why it occurred

Will trigger, but usage is very unlikely/silly and not to be considered as important:
- any new libvirt XML creation with manual <feature policy='require' name='osxsave'/>
- virt-install with --cpu ...,+osxsave / ospke

OpenStack:
- I have checked a few OpenStack created x86 XMLs I foound on bugs and IS provided me a few that are running atm. I didn't find the feature there, but I'll wait for the feedback of the OpenStack team to be sure

Host-model:
I expected host-model to trigger the old feature to be added, but that was not the case:
- any new libvirt XML creation with <cpu mode='host-model'>
- virt-install with --cpu host-model-only
When using host-model we see:
- Xenial: host-model was not workign properly back then, not adding features
- Bionic/Cosmic/Disco: host-model is expanded, but since osxsave was never on a CPUID report by the kernel it isn't added in those cases.

Virt-manager:
- by default did not add cpu/features
- using "copy host cpu" is equal to host-model

virt-install has many old options, some of them are affected.
--cpu=host-model-only - no problem
--cpu=host-passthrough - no problem
--cpu=hv-default - no problem
--cpu=clear - no problem
--cpu=host - no problem
But these will trigger it:
--cpu=host-copy
 => This still ADDs osxsave even on qemu 3.1, but only in some environments
 when run on real HW
    Does not add it on all systems.
--cpu=host-model
 => those two no more add osxsave, but if the guest was started on an old system it did add it - those guests no more start

The two affected options where deprecated (no more in the man page, only there for command line compat) even back in Xenial. It was reported that pre Xenial --cpu=host might also have added osxsave but we are not sure how far back you'd need to go.

Tested in containers, servers, laptops - I think the summary is somewhat complete.

Easiest step to reproduce even with modern components are:
#1 if your combination is qemu >3.0 virt-install <2.0
$ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/test.qcow 20M
$ sudo virt-install --name test-osxsave --memory 512 --disk /var/lib/libvirt/images/test.qcow --import --check all=off --cpu=host-copy
=> can't apply global Haswell-noTSX-IBRS-x86_64-cpu.osxsave=on: Property '.osxsave' not found

#2 if your virt-install already newer it won#t do the mistake, but any older guest upgrades badly
# start on qemu <3.0 virt-install <2.0
$ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/test.qcow 20M
$ sudo virt-install --name test-osxsave --memory 512 --disk /var/lib/libvirt/images/test.qcow --import --check all=off --cpu=host-copy
# guest has osxsave feature, upgrade to qemu 3.1, guest will no more start

description: updated
description: updated

The fix in newer virt-install might be related to https://github.com/virt-manager/virt-manager/commit/469fed08

description: updated
Changed in libvirt (Ubuntu):
status: New → Triaged
Changed in qemu (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in libvirt (Ubuntu):
importance: Undecided → Medium

I have created a preliminary patch and a PPA for Disco with it.
The plan is to do some testing on that and then suggest the change upstream.
History has shown that such fixes can differ greatly depending on the "preferred style" how to implement it thou - so there might be a few iterations before being accepted.

PPA: https://launchpad.net/~paelzer/+archive/ubuntu/bug-1825195-libvirt-osxsave/+packages

Some details from testing that make it even less severe ...

Note: since this was never in any Host defined any required setting would have triggered a pre-check like:
  error: the CPU is incompatible with host CPU: Host CPU does not provide required features: ospke

But if set to optional:
        <cpu match="exact">
                <model>core2duo</model>
                <feature policy='optional' name='osxsave'/>
                <feature policy='optional' name='ospke'/>
        </cpu>

Then you'd hit:
error: Failed to start domain disco-osxsave
error: internal error: process exited while connecting to monitor: 2019-04-25T12:12:01.698646Z qemu-system-x86_64: can't apply global core2duo-x86_64-cpu.osxsave=off: Property '.osxsave' not found

Commandline:
 -cpu core2duo,osxsave=off,ospke=off

With the fix in place the above becomes:
  -cpu core2duo
and works again.

description: updated
description: updated
description: updated
description: updated
Changed in qemu (Ubuntu Eoan):
status: Triaged → Won't Fix
Changed in qemu (Ubuntu Disco):
status: New → Won't Fix
Changed in libvirt (Ubuntu Disco):
status: New → Triaged
Changed in libvirt (Ubuntu Eoan):
status: Triaged → In Progress
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 5.0.0-1ubuntu4

---------------
libvirt (5.0.0-1ubuntu4) eoan; urgency=medium

  * d/p/ubuntu/lp-1825195-*.patch: fix issues with old guests that defined
    the never functional osxsave and ospke features (LP: #1825195).
  * d/p/series: reorder ubuntu Delta
  * d/p/ubuntu-aa/lp-1815910-allow-vhost-net.patch: avoid apparmor issues
    with vhost-net/vhost-vsock/vhost-scsi hotplug (LP: #1815910)
  * d/p/ubuntu-aa/lp-1829223-virt-aa-helper-allow-vhost-scsi.patch fix
    vhost-scsi hotplug in virt-aa-helper (LP: #1829223)

libvirt (5.0.0-1ubuntu3) eoan; urgency=medium

  * SECURITY UPDATE: Add support for md-clear functionality
    - debian/patches/ubuntu/md-clear.patch: Define md-clear CPUID bit in
      src/cpu_map/x86_features.xml.
    - CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091

 -- Christian Ehrhardt <email address hidden> Thu, 16 May 2019 10:42:09 +0200

Changed in libvirt (Ubuntu Eoan):
status: In Progress → Fix Released
Changed in qemu (Ubuntu):
status: Triaged → Won't Fix

Now that it is complete in Eoan and upstream accepted I uploaded it to Disco-unapproved as well.

Hello Christian, or anyone else affected,

Accepted libvirt into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/5.0.0-1ubuntu2.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in libvirt (Ubuntu Disco):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-disco
Download full text (3.7 KiB)

Thanks, setting up the case on Disco (without proposed) as outlined and hitting it:

Using:
    <cpu mode='custom' match='exact' check='partial'>
      <model fallback='allow'>core2duo</model>
      <feature policy='disable' name='osxsave'/>
      <feature policy='disable' name='ospke'/>
    </cpu>

root@d:~# virsh start test-osx
error: Failed to start domain test-osx
error: internal error: process exited while connecting to monitor: 2019-05-22T06:47:37.379603Z qemu-system-x86_64: can't apply global core2duo-x86_64-cpu.osxsave=off: Property '.osxsave' not found

or if I drop osxsave

error: internal error: process exited while connecting to monitor: 2019-05-22T06:47:56.169672Z qemu-system-x86_64: can't apply global core2duo-x86_64-cpu.ospke=off: Property '.ospke' not found

Now upgrading to proposed.

root@d:~# apt install $(apt list --upgradable 2>/dev/null | awk --field-separator='/' '/^libvirt/ {print $1}' | xargs)
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
  libvirt-daemon-driver-storage-gluster libvirt-daemon-driver-storage-zfs numad auditd nfs-common pm-utils radvd systemtap zfsutils
The following packages will be upgraded:
  libvirt-clients libvirt-daemon libvirt-daemon-driver-storage-rbd libvirt-daemon-system libvirt0
5 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
Need to get 4014 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 libvirt-daemon-driver-storage-rbd amd64 5.0.0-1ubuntu2.2 [68.8 kB]
Get:2 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 libvirt-daemon-system amd64 5.0.0-1ubuntu2.2 [76.3 kB]
Get:3 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 libvirt-clients amd64 5.0.0-1ubuntu2.2 [665 kB]
Get:4 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 libvirt0 amd64 5.0.0-1ubuntu2.2 [1460 kB]
Get:5 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 libvirt-daemon amd64 5.0.0-1ubuntu2.2 [1744 kB]
Fetched 4014 kB in 1s (2801 kB/s)
Preconfiguring packages ...
(Reading database ... 46130 files and directories currently installed.)
Preparing to unpack .../libvirt-daemon-driver-storage-rbd_5.0.0-1ubuntu2.2_amd64.deb ...
Unpacking libvirt-daemon-driver-storage-rbd (5.0.0-1ubuntu2.2) over (5.0.0-1ubuntu2.1) ...
Preparing to unpack .../libvirt-daemon-system_5.0.0-1ubuntu2.2_amd64.deb ...
Unpacking libvirt-daemon-system (5.0.0-1ubuntu2.2) over (5.0.0-1ubuntu2.1) ...
Preparing to unpack .../libvirt-clients_5.0.0-1ubuntu2.2_amd64.deb ...
Unpacking libvirt-clients (5.0.0-1ubuntu2.2) over (5.0.0-1ubuntu2.1) ...
Preparing to unpack .../libvirt0_5.0.0-1ubuntu2.2_amd64.deb ...
Unpacking libvirt0:amd64 (5.0.0-1ubuntu2.2) over (5.0.0-1ubuntu2.1) ...
Preparing to unpack .../libvirt-daemon_5.0.0-1ubuntu2.2_amd64.deb ...
Unpacking libvirt-daemon (5.0.0-1ubuntu2.2) over (5.0.0-1ubuntu2.1) ...
Setting up libvirt0:amd64 (5.0.0-1ubuntu2.2) ...
Setting up libvirt-daemon (5.0.0-1ubuntu2.2) ...
Setting up libvirt-daemon-driver-storage-rbd (5.0.0-1ubuntu2.2) ...
Setting up libvirt-clients (5.0.0-1ubuntu2.2) ...
Setting up libvir...

Read more...

tags: added: verification-done verification-done-disco
removed: verification-needed verification-needed-disco
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 5.0.0-1ubuntu2.2

---------------
libvirt (5.0.0-1ubuntu2.2) disco; urgency=medium

  * d/p/ubuntu/lp-1825195-*.patch: fix issues with old guests that defined
    the never functional osxsave and ospke features (LP: #1825195).

 -- Christian Ehrhardt <email address hidden> Thu, 16 May 2019 10:42:09 +0200

Changed in libvirt (Ubuntu Disco):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released

The verification of the Stable Release Update for libvirt has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.