[Ubuntu 20.04] Stale libvirt cache leads to VM startup failures

Bug #1874647 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Medium
Skipper Bug Screeners
libvirt (Ubuntu)
Fix Released
Medium
Unassigned
Focal
Fix Released
Undecided
Christian Ehrhardt 
Groovy
Fix Released
Medium
Unassigned

Bug Description

[Impact]

 * capability caching by libvirt is required for efficiency, but
   often stumbles over changes it misses to pick up and refresh

 * This backports several fixes to catch more of such situations
   and refresh the caches in those cases
   - AMD SEV changed
   - s390x protvirt changed
   - CPU changed

 * Backporting these changes

[Test Case]

 * For AMS SEV and s390x protvirt you'd need the respective HW and
   environments. Maybe IBM can test the latter then.
   - For nested we can test it thou
    1. create a guest with host-model type
    2. install libvirt in the guest
    3. run "virsh capabilities" and save it to a file
    4. shut down guest
    5. edit the guest and take away some cpu features
    6. start guest again and run "virsh capabilities" again
       It will still report these features as present (wrong)

   With the fix at #6 it will realize the CPU has changed and refresh the
   capabilities cache.

[Regression Potential]

 * This increases the amount of capability refreshes, the regression that
   comes to mind is that if this contains false-positives it might trigger
   too often and therefore slow down operations on systems where this
   happens.
   Functionally that would be no breakage, even not caching at all works
   fine, but a performance issue. The added tests seem fine thou as a cpu
   attribute has to change which isn't a high frequency event.

[Other Info]

 * n/a

---

Stale libvirt cache leads to VM startup failures

Contact Information = Viktor Mihajlovski <email address hidden>

---Additional Hardware Info---
Z15 with IBM Secute Execution

---uname output---
Linux linux02 5.4.0-21-generic #25-Ubuntu SMP Sat Mar 28 13:10:00 UTC 2020 s390x s390x s390x GNU/Linux

Machine Type = 8562 (IBM Z15)

---Debugger---
A debugger is not configured

---Steps to Reproduce---
1. Install Ubuntu 20.04 in the LPAR
2. Modify the host kernel command line in /etc/zipl.conf to include prot_virt=1, run zipl and reboot.
3. Define at least one KVM guest with host CPU model and start and stop it
4. Define a secure KVM guest using the host CPU model and start and stop it.
5. Change back the host kernel command line, re-run zipl, reboot.
6. Try to start the first KVM guest, which fails with a message like:
error: internal error: qemu unexpectedly closed the monitor: 2020-04-23T13:55:30.889152Z qemu-system-s390x: Some features requested in the CPU model are not available in the configuration: unpack

The reason for that is that libvirt caches the domaincapabilities reported during the first boot and doesn't update them after the reboot in step 5 even though changing the prot_virt= in the command line changes the CPU features as reported by domcapabilities. So even though the guest may not require the unpack feature, libvirt constructs a CPU model which can't be satisfied on this configuration.

The issue also occurs the other way around, going from prot_virt=0 to prot_virt=1, in which case the guest will fail to boot as it requires the unpack feature.

Manually removing the content of /var/cache/libvirt/qemu/capabilities/ will force libvirt to refresh it's capabilities cache and temporarily resolve the situation.

Related branches

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-185546 severity-medium targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Frank Heimes (fheimes) wrote :

Was this problem already reported upstream and a fix made available?
Btw. the latest (and wit that GA) kernel of 20.04 is: 5.4.0-26

Changed in ubuntu-z-systems:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-04-24 04:49 EDT-------
I just realized that AMD SEV faces the equivalent issue, as can be seen in https://github.com/libvirt/libvirt/blob/master/docs/kbase/launch_security_sev.rst. The only difference is that for Secure Execution it's the kernel command line setting, wheras SEV uses a module parameter.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The case with the module parameter in SEV needs a fix similar to
  https://libvirt.org/git/?p=libvirt.git;a=commit;h=b183a75319b90d0af5512be513743e1eab950612
as it can be re-loaded at any time.

Essentially the checks in virQEMUCapsLoadCache check qemu/kernel version and many other things.
They also check microcode version ...

I think the decision there can be two ways, either "on any reboot we need to refresh caps" which makes the code small but probably also wastes quite some refreshes that were not needed.
Or it could along all the versions that it checks also keep a full string of /proc/cmdline and compare it. If changed it needs to be refreshed.

I agree to Frank that this should be reported upstream.
No matter if you or I write a fix, it needs to go upstream then - so a bug tracker there can't hurt.
OTOH you can as well start right away with a RFC patch there if you have something prepared already - there is no "strict need" for an upstream bug, only if you hope someone there will fix it for you.

Since workarounds are available the prio is medium IMHO, eager to see what upstream thinks.

P.S. Even if you have no patch, it would be great if you (as the affected) would kick off the discussion upstream. Please provide a link here to the mailing list entry.

affects: linux (Ubuntu) → libvirt (Ubuntu)
Changed in libvirt (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-04-24 06:03 EDT-------
(In reply to comment #8)
> The case with the module parameter in SEV needs a fix similar to
> https://libvirt.org/git/?p=libvirt.git;a=commit;
> h=b183a75319b90d0af5512be513743e1eab950612
> as it can be re-loaded at any time.
>
> Essentially the checks in virQEMUCapsLoadCache check qemu/kernel version and
> many other things.
> They also check microcode version ...
>
> I think the decision there can be two ways, either "on any reboot we need to
> refresh caps" which makes the code small but probably also wastes quite some
> refreshes that were not needed.
> Or it could along all the versions that it checks also keep a full string of
> /proc/cmdline and compare it. If changed it needs to be refreshed.
>
> I agree to Frank that this should be reported upstream.
> No matter if you or I write a fix, it needs to go upstream then - so a bug
> tracker there can't hurt.
> OTOH you can as well start right away with a RFC patch there if you have
> something prepared already - there is no "strict need" for an upstream bug,
> only if you hope someone there will fix it for you.
>
> Since workarounds are available the prio is medium IMHO, eager to see what
> upstream thinks.
>
> P.S. Even if you have no patch, it would be great if you (as the affected)
> would kick off the discussion upstream. Please provide a link here to the
> mailing list entry.

I agree with you. Please regard this as a tracking bug to pick up the upstream commits.

We (IBM) will prepare a write up for Secure Execution comparable to what exists for AMD SEV and also provide a fix for the cap caching invalidity problem. If that is going to look similar for what was done for nesting needs to be seen but it is a good starting point. Thanks for providing it.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-04-24 06:08 EDT-------
(In reply to comment #9)
> (In reply to comment #8)
> > The case with the module parameter in SEV needs a fix similar to
> > https://libvirt.org/git/?p=libvirt.git;a=commit;
> > h=b183a75319b90d0af5512be513743e1eab950612
> > as it can be re-loaded at any time.
> >
> > Essentially the checks in virQEMUCapsLoadCache check qemu/kernel version and
> > many other things.
> > They also check microcode version ...
> >
> > I think the decision there can be two ways, either "on any reboot we need to
> > refresh caps" which makes the code small but probably also wastes quite some
> > refreshes that were not needed.
> > Or it could along all the versions that it checks also keep a full string of
> > /proc/cmdline and compare it. If changed it needs to be refreshed.
> >
> > I agree to Frank that this should be reported upstream.
> > No matter if you or I write a fix, it needs to go upstream then - so a bug
> > tracker there can't hurt.
> > OTOH you can as well start right away with a RFC patch there if you have
> > something prepared already - there is no "strict need" for an upstream bug,
> > only if you hope someone there will fix it for you.
> >
> > Since workarounds are available the prio is medium IMHO, eager to see what
> > upstream thinks.
> >
> > P.S. Even if you have no patch, it would be great if you (as the affected)
> > would kick off the discussion upstream. Please provide a link here to the
> > mailing list entry.
>
> I agree with you. Please regard this as a tracking bug to pick up the
> upstream commits.
>
> We (IBM) will prepare a write up for Secure Execution comparable to what
> exists for AMD SEV and also provide a fix for the cap caching invalidity
> problem. If that is going to look similar for what was done for nesting
> needs to be seen but it is a good starting point. Thanks for providing it.

I think the proper solution is to rebuild the capabilities whenever the date of /dev/kvm changed. There are module parameters (hpage and nested) that will change the capabilities. Caching beyond the lifetime of /dev/kvm is just wrong.
When the startup takes slightly longer after reboot - so be it.

Agreed to discuss this upstream. Please cc me

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-05-12 02:37 EDT-------
I sent a patch series to the libvirt mailing list yesterday addressing the issues of this bugzilla.
https://www.redhat.com/archives/libvir-list/2020-May/msg00416.html

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-05-26 07:14 EDT-------
*** Bug 185971 has been marked as a duplicate of this bug. ***

Revision history for this message
Frank Heimes (fheimes) wrote :

(just for the records: "Bug 185971" in comment #7 is an IBM Bugzilla number, not a LP bug number, hence pointing with the automatic link generation to a wrong LP bug)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note: the referred path set didn't land yet.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-06-17 12:27 EDT-------
(In reply to comment #14)
> Note: the referred path set didn't land yet.

Please not that the patch series has been accepted upstream.
It will be released as part of libvirt v6.5.0.

Here is the list of the commit ids:
c5fffb959d util: Introduce a parser for kernel cmdline arguments
b611b620ce qemu: Check if s390 secure guest support is enabled
657365e74f qemu: Check if AMD secure guest support is enabled
0254ceab82 tools: Secure guest check on s390 in virt-host-validate
4b561d49ad tools: Secure guest check for AMD in virt-host-validate
2c3ffa3728 docs: Update AMD launch secure description
f0d0cd6179 docs: Describe protected virtualization guest setup

The issue described in this bugzilla (stale capability cache) should be resolved already by the first two patches of the series. Patch 4 extends virt-host-validate with checks of the hosts readiness to support IBM Secure Execution and patch 6 provides documentation how to setup and use IBM Secure Execution with libvirt on linux on Z.

Please also not that there is another series that might come in handy for some other stale capability cache scenarios. This might e.g. occur if the KVM host system itself is being moved/migrated/upgraded.
Here is the list of commit ids:
8cb9d2495c util: Define g_autoptr callback for FILE
a551dd5fdf hostcpu: Introduce virHostCPUGetSignature
44f826e4a0 hostcpu: Implement virHostCPUGetSignature for x86
2a68ceaa6e hostcpu: Implement virHostCPUGetSignature for ppc64
d3d87e0cef hostcpu: Implement virHostCPUGetSignature for s390
004804a7d7 qemu: Invalidate capabilities when host CPU changes

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

awesome, thanks Boris!

I think I'm looking at making sure to catch these when merging libvirt for 20.10 and then we might take just the two commits for the initial bug into Focal.

tags: added: libvirt-20.10
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Christian Ehrhardt  (paelzer)
status: Incomplete → Triaged
Changed in libvirt (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Christian Ehrhardt  (paelzer)
Changed in ubuntu-z-systems:
assignee: Christian Ehrhardt  (paelzer) → Skipper Bug Screeners (skipper-screen-team)
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-08-26 07:24 EDT-------
@Canonical. Any update available?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Libvirt 6.6 contains the fix, but is stuck in groovy proposed for a while already.
After a fix (that you also want for virtiofsd) and another issue around libtripc being resolved there will be a follow up upload that should resolve those two today and hopefully migrate to groovy-release soon after.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Changed in libvirt (Ubuntu Groovy):
status: Triaged → In Progress
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-08-26 09:01 EDT-------
Thx for the update.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.5 KiB)

This bug was fixed in the package libvirt - 6.6.0-1ubuntu2

---------------
libvirt (6.6.0-1ubuntu2) groovy; urgency=medium

  * d/p/u/lp-1892826-Revert-m4-virt-xdr-rewrite-XDR-check.patch: avoid clashes
    between libtripc and glibc that break libvirt-lxc (LP: #1892826)
  * d/p/ubuntu-aa/lp-1892736-apparmor-allow-libvirtd-to-call-virtiofsd.patch:
    allow libvirt to control virtiofsd (LP: #1892736)

libvirt (6.6.0-1ubuntu1) groovy; urgency=medium

  * Merge with Debian 6.6.0-1 from experimental
    Among many other new features and fixes this includes fixes for:
    (LP: #1874647) - Stale libvirt cache leads to VM startup failures
    (LP: #1869796) - bad ordering and dependent restarts of services/sockets
    Remaining changes:
    - d/p/ubuntu-aa/lp-1847361-load-versioned-module.patch: allow loading
      versioned modules after qemu package upgrades (LP 1847361)
    - libvirt-uri.sh: Automatically switch default libvirt URI for users
      via user profile (xen URI on dom0, qemu:///system otherwise)
    - Disable libssh2 support (universe dependency)
    - Disable firewalld support (universe dependency)
    - Set qemu-group to kvm (for compat with older ubuntu)
    - Additional apport package-hook
    - Autostart default bridged network (As upstream does, but not Debian).
      In addition to just enabling it our solution provides:
      + do not autostart if subnet is already taken (e.g. in guests).
      + iterate some alternative subnets before giving up
    - d/p/ubuntu/Allow-libvirt-group-to-access-the-socket.patch: This is
      the group based access to libvirt functions as it was used in Ubuntu
      for quite long.
      + d/p/ubuntu/daemon-augeas-fix-expected.patch fix some related tests
        due to the group access change.
      + d/libvirt-daemon-system.postinst: add users in sudo to the libvirt
        group.
    - ubuntu/parallel-shutdown.patch: set parallel shutdown by default.
    - Update README.Debian with Ubuntu changes
    - d/p/ubuntu/ubuntu_machine_type.patch: accept ubuntu types as pci440fx
    - fix autopkgtests
      + d/t/control, d/t/smoke-qemu-session: fixup smoke-qemu-session by making
        vmlinuz available and accessible (Debian bug 848314)
      + d/t/control: fix smoke-qemu-session by ensuring the service will run
        installing libvirt-daemon-system
      + d/t/smoke-lxc: fix smoke-lxc by ignoring potential issues on destroy as
        long as the following undefine succeeds
      + d/t/smoke-lxc: use systemd instead of sysV to restart the service
    - dnsmasq related enhancements
      + run dnsmasq as libvirt-dnsmasq (LP: 1743718)
      + d/libvirt-daemon-system.postinst: add libvirt-dnsmasq user and group
      + d/libvirt-daemon-system.postrm: remove libvirt-dnsmasq user and group
        on purge
      + d/p/ubuntu/dnsmasq-as-priv-user: write dnsmasq config with user
        libvirt-dnsmasq and adapt the self tests to expect that config
      + d/libvirt-daemon-system.postinst: fix old libvirt-dnsmasq users group
      + Add dnsmasq configuration to work with system wide dnsmasq-base
    - debian/rules: disable the netcf backend. (LP: 1764314)
    - debian/patches/ubuntu/ovmf_paths.patch...

Changed in libvirt (Ubuntu Groovy):
status: In Progress → Fix Released
description: updated
Changed in libvirt (Ubuntu Groovy):
assignee: Christian Ehrhardt  (paelzer) → nobody
Changed in libvirt (Ubuntu Focal):
assignee: nobody → Christian Ehrhardt  (paelzer)
status: New → In Progress
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Test the test to be helpful:

Run guest a defined model like:
 23 <cpu mode='custom' match='exact' check='partial'>
 24 <model fallback='forbid'>SandyBridge-IBRS</model>
 25 <vendor>Intel</vendor>
 26 </cpu>

In the guest check the cache file update time:
$ sudo ls -laF /var/cache/libvirt/qemu/capabilities/
-rw------- 1 root root 75171 Sep 2 08:58 926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml
-rw------- 1 root root 85376 Sep 2 08:58 f11008721aacc79c97e592178e61264d75be551864cd79cc41fe820e31262f27.xml
$ virsh domcapabilities > dpre
$ virsh capabilities > cpre
$ head -n5 /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel Xeon E312xx (Sandy Bridge, IBRS update)

Then shutdown and change that to IvyBridge-IBRS (similar but not the same)
When again started it should refresh the cache, as the CPU changed.
But without the fix it hasn't.

While cpuinfo changed, the cache and results didn't.

$ head -n5 /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel Xeon E3-12xx v2 (Ivy Bridge, IBRS)
$ virsh domcapabilities > dpost
$ virsh capabilities > cpost
$ md5sum cp* dp*
f0df917463798992d5842297fa039f86 cpost
4f214575cab864502576a0da1bceb031 cpre
94d6fe53ee5f9f6270e001788a225328 dpost
94d6fe53ee5f9f6270e001788a225328 dpre

With the fix:

-rw------- 1 root root 75171 Sep 2 09:04 926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml
-rw------- 1 root root 85376 Sep 2 09:04 f11008721aacc79c97e592178e61264d75be551864cd79cc41fe820e31262f27.xml

And after reboot into the other type we see the host section updated.
$ md5sum cp* dp*
4f214575cab864502576a0da1bceb031 cpost
f0df917463798992d5842297fa039f86 cpre
94d6fe53ee5f9f6270e001788a225328 dpost
94d6fe53ee5f9f6270e001788a225328 dpre
-rw------- 1 root root 75171 Sep 2 09:04 926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml
-rw------- 1 root root 85376 Sep 2 09:04 f11008721aacc79c97e592178e61264d75be551864cd79cc41fe820e31262f27.xml

$ diff -Naur cpre cpost
--- cpre 2020-09-02 09:04:41.058583142 +0000
+++ cpost 2020-09-02 09:05:50.652000000 +0000
@@ -4,10 +4,11 @@
     <uuid>058268cb-6229-46c7-84f8-5cd8edac4c5d</uuid>
     <cpu>
       <arch>x86_64</arch>
- <model>IvyBridge-IBRS</model>
+ <model>SandyBridge-IBRS</model>
       <vendor>Intel</vendor>
       <microcode version='1'/>
       <topology sockets='1' cores='1' threads='1'/>
+ <feature name='vme'/>
       <feature name='osxsave'/>
       <feature name='hypervisor'/>
       <feature name='arat'/>

There are more complex feature checks that might go deeper e.g. when actually starting a guest and deriving host-model. But this should be ok as a test of the area I hope.

P.S. I've yet to see the "Outdated capabilities for '%s': host CPU changed" message in a debug log, maybe the tests needs to be further adapted for the specific case.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

MP is also reviewed, this is uploaded for an SRU.
Once the SRU Team got to it it would be great to have IBM testing -proposed for this case.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Incomplete → In Progress
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - Re-uploaded since the first shot (while testing fine) had formally an incorrect format for referencing a Launchpad bug.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-09-15 09:55 EDT-------
Upgraded on an SE capable LPAR an Ubuntu Focal to Groovy.
Successfully checked installed libvirt packages if they are at least match with the above description.

# apt list --installed | grep libvirt

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

gir1.2-libvirt-glib-1.0/groovy,now 3.0.0-1 s390x [installed,automatic]
libvirt-clients/groovy,now 6.6.0-1ubuntu2 s390x [installed]
libvirt-daemon-driver-qemu/groovy,now 6.6.0-1ubuntu2 s390x [installed,automatic]
libvirt-daemon-system-systemd/groovy,now 6.6.0-1ubuntu2 s390x [installed,automatic]
libvirt-daemon-system/groovy,now 6.6.0-1ubuntu2 s390x [installed,automatic]
libvirt-daemon/groovy,now 6.6.0-1ubuntu2 s390x [installed]
libvirt-dev/groovy,now 6.6.0-1ubuntu2 s390x [installed]
libvirt-glib-1.0-0/groovy,now 3.0.0-1 s390x [installed,automatic]
libvirt0/groovy,now 6.6.0-1ubuntu2 s390x [installed,automatic]
python3-libvirt/groovy,now 6.1.0-1 s390x [installed,automatic]

Protected virtualization was enabled on the system before and after upgrade.
Libvirts cached capabilities file was up to date, virt-host-validate returned "pass" for "QEMU: Checking for secure guest support" and SE guest was able to start successfully.

Removed "prot-virt=1" kernel parameter and rebooted host.
Libvirts cached capabilities file had been updated, virt-host-validate returned "WARN (IBM Secure Execution appears to be disabled in kernel. Add prot_virt=1 to kernel cmdline arguments)" for "QEMU: Checking for secure guest support" and as expected SE guest was NOT able to start.

Added "prot_virt=1" kernel parameter and rebooted host.
Libvirts cached capabilities file has been update, virt-host-validate returned "pass" for "QEMU: Checking for secure guest support" and SE guest was able to start successfully.

Played around with kernel parameter values (prot_virt or prot-virt setting it to 0 or 1) a couple of times.
All seems to work as expected.

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted libvirt into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/6.0.0-0ubuntu8.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in libvirt (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (libvirt/6.0.0-0ubuntu8.4)

All autopkgtests for the newly accepted libvirt (6.0.0-0ubuntu8.4) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

ruby-libvirt/unknown (armhf)
nova/unknown (armhf)
vagrant-libvirt/unknown (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#libvirt

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you Brian, IBM just did the s390x specific eval on the equivalent build from the PPA - no need to redo all of this.
I was running a regression test and the tests I was doing before in comment #17 again.
They behaved the same as with my PPA tests which is (combined with the IBM tests on protvirt) enough confirmation on this.

Setting verified tags on this, yet we need to still sort out the off autopkgtest issues that we hit...

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The three test issues on armhf all seemed to be from the test environment, but not the involved components. For now I just restarted them as-is ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Boris - while not strictly required if you have the test system still around testing also the version in -proposed as well would be awesome.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It seems vorlon already retried two of the three cases. One recovered, one did not.
Still an indication that it is at least flaky ... :-/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

All tests should be resolved by now reaching their former result state

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (7.4 KiB)

------- Comment From <email address hidden> 2020-09-16 08:38 EDT-------
Edited /etc/apt/sources.list by duplicating all lines containing groovy-update and replacing it with groovy-proposed.
Running "apt update" and "apt upgrade"
Rebooting host

Seems like libvirt remained unchanged and qemu and kernel updated
# apt list --installed | grep -e libvirt -e qemu -e linux-image

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

gir1.2-libvirt-glib-1.0/groovy,now 3.0.0-1 s390x [installed,automatic]
libvirt-clients/groovy,now 6.6.0-1ubuntu2 s390x [installed]
libvirt-daemon-driver-qemu/groovy,now 6.6.0-1ubuntu2 s390x [installed,automatic]
libvirt-daemon-system-systemd/groovy,now 6.6.0-1ubuntu2 s390x [installed,automatic]
libvirt-daemon-system/groovy,now 6.6.0-1ubuntu2 s390x [installed,automatic]
libvirt-daemon/groovy,now 6.6.0-1ubuntu2 s390x [installed]
libvirt-dev/groovy,now 6.6.0-1ubuntu2 s390x [installed]
libvirt-glib-1.0-0/groovy,now 3.0.0-1 s390x [installed,automatic]
libvirt0/groovy,now 6.6.0-1ubuntu2 s390x [installed,automatic]
linux-image-5.4.0-21-generic/now 5.4.0-21.25 s390x [installed,local]
linux-image-5.4.0-47-generic/now 5.4.0-47.51 s390x [installed,local]
linux-image-5.8.0-18-generic/groovy,now 5.8.0-18.19 s390x [installed,automatic]
linux-image-5.8.0-19-generic/groovy-proposed,groovy-proposed,now 5.8.0-19.20 s390x [installed,automatic]
linux-image-generic/groovy-proposed,groovy-proposed,now 5.8.0.19.23 s390x [installed,automatic]
linux-image-unsigned-5.4.0-9019-generic/now 5.4.0-9019.23 s390x [installed,local]
python3-libvirt/groovy,now 6.1.0-1 s390x [installed,automatic]
qemu-block-extra/groovy-proposed,groovy-proposed,now 1:5.0-5ubuntu8 s390x [installed,automatic]
qemu-kvm/groovy-proposed,groovy-proposed,now 1:5.0-5ubuntu8 s390x [installed]
qemu-system-common/groovy-proposed,groovy-proposed,now 1:5.0-5ubuntu8 s390x [installed,automatic]
qemu-system-data/groovy-proposed,groovy-proposed,now 1:5.0-5ubuntu8 all [installed,automatic]
qemu-system-s390x/groovy-proposed,groovy-proposed,now 1:5.0-5ubuntu8 s390x [installed]
qemu-utils/groovy-proposed,groovy-proposed,now 1:5.0-5ubuntu8 s390x [installed,automatic]
qemu/groovy-proposed,groovy-proposed,now 1:5.0-5ubuntu8 s390x [installed]

Protected virtualization was enabled on the system before and after upgrade with kernel parameter "prot_virt=1".
Libvirts cached capabilities file was updated due to new kernel booting and virt-host-validate returned "pass" for "QEMU: Checking for secure guest support".
Unexpectedly starting an SE guest with console (virsh start guest01 --console) resulted in a crash of the guest.

Here is what I caught in the log:
2020-09-16 10:48:14.805+0000: starting up libvirt version: 6.6.0, package: 1ubuntu2 (Christian Ehrhardt <email address hidden> Tue, 25 Aug 2020 14:53:26 +0200), qemu version: 5.0.0Debian 1:5.0-5ubuntu8, kernel: 5.8.0-19-generic, hostname: linux02.
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
HOME=/var/lib/libvirt/qemu/domain-1-focal \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-focal/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-focal/.cache \
XDG_CONFIG_...

Read more...

Revision history for this message
Chris Halse Rogers (raof) wrote :

This looks like it should be ready for release into focal-updates, but it's unclear to me whether anyone has tested the packages currently in focal-proposed?

I see that some PPA packages have been tested and confirmed to work, but for SRUs we generally require that the actual packages that will end up in -updates (ie: the packages in -proposed) be tested.

This is to avoid human error where the packages in a PPA are not exactly the same code as that in -proposed, and also to avoid the rare cases where some quirk of the build environment causes PPA builds to build correctly but the archive packages to fail.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: [Bug 1874647] Re: [Ubuntu 20.04] Stale libvirt cache leads to VM startup failures

On Wed, Sep 23, 2020 at 4:05 AM Chris Halse Rogers
<email address hidden> wrote:
>
> This looks like it should be ready for release into focal-updates, but
> it's unclear to me whether anyone has tested the packages currently in
> focal-proposed?

Hi Chris, I did test the x86 part of it.
AFAIK the s390x part was only tested on the PPA, but I'm unsure if we
can expect them to be tested twice without actively pinging/polling
for it.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-09-24 09:56 EDT-------
Installed focal.
Added proposed repo
$ cat <<EOF >/etc/apt/sources.list.d/ubuntu-$(lsb_release -cs)-proposed.list
# Enable Ubuntu proposed archive
deb http://ports.ubuntu.com/ubuntu-ports $(lsb_release -cs)-proposed restricted main multiverse universe
EOF

Ran
$ apt-get upgrade

Checked installed packages
$ apt list --installed | grep -e libvirt -e qemu -e linux-image
libvirt-clients/focal-proposed,now 6.0.0-0ubuntu8.4 s390x [installed]
libvirt-daemon-driver-qemu/focal-proposed,now 6.0.0-0ubuntu8.4 s390x [installed]
libvirt-daemon-driver-storage-rbd/focal-proposed,now 6.0.0-0ubuntu8.4 s390x [installed,automatic]
libvirt-daemon-system-systemd/focal-proposed,now 6.0.0-0ubuntu8.4 s390x [installed,automatic]
libvirt-daemon-system/focal-proposed,now 6.0.0-0ubuntu8.4 s390x [installed]
libvirt-daemon/focal-proposed,now 6.0.0-0ubuntu8.4 s390x [installed]
libvirt0/focal-proposed,now 6.0.0-0ubuntu8.4 s390x [installed,automatic]
linux-image-5.4.0-48-generic/focal-updates,focal-security,focal-proposed,now 5.4.0-48.52 s390x [installed,automatic]
linux-image-generic/focal-updates,focal-security,now 5.4.0.48.51 s390x [installed,upgradable to: 5.4.0.49.52]
qemu-block-extra/focal-updates,focal-security,now 1:4.2-3ubuntu6.6 s390x [installed,automatic]
qemu-kvm/focal-updates,focal-security,now 1:4.2-3ubuntu6.6 s390x [installed]
qemu-system-common/focal-updates,focal-security,now 1:4.2-3ubuntu6.6 s390x [installed,automatic]
qemu-system-data/focal-updates,focal-security,now 1:4.2-3ubuntu6.6 all [installed,automatic]
qemu-system-s390x/focal-updates,focal-security,now 1:4.2-3ubuntu6.6 s390x [installed,automatic]
qemu-utils/focal-updates,focal-security,now 1:4.2-3ubuntu6.6 s390x [installed,automatic]

Ran virt-host-validate and received the expected warning: "WARN (IBM Secure Execution appears to be disabled in kernel. Add prot_virt=1 to kernel cmdline arguments)"

Enabled protected virtualization by adding kernel parameter "prot_virt=1" and rebooted.
Libvirts cached capabilities file was updated and virt-host-validate returned "PASS" for "QEMU: Checking for secure guest support" and as expected an SE guest was able to start.

Disabled protected virtualization by changing kernel parameter "prot_virt=1" to "prot-virt=0" and rebooted.
Libvirts cached capabilities file was updated and virt-host-validate returned "WARN..." for "QEMU: Checking for secure guest support" and as expected an SE guest was NOT able to start.

Reenabled protected virtualization by changing kernel parameter "prot-virt=0" to "prot-virt=1" and rebooted.
Libvirts cached capabilities file was updated and virt-host-validate returned "PASS" for "QEMU: Checking for secure guest support" and as expected an SE guest was able to start.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 6.0.0-0ubuntu8.4

---------------
libvirt (6.0.0-0ubuntu8.4) focal; urgency=medium

  * avoid stale libvirt capability cache (LP: #1874647)
    - when host cpu changes (e.g. nested with different configuration)
    - when s390x protvirt or AMD SEV changes
    - d/p/ubuntu/lp-1874647-*

 -- Christian Ehrhardt <email address hidden> Mon, 31 Aug 2020 08:41:25 +0200

Changed in libvirt (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-09-30 06:46 EDT-------
IBM Bugzilla status-> closed, Fix Released with all requested distros

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.