Upgrade of qemu binaries causes running instances not able to dynamically load modules
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Stein |
Fix Released
|
Medium
|
Unassigned | ||
libvirt (Ubuntu) |
Fix Released
|
Low
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Christian Ehrhardt | ||
Eoan |
Fix Released
|
Undecided
|
Christian Ehrhardt | ||
qemu (Ubuntu) |
Fix Released
|
Wishlist
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Christian Ehrhardt | ||
Eoan |
Fix Released
|
Undecided
|
Christian Ehrhardt |
Bug Description
[Impact]
* An infrequent but annoying issue is QEMUs problem to not be able to
hot-add capabilities IF since starting the instance qemu has been
upgraded. This is due to qemu modules only working with exactly the
same build.
* We brought changes upstream that allow the packaging to keep the old
files around and make qemu look after them as a fallback.
[Test Case]
I:
* $ apt install uvtool-libvirt
$ uvt-simplestrea
$ uvt-kvm create --password ubuntu lateload arch=amd64 release=bionic label=daily
cat > curldisk.xml << EOF
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol="http" name="ubuntu/
<host name="archive.
</source>
<target dev='vdc' bus='virtio'/>
<readonly/>
</disk>
EOF
# Here up or downgrade the installed packages, even a minor
# version or a rebuild of the same version
# Instead if you prefer (easier) you can run
$ apt install --reinstall qemu-*
Next check if they appeared (action of the maintainer scripts)
in the /var/run/
# And then rm/mv the original .so files of qemu-block-extra
# Trying to load a .so now would after an upgrade fail as the old qemu can't load the build id
$ virsh attach-device lateload curldisk.xml
Reported issue happens on attach:
root@b:~# virsh attach-device lateload cdrom-curl.xml
error: Failed to attach device from cdrom-curl.xml
error: internal error: unable to execute QEMU command 'device_add': Property 'virtio-
In the log we can see:
Failed to initialize module: /usr/lib/
One can also check files mapped into a process and we should see the /var/run/.. path being used now.
II:
* As it had issues in the first iteration of the fix worth a
try is also the use of an environment var for an extra path:
$ QEMU_MODULE_
[Regression Potential]
I:
* libvirt just allows a few more paths to be read from in the apparmor
isolation that is usually safe unless these paths are considered
sensitive. But /var/run/qemu is new, /var/run in general not meant
for permanent or secure data and as always if people want to ramp up
isolation they can always add deny rules to the local overrides.
II:
* the qemu change has two components.
In qemu code it looks for another path if the former ones failed.
I see no issues there yet, but can imagine that odd versions might
make it access odd paths which would then be denied by apparmor or
just don't exist. But that is no different than the former built-in
paths it tries, so nothing bad should happen.
The code change to the maintainer scripts has to backup the files.
If that goes wrong upgrades could be broken, but so far no tests have
shown issues.
[Other Info]
* To really use the functionality users will need the new qemu AND the
new libvirt that are uploaded for this bug.
But it felt wrong to add versioned dependencies from qemu->libvirt
(that is the semantically correct direction) also conflicts/breaks
might cause issues in many places that want to control these. OTOH
while the fix is great for some installations the majority of users
won't care and therefore be happy if extra dependencies are not
causing any oddity on apt upgrade. Therefore no versioned
dependencies were added intentionally.
---
[Feature Freeze Exception]
Hi,
this is IMHO a just a bugfix. But since it involves some bigger changes I wanted to be on the safe side and get an ack by the release Team.
Problem:
- on upgrade qemu processes are left running as they
represent a guest VM
- later trying to add features e.g. ceph disk hot-add will
need to load .so files e.g. from qemu-block-extra package
- those modules can on ly be loaded from the same build, but those are
gone after upgrade
Solution:
- If qemu fails to load from its usual paths it will
now also look in /var/run/<version/
- package upgrade code will place the .so's there
- things will be cleaned on reboot which is much simpler
and error-proof than trying to detect which versions
binaries are running
- libvirt has a change to allow just reading and
mapping from that path (apparmor)
@Release team it would be great if you would agree to this being safe for an FFe.
--- initial report ---
Upgrading qemu binaries causes the on-disk versions to change, but the in-memory running instances still attempt to dynamically load a library which matches its same version. This can cause running instances to fail actions like hotplugging devices. This can be alleviated by migrating the instance to a new host or restarting the instance, however in cloud type environments there may be instances that cannot be migrated (sriov, etc) or the cloud operator does not have permission to reboot.
This may be resolvable for many situations by changing the packaging to keep older versions of qemu libraries around on disk (similar to how the kernel package keeps older kernel versions around).
--- solution options (WIP) ---
For a packaging solution we would need:
- qemu-block-extra / qemu-system-gui binary packages would need
sort of a -$buildid in the name. That could be the version
string (sanitized for package name)
- /usr/lib/
- loading of modules from qemu would need to consider $buildid
when creating module names.
util/module.c in module_load_one / module_load_file
It already searches in multiple dirs, maybe it could insert
the $buildid there
- We'd need a way of detecting running versions of qemu binaries
and only make them uninstallable once the binaries are all
gone. I have not seen something like that in apt yet (kernel
is easy in comparison as only one can be loaded at a time).
ALTERNATIVES:
- disable loadable module support
- add an option to load all modules in advance (unlikely to be
liked upstream) and not desirable for many setups using qemu
(especially not as default)
- add an option to load a module (e.g via QMP/HMP) which would
allow an admin
to decide doing so for the few setups that benefit.
- that could down the road then even get a libvirt interface
for easier consumption
Heads up - None of the above would be SRUable
--- mitigation options ---
- live migrate for upgrades
- prohibited by SR-IOV usage
- Tech to get SR-IOV migratable is coming (e.g. via net_failover,
bonding in DPDK, ...)
- load the modules you need in advance
- Note: lacking an explicit "load module" command makes this
slightly odd for now
- but using iscsi or ceph is never spontaneous, a deployment
has or hasn't the setup to use those
- Create a single small read-only node and attach this to each guest,
that will load the driver and render you immune to the issue. While
more clunky, this isn't so much different than how it would be
with an explicit "load module" command.
Actually the target doesn't have to exist it can fail to attach
and still achieves what is needed comment #17 has an example.
Related branches
- Rafael David Tinoco (community): Approve
- Canonical Server: Pending requested
-
Diff: 411 lines (+333/-0)8 files modifieddebian/changelog (+12/-0)
debian/patches/series (+1/-0)
debian/patches/ubuntu/lp-1847361-modules-load-upgrade.patch (+131/-0)
debian/qemu-block-extra.postrm.in (+43/-0)
debian/qemu-block-extra.prerm.in (+45/-0)
debian/qemu-system-gui.postrm.in (+44/-0)
debian/qemu-system-gui.prerm.in (+46/-0)
debian/rules (+11/-0)
- Rafael David Tinoco (community): Approve
- Canonical Server: Pending requested
-
Diff: 74 lines (+52/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/series (+1/-0)
debian/patches/ubuntu-aa/lp-1847361-load-versioned-module.patch (+44/-0)
- Rafael David Tinoco (community): Approve
- Canonical Server: Pending requested
-
Diff: 317 lines (+244/-1)7 files modifieddebian/changelog (+11/-0)
debian/patches/series (+1/-0)
debian/patches/ubuntu/lp-1847361-modules-load-upgrade.patch (+133/-0)
debian/qemu-block-extra.postrm.in (+43/-0)
debian/qemu-block-extra.prerm.in (+45/-0)
debian/rules (+11/-0)
dev/null (+0/-1)
- Rafael David Tinoco (community): Approve
- Canonical Server: Pending requested
-
Diff: 74 lines (+52/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/series (+1/-0)
debian/patches/ubuntu-aa/lp-1847361-load-versioned-module.patch (+44/-0)
- Andreas Hasenack: Approve
- Canonical Server: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 3765 lines (+3470/-0)43 files modifieddebian/changelog (+21/-0)
debian/patches/lp-1867519-block-nbd-extract-the-common-cleanup-code.patch (+78/-0)
debian/patches/series (+38/-0)
debian/patches/stable/lp-1867519-arm-arm-powerctl-rebuild-hflags-after-setting-CP15-b.patch (+48/-0)
debian/patches/stable/lp-1867519-arm-arm-powerctl-set-NSACR.-CP11-CP10-bits-in-arm_se.patch (+49/-0)
debian/patches/stable/lp-1867519-backup-top-Begin-drain-earlier.patch (+46/-0)
debian/patches/stable/lp-1867519-block-Activate-recursively-even-for-already-active-n.patch (+108/-0)
debian/patches/stable/lp-1867519-block-backup-top-fix-failure-path.patch (+97/-0)
debian/patches/stable/lp-1867519-block-block-copy-fix-progress-calculation.patch (+201/-0)
debian/patches/stable/lp-1867519-block-fix-crash-on-zero-length-unaligned-write-and-r.patch (+107/-0)
debian/patches/stable/lp-1867519-block-io-fix-bdrv_co_do_copy_on_readv.patch (+44/-0)
debian/patches/stable/lp-1867519-block-nbd-fix-memory-leak-in-nbd_open.patch (+76/-0)
debian/patches/stable/lp-1867519-block-qcow2-threads-fix-qcow2_decompress.patch (+79/-0)
debian/patches/stable/lp-1867519-hw-i386-pc-fix-regression-in-parsing-vga-cmdline-par.patch (+58/-0)
debian/patches/stable/lp-1867519-intel_iommu-a-fix-to-vtd_find_as_from_bus_num.patch (+44/-0)
debian/patches/stable/lp-1867519-intel_iommu-add-present-bit-check-for-pasid-table-en.patch (+202/-0)
debian/patches/stable/lp-1867519-iotests-add-test-for-backup-top-failure-on-permissio.patch (+138/-0)
debian/patches/stable/lp-1867519-job-refactor-progress-to-separate-object.patch (+230/-0)
debian/patches/stable/lp-1867519-plugins-core-add-missing-break-in-cb_to_tcg_flags.patch (+41/-0)
debian/patches/stable/lp-1867519-qcow2-Fix-alloc_cluster_abort-for-pre-existing-clust.patch (+39/-0)
debian/patches/stable/lp-1867519-qcow2-Fix-qcow2_alloc_cluster_abort-for-external-dat.patch (+44/-0)
debian/patches/stable/lp-1867519-qcow2-bitmaps-fix-qcow2_can_store_new_dirty_bitmap.patch (+102/-0)
debian/patches/stable/lp-1867519-qemu-img-Fix-convert-n-B-for-backing-less-targets.patch (+54/-0)
debian/patches/stable/lp-1867519-s390-sclp-improve-special-wait-psw-logic.patch (+40/-0)
debian/patches/stable/lp-1867519-target-arm-Return-correct-IL-bit-in-merge_syn_data_a.patch (+46/-0)
debian/patches/stable/lp-1867519-target-arm-Set-ISSIs16Bit-in-make_issinfo.patch (+42/-0)
debian/patches/stable/lp-1867519-target-arm-arm-semi-fix-SYS_OPEN-to-return-nonzero-f.patch (+79/-0)
debian/patches/stable/lp-1867519-target-arm-ensure-we-use-current-exception-state-aft.patch (+127/-0)
debian/patches/stable/lp-1867519-target-i386-kvm-initialize-feature-MSRs-very-early.patch (+169/-0)
debian/patches/stable/lp-1867519-tcg-save-vaddr-temp-for-plugin-usage.patch (+98/-0)
debian/patches/stable/lp-1867519-tpm-ppi-page-align-PPI-RAM.patch (+47/-0)
debian/patches/stable/lp-1867519-vfio-pci-Don-t-remove-irqchip-notifier-if-not-regist.patch (+50/-0)
debian/patches/stable/lp-1867519-virtio-gracefully-handle-invalid-region-caches.patch (+331/-0)
debian/patches/stable/lp-1867519-virtio-mmio-update-queue-size-on-guest-write.patch (+40/-0)
debian/patches/stable/lp-1867519-virtio-net-delete-also-control-queue-when-TX-RX-dele.patch (+41/-0)
debian/patches/stable/lp-1867519-virtio-update-queue-size-on-guest-write.patch (+40/-0)
debian/patches/ubuntu/lp-1847361-modules-load-upgrade.patch (+125/-0)
debian/patches/ubuntu/lp-1847361-vhost-correctly-turn-on-VIRTIO_F_IOMMU_PLATFORM.patch (+61/-0)
debian/qemu-block-extra.postrm.in (+43/-0)
debian/qemu-block-extra.prerm.in (+45/-0)
debian/qemu-system-gui.postrm.in (+44/-0)
debian/qemu-system-gui.prerm.in (+46/-0)
debian/rules (+12/-0)
- Rafael David Tinoco (community): Approve
- Jamie Strandboge (community): Approve
- Canonical Server: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 74 lines (+52/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/series (+1/-0)
debian/patches/ubuntu-aa/lp-1847361-load-versioned-module.patch (+44/-0)
- Rafael David Tinoco (community): Approve
- Ubuntu Security Team: Pending requested
- Canonical Server: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 483 lines (+392/-0)9 files modifieddebian/changelog (+14/-0)
debian/patches/series (+2/-0)
debian/patches/ubuntu/lp-1847361-modules-load-upgrade.patch (+125/-0)
debian/patches/ubuntu/lp-1847361-vhost-correctly-turn-on-VIRTIO_F_IOMMU_PLATFORM.patch (+61/-0)
debian/qemu-block-extra.postrm.in (+43/-0)
debian/qemu-block-extra.prerm.in (+45/-0)
debian/qemu-system-gui.postrm.in (+44/-0)
debian/qemu-system-gui.prerm.in (+46/-0)
debian/rules (+12/-0)
tags: | added: server-triage-discuss |
tags: | added: server-next |
Changed in qemu (Ubuntu): | |
status: | In Progress → Fix Committed |
Changed in libvirt (Ubuntu): | |
status: | Triaged → Fix Released |
Changed in qemu (Ubuntu): | |
status: | Fix Committed → Fix Released |
Changed in qemu (Ubuntu Bionic): | |
assignee: | nobody → Christian Ehrhardt (paelzer) |
Changed in qemu (Ubuntu Bionic): | |
status: | New → Triaged |
Changed in qemu (Ubuntu Eoan): | |
status: | New → Triaged |
Changed in libvirt (Ubuntu Eoan): | |
status: | New → Triaged |
Changed in libvirt (Ubuntu Bionic): | |
status: | New → Triaged |
Changed in qemu (Ubuntu Eoan): | |
assignee: | nobody → Christian Ehrhardt (paelzer) |
Changed in libvirt (Ubuntu Eoan): | |
assignee: | nobody → Christian Ehrhardt (paelzer) |
Changed in libvirt (Ubuntu Bionic): | |
assignee: | nobody → Christian Ehrhardt (paelzer) |
description: | updated |
description: | updated |
description: | updated |
Changed in qemu (Ubuntu Eoan): | |
status: | Triaged → Fix Committed |
tags: | added: verification-needed verification-needed-eoan |
Changed in libvirt (Ubuntu Eoan): | |
status: | Triaged → Fix Committed |
Changed in qemu (Ubuntu Bionic): | |
status: | Triaged → Fix Committed |
tags: | added: verification-needed-bionic |
Changed in libvirt (Ubuntu Bionic): | |
status: | Triaged → Fix Committed |
Changed in cloud-archive: | |
status: | Incomplete → Confirmed |
tags: |
added: verification-stein-done removed: verification-stein-needed |
Interesting bug, thanks Billy for the report.
Packaging of qemu (and actually many other applications as well) splits its capabilities into a main binaries and .so files (or similar tech, even am extra python module if you want). That is for flexibility of attack surface (less active code), size management (install footprint), dependency management (some might be optional or less supported), ...
Some of these technologies can load these modules "late" and this is what happens here.
Qemu will not load all its .so files that are installed immediately, but when needed.
That is what happens when
I personally don't think packaging it with versioned binary packages like a lib or the kernel is the right way (as more packages are design-wise affected by the same and wouldn't we want to switch all of them then), but I don't have a great alternative yet either - this clearly needs some discussions about the way to go with it.
We might need to analyze which of these can load late and maybe provide an option to load them early or automatically before upgrades? OTOH that seems a very big and undetected change in behavior if e.g. reducing attack surface was your intention.
As I said, very interesting bug ...