virtio-balloon change breaks post 4.0 upgrade

Bug #1838569 reported by Bjoern
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
qemu (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

We upgraded the libvirt UCA packages from 3.6 to 4.0 and qemu 2.10 to 2.11 as part of a queens upgrade and noticed that
virtio-ballon is broken when instances live migrate (started with a prior 3.6 version) with:

2019-07-24T06:46:49.487109Z qemu-system-x86_64: warning: Unknown firmware file in legacy mode: etc/msr_feature_control
2019-07-24T06:47:22.187749Z qemu-system-x86_64: VQ 2 size 0x80 < last_avail_idx 0xb57 - used_idx 0xb59
2019-07-24T06:47:22.187768Z qemu-system-x86_64: Failed to load virtio-balloon:virtio
2019-07-24T06:47:22.187771Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon'
2019-07-24T06:47:22.188194Z qemu-system-x86_64: load of migration failed: Operation not permitted
2019-07-24 06:47:22.430+0000: shutting down, reason=failed

This seem to be the exact problem as reported by https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02228.html

Listed the packages which changed:

Start-Date: 2019-07-06 06:40:55
Commandline: /usr/bin/apt-get -y -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold install libvirt-bin python-libvirt qemu qemu-utils qemu-system qemu-system-arm qemu-system-mips qemu-system-ppc qemu-system-sparc qemu-system-x86 qemu-system-misc qemu-block-extra qemu-utils qemu-user qemu-kvm
Install: librdmacm1:amd64 (17.1-1ubuntu0.1~cloud0, automatic), libvirt-daemon-driver-storage-rbd:amd64 (4.0.0-1ubuntu8.10~cloud0, automatic), ipxe-qemu-256k-compat-efi-roms:amd64 (1.0.0+git-20150424.a25a16d-0ubuntu2~cloud0, automatic)
Upgrade: qemu-system-mips:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system-misc:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system-ppc:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), python-libvirt:amd64 (3.5.0-1build1~cloud0, 4.0.0-1~cloud0), qemu-system-x86:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-clients:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu-user:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-bin:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-utils:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-daemon-system:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu-system-sparc:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-user-binfmt:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-kvm:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt0:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu-system-arm:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-block-extra:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system-common:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-daemon:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0)
End-Date: 2019-07-06 06:41:08

At this point the instances would have to be hard rebooted or stopped/started to fix the issue for future live migration attemps

Bjoern (bjoern-t)
description: updated
Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Hi Bjoern,
  I don't think this is the same bug as the one you reference; that's a config space disagreement, not a virtio queue disagreeement.

It almost feels like the old 4a1e48becab81020adfb74b22c76a595f2d02a01 stats migration fix; but that went in with 2.8 so shouldn't be a problem going 2.10 to 2.11

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In regard to "similar bugs" it sounds more like [1] to me.
Which was around needing [2].

But just like the commit David mentioned is in 2.8 this one is in since 2.6 (and backported).

[1]: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1647389
[2]: https://git.qemu.org/?p=qemu.git;a=commit;h=4eae2a657d1ff5ada56eb9b4966eae0eff333b0b

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status: New → Confirmed
Revision history for this message
Daniel 'f0o' Preussker (dpreussker) wrote :

With recent release of OpenStack Train this issue reappears...

Upgrading from Stein to Train will require all VMs to be hard-rebooted to be migrated as a final step because Live Migration fails with:

Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: Unable to read from monitor: Connection reset by peer
Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: internal error: qemu unexpectedly closed the monitor: 2019-10-17T10:28:42.981201Z qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
                                                          2019-10-17T10:28:42.981250Z qemu-system-x86_64: Failed to load PCIDevice:config
                                                          2019-10-17T10:28:42.981263Z qemu-system-x86_64: Failed to load virtio-balloon:virtio
                                                          2019-10-17T10:28:42.981272Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon'
                                                          2019-10-17T10:28:42.981391Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable
                                                          2019-10-17T10:28:42.983157Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable
                                                          2019-10-17T10:28:42.983672Z qemu-system-x86_64: load of migration failed: Invalid argument

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Dnaiel: That's a different problem; 'Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0'; so should probably be a separate bug.

I'd bet on this being the one fixed by 2bbadb08ce272d65e1f78621002008b07d1e0f03

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: [Bug 1838569] Re: virtio-balloon change breaks post 4.0 upgrade

> I'd bet on this being the one fixed by
> 2bbadb08ce272d65e1f78621002008b07d1e0f03

But wasn't the breakage this fixes only added in qemu 4.0?
He reports his change is from qemu 2.10 to 2.11.

Unfortunately 2bbadb08 doesn't have a "fixes" line, maybe Ubuntu has
something backported that makes the Ubuntu 2.11 being affected by it.
I checked the Delta but there was nothing related backported that I
could identify.

But I agree, that first of all this should be a new bug instead of
reviving the old one here.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> I'd bet on this being the one fixed by
> 2bbadb08ce272d65e1f78621002008b07d1e0f03

But wasn't the breakage this fixes only added in qemu 4.0?
He reports his change is from qemu 2.10 to 2.11.

Unfortunately 2bbadb08 doesn't have a "fixes" line, maybe Ubuntu has
something backported that makes the Ubuntu 2.11 being affected by it.
I checked the Delta but there was nothing related backported that I
could identify.

But I agree, that first of all this should be a new bug instead of
reviving the old one here.

P.S. this will race as I sent the same via mail :-/, but I need to extend.

I have now realized that the later report (comment #4) is about Openstack Train which at least on Ubuntu would come with qemu 4.0 - and since 2bbadb08 was released only with 4.1 that would be an open issue.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I forked that new discussion into https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1848497
Please follow there and leave this bug here to the originally reported error signature.

Revision history for this message
Bjoern (bjoern-t) wrote :

It seems the fix is comitted with https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1848497. Closing this issue

Changed in qemu:
status: New → Fix Committed
status: Fix Committed → Fix Released
Changed in cloud-archive:
status: New → Fix Released
Changed in qemu (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.