virtio-balloon change breaks rocky -> stein live migrate

Bug #1882416 reported by Sam Morrison
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
New
Undecided
Unassigned
qemu (Ubuntu)
New
Undecided
Unassigned

Bug Description

Live migrating a VM from a rocky env -> stein env doesn't work

Source host (openstack rocky release):
qemu-system-x86:
  Installed: 1:2.11+dfsg-1ubuntu7.26

Destination host (openstack stein release):
qemu-system-x86:
  Installed: 1:3.1+dfsg-2ubuntu3.7~cloud0

Live Migration failure: internal error: qemu unexpectedly closed the monitor: 2020-05-24T22:47:19.677896Z qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
2020-05-24T22:47:19.677922Z qemu-system-x86_64: Failed to load PCIDevice:config
2020-05-24T22:47:19.677926Z qemu-system-x86_64: Failed to load virtio-balloon:virtio
2020-05-24T22:47:19.677929Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon'
2020-05-24T22:47:19.678086Z qemu-system-x86_64: load of migration failed: Invalid argument: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-05-24T22:47:19.677896Z qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0'

Please see https://bugs.launchpad.net/cloud-archive/+bug/1848497 for related issue

Tags: seg sts
Revision history for this message
Sam Morrison (sorrison) wrote :

I should note to get around this issue I upgraded to the qemu packages found in the ussuri cloud archive and that worked great

So live migrate from

1:2.11+dfsg-1ubuntu7.26

to

1:4.2-3ubuntu6~cloud0

Works fine

Revision history for this message
Brett Milford (brettmilford) wrote :

I'm getting the same:

2020-06-05T13:00:38.579356Z qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
      2020-06-05T13:00:38.579375Z qemu-system-x86_64: Failed to load PCIDevice:config
      2020-06-05T13:00:38.579379Z qemu-system-x86_64: Failed to load virtio-balloon:virtio
      2020-06-05T13:00:38.579382Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon'
      2020-06-05T13:00:38.579434Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2399983 kHz) and host (2399997 kHz), and TSC scaling unavailable
      2020-06-05T13:00:38.579562Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2399983 kHz) and host (2399997 kHz), and TSC scaling unavailable
      2020-06-05T13:00:38.579707Z qemu-system-x86_64: load of migration failed: Invalid argument

Migrating from:
qemu-system-x86 1:3.1+dfsg-2ubuntu3.2~cloud0

To:
qemu-system-x86 1:3.1+dfsg-2ubuntu3.7~cloud0

tags: added: sts
Revision history for this message
masterpe (michiel-y) wrote :

This issue has been fixed by qemu commit 2bbadb08ce272d65e1f78621002008b07d1e0f03: https://git.qemu.org/?p=qemu.git;a=commit;h=2bbadb08ce272d65e1f78621002008b07d1e0f03

And went active in around version 4.1.0-rc0.

Revision history for this message
Sam Morrison (sorrison) wrote :

Yes I understand that but the stein cloud archive has version 1:3.1+dfsg-2ubuntu3.7~cloud0

Revision history for this message
Brett Milford (brettmilford) wrote :

@sorrison fyi Rocky UCA doesn't package qemu, your source package is likely from update/main of bionic.

Regardless, I can replicate this bug in both cases. i.e. when migrating from 3.1+dfsg-2ubuntu3.2~cloud0 to 3.1+dfsg-2ubuntu3.7~cloud0 and from bionic 1:2.11+dfsg-1ubuntu7.28 to 3.1+dfsg-2ubuntu3.7~cloud0.

Revision history for this message
Brett Milford (brettmilford) wrote :

from paelzer https://bugs.launchpad.net/cloud-archive/+bug/1848497/comments/15

BTW 2.11 -> 3.1 without the cloud archive in mind matches a migration from Bionic to Disco.
I have checked my test logs (a bit ago since Disco itself is EOL).
But at least this January 13 and 16th the migrations 2.11 -> 3.1 still were ok.

In my log that was between
B: qemu: 1:2.11+dfsg-1ubuntu7.21 libvirt: 4.0.0-1ubuntu8.14
D: qemu: 1:3.1+dfsg-2ubuntu3.7 libvirt: 5.0.0-1ubuntu2.6

  7.2.0 (11:51:19): Test live migration (extra option '') of a bionic guest testkvm-bionic-from/testkvm-disco-from
    7.2.1 (11:51:19): live migration (extra option '') testkvm-bionic-from -> testkvm-disco-from => Pass
    7.2.2 (11:51:26): Check if guest kvmguest-bionic-normal on testkvm-disco-from is alive => Pass

Trent Lloyd (lathiat)
tags: added: seg
Revision history for this message
Trent Lloyd (lathiat) wrote :

I think the issue here is that Stein's qemu comes from Disco which was EOL before Bug #1848497 was fixed and so the change wasn't backported.

While Stein is EOL next month the problem is this makes live migrations fail which are often wanted during OpenStack upgrades to actually get through Stein onto Train. So I think we'll need to backport the fix.

Revision history for this message
Dan Streetman (ddstreet) wrote :

> While Stein is EOL next month

No, Stein is alive until 2022:
https://ubuntu.com/about/release-cycle#ubuntu-openstack-release-cycle

> Bug #1848497 was fixed

so this bug should probably be a dup of that bug, and that bug should have UCA target stein added, right?

There is also bug 1847361 for qemu in Stein...

Revision history for this message
Dan Streetman (ddstreet) wrote :

ok i'm marking this as a dup of bug 1848497, and will handle prepping the patch in that bug

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers