virtio-balloon change breaks migration from qemu prior to 4.0
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Stein |
Fix Released
|
High
|
Unassigned | ||
Train |
Fix Released
|
Undecided
|
Unassigned | ||
Ussuri |
Fix Released
|
Undecided
|
Unassigned | ||
qemu (Ubuntu) |
Fix Released
|
High
|
Christian Ehrhardt | ||
Eoan |
Fix Released
|
High
|
Christian Ehrhardt | ||
Focal |
Fix Released
|
High
|
Christian Ehrhardt |
Bug Description
[Impact]
* Due to a bug in qemu in 4.0 the config size for virtio-baloon changed.
* This breaks migration from pre 4.0 qemu because the PCI BAR size is
affected.
* Upstream has realized this and fixed it in 4.1, this backports the fix
to qemu 4.0 in Ubuntu Eoan
[Test Case]
* Take a pre-eoan (pre qemu 4.0) guest and check that your setup can
migrate it back and forth with a eoan/qemu-4.0 target.
Note: (always) use a versioned machine type like pc-i44fx-disco (also
the default if you use disco as source).
Then add a virt-baloon device to the guest on pre-4.0 and migrate it
again.
Unfixed the following error will show up:
get_
* Unfixed -> Fixed qemu 4.0 migrations should work as well. While the
other way around it could (size didn't change), but there are no
guarantees (no logic in the target).
[Regression Potential]
* Messing with machine types is always dangerous, as in case of a mistake
things get even more complex. But in this case things seemed rather
straight forward. Pre 4.0 code all behaves the same, only 4.0 gets the
new attribute set and later code has logic to handle dynamic sizes.
That way I think we are safe of machine-type regressions.
* For the change in behavior, it changes pre 4.0 migrations, which atm
are broken if a virt-baloon device is present. There is nothing to
break more int hat use case, and if such a device isn't present it
shouldn't change anything. Therefore IMHO safe again.
[Other Info]
* n/a
---
Related but not the same as bug 1838569 which had two error signatures.
The first being covered there and the second handled here.
--- ---
Quote from https:/
Daniel 'f0o' Preussker (dpreussker) wrote 1 hour ago: #4
With recent release of OpenStack Train this issue reappears...
Upgrading from Stein to Train will require all VMs to be hard-rebooted to be migrated as a final step because Live Migration fails with:
Oct 17 10:28:43 h2.1.openstack.
Oct 17 10:28:43 h2.1.openstack.
--- ---
Identified as:
Dr. David Alan Gilbert (dgilbert-h) wrote 1 hour ago: #5
Dnaiel: That's a different problem; 'Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0'; so should probably be a separate bug.
I'd bet on this being the one fixed by 2bbadb08ce272d6
--- ---
And that is a fix that only is in qemu 4.1 and would be an open bug for Ubuntu and Cloud Archive
Related branches
- Rafael David Tinoco (community): Approve
- Canonical Server packageset reviewers: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 329 lines (+301/-0)4 files modifieddebian/changelog (+10/-0)
debian/patches/series (+2/-0)
debian/patches/ubuntu/lp-1848497-virtio-balloon-fix-QEMU-4.0-config-size-migration-in.patch (+137/-0)
debian/patches/ubuntu/lp-1848556-curl-Handle-success-in-multi_check_completion.patch (+152/-0)
- Rafael David Tinoco (community): Approve
- Canonical Server: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 329 lines (+301/-0)4 files modifieddebian/changelog (+10/-0)
debian/patches/series (+2/-0)
debian/patches/ubuntu/lp-1848497-virtio-balloon-fix-QEMU-4.0-config-size-migration-in.patch (+137/-0)
debian/patches/ubuntu/lp-1848556-curl-Handle-success-in-multi_check_completion.patch (+152/-0)
- Christian Ehrhardt (community): Needs Resubmitting
- Canonical Server: Pending requested
- Canonical Server packageset reviewers: Pending requested
-
Diff: 167 lines (+145/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/series (+1/-0)
debian/patches/ubuntu/lp-1848497-virtio-balloon-fix-QEMU-4.0-config-size-migration-in.patch (+137/-0)
tags: | added: server-next |
Changed in qemu (Ubuntu Eoan): | |
status: | New → Triaged |
Changed in qemu (Ubuntu Focal): | |
status: | Confirmed → Triaged |
Changed in qemu (Ubuntu Eoan): | |
assignee: | nobody → Christian Ehrhardt (paelzer) |
importance: | Undecided → High |
Changed in cloud-archive: | |
status: | New → Fix Released |
With a migration Bionic to Eoan with a balloon device I can confirm this.
Guestconfig:
<memballoon model='virtio'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</memballoon>
root@testkvm- bionic- from:~# virsh migrate --unsafe --live testguest qemu+ssh: //10.192. 69.27/system 21T13:44: 16.155100Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H :ECX.svm [bit 2] 21T13:44: 18.530641Z qemu-system-x86_64: get_pci_ config_ device: Bad config data: i=0x10 read: e1 device: 1 cmask: ff wmask: c0 w1cmask:0 21T13:44: 18.530657Z qemu-system-x86_64: Failed to load PCIDevice:config 21T13:44: 18.530660Z qemu-system-x86_64: Failed to load virtio- balloon: virtio 21T13:44: 18.530663Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00: 06.0/virtio- balloon' 21T13:44: 18.530839Z qemu-system-x86_64: load of migration failed: Invalid argument
error: internal error: qemu unexpectedly closed the monitor: 2019-10-
2019-10-
2019-10-
2019-10-
2019-10-
2019-10-