Removal of former release machine types breaks VMs using them

Bug #1626070 reported by Iain Lane
24
This bug affects 7 people
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Fix Released
Critical
Christian Ehrhardt 

Bug Description

I just got 1:2.6.1+dfsg-0ubuntu3. Now, when starting some of my VMs-

laney@nightingale> virsh start xenial-vm
error: Failed to start domain xenial-vm
error: internal error: process exited while connecting to monitor: 2016-09-21T12:20:27.252537Z qemu-system-x86_64: -enable-kvm: unsupported machine type
Use -machine help to list supported machines

It's because the configuration has

  <os>
    <type arch='x86_64' machine='pc-i440fx-wily'>hvm</type>
  </os>

(I created the VM on a system which didn't know about xenial at the time). Fixing it to xenial there works.

However, this took some effort to work out - as you can see the error message is particularly cryptic. In fact I was using virt-manager to launch it, but that just prints the same message.

Is it possible to deal with this somehow? Like by restoring the old releases, migrating VMs to a newer one, or at least printing a better message.

Tags: patch
Revision history for this message
Iain Lane (laney) wrote :

Christian - thoughts? :)

Changed in qemu (Ubuntu):
assignee: nobody → ChristianEhrhardt (paelzer)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We knew it would happen for migrations, but you are right it is critical if people just start old (wrong) defines.
We certainly fixed the right issue, but yes the (correct) removal of the wily type which was the (wrong) default of xenial before can break new starts of former machines.

Damn - yes we have to add the type again ASAP - but not as default.
To avoid too much fallout.

Thanks for the report, we did so much testing on migrations and pre/post upgrade things that we seemed to miss the obvious one.

I'll create a debdiff asap and upload.
That wily type will be identical to the one we had (which also is the same as the new ubuntu.xenial type which is why the reported fix worked)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Well actually on checking to prep something - we didn't remove the pc-i440fx-wily type.
Just to avoid this issue ...

It was only removed in Yakkety, as migrations have to go "through" Xenial.

So to confirm - you are likely on a Yakkety system and currently struggle to get this done for machines you formerly created with xenial.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Yes I see on your qemu that you are on yakkety - ok good.
Not SRU broken Xenial but "only" certain cases of Yakkety.

thinking ...

So while anything you'd start as soon as the Xenial SRU lands will work in yakkety you are right and identified a gap where things can break. That is older (pre SRU) created Xenial guests - which due to an older bug appear as wily guests - can't be started on Yakkety.

https://wiki.ubuntu.com/QemuKVMMigration defined the dropping of all types before a former LTS release on the next dev release.

But due to the former issue on Xenial that means on Yakkety we have to maintain the "Xenial type" in its old AND new form.

Ok, now all fits together - I can prep something for that.

Changed in qemu (Ubuntu):
status: New → In Progress
importance: Undecided → Critical
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Debdiff ready and currently building for a local test - here attached for review and later reference.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

add the broken wily type that xenial used as default.
Drop the default and alias, but keep everything else as-is to avoid issues with upgrades with VMs created on earlier Xenial systems.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Things look good to me now packages to verify are currently build at https://launchpad.net/~paelzer/+archive/ubuntu/qemu-machine-type-dev

While this is urgent and I expect over the day more might show up to be affected there is no reason to rush more and make it worse - so testing again ...

@Lain - As just mentioned on IRC I'll try to ping you for a test once they are published for a test by you as well.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Still building, but for what it is worth the test suite to check proposed/ppa's now got an "upgrade test" that would have covered this. Learn from your mistakes ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Did a local Test until launchpad builds and publishes:
1. create KVM capable xenial container
2. in the Xenial container create a guest
  KVM has default to wily (was one part of the old bug)
  $ KVM -M | grep wily
    pc-i440fx-wily Ubuntu 15.04 PC (i440FX + PIIX, 1996) (default)
3. create a guest and check its machine type
  $ virsh dumpxml testme | xmllint --xpath "string(//domain/os/type/@machine)" -
    pc-i440fx-wily
4. upgrade to yakkety
   dpkg -l show 1:2.6.1+dfsg-0ubuntu3 now
5. check status of guest
  $ virsh list
     1 testme running
     Note: not killed on upgrade of the package as expected
  $ virsh dumpxml testme | xmllint --xpath "string(//domain/os/type/@machine)" -
     pc-i440fx-wilyroot
  $ virsh shutdown testme
  $ virsh start testme
     error: Failed to start domain testme
     error: internal error: process exited while connecting to monitor: 2016-09-21T14:08:54.005107Z qemu-system-x86_64: -enable-kvm: unsupported machine type
     Use -machine help to list supported machines
6. insert locally built new version of qemu via local signed archive
    dpkg -l show 1:2.6.1+dfsg-0ubuntu4 now
7. check if it starts correctly now
  $ virsh start testme
  $ virsh list
     3 testme running
  $ virsh dumpxml testme | xmllint --xpath "string(//domain/os/type/@machine)" -
     pc-i440fx-wily

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

ppa now built and published, ready for verification

Revision history for this message
Scott Moser (smoser) wrote :

Christian, I just verified that adding your ppa and upgrading qemu and libvirt-bin fixes the problem.

I reproduced the original reported issue simply with:
uvt-kvm create sm1 release=yakkety

tags: added: patch
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Yeah Lain Lane just reported the same, thanks smoser for the extra check - uploading ...

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.6.1+dfsg-0ubuntu4

---------------
qemu (1:2.6.1+dfsg-0ubuntu4) yakkety; urgency=medium

  * retain older xenial machine type to avoid issues starting guests
    created on xenial prior to the SRU for bug 1621042. In that regard the old
    broken xenial machine type and the new fixed one have both to be considered
    as valid LTS machine types (LP: #1626070).

 -- Christian Ehrhardt <email address hidden> Wed, 21 Sep 2016 14:57:09 +0200

Changed in qemu (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
richud (richud.com) wrote :

This bug has re-appeared as pc-i440fx-zesty (17.04) is not in the qemu machine types in cosmic (18.10) - I can now no longer run any snapshots I made when I was running 17.04.

error: unsupported configuration: Target domain machine type pc-i440fx-cosmic does not match source pc-i440fx-zesty

$ qemu-system-x86_64 -machine help
Supported machines are:
pc-i440fx-xenial Ubuntu 16.04 PC (i440FX + PIIX, 1996)
pc-i440fx-wily Ubuntu 15.04 PC (i440FX + PIIX, 1996)
pc-i440fx-trusty Ubuntu 14.04 PC (i440FX + PIIX, 1996)
ubuntu Ubuntu 18.10 PC (i440FX + PIIX, 1996) (alias of pc-i440fx-cosmic)
pc-i440fx-cosmic Ubuntu 18.10 PC (i440FX + PIIX, 1996) (default)
pc-i440fx-cosmic-hpb Ubuntu 18.10 PC (i440FX + PIIX +host-phys-bits=true, 1996)
pc-i440fx-bionic Ubuntu 18.04 PC (i440FX + PIIX, 1996)
pc-i440fx-bionic-hpb Ubuntu 18.04 PC (i440FX + PIIX, +host-phys-bits=true, 1996)
pc-i440fx-artful Ubuntu 17.10 PC (i440FX + PIIX, 1996)

Revision history for this message
richud (richud.com) wrote :

recurrence of bug

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Richud,
while the bug here was a special case with the wily type being used in Xenial and therefore to be retained until the EOL of xenial has passed.
It is different for other machine types like the ones you see issues with atm.

Keeping the old types "almost forever" is a common habit as people just don't think about them usually. That has for several users caused much more pain later on. Therefore it is intended to (please choose your preferred word) hint/encourage/enforce users to finally upgrade machine types to pick up new features, stability and security fixes bound to new types only.

See https://wiki.ubuntu.com/QemuKVMMigration#Support_Matrix

Due to that the qemu in Ubuntu 18.10 dropped Zesty and Yakkety intentionally.

Revision history for this message
richud (richud.com) wrote :

Hi Christian,

Thank you for replying so quickly.

The 'old type' is only a year old which seems a bit early to be deprecated?
Sorry if I am missing something, but I see no way to fix all my broken snapshots without the previous machine type existing? (FAQ doesn't mention snapshots)

Perhaps this should be reworded too?;
"Each following release will keep the previously defined aliases to the specific types for compat. Adding a delta added have to make sure to not only add a new, but also maintain compatibility for the old type. "

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi again Richud,
the non LTS releases are on a 9 month period until EOL - that is what I followed.
Anyway we are not here to discuss support times but trying to help you.

Unfortunately in libvirt/qemu there are so many things called snapshots that I can't help right away, would you mind opening a new bug to discuss this there and leaving the old one here as-is?

Please outline there:
1. the way (command) you have taken your snapshots
2. the way (command) you try to re-start your snapshots
3. versions of the components installed (I guess apport-bug will do that for you)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.