migration fails between 12.04 Precise and 14.04 Trusty

Bug #1291321 reported by Valentijn Sessink
44
This bug affects 6 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
Undecided
Unassigned
qemu (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
Undecided
Unassigned

Bug Description

We're trying to live migrate machine "fhdhvalentijn" to a server named "ranja". This used to work perfectly when both systems ran 12.04 Precise. After upgrading the target machine ("ranja") to Trusty, migration fails; sometimes, migration results in shutting down the VM.
Command: virsh migrate --live --copy-storage-all --verbose vhdhvalentijn qemu+ssh://ranja/system
Expected: live migration
Result: "error: operation failed: migration job: unexpectedly failed"

Logfile on server "Ranja" says:
Length mismatch: vga.vram: 1000000 in != 800000
qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed
2014-03-12 11:05:30.325+0000: shutting down

Logfile on server "Duikboot" (the host) sometimes suddenly says: "2014-03-12 11:18:25.645+0000: shutting down" and then the VM is no longer running.

I also tried to start the migration from 14.04 (virsh -c qemu+ssh://duikboot/system migrate --live --copy-storage-all ...), and also tried to leave out "--live" (and tried various options like migrating with virt-manager). Also tried to migrate a "virgin" machine (i.e. a newly created machine with default options) - to no avail.

I did not try to migrate directly with qemu - as I have no experience with that.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for submitting this bug. Would it be possible for you to test with the qemu package in ppa:ubuntu-virt/candidate to see whether that works any better?

Changed in libvirt (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Changed in qemu (Ubuntu):
status: New → Incomplete
importance: Undecided → High
Revision history for this message
Valentijn Sessink (valentijn) wrote : Re: [Bug 1291321] Re: migration fails between 12.04 Precise and 14.04 Trusty

Yes, that's possible. Should I install the ppa on source, target or both?

V.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Only on the target, as it only has a package for trusty.

Revision history for this message
Valentijn Sessink (valentijn) wrote :

Hi,

Installed qemu-common qemu-keymaps qemu-kvm qemu-system-common
qemu-system-x86 qemu-utils from the ppa, same error:

source (12.04): # error: operation failed: migration job: unexpectedly
failed

Log file on target (14.04, log_level = 1) says:

2014-03-13 07:03:56.550+0000: starting up
LC_ALL=C
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
QEMU_AUDIO_DRV=none /usr/bin/kvm -name fhdhvalentijn -S -machine
pc-1.0,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp
1,sockets=1,cores=1,threads=1 -uuid f46f2877-62b2-5c2c-0b76-f62e6773dbff
-no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/fhdhvalentijn.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -boot strict=on -kernel
/tmp/migrate/vmlinuz-2.6.32-53-generic -initrd
/tmp/migrate/initrd.img-2.6.32-53-generic -append root=/dev/vda ro
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/dev/mapper/OpenOffice-filmhuisrestorevalentijn,if=none,id=drive-virtio-disk0,format=raw
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:b3:d1:57,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming tcp:[::]:49152 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
char device redirected to /dev/pts/1 (label charserial0)
Length mismatch: vga.vram: 1000000 in != 800000
qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed
2014-03-13 07:03:56.922+0000: shutting down

Revision history for this message
Valentijn Sessink (valentijn) wrote :

The "1000000" comes from vga.h in the source for qemu-kvm-1.0+noroms: #define VGA_RAM_SIZE (16 * 1024 * 1024)
I could not find the "800000" value - as far as I can see, vga ram size is not fixed anymore in the qemu-1.7.0+dfsg source - but I did not fully understand the code.
Anyway, I tried something new and removed the video device from the machine. Now a new error pops up:
Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 10000 in != 20000

Anything else I can check?

Revision history for this message
Valentijn Sessink (valentijn) wrote :

Hi, is there anything I can do to help this bug's status?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Sorry, that was meant to be reset to New. I'm intending to set up reproducers today and get to the bottom of it. This definately should be fixed somehow before release. Thanks for reporting this.

Changed in libvirt (Ubuntu):
status: Incomplete → New
Changed in qemu (Ubuntu):
status: Incomplete → New
Changed in qemu (Ubuntu):
status: New → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Adding

 -global cirrus-vga.vgamem_mb=10

to the receiving end's qemu command line options brings us past that error. You may end up with other incompatibilities (I did) but perhaps libvirt will ensure the rest of the devices match up. Please let us know how it fares with that option.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Sorry, I guess that doesn't help much with your use of libvirt.

Can you please show the xml for the original VM you are migrating? It can and should be specifying a vga ram size.

Revision history for this message
Valentijn Sessink (valentijn) wrote :

When you add a machine with virt-manager, the display setting is <model type='cirrus' vram='9216' heads='1'/>. I tried to set it to 10240 (which is 10mb) but that didn't help - the error is the same.

Revision history for this message
Valentijn Sessink (valentijn) wrote :

BTW, that last test was with the "regular" qemu-kvm in trusty, which is 1.7.0+dfsg-3ubuntu6. I upgraded to the ppa-version again and ... it seems to do something. BRB.

Revision history for this message
Valentijn Sessink (valentijn) wrote :

"error: Unable to read from monitor: Connection reset by peer" and the machine crashed :-(
Next try, same as always: "Length mismatch: vga.vram: 1000000 in != 800000"
That's a weird side effect, sometimes not only migration fails, but the machine on the source host crashes, too. Did you reproduce that as well?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Valentijn Sessink (<email address hidden>):
> When you add a machine with virt-manager, the display setting is <model
> type='cirrus' vram='9216' heads='1'/>. I tried to set it to 10240 (which
> is 10mb) but that didn't help - the error is the same.

Yes, oddly the way to specify this is to add

 <qemu:commandline>
  <qemu:arg value='-global'/>
  <qemu:arg value='cirrus-vga.vgamem_mb=8'/>
 </qemu:commandline>

to the domain xml. However, when I tried adding that in precise it
seems to immediately disappear. It could be something odd about my
setup so is worth your trying it.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Valentijn Sessink (<email address hidden>):
> "error: Unable to read from monitor: Connection reset by peer" and the machine crashed :-(
> Next try, same as always: "Length mismatch: vga.vram: 1000000 in != 800000"
> That's a weird side effect, sometimes not only migration fails, but the machine on the source host crashes, too. Did you reproduce that as well?

I've not gotten the original VM to crash yet, that would be
particularly bad.

Revision history for this message
Valentijn Sessink (valentijn) wrote :

Hi,

Adding <qemu:commandline> tags needs a namespace. What you do is:
12.04~# virsh edit machine
Now add the namespace. The resulting first line of the domain xml-file should read:
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
After that, adding <qemu:commandline> tags will work, but alas:

# virsh start fhdhvalentijn
error: Failed to start domain fhdhvalentijn
error: internal error Process exited while reading console log output: char device redirected to /dev/pts/12
kvm: Property 'cirrus-vga.vgamem_mb' not found

So 12.04 doesn't have the option. I even tried to trick the system into thinking it did have a cirrus-vga.vgamem_mb thing, by 1) leaving the option out; 2) starting fhdhvalentijn 3) virsh edit fhdhvalentijn and adding the cirrus-option; then migrating the thing, but that doesn't work either.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

> So 12.04 doesn't have the option. I even tried to trick the system into

Right, that was the conclusion I unfortunately came to yesterday. So I
will look at the qemu source and see if there is a reasonable way to
resize the vram when the mismatch is found at migration time.

Revision history for this message
Valentijn Sessink (valentijn) wrote :

Please note my comment #5: removing the vga adapter showed up a new mismatch, this time for the virtio network adapter. I did not look into that but you might want to check if there's more memory mismatches that should be addressed before you start adding code.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Yes, I did run into that yesterday as well. However, we have to
start somewhere :)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

After trying a few more things and discussing with upstream, I'm afraid the answer will be to mark this as unsupported in the release notes. Because users could be using -m pc-1.0 in raring+, we cannot simply use the old qemu-kvm values for pc-1.0 and lower machines, as that would fix users from newer releases.

The one possibility which would remain, would be to publish an alternate qemu package (perhaps in ppa) which uses the older values, and advise users needing to migrate images from precise to install that package. Key to this would be to make sure that users use a higher machine type when starting new VMs so that they could be migrated to the new qemu values for subsequent releases.

In order to make sure that this is not an issue after trusty, we will define a trusty-specific machine type.

Changed in libvirt (Ubuntu):
status: New → Confirmed
Revision history for this message
Valentijn Sessink (valentijn) wrote :

Ok, I understand. Please note that this was also an issue for 10.04 -> 12.04, so it's definitely a good idea to have it fixed for future upgrades (aka "trusty specific machine type").
I'm not sure about the extra qemu. The next action after migration would be to install the "real" qemu, right? After which I should stop-start (as opposed to re-start!) all VM's to make sure they're under a trusty-compatible qemu? Or an I missing something here?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

> I'm not sure about the extra qemu. The next action after migration
> would be to install the "real" qemu, right? After which I should
> stop-start (as opposed to re-start!) all VM's to make sure they're
> under a trusty-compatible qemu? Or an I missing something here?

Right, it would be a very short-term solution, and really not worth
the time and risk.

Revision history for this message
Valentijn Sessink (valentijn) wrote :

Otoh, now we're facing a non-migrateable 12.04 in general, because you cannot move your VMs out of the way :-(
Anyway, proper documentation is helpful in any case. Finally: there is no real workaround now, or is there? What would you do, having several 12.04 host machines? Just migrate everything with the VM's running?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Now I suppose one simple workaround would be to build the qemu-kvm package
from precise on trusty...

Revision history for this message
Valentijn Sessink (valentijn) wrote :

Unfortunately, no, that doesn't seem to work either. I managed to compile qemu-kvm-1.0+noroms (required a few minor changes, linking with librt and a documentation difference) and the supporting bios and pxe files on Trusty. Migration now seems to work (Completed 100 %), but then the target libvirt spews out a "kvm: Features 0x100000d4 unsupported. Allowed features: 0x71000454" and the guest machine becomes non responsive (feels like a disk issue) - a hard shutdown (aka destroy) - on the target - is required, which makes the whole live migration effort a bit silly :-S

Revision history for this message
Doug Smythies (dsmythies) wrote :

My comment may not apply to this, but I'll make it anyhow:

I upgraded my host server from ubuntu 12.04 to 14.04. Afterwards, none of my desktop VM's worked (I haven't even tried the server VM's yet). I made a new guest VM using the exact same "virst-install" command I had used for the original VM creation on the 12.04 host server, and compared the resulting .xml file. There were 3 or 4 differences, but this one:

<type arch='x86_64' machine='pc-1.0'>hvm</type>

in 14.04 became this:

<type arch='x86_64' machine='pc-i440fx-1.7'>hvm</type>

I used "virsh edit" and changed that line and the guest VM originally created under a 12.04 server host then worked under the 14.04 server host.

Note: So far, this change has been made to 6 VM's (5 are either 14.04 originally or as upgraded from 13.10, and one was a 12.04 Desktop guest VM) and 5 of them are now working. The one is only not working for one log in type out of 5 possible (GNOME; GNOME Classic; GNOME Flashback (Compiz); GNOME Flashback (Metacity); Ubuntu (Default)).

Revision history for this message
Ryan Beisner (1chb1n) wrote :

@Doug I saw the same in a 12.04 to 14.04 upgrade, also had to edit the machine type. Other than that one edit, all of my VMs worked after the Trusty upgrade. Take note that this triggers Windows VMs to have to re-activate due to the changed 'hardware.'

But I think this bug is more about a live migration scenario between two virtual machine hosts, source on 12.04 and the destination running 14.04.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Indeed, as Ryan suggests this is different. Doug, would you mind filing
a new bug for what you are seeing? pc-1.0 should certainly still work,
although it's meaning has changed a bit since 12.04. My understanding
(which may be inaccurate) was that while the change in meaning will
prevent live migration, simply booting a pre-existing VM should still
work.

Changed in qemu (Ubuntu):
status: Confirmed → Won't Fix
Changed in libvirt (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Matt Mullins (mokomull) wrote :

This is also blocking a precise -> trusty migration for me.

Would it be possible / feasible to provide a "pc-1.0-precise" machine type in qemu for trusty and an updated libvirt for precise that allows you to change the machine attribute on the <type> element for migration, so I could specify a migration-compatible domain configuration?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi Matt,

some feeble attempts at that were made, but the result was deemed too fragile. The safest option is to shut down the VMs, change the machine type in the xml definition, migrate the shut-down images, and re-start them in trusty.

Revision history for this message
Alex Bligh (ubuntu-alex-org) wrote :

This is pretty annoying. In a situation where you have many customer VMs running on 12.04, and want to migrate them to a host running 14.04 (so you can do a rolling OS upgrade), I'm afraid "shut down all your customer VMs and restart" isn't really an option for obvious reasons.

Equally, installing two versions of qemu, or custom versions of qemu is not really an option.

In my situation I'm not using virsh / libvirt, so adding '-machine pc-1.0 -global cirrus-vga.vgamem_mb=10' or similar would be a reasonable fix; when the VM is eventually rebooted, I can reboot without that, and the hardware will appear to be upgraded (not great, but ok).

However, this doesn't work as (as far as I can tell) there is no way to get past:
'Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 10000 in != 20000'

through command-line skulduggery.

What it seems to me one should do is define a pc-1.0-precise machine type (which is obviously not going to be used by anyone using raring etc.), and use this solely for incoming migrations. I'd produce the patch myself save I've not yet discovered where the relevant tweak for changing virtio-net-pci.rom size is.

If I find it (I've contributed to qemu before) would you take this as an SRU?

Revision history for this message
Alex Bligh (ubuntu-alex-org) wrote :

Looks like there is a patch here:
 http://pkgs.fedoraproject.org/cgit/qemu.git/tree/0001-Fix-migration-from-qemu-kvm.patch?h=f20

but it's either take it (and break inbound migrates from quantal etc.) or don't (and break inbound migrates from precise). Another possibility (unhelpful for libvirt possibly), would simply be a second binary for this purpose.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks very much to Alex Bligh for posting a working patchset for qemu!

Test packages for both qemu and libvirt are in ppa:serge-hallyn/virt. The source of these allowd me to successfuly migrate a cirros VM from precise to utopic. To trigger this, /etc/libvirt/qemu.conf must contain "incoming_assume_qemukvm = 1".

Changed in qemu (Ubuntu):
status: Won't Fix → Triaged
Changed in libvirt (Ubuntu):
status: Won't Fix → Triaged
Revision history for this message
Matt Mullins (mokomull) wrote :

That's awesome. I just got a chance to test it out, and with a couple hiccups (below), I seem to have successfully migrated a guest from a precise host to a new trusty one.

It looks like qemu failed to build in Serge's PPA due to the spice that also lives in that PPA. I built the qemu from the source in that PPA, and it built without modification.

I also got 'pci_add_option_rom: failed to find romfile "pxe-virtio.rom.12.04"' the first time I did the migration with libvirt; a symlink seems to have papered over that problem. libvirt is passing "virtio-net-pci.romfile=pxe-virtio.rom.12.04", but nothing provides that file.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Matt Mullins (<email address hidden>):
> That's awesome. I just got a chance to test it out, and with a couple
> hiccups (below), I seem to have successfully migrated a guest from a
> precise host to a new trusty one.
>
> It looks like qemu failed to build in Serge's PPA due to the spice that
> also lives in that PPA. I built the qemu from the source in that PPA,
> and it built without modification.

D'oh, indeed I misspoke in my last comment.
I didn't want to drop that spice package so the trusty packages
built in ppa:serge-hallyn/qemu-p-migration

> I also got 'pci_add_option_rom: failed to find romfile "pxe-
> virtio.rom.12.04"' the first time I did the migration with libvirt; a

Ah yes, I forgot to add that file to the qemu package.

Thanks for testing.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(Note, the patches being tested are the ones from this thread: https://lists.nongnu.org/archive/html/qemu-devel/2014-07/msg03160.html )

Revision history for this message
Raphi (raphithom) wrote :

I installed the patched package from the ppa on our upgraded (12.04->14.04) system - however the live-migration is still not working as expected.

The kvm process is just dying without any message/logs as soon as the migrate command is started on the source node. We're using googles ganeti to manage our cluster. The KVM instance is started with the -machine pc-1.0 option. Is there anything which needs to be adjusted manually in order to bring the live-migration feature to work?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Raphi,

You say you started the instance with -machine pc-1.0. To accept incoming
migration from precise you must use -machine pc-1.0-qemu-kvm

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu Trusty):
status: New → Confirmed
Changed in qemu (Ubuntu Trusty):
status: New → Confirmed
Revision history for this message
Raphael Thoma (raphithom-4) wrote :

In the meantime I could find out that the problem only pops up when "virtio" (or virtio-net) is being used for network devices. With the e1000 nic_type everything is working as expected. Can this be seen as related to this bug?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Raphael Thoma (<email address hidden>):
> In the meantime I could find out that the problem only pops up when
> "virtio" (or virtio-net) is being used for network devices. With the
> e1000 nic_type everything is working as expected. Can this be seen as
> related to this bug?

Perhaps, but if so then only as evidence of why we would not want to
try to officially support this - there are too many possible remaining
incompatibilities to be able to generally say, with confidence, that
migration between 12.04 and 14.04 is reliable.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Raphael,

if you're having trouble with the virtio network type, then it sounds like either you are not passing the romfile option to qemu in the trusty host (-global virtio-net-pci.romfile=pxe-virtio.rom.12.04), or you do not have the pxe-virtio.rom.12.04 file installed (in /usr/share/qemu/ as a link to /usr/lib/ipxe/qemu/pxe-virtio.rom.12.04.

Revision history for this message
Raphael Thoma (raphithom-4) wrote :

@Serge

Thank you very much. This was exactly the missing piece in my puzzle! The live-migration between 12.04 and 14.04 is now working as expected.

Changed in libvirt (Ubuntu):
status: Triaged → Fix Released
Changed in qemu (Ubuntu):
status: Triaged → Fix Released
Changed in libvirt (Ubuntu Trusty):
status: Confirmed → Fix Released
Changed in qemu (Ubuntu Trusty):
status: Confirmed → Fix Released
Revision history for this message
Pete Ashdown (pashdown-xmission) wrote :

Having trouble with this still. I've installed the packages and made the changes to machine type and qemu as stated above. These are the errors I'm getting.

2015-10-23 19:25:57.491+0000: 7277: warning : qemuDomainObjEnterMonitorInternal:1274 : This thread seems to be the async job owner; entering monitor without asking for a nested job is dangerous
2015-10-23 19:25:57.583+0000: 7276: error : qemuMonitorIO:656 : internal error: End of file from monitor

This is the command I used to migrate:

virsh migrate --live --persistent --undefinesource --copy-storage-all --verbose test qemu+ssh://root@kvm15/system

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

That seems like an unrelated bug. Please open a new bug, giving as much information as possible about the two hosts and the vms.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

This doesn't really make sense, but the patch adding the pc-1.0-qemu-kvm machine
type seems to be also adding pc-1.0-qemu-kvm as an alias for pc-1.0, all the way
back to the original version where we introduced the patch. Which doesn't really make sense as the package did pass SRU testing.

When I build a new package without that extra alias, the migration gets further, though it then stops on

Unknown savevm section or instance 'kvm-tpr-opt' 0

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I can confirm Serges finding and reproduce this latest reoccuring of the bug, but lets keep this one closed and continue in bug 1536331 - There Serge already started on this in the past.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.