KVM crashes when attempting to restart migration

Bug #855800 reported by Justin Fletcher
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned
qemu-kvm (Ubuntu)
Expired
High
Unassigned

Bug Description

Operations performed:
Sequence to trigger crash:

    * Start two kvm systems, one on gerph (primary), one on nbuild2 (listening for incoming migration) - do not use -daemonize
    * On gerph, connect to monitor.
    * "migrate -d -b tcp:nbuild2:4444"
    * "info migrate"
    * "migrate_cancel"
    * "info migrate"
    * "migrate -d -b tcp:nbuild2:4444"
    * crashed with assertion:
kvm: block-migration.c:355: flush_blks: Assertion `block_mig_state.read_done >= 0' failed.
                 Connection closed by foreign host.
[1]+ Aborted (core dumped) kvm -drive file=./copy-disk2.img,boot=on -m 4096 -serial mon:telnet::23023,server,nowait -balloon virtio -vnc :99 -usbdevice tablet -net nic,macaddr=f6:a6:31:53:89:9a,model=rtl8139,vlan=0 -net tap,vlan=0

Repeating the operations above often dies in different places; just repeat the cancel and restart the operation. Because the KVM system dies, the underlying VM is obviously terminated.

Distribution:

jfletcher@gerph:~$ lsb_release -rd
Description: Ubuntu 10.04.3 LTS
Release: 10.04

Package:

jfletcher@gerph:~$ apt-cache policy kvm
kvm:
  Installed: 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9.15
  Candidate: 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9.15
  Version table:
 *** 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9.15 0
        500 http://gb.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
        500 http://security.ubuntu.com/ubuntu/ lucid-security/main Packages
        100 /var/lib/dpkg/status
     1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9 0
        500 http://gb.archive.ubuntu.com/ubuntu/ lucid/main Packages

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for taking the time to submit this bug and helping to make Ubuntu better.

Just to be sure I understand right, if you simply let the migration continue rather than canceling it, you don't get an error, right? I'll mark this low priority under that assumption. If I'm wrong, then priority should be raised.

(Leaving status New until I manage to reproduce)

Changed in qemu-kvm (Ubuntu):
importance: Undecided → Low
Revision history for this message
Justin Fletcher (gerph) wrote :

That's correct for the testing I have performed.

I have been able to perform repeated migrate/migrate_cancel operations much more quickly than I have been able to perform actual migrations, therefore the test set of migrate operations after a cancel is at least an order of magnitude larger than the test set of completing migrations.

Background in case it's relevant:
I was doing this to test the behaviour if (for example) the target system failed during the migration and it was necessary to cancel and restart, as such resilience is important for the services I maintain.

If there's any more information required, I'm happy to provide help :-)

Revision history for this message
Justin Fletcher (gerph) wrote :

If you *need* to use the live migration (rather than offline migration by copying the disk images) you have already made a decision that the service is sufficiently important that you cannot have downtime on it. If the live migration could fail, and resuming it could crash (as reported), this is going to be a serious concern and most likely not a risk you would wish to take with a service that you have already decided is so vital as to not need downtime.

The migration feature that if used might crash, is not a feature I would like to trust my valuable services to.

Therefore I would suggest that this crash have the same priority as the migration feature. If migration is a low priority feature then it would be find as 'low' priority', but if the live migration is an important feature to have then it needs to be solid.

As an administrator of services, I play have a game of Russian-roulette with them, and migration is that game at present.

Revision history for this message
Justin Fletcher (gerph) wrote :

Oops, I meant "I cannot play a game of ..."

tags: added: lucid migration
tags: added: crash
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Definately confirmed on lucid.

Changed in qemu-kvm (Ubuntu):
status: New → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

In quantal migration always fails, but still after one failed attempt, if I do 'migrate_cancel' and re-try the migration, I get the same error.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

This causes you to lose your VM state.

Changed in qemu-kvm (Ubuntu):
importance: Low → High
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

upstream git head qemu still behaves the same as quantal qemu-kvm (1.1.0), marking a affecting upstream.

Revision history for this message
Thomas Huth (th-huth) wrote :

Can you still reproduce this issue with the latest version of QEMU (currently v2.8)?

Changed in qemu:
status: New → Incomplete
Revision history for this message
Justin Fletcher (gerph) wrote :

I haven't attempted to reproduce the issue recently, I'm afraid. I've changed jobs twice in the intervening time, so the immediate issue for me has gone away. If I find an opportunity, I shall try to reproduce with the most recent versions.

Thomas Huth (th-huth)
Changed in qemu-kvm (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for qemu-kvm (Ubuntu) because there has been no activity for 60 days.]

Changed in qemu-kvm (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.