Comment 5 for bug 1524898

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Looking at todays failures, which is a much more modern qemu we've got a similar case:

http://logs.openstack.org/55/328055/12/check/gate-tempest-dsvm-multinode-live-migration/e4d241f/logs/subnode-2/libvirt/libvirtd.txt.gz#_2016-06-28_12_51_58_804
http://logs.openstack.org/55/328055/12/check/gate-tempest-dsvm-multinode-live-migration/e4d241f/logs/libvirt/qemu/instance-00000003.txt.gz

a) 12:51:58.804+0000: receives a query-migrate reply saying all is fine
b) 12:51:58.908+0000: receives a STOP from the source qemu saying it's nearly finished
c) 12:52:01.358+0000: ! receives a query-migrate reply saying migration has failed
d) 12:52:01.377760Z qemu-system-x86_64: load of migration failed: Input/output error
e) 12:52:01.360+0000: "event": "RESUME"
f) 12:52:03.136445Z qemu-system-x86_64: terminating on signal 15 from pid 18359

(d) comes from the destination.
OK, so migration failed - but we can't really tell who blinked first; the destination gives the I/O error probably because the source migration went away from it's point of view (probably!), either due to a network issue or the source dieing. Given that we see the migration failed slightly earlier on the source it's probably the source dieing by itself - but unfortunately with no error in teh source qemu log before it quits.

We need some debugging in the version of qemu that's run - unfortunately the ubuntu qemu packages dont seem to have any of qemu's tracing turned on either.