Comment 15 for bug 1297218

Revision history for this message
Paul Boven (p-boven) wrote :

I've repeated the experiment without any shared storage, so that eliminates GlusterFS as a suspect.

server-a# virsh migrate --live --persistent --undefinesource --copy-storage-inc guest qemu+tls://server-b/system

Result: After about a week of uptime, the guest froze solid for 27 seconds after the migration. This is after the migration, because the guest is running on the destination server, using up a full core, and not present on the originating server anymore. CPU usage goes back to normal once the guest becomes responsive again.

Just before the migration, NTP was perfectly locked to well within 100us. Right after the machine become responsive again, this NTP status shows the machine simply lost more than 27 seconds:

root@guest:~# ntpq -p
     remote refid st t when poll reach delay offset jitter
==============================================================================
*cl0 xx.xx.xx.xx 3 u 15 16 377 0.457 27388.3 0.100
 cl1 xx.xx.xx.xx 3 u 13 16 377 0.429 27388.4 0.178

root@guest:~# uptime
 16:03:30 up 8 days, 23:45, 1 user, load average: 0.02, 0.02, 0.05

During these 27 seconds, it did not respond to any network activity or (virtual) console. There is no mention of clock-jumps or anything else in dmesg this time.

Note that I have now reproduced this on two different pairs of machines: our original KVM cluster, and two compute nodes (different hardware) to test this with a supported Ubuntu release.