Ubuntu
qemu package

Live migration locks up Linux 3.2-based guests

Trusty (14.04)
Bug #1398718

Bug #1398718 reported by Matt Mullins on 2014-12-03

This bug affects 1 person

	Status	Importance	Assigned to
qemu (Ubuntu)	Invalid	High	Unassigned
Trusty	Invalid	High	Unassigned
Utopic	Won't Fix	High	Unassigned

Bug Description

In the thread at http://thread.gmane.org/gmane.comp.emulators.kvm.devel/127042/focus=129294, three commits were identified to fix live migration for qemu 2.0 (at least), which I am using on trusty. I would like to get these pulled-in by the package maintainer.

I have cherry-picked those three commits (with some considerable fix-up for the first , which may or may not be correct; the others apply cleanly) and built packages locally. Installing that on the migration-receiver seems to fix my guest lockups after live-migrating. I can attach the patches I'm using if someone is able to review my fix-ups to the first one.

My original problem description was:
Somewhere between kernel 3.2 and 3.11 on my VM hosts (yes, I know that narrows
it down a /whole lot/ ...), live migration started killing my Ubuntu precise
(kernel 3.2.x) guests, causing all of their vcpus to go into a busy loop. Once
(and only once) I've observed the guest eventually becoming responsive again,
with a clock nearly 600 years in the future and a negative uptime.

I haven't been able to dig up any previous threads about this problem, so my
gut instinct is that I've configured something wonky. Any pointers toward
/what/ I may have done wrong are appreciated.

It only seems to happen if I've given the guests Nehalem-class CPU features.
My longest-running VMs, from before I started passing-through the CPU
capabilities into the guest, seem to migrate without issue.

It also seems to happen reliably when the guest has been running for a while;
it's easily reproducible with guests that have been up ~1 day, and I've
reproduced it in VMs with an uptime of ~20 hours. I haven't yet figured out a
lower-bound, which makes the testing cycle a little longer for me.

The guests that I reliably reproduce this on are Ubuntu 12.04 guests running
the current 3.2 kernel that Canonical distributes. Recent Fedora kernels
(3.14+, IIRC) don't seem to busy-spin this way, though I haven't tested this
case exhaustively, and I haven't written down very good notes for the tests I
have done with Fedora.

The hosts are dual-socket Nehalem Xeons (L5520), currently running Ubuntu 14.04
and the associated 3.13 kernel. I had previously reproduced this with 12.04
running a raring-backport 3.11 kernel as well, but I (seemingly erroneously)
assumed it may have been a qemu userspace discrepancy.

Serge Hallyn (serge-hallyn) on 2014-12-12

Changed in qemu (Ubuntu):
status:	New → Triaged
importance:	Undecided → High
Changed in qemu (Ubuntu Trusty):
status:	New → Confirmed
Changed in qemu (Ubuntu Utopic):
status:	New → Confirmed
Changed in qemu (Ubuntu Trusty):
importance:	Undecided → High
Changed in qemu (Ubuntu Utopic):
importance:	Undecided → High

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2015-05-12:

Could you confirm that this is still an issue in vivid?

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2015-08-12:

Ping - can you confirm whether this is an issue still in vivid or wily?

Changed in qemu (Ubuntu):
status:	Triaged → Incomplete

Revision history for this message

Matt Mullins (mokomull) wrote on 2015-08-12:

Apologies for the delay. I can't even seem to reproduce the original issue on the affected hardware before upgrading, so I can't confirm the fix in vivid or wily.

I'd be fine with closing this "can (no longer) reproduce"—maybe my recollection is playing funny tricks on me.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2015-08-13:

Thanks. I'm afraid the status for "cannot be reproduced" is "invalid". If you see this happening again, please do re-open this bug.

Changed in qemu (Ubuntu):
status:	Incomplete → Invalid
Changed in qemu (Ubuntu Trusty):
status:	Confirmed → Invalid
Changed in qemu (Ubuntu Utopic):
status:	Confirmed → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuqemu package

Live migration locks up Linux 3.2-based guests

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
qemu package