qemu-system-ppc64 hanging occasionally in disk writes

Bug #1013241 reported by Richard W.M. Jones
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

I found last week that qemu-system-ppc64 (from git) hangs occasionally
under load, and I have a reproducer for it now. Unfortunately the
reproducer really takes a long time to run -- usually I can get a hang
in under 12 hours.

Here is the reproducer case:

  https://lists.fedoraproject.org/pipermail/ppc/2012-June/001698.html

Notes:

(1) Verified by one other person (other than me). Happens on both
    ppc64 and x86-64 host.

(2) Happens with both Fedora guest kernel 3.3.4-5.fc17.ppc64 and kernel
    3.5.0 that I compiled myself. The test case above contains 3.3.4-5.

(3) Seems to be a problem in qemu, not the guest. The reason I think
    this is because I tried to capture a backtrace of the hang using
    remote gdb, but gdb just hung when trying to connect to qemu
    (gdb connects fine before the bug happens).

(4) Judging by guest messages, appears to be happening when writing
    to the disk.

Revision history for this message
Richard W.M. Jones (rich-annexia) wrote :

I switched to using virtio-scsi (instead of virtio-blk). This appears to have solved
this problem, although it brings another problem. I also tried vscsi, which fixes
both problems.

Therefore I will (not definitively) claim that the problem lies somewhere in virtio-blk,
but a workaround seems to be available.

Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote : Re: [Qemu-devel] [Bug 1013241] Re: qemu-system-ppc64 hanging occasionally in disk writes

On Tue, 2012-06-19 at 10:16 +0000, Richard W.M. Jones wrote:
> I switched to using virtio-scsi (instead of virtio-blk). This appears to have solved
> this problem, although it brings another problem. I also tried vscsi, which fixes
> both problems.
>
> Therefore I will (not definitively) claim that the problem lies somewhere in virtio-blk,
> but a workaround seems to be available.

What was the virtio-scsi problem ? (Other than SLOF doesn't know about
it yet :-) I haven't audited/tested it so it might have endian issues...

I have reproduced a similar hang with vscsi in full emulation, I haven't
observed your problem with virtio-blk, I plan to spend more time doing
some torture testing & debugging this week see if I can find out what's
going on.

BTW. What was your guest kernel version ?

Cheers,
Ben.

Revision history for this message
Richard W.M. Jones (rich-annexia) wrote :

The problem with virtio-scsi is only a single disk shows up:

https://bugs.launchpad.net/qemu/+bug/1013691

I've been using guest kernels 3.3.4 and 3.5.0-rc2+ (ie. Linus git), and both behave the same way.

Revision history for this message
Thomas Huth (th-huth) wrote :

Looking through old bug tickets... can you still reproduce this issue with the latest version of QEMU? Or could we close this ticket nowadays?

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.