assertion "QLIST_EMPTY(&bs->tracked_requests)" failed

Bug #1087114 reported by Brad Smith
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

QEMU 1.3.0 on OpenBSD now crashes with an error as shown below and the command line params do not seem to matter.

assertion "QLIST_EMPTY(&bs->tracked_requests)" failed: file "block.c", line 1220, function "bdrv_drain_all"

#1 0x0000030d1bce24aa in abort () at /usr/src/lib/libc/stdlib/abort.c:70
        p = (struct atexit *) 0x30d11897000
        mask = 4294967263
        cleanup_called = 1
#2 0x0000030d1bc5ff44 in __assert2 (file=Variable "file" is not available.
) at /usr/src/lib/libc/gen/assert.c:52
No locals.
#3 0x0000030b0d383a03 in bdrv_drain_all () at block.c:1220
        bs = (BlockDriverState *) 0x30d13f3b630
        busy = false
        __func__ = "bdrv_drain_all"
#4 0x0000030b0d43acfc in bmdma_cmd_writeb (bm=0x30d0f5f56a8, val=8) at hw/ide/pci.c:312
        __func__ = "bmdma_cmd_writeb"
#5 0x0000030b0d43b450 in bmdma_write (opaque=0x30d0f5f56a8, addr=0, val=8, size=1) at hw/ide/piix.c:76
        bm = (BMDMAState *) 0x30d0f5f56a8
#6 0x0000030b0d5c2ce6 in memory_region_write_accessor (opaque=0x30d0f5f57d0, addr=0, value=0x30d18c288f0, size=1, shift=0, mask=255)
    at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:334
        mr = (MemoryRegion *) 0x30d0f5f57d0
        tmp = 8
#7 0x0000030b0d5c2dc5 in access_with_adjusted_size (addr=0, value=0x30d18c288f0, size=1, access_size_min=1, access_size_max=4,
    access=0x30b0d5c2c6b <memory_region_write_accessor>, opaque=0x30d0f5f57d0) at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:364
        access_mask = 255
        access_size = 1
        i = 0
#8 0x0000030b0d5c3222 in memory_region_iorange_write (iorange=0x30d1d5e7400, offset=0, width=1, data=8)
    at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:439
        mrio = (MemoryRegionIORange *) 0x30d1d5e7400
        mr = (MemoryRegion *) 0x30d0f5f57d0
        __func__ = "memory_region_iorange_write"
#9 0x0000030b0d5c019a in ioport_writeb_thunk (opaque=0x30d1d5e7400, addr=49216, data=8) at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:212
        ioport = (IORange *) 0x30d1d5e7400
#10 0x0000030b0d5bfb65 in ioport_write (index=0, address=49216, data=8) at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:83
        func = (IOPortWriteFunc *) 0x30b0d5c0148 <ioport_writeb_thunk>
        default_func = {0x30b0d5bfbbc <default_ioport_writeb>, 0x30b0d5bfc61 <default_ioport_writew>, 0x30b0d5bfd0c <default_ioport_writel>}
#11 0x0000030b0d5c0704 in cpu_outb (addr=49216, val=8 '\b') at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:289
No locals.
#12 0x0000030b0d6067dd in helper_outb (port=49216, data=8) at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/target-i386/misc_helper.c:72
No locals.

Revision history for this message
Stefan Hajnoczi (stefanha) wrote : Re: [Qemu-devel] [Bug 1087114] [NEW] assertion "QLIST_EMPTY(&bs->tracked_requests)" failed

On Thu, Dec 06, 2012 at 04:02:57AM -0000, Brad Smith wrote:
> QEMU 1.3.0 on OpenBSD now crashes with an error as shown below and the
> command line params do not seem to matter.

Please use git-bisect(1) to identify the commit that caused the
regression.

I was unable to hit this code path with qemu-system-i386 with an IDE
disk. Please do share your command-line.

> assertion "QLIST_EMPTY(&bs->tracked_requests)" failed: file "block.c",
> line 1220, function "bdrv_drain_all"

bdrv_drain_all() waits until in-flight requests have completed. The
assertion verifies that all I/O requests are really done. Something is
wrong here.

> #1 0x0000030d1bce24aa in abort () at /usr/src/lib/libc/stdlib/abort.c:70
> p = (struct atexit *) 0x30d11897000
> mask = 4294967263
> cleanup_called = 1
> #2 0x0000030d1bc5ff44 in __assert2 (file=Variable "file" is not available.
> ) at /usr/src/lib/libc/gen/assert.c:52
> No locals.
> #3 0x0000030b0d383a03 in bdrv_drain_all () at block.c:1220
> bs = (BlockDriverState *) 0x30d13f3b630
> busy = false
> __func__ = "bdrv_drain_all"
> #4 0x0000030b0d43acfc in bmdma_cmd_writeb (bm=0x30d0f5f56a8, val=8) at hw/ide/pci.c:312
> __func__ = "bmdma_cmd_writeb"
> #5 0x0000030b0d43b450 in bmdma_write (opaque=0x30d0f5f56a8, addr=0, val=8, size=1) at hw/ide/piix.c:76
> bm = (BMDMAState *) 0x30d0f5f56a8

The device is an IDE disk.

> #6 0x0000030b0d5c2ce6 in memory_region_write_accessor (opaque=0x30d0f5f57d0, addr=0, value=0x30d18c288f0, size=1, shift=0, mask=255)
> at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:334
> mr = (MemoryRegion *) 0x30d0f5f57d0
> tmp = 8
> #7 0x0000030b0d5c2dc5 in access_with_adjusted_size (addr=0, value=0x30d18c288f0, size=1, access_size_min=1, access_size_max=4,
> access=0x30b0d5c2c6b <memory_region_write_accessor>, opaque=0x30d0f5f57d0) at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:364
> access_mask = 255
> access_size = 1
> i = 0
> #8 0x0000030b0d5c3222 in memory_region_iorange_write (iorange=0x30d1d5e7400, offset=0, width=1, data=8)
> at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:439
> mrio = (MemoryRegionIORange *) 0x30d1d5e7400
> mr = (MemoryRegion *) 0x30d0f5f57d0
> __func__ = "memory_region_iorange_write"
> #9 0x0000030b0d5c019a in ioport_writeb_thunk (opaque=0x30d1d5e7400, addr=49216, data=8) at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:212
> ioport = (IORange *) 0x30d1d5e7400
> #10 0x0000030b0d5bfb65 in ioport_write (index=0, address=49216, data=8) at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:83
> func = (IOPortWriteFunc *) 0x30b0d5c0148 <ioport_writeb_thunk>
> default_func = {0x30b0d5bfbbc <default_ioport_writeb>, 0x30b0d5bfc61 <default_ioport_writew>, 0x30b0d5bfd0c <default_ioport_writel>}
> #11 0x0000030b0d5c0704 in cpu_outb (addr=49216, val=8 '\b') at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:289
> No locals.
> #12 0x0000030b0d6067dd in helper_outb (port=49216, data=8) at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/target-i386/misc_helper.c:72
> No locals.

Revision history for this message
Brad Smith (t-brad-3) wrote :

qemu-system-x86_64 -cdrom [image] -boot -d -hda virtual.img

is the command line I was using.

Revision history for this message
Paolo Bonzini (bonzini) wrote :

Please attach config.log, also please try (if you're using recent openbsd with rthreads) --with-coroutine=sigaltstack.

Revision history for this message
Brad Smith (t-brad-3) wrote :

I'm just finishing the bisection and think I have the commit that caused this but I'm now just testing commits +-1 from that commit to make sure and if it is will try reverting just that commit against HEAD as well. Using the sigaltstack coroutine backend did not make any difference. I actually am using that now and then reverted it when initially testing 1.3 to make sure that was not the source of the regression with no change in behaviour at all. Also yes I would be using rthreads. All development happens against -current.

Revision history for this message
Brad Smith (t-brad-3) wrote :

So what is causing this is this commit... c166cb72f1676855816340666c3b618beef4b976

semaphore: implement fallback counting semaphores with mutex+condvar

OpenBSD and Darwin do not have sem_timedwait. Implement a fallback for them.

If I remove that, since OpenBSD 5.2/-current has sem_timedwait, then it works just fine.

Revision history for this message
Brad Smith (t-brad-3) wrote : Re: [Qemu-devel] [Bug 1087114] [NEW] assertion "QLIST_EMPTY(&bs->tracked_requests)" failed

On Thu, Dec 13, 2012 at 04:26:50PM +0800, Zhi Yong Wu wrote:
> On Thu, Dec 6, 2012 at 12:02 PM, Brad Smith <email address hidden> wrote:
> > Public bug reported:
> >
> > QEMU 1.3.0 on OpenBSD now crashes with an error as shown below and the
> > command line params do not seem to matter.
> >
> > assertion "QLIST_EMPTY(&bs->tracked_requests)" failed: file "block.c",
> > line 1220, function "bdrv_drain_all"
> Just i hit the same issue on my large scale perf testing, mayb i
> should try virtio-blk to work around before it is fixed by some guy.

What OS are you using to host QEMU?

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Revision history for this message
Brad Smith (t-brad-3) wrote :

Paolo,

As you wrote the fallback code which is used when sem_timedwait() is missing could you please take a look at this when you have some time? I can test any patches you might come up with.

Revision history for this message
Austin Seipp (thoughtpolice) wrote :

I was experiencing this bug fairly regularly with QEMU 1.3.0 on OS X 10.8. All my emulations of debian environments couldn't even get past installation, because this bug would hit too early.

Brad, it looks like 2 weeks ago you got a patch authored that fixes this fallback code. That's commit a795ef8dcb8cbadffc996c41ff38927a97645234, which was originally from Paolo.

I have applied this patch locally to a copy of QEMU 1.3.0 and my problems went away. Thus I think this bug is fixed in HEAD, but I do not know if the commit has been put in another branch.

Revision history for this message
Aaron Jackson (limes102) wrote :

I am currently experiencing this on Mac OS X 10.8 also, custom built yesterday evening from the master branch.

I have looked at the diff and it looks like it has been applied in qemu-thread-posix.c file, but no luck.

Any pointers? Thanks

Revision history for this message
Rainer Müller (raimue) wrote :

I had the same problem on Mac OS X 10.8.2 with qemu 1.3.0, but it is now fixed in the current master branch. I can confirm that the commit a795ef8dcb8cbadffc996c41ff38927a97645234 fixes this problem. This commit can also be applied to the 1.3.0 source.

Revision history for this message
Aaron Jackson (limes102) wrote :

I am still having this error even though I compile from the master branch and commit a795ef8dcb8cbadffc996c41ff38927a97645234 is definitely there.

Revision history for this message
Brad Smith (t-brad-3) wrote :

Before the patch in question was commited running QEMU 1.3.0 hosted on OpenBSD I was able to cause QEMU to crash reproducibly by just booting OpenBSD within QEMU and upon the kernel accessing the virtual disk to read the disklabel or during an install writing the disklabel. After the patch was applied I was not able to cause any crashes and went through a handful of installs without any issues.

Are you able to build QEMU with debug symbols and get a backtrace once it has crashed on your OS X system?

Revision history for this message
Brad Smith (t-brad-3) wrote :

The other question I have is if you look at the commit I mentioned as causing the crash (at least on OpenBSD) and revert that change from either 1.3.0 or HEAD branch and build QEMU on OS X does the crashing you're experiencing go away?

Revision history for this message
Aaron Jackson (limes102) wrote :

On line 216 of qemu-thread-posix.c I have commented out the ++sem->count; which seems to be the only change made in that commit. Unfortunately it still crashes with that error.

I have compiled with --enable-debug but not sure how to get a backtrace or even a log of what goes wrong.

Revision history for this message
Rainer Müller (raimue) wrote :

Aaron, this added line in qemu-thread-posix.c is the fix, qemu is expected to crash once this is removed.

I guess Brad meant to revert c166cb72f1676855816340666c3b618beef4b976 which introduced the fallback code. However, reverting this commit alone will not work on Mac OS X as sem_timedwait() is not available (and the reason why the fallback code was added at all).

Revision history for this message
Brad Smith (t-brad-3) wrote :

So this is still an issue with 1.4.x and/or master?

Revision history for this message
Brad Smith (t-brad-3) wrote :

Any OS X and NetBSD users still affected by this issue should test this patch..

http://lists.nongnu.org/archive/html/qemu-devel/2013-06/msg05335.html

Revision history for this message
Brad Smith (t-brad-3) wrote :

Austin, Aaron and Reiner... Would you guys be able to test master on OS X and report back if this issue has been fully resolved or not?

Revision history for this message
Rainer Müller (raimue) wrote :

I was unable to reproduce the original issue on Mac OS X 10.8.4 using the current master. However, I was also unable to reproduce the original issue on the stable-1.5 branch which does not have the fix by Izumi Tsutsui linked above. As this second fix is only for a problem that appears in certain load situations, of course I might not be able to reproduce it.

I also reviewed the code on master I am confident that the solution is correct now.

Revision history for this message
Thomas Huth (th-huth) wrote :

As mentioned in previous comments already, this issue should be solved by commit a795ef8dcb8cbadffc996c4, so setting the status to "Fix released" now.

Changed in qemu:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.