Comment 3 for bug 1681439

Revision history for this message
Michał Kępień (kempniu) wrote : Re: qemu-system-x86_64: hw/ide/core.c:685: ide_cancel_dma_sync: Assertion `s->bus->dma->aiocb == NULL' failed.

> I don't think the assert you are talking about in the subject is added
> by 9972354856. That assertion was added by 86698a12f and has been
> present since QEMU 2.6. I don't see the relation immediately to
> AioContext patches.

You are right, of course. Sorry for misleading you about this. What I
meant to write was that git bisect pinpoints commit 9972354856 as the
likely culprit ("likely" because of the makeshift testing methodology
used).

> Is this only during boot/shutdown? If not, it looks like there might be
> some other errors occurring that aggravate the device state and cause a
> reset by the guest.

In fact this has never happened to me upon boot or shutdown. I believe
the operating system installed on the storage volume I am testing this
with has some kind of disk-intensive activity scheduled to run about
twenty minutes after booting. That is why I have to wait that long
after booting the VM to determine whether the issue appears.

> Anyway, what should happen is something like this:
>
> - Guest issues a reset request (ide_exec_cmd -> cmd_device_reset)
> - The device should now be "busy" and cannot accept any more requests (see the conditional early in ide_exec_cmd)
> - cmd_device_reset drains any existing requests.
> - we assert that there are no handles to BH routines that have yet to return
>
> Normally I'd say this is enough; because:
>
> Although blk_drain does not prohibit future DMA transfers, it is being
> called after an explicit reset request from the guest, and so the device
> should be unable to service any further requests. After existing DMA
> commands are drained we should be unable to add any further requests.
>
> It generally shouldn't be possible to see new requests show up here,
> unless;
>
> (A) We are not guarding ide_exec_cmd properly and a new command is sneaking in while we are trying to reset the device, or
> (B) blk_drain is not in fact doing what we expect it to (draining all pending DMA from an outstanding IDE command we are servicing.)

ide_cancel_dma_sync() is also invoked from bmdma_cmd_writeb() and this
is in fact the code path taken when the assertion fails.

> Since you mentioned that you need to enable TRIM support in order to see
> the behavior, perhaps this is a function of a TRIM command being
> improperly implemented and causing the guest to panic, and we are indeed
> not draining TRIM requests properly.

I am not sure what the relation of TRIM to BMDMA is, but I still cannot
reproduce the issue without TRIM being enabled.

> That's my best wild guess, anyway. If you can't reproduce this
> elsewhere, can you run some debug version of this to see under which
> codepath we are invoking reset, and what the running command that we are
> failing to terminate is?

I recompiled QEMU with --enable-debug --extra-cflags="-ggdb -O0" and
attached the output of "bt full". If this is not enough, please let me
know.