Comment 2 for bug 1681439

Revision history for this message
John Snow (jnsnow) wrote : Re: qemu-system-x86_64: hw/ide/core.c:685: ide_cancel_dma_sync: Assertion `s->bus->dma->aiocb == NULL' failed.

I don't think the assert you are talking about in the subject is added by 9972354856. That assertion was added by 86698a12f and has been present since QEMU 2.6. I don't see the relation immediately to AioContext patches.

Is this only during boot/shutdown? If not, it looks like there might be some other errors occurring that aggravate the device state and cause a reset by the guest.

Anyway, what should happen is something like this:

- Guest issues a reset request (ide_exec_cmd -> cmd_device_reset)
- The device should now be "busy" and cannot accept any more requests (see the conditional early in ide_exec_cmd)
- cmd_device_reset drains any existing requests.
- we assert that there are no handles to BH routines that have yet to return

Normally I'd say this is enough; because:

Although blk_drain does not prohibit future DMA transfers, it is being called after an explicit reset request from the guest, and so the device should be unable to service any further requests. After existing DMA commands are drained we should be unable to add any further requests.

It generally shouldn't be possible to see new requests show up here, unless;

(A) We are not guarding ide_exec_cmd properly and a new command is sneaking in while we are trying to reset the device, or
(B) blk_drain is not in fact doing what we expect it to (draining all pending DMA from an outstanding IDE command we are servicing.)

Since you mentioned that you need to enable TRIM support in order to see the behavior, perhaps this is a function of a TRIM command being improperly implemented and causing the guest to panic, and we are indeed not draining TRIM requests properly.

That's my best wild guess, anyway. If you can't reproduce this elsewhere, can you run some debug version of this to see under which codepath we are invoking reset, and what the running command that we are failing to terminate is?

--js