"Assertion `!r->req.sg' failed." during live migration with VirtIO

Bug #1646610 reported by Peter
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

We've hit this issue twice so far, but don't have an obvious repro yet. It's pretty rare for us to hit it but I'm still trying so I can get a core and backtrace. The guest was Windows running a constant workload. We were using VirtIO SCSI drivers in both cases.

In both cases we hit the assert here:

hw/scsi/scsi-generic.c:

static void scsi_generic_save_request(QEMUFile *f, SCSIRequest *req)
{
    SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);

    qemu_put_sbe32s(f, &r->buflen);
    if (r->buflen && r->req.cmd.mode == SCSI_XFER_TO_DEV) {
*** assert(!r->req.sg);
        qemu_put_buffer(f, r->buf, r->req.cmd.xfer);
    }
}

From code inspection, it seems that this will always happen if scsi_req_enqueue_internal in hw/scsi/scsi-bus.c is ever invoked.

static void scsi_req_enqueue_internal(SCSIRequest *req)
{
    assert(!req->enqueued);
    scsi_req_ref(req);
    if (req->bus->info->get_sg_list) {
        req->sg = req->bus->info->get_sg_list(req);
    } else {
        req->sg = NULL;
    }
    req->enqueued = true;
    QTAILQ_INSERT_TAIL(&req->dev->requests, req, next);
}

req->bus->info->get_sg_list will return &req->qsgl for a virtio-scsi bus since it's a method stored on the SCSIBusInfo struct. I didn't see anything that would clear the req->sg if scsi_req_enqueue_internal is called at least once.

I think this can only happen if scsi_dma_restart_bh in hw/scsi/scsi-bus.c is called. The only other location I see scsi_req_enqueue_internal is on the load side for the destination of a migration.

static void scsi_dma_restart_bh(void *opaque)
{
    SCSIDevice *s = opaque;
    SCSIRequest *req, *next;

    qemu_bh_delete(s->bh);
    s->bh = NULL;

    QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) {
        scsi_req_ref(req);
        if (req->retry) {
            req->retry = false;
            switch (req->cmd.mode) {
            case SCSI_XFER_FROM_DEV:
            case SCSI_XFER_TO_DEV:
                scsi_req_continue(req);
                break;
            case SCSI_XFER_NONE:
                scsi_req_dequeue(req);
                scsi_req_enqueue(req); // *** this calls scsi_req_enqueue_internal
                break;
            }
        }
        scsi_req_unref(req);
    }
}

Finally when put_scsi_requests is called for migration, it seems like it will call both virtio_scsi_save_request (from bus->info->save_request) and scsi_generic_save_request (from req->ops->save_request) and trigger the assert.

I searched for a bit, but didn't find anyone else reporting this. Has anyone else seen this? It seems to me like that assert should check that the sg list is empty instead of checking that it exists. Is this an appropriate assessment? Assuming I find a repro, I'll try to test this solution.

Thanks!

Revision history for this message
Thomas Huth (th-huth) wrote :

Which version of QEMU are you using?

Revision history for this message
Peter (peter.turschm) wrote :

Hi Thomas,

Thanks for looking. We're using version 2.3.0.

Revision history for this message
Thomas Huth (th-huth) wrote :

OK. QEMU version 2.3 is not maintained by the QEMU project any more. Can you reproduce it with the latest version (release candidate of 2.8 preferably, or at least 2.7)?

Changed in qemu:
status: New → Incomplete
Revision history for this message
Peter (peter.turschm) wrote :

Thanks Thomas. Will do.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.