Comment 2 for bug 1848739

Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: [linux-azure] Patch to prevent possible data corruption

Yes we need that patch. The commit description is misleading, as this commit actually fixes a bug on data corruption for SCSI devices.

In the latest Ubuntu 4.15 kernel, __blk_mq_try_issue_directly() (in "block/blk-mq.c") calls blk_mq_sched_insert_request() if q->mq_ops->queue_rq() returns BLK_STS_RESOURCE, this is not correct and is prone to data corruption.

The c616cbee97ae has the following change:

@@ -1785,7 +1764,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
        if (bypass_insert)
                return BLK_STS_RESOURCE;

- blk_mq_sched_insert_request(rq, false, run_queue, false);
+ blk_mq_request_bypass_insert(rq, run_queue);
        return BLK_STS_OK;
 }

This change is subtle, now the I/O will not get merged with other pending I/O if a direct issue has failed. This fixed the data corruption for SCSI devices. The bug was not introduced by commit ffe81d45322c.

I recommend them pick up all the relevant patches leading to this commit. If this is difficult we can do a special back ported patch to change the code in __blk_mq_try_issue_directly() to handle requeue.