LSI Logic MPT driver mapping of scsi device busy to scsi host+device busy leads to read-only ext3 fs remounts on VMware ESX Server.

Bug #137585 reported by Ed Goggin
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-source-2.6.20 (Ubuntu)
Won't Fix
Medium
Unassigned
linux-source-2.6.22 (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

The function mptscsih_io_done() in drivers/message/fusion/mptscsih.c maps scsi device busy to scsi host+device busy for the case where the MPI IOC status is MPI_IOCSTATUS_SUCCESS. This action results in limiting Linux mid-layer SCSI retransmissions for this case to 5 without any delay between successive retransmissions (see DID_BUS_BUSY handling in scsi_decide_disposition in drivers/scsi/scsi_error.c). While this change was made in order to fail-over to Linux host resident MPIO code more quickly in the event of such scsi device busy failures, this also leads to scsi device busy I/O failures being reported to SCSI client code more readily in VMware ESX Server configurations. When such an I/O failure occurs on a SCSI WRITE command to a journal of an ext3 file system, the ext3 file system is remounted in read-only mode.

A fix for this issue is present in drivers/message/fusion/mptscsih.c:mptscsih_io_done() in upstream 2.6.22 code. I've included below the patch from Eric Moore from LSI Logic which contained the fix.

This address the issue of VMWare guest OS being remounted as read-only becuase the underlying device was held busy too long, , and at the same time address Engenio MPP driver concerns over infinite retries. This patch removes the code that snoops the SAM STATUS on busy, which would be returning DID_BUS_BUSY, instead we return the status as is. Retry hanlding seems to be properly handled in scsi_softirq_done, where a busy sam status would only occurr for the time specified by (cmd->allowed +1) * cmd->timeout_per_command.

Signed-off-by: Eric Moore <email address hidden>

diff -uarpN b/drivers/message/fusion/mptscsih.c a/drivers/message/fusion/mptscsih.c
--- b/drivers/message/fusion/mptscsih.c 2007-03-15 18:20:01.000000000 -0600
+++ a/drivers/message/fusion/mptscsih.c 2007-03-18 12:07:26.000000000 -0600
@@ -819,10 +819,7 @@ mptscsih_io_done(MPT_ADAPTER *ioc, MPT_F
    sc->resid=0;
   case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR: /* 0x0040 */
   case MPI_IOCSTATUS_SUCCESS: /* 0x0000 */
- if (scsi_status == MPI_SCSI_STATUS_BUSY)
- sc->result = (DID_BUS_BUSY << 16) | scsi_status;
- else
- sc->result = (DID_OK << 16) | scsi_status;
+ sc->result = (DID_OK << 16) | scsi_status;
    if (scsi_state == 0) {
     ;
    } else if (scsi_state & MPI_SCSI_STATE_AUTOSENSE_VALID) {
-

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. This patch is in the current gutsy kernel release, 2.6.22-11.32. Thanks!

Revision history for this message
Ed Goggin (egoggin) wrote :

Leann,

Thanks for the prompt reply. That is good news. We are also seeking to have this patch installed into some maintenance release of Ubuntu-7.0.4. Is this possible?

Ed

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Ed, I'm re-assigning this to the kernel team for their consideration to backport to 2.6.20. Just note that updates to a stable release are only done under certain circumstances as outlined here:

https://wiki.ubuntu.com/StableReleaseUpdates

Thanks.

Changed in linux-source-2.6.20:
assignee: nobody → ubuntu-kernel-team
status: New → Confirmed
Changed in linux-source-2.6.22:
importance: Undecided → Medium
Changed in linux-source-2.6.20:
importance: Undecided → Medium
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

It unfortunately looks like this never went in as an SRU for Feisty. Currently, the 18 month support period for Feisty Fawn 7.04 has reached it's end of life - http://www.ubuntu.com/news/ubuntu-7.04-end-of-life . As a result we are closing the linux-source-2.6.20 task. Thanks.

Changed in linux-source-2.6.20:
status: Confirmed → Won't Fix
Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.