I/O error after mpt2sas fault_state 0x265d

Bug #1625718 reported by RedShift
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-lts-xenial (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

After a write intensive operation (like dd if=/dev/zero of=bigfile) an I/O error is issued and the filesystem is mounted read-only. Further read attempts from the storage result in more I/O errors.

The RAID controller is a Dell/LSI SAS2008 FWVersion(06.00.00.00), ChipRevision(0x03), BiosVersion(07.07.00.00) managing just one RAID10 array with four disks.

The following kernels have been tried and gave the same problem:

* linux-image-4.4.0-36-generic
* linux-image-4.7.4-040704-generic
* linux-image-4.8.0-040800rc7-generic

All disks are clean when inspected with smartctl.

Tags: kernel-bug
Revision history for this message
RedShift (redshift-gmx) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1625718

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
RedShift (redshift-gmx)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
RedShift (redshift-gmx) wrote :

Downgraded the server to Ubuntu 14.04.5: no issues with the following kernel versions:

* linux-image-3.13.0-96-generic
* linux-image-4.2.0-42-generic

I am attaching some lines from dmesg where the "fault_state(0x265d)" is present but does not lead to I/O errors or filesystems mounted read only: even under high load the system behaves normally.

As stated before I am unable to run apport-collect (or any other command) after that I/O errors have been noticed.

RedShift (redshift-gmx)
affects: linux (Ubuntu) → linux-lts-xenial (Ubuntu)
Changed in linux-lts-xenial (Ubuntu):
importance: Undecided → Medium
Revision history for this message
RedShift (redshift-gmx) wrote :

It seems that one of the following two steps solved the issue, at least when using kernel 4.4.0:

* upgrading the disks (model ST9900805SS) firmware to version CS0C (hosts equipped with different models of disks didn't seem to be affected by this problem);

* adding "mpt3sas.msix_disable=1" to kernel boot parameters.

Revision history for this message
RedShift (redshift-gmx) wrote :

Workaround available

Changed in linux-lts-xenial (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.