mpt2sas driver is unusable

Bug #906873 reported by Tamas Papp
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I have a Supermicro computer with integrated LSI raid controller.

I tried many version of kernel and driver with no success.

All of them give me at least this error message:

[ 272.434551] scsi target6:1:0: volume handle(0x0143), volume wwid(0x095379ae0ead6e88)
[ 272.434557] sd 6:1:0:0: task abort: FAILED scmd(ffff88100da5ce00)
[ 272.434561] sd 6:1:0:0: attempting task abort! scmd(ffff88100da5d800)
[ 272.434565] sd 6:1:0:0: [sda] CDB: Write(10): 2a 00 00 a5 31 04 00 04 00 00

The newest version (11.00) makes a kernel panic and oops. Unfortunately I have no null modem cable, to grab the oops message through serial console.

I tried with the following scenarios (all amd64):

Ubuntu 10.04 2.6.32: this error message
Ubuntu 12.04 (Precise) beta: 3.2.0: this error message
Ubuntu 10.04 backported kernel 2.6.38: this error message
Ubuntu 10.04 backported kernel 2.6.38 + driver from ftp://ftp.supermicro.com/driver/SAS/LSI/2008/IR/Driver/Linux/PH10-00.00/ : kernel panic at bootime
Ubuntu 10.04 backported kernel 2.6.38 + driver from ftp://ftp.supermicro.com/driver/SAS/LSI/2008/IR/Driver/Linux/PH11-00.00/: kernel panic at boottime
Scientific Linux 6.1: kernel panic at install boot time (some weird apic function, probably not related)
Scientific Linux 6.1 with noapic kernel parameter: just hang up
Scientific Linux 5.7: this error message at install time, no hangup, but mkfs was not success after an hour

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-6-generic 3.2.0-6.12
ProcVersionSignature: Ubuntu 3.2.0-6.12-generic 3.2.0-rc6
Uname: Linux 3.2.0-6-generic x86_64
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Dec 20 13:18 seq
 crw-rw---T 1 root audio 116, 33 Dec 20 13:18 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 1.90-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Tue Dec 20 14:03:33 2011
HibernationDevice: RESUME=/dev/mapper/vg0-swap
IwConfig: Error: [Errno 2] No such file or directory
MachineType: Supermicro H8QG6
PciMultimedia:

ProcEnviron:
 LC_CTYPE=hu_HU
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-6-generic root=/dev/mapper/vg0-root ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-6-generic N/A
 linux-backports-modules-3.2.0-6-generic N/A
 linux-firmware 1.66
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
UpgradeStatus: Upgraded to precise on 2011-12-20 (0 days ago)
dmi.bios.date: 09/26/2011
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2.00
dmi.board.asset.tag: 1234567890
dmi.board.name: H8QG6
dmi.board.vendor: Supermicro
dmi.board.version: 1234567890
dmi.chassis.asset.tag: 1234567890
dmi.chassis.type: 17
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 1234567890
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2.00:bd09/26/2011:svnSupermicro:pnH8QG6:pvr1234567890:rvnSupermicro:rnH8QG6:rvr1234567890:cvnSupermicro:ct17:cvr1234567890:
dmi.product.name: H8QG6
dmi.product.version: 1234567890
dmi.sys.vendor: Supermicro

Revision history for this message
Tamas Papp (tompos) wrote :
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . If possible, please test the latest v3.2-rcN kernel (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed by the mainline kernel, please add the following tag 'kernel-fixed-upstream-KERNEL-VERSION'. For example, if kernel version 3.2-rc1 fixed and issue, the tag would be: 'kernel-fixed-upstream-v3.2-rc1'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'. If you believe this bug does not require upstream testing, please add the tag: 'kernel-upstream-testing-not-needed'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: needs-upstream-testing
tags: added: kernel-da-key
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-7.13)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-7.13
Revision history for this message
Tapani Tarvainen (ubuntu-tapani) wrote :

This looks similar to a problem I've been having with mpt2sas (in several machines with various LSI controller cards). In my case the crashes went away when I disabled hddtemp and smartd (and avoid doing smartctl or hdparm when there's activity on the disks).
With hddtemp running mpt2sas would crash every time within 24 hours of boot, sooner if there was heavy disk activity; with sufficiently heavy disk action a single smartctl -a could crash it. After disabling hddtemp, no problems.
So it would appear there's a bug in mpt2sas command passthru handling.
I don't know if this is a different bug, but if hddtemp or smartd are running, it might be worthwhile to try disabling them.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Popolon (popolon) wrote :

I've the same problem on ubuntu 12.04 with kernel 3.2.0-29-generic #46-Ubuntu SMP (x86_64). What could I do to help to resolv this bug now ?

Revision history for this message
Popolon (popolon) wrote :

I've no kernel panic (for now), but a simple rsync via ssh or any disk access and loadavg goes up fastly, system become unresponsive to other disks access and lock commands (and break with ctrl-c) until the rsync is breaked.

Revision history for this message
Popolon (popolon) wrote :
Revision history for this message
Tamas Papp (tompos) wrote :

It's always better to use apport system to collect and report the system information:

https://help.ubuntu.com/community/ReportingBugs#Adding_apport-collect_information_to_an_existing_Launchpad_bug

Revision history for this message
THCTLO (thctlo) wrote :

Hai,

while formating the first 512k of my tape.
Running ubuntu 12.04 LTS, kernel -40 , everything up2date, only LTO4 tapedrive on sas controller.

[588841.364868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[588841.375947] dd D ffffffff81806240 0 26843 26827 0x00000000
[588841.375952] ffff881048fe1cd8 0000000000000082 ffff881048fe1c98 ffffffffa0039318
[588841.375968] ffff881048fe1fd8 ffff881048fe1fd8 ffff881048fe1fd8 00000000000137c0
[588841.375983] ffffffff81c0d020 ffff881049f58000 000000a43c794a10 7fffffffffffffff
[588841.375998] Call Trace:
[588841.376023] [<ffffffffa0039318>] ? mpt2sas_base_get_smid_scsiio+0x88/0xd0 [mpt2sas]
[588841.376034] [<ffffffff8165ba6f>] schedule+0x3f/0x60
[588841.376042] [<ffffffff8165c0b5>] schedule_timeout+0x2a5/0x320
[588841.376050] [<ffffffff8130efd7>] ? kobject_put+0x27/0x60
[588841.376059] [<ffffffff8165dc65>] ? _raw_spin_lock_irq+0x15/0x20
[588841.376067] [<ffffffff8165b8af>] wait_for_common+0xdf/0x180
[588841.376076] [<ffffffff81060620>] ? try_to_wake_up+0x200/0x200
[588841.376083] [<ffffffff8165ba2d>] wait_for_completion+0x1d/0x20
[588841.376092] [<ffffffffa0157b1a>] st_do_scsi.constprop.17+0x12a/0x280 [st]
[588841.376100] [<ffffffffa015a6c8>] st_flush+0x218/0x360 [st]
[588841.376110] [<ffffffff8117aad3>] ? __fput+0x153/0x210
[588841.376117] [<ffffffff8117772f>] filp_close+0x3f/0x90
[588841.376124] [<ffffffff81177832>] sys_close+0xb2/0x120
[588841.376133] [<ffffffff81665f82>] system_call_fastpath+0x16/0x1b

Revision history for this message
Tapani Tarvainen (ubuntu-tapani) wrote :

This problem still persists in Saucy using latest LSI firmware (10.00.00.07, bios 07.31.00.00),
although it's not as bad as it used to be - it doesn't crash within hours anymore but rather weeks
(although a few times twice within a few hours). But when it crashes it crashes hard - all disks
on the controller freeze (I can only see syslog entries before the crash by directing syslog to
another machine).

So, it's still too bad to allow using smartd or hddtemp in mission-critical machines.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.