LSI SAS 1078 not detected when installing

Bug #343749 reported by Jaume Sabater on 2009-03-16
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Fedora)
Won't Fix
Critical
linux (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: linux-source-2.6.27

Trying to install Ubuntu 8.10 on an IBM 3950 M2 with:

- Chipset ICH7
- Integrated network card Broadcom NetXtreme II BCM5709 Gigabit Eternet (rev 06)
- Network card Intel Pro/1000 PT 82571EB (4 ports)
- LSI Logic/Symbios Logic SAS 1078 PCI-Express Fusion MPT SAS (rev ff)
- RSA card

All devices appear correctly in an lspci command. During the install process, only the Broadcom network card is detected, although the e1000e driver is loaded. During the disk detection, no disks are found, although the mptsas driver is loaded.

Here you are some information I manually copied from the screen of the server (dmesg while detecting the network hardware):

[..]
Fusion MPT base driver 3.04.07
Copyright (c) 1999-2008 LSI Corporation
mptsas 0000:04:00.0: PCI INT A -> GSI 46 (level, low) -> IRQ 46
mptbase: ioc0: Initiating bringup
e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
e1000e: Copyright (c) 1999-2008 Intel Corporation.
ioc0: LSISAS1078 C2: Capabilities={Initiator}
mptbase: ioc0: PCI-MSI enabled
mptsas 0000:04:00.0: setting latency timer to 64
Calgary: DMA error on CalIOC2 PHB 0x3
Calgary: 0x02000000@CSR 0x00000000@PLSSR 0xb0008000@CSMR 0x00000000@MCK
Calgary: 0x00000000@0x810 0xfee0c000@0x820 0x00000000@0x830 0x00000000@0x840 0x03804a00@0x850 0x00000000@0x860 0x00000000@0x870
Calgary: 0x00000000@0xcb0
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
mptbase: ioc0: WARNING - NOT READY!
mptbase: ioc0: WARNING - Cannot recover rc = -1!
mptbase: ioc0: WARNING - Firmware Reload FAILED!
Clocksource tsc unstable (delta = 48342981112 ns)

And here you are part of the result of the lspci command (copied manually, too):

04:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1078 PCI-Express Fusion-MPT SAS (rev ff)
20:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
20:00.1 Idem.
21:00.0 Idem.
21:00.1 Idem.

I have been searching the Internet for hours with no luck. I have tried disabling the onboard LAN controller with no luck.

Jaume Sabater (jsabater) wrote :

Managed to sort out the problem by booting both the installer and the installed kernel 2.6.27-X with the parameter iommu=soft. iommu=off did not work. Attached you'll find a lspci. Hope it helps. I've seen the guys at Red Hat updated their kernel in Red Hat Enterprise 9 with a workaround for this issue, which is related to MSI (IBM's Calgary implementation) and not the disk controller by LSI.

Download full text (4.1 KiB)

Description of problem:
When doing "intensive" I/O, the mpt* drivers crashes the filesystem, on Fedora 12.

The problem is on an IBM x3580 M2 machine, using the integrated LSI SAS1078 C1 PCI-express Fusion-MPT SAS.

Steps to Reproduce:
1. Create a big allocated space (20GB for example)
2. dd if=/dev/vg/mybigspace of=/dev/null
3. After a few minutes, the filesystem access becomes impossible. Looking at dmesg, you get the following:

Calgary: DMA error on CalIOC2 PHB 0x3
Calgary: 0x80000000@CSR 0x00000000@PLSSR 0xb0008000@CSMR 0x00000000@MCK
Calgary: 0x00000000@0x810 0x00000000@0x820 0x00000000@0x830 0x00000000@0x840 0x00000000@0x850 0x00000000@0x860 0x00000000@0x870
Calgary: 0x40000000@0xcb0
irq 46: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.31.5-127.fc12.x86_64 #1
Call Trace:
 <IRQ> [<ffffffff8109aefc>] __report_bad_irq+0x3d/0x8c
 [<ffffffff8109b063>] note_interrupt+0x118/0x17d
 [<ffffffff8109b6f2>] handle_fasteoi_irq+0xa1/0xc6
 [<ffffffff8101463c>] handle_irq+0x8b/0x93
 [<ffffffff8141e9cc>] do_IRQ+0x5c/0xbc
 [<ffffffff810126d3>] ret_from_intr+0x0/0x11
 <EOI> [<ffffffff8101907f>] ? mwait_idle+0x91/0xae
 [<ffffffff8101907f>] ? mwait_idle+0x91/0xae
 [<ffffffff81019021>] ? mwait_idle+0x33/0xae
 [<ffffffff8141d079>] ? atomic_notifier_call_chain+0x13/0x15
 [<ffffffff81010bb8>] ? enter_idle+0x25/0x27
 [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
 [<ffffffff814145be>] ? start_secondary+0x1f3/0x234
handlers:
[<ffffffffa00e3d7e>] (mpt_interrupt+0x0/0x8bb [mptbase])
Disabling IRQ #46
mptscsih: ioc0: attempting task abort! (sc=ffff880a0d8fa400)
sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 fb 5b a7 00 00 60 00
mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)!
mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_IssueTaskMgmt!!
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
mptbase: ioc0: WARNING - NOT READY WARNING!
mptbase: WARNING - (-1) Cannot recover ioc0
mptscsih: ioc0: WARNING - TaskMgmt HardReset FAILED!!
mptscsih: ioc0: task abort: FAILED (sc=ffff880a0d8fa400)
mptscsih: ioc0: attempting task abort! (sc=ffff880a04bb3100)
sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 28 f5 ff 00 00 08 00
mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)!
mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_IssueTaskMgmt!!
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
mptbase: ioc0: WARNING - NOT READY WARNING!
mptbase: WARNING - (-1) Cannot recover ioc0
mptscsih: ioc0: WARNING - TaskMgmt HardReset FAILED!!
mptscsih: ioc0: task abort: FAILED (sc=ffff880a04bb3100)
mptscsih: ioc0: attempting task abort! (sc=ffff880a04bb2600)
sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 61 de 27 00 00 08 00
mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)!
mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_IssueTaskMgmt!!
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
[root@flanders ubuntu]#
Message from syslogd@mymachine at Nov 26 10:38:28 ...
 kernel:mpage_da_map_blocks block allocation failed for inode 11762 at logical offse...

Read more...

Luc Stepniewski (lstep) wrote :

I have the same problem, on a x3850 M2 (I think it's the same as the x3950 M2): When I do an "intensive" I/O like a "dd if=/dev/vg/mydev of=/dev/null", I will get the same errors. I have tried on Fedora 12, which gives also the same error :

Calgary: DMA error on CalIOC2 PHB 0x3
Calgary: 0x80000000@CSR 0x00000000@PLSSR 0xb0008000@CSMR 0x00000000@MCK
Calgary: 0x00000000@0x810 0x00000000@0x820 0x00000000@0x830 0x00000000@0x840 0x00000000@0x850 0x00000000@0x860 0x00000000@0x870
Calgary: 0x40000000@0xcb0
irq 46: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.31.5-127.fc12.x86_64 #1
Call Trace:
 <IRQ> [<ffffffff8109aefc>] __report_bad_irq+0x3d/0x8c
 [<ffffffff8109b063>] note_interrupt+0x118/0x17d
 [<ffffffff8109b6f2>] handle_fasteoi_irq+0xa1/0xc6
 [<ffffffff8101463c>] handle_irq+0x8b/0x93
 [<ffffffff8141e9cc>] do_IRQ+0x5c/0xbc
 [<ffffffff810126d3>] ret_from_intr+0x0/0x11
 <EOI> [<ffffffff8101907f>] ? mwait_idle+0x91/0xae
 [<ffffffff8101907f>] ? mwait_idle+0x91/0xae
 [<ffffffff81019021>] ? mwait_idle+0x33/0xae
 [<ffffffff8141d079>] ? atomic_notifier_call_chain+0x13/0x15
 [<ffffffff81010bb8>] ? enter_idle+0x25/0x27
 [<ffffffff81010c60>] ? cpu_idle+0xa6/0xe9
 [<ffffffff814145be>] ? start_secondary+0x1f3/0x234
handlers:
[<ffffffffa00e3d7e>] (mpt_interrupt+0x0/0x8bb [mptbase])
Disabling IRQ #46
mptscsih: ioc0: attempting task abort! (sc=ffff880a0d8fa400)
sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 fb 5b a7 00 00 60 00
mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)!
mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_IssueTaskMgmt!!
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
mptbase: ioc0: WARNING - NOT READY WARNING!
mptbase: WARNING - (-1) Cannot recover ioc0
mptscsih: ioc0: WARNING - TaskMgmt HardReset FAILED!!
mptscsih: ioc0: task abort: FAILED (sc=ffff880a0d8fa400)
mptscsih: ioc0: attempting task abort! (sc=ffff880a04bb3100)
sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 28 f5 ff 00 00 08 00
mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)!
mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_IssueTaskMgmt!!
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
mptbase: ioc0: WARNING - NOT READY WARNING!
mptbase: WARNING - (-1) Cannot recover ioc0
mptscsih: ioc0: WARNING - TaskMgmt HardReset FAILED!!
mptscsih: ioc0: task abort: FAILED (sc=ffff880a04bb3100)
mptscsih: ioc0: attempting task abort! (sc=ffff880a04bb2600)
sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 61 de 27 00 00 08 00
mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)!
mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_IssueTaskMgmt!!
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
[root@flanders fc12]#
Message from syslogd@flanders at Nov 26 10:38:28 ...
 kernel:mpage_da_map_blocks block allocation failed for inode 11762 at logical offset 2 with max blocks 1 with error -30

Message from syslogd@flanders at Nov 26 10:38:28 ...
 kernel:This should not happen.!! Data will be lost

Changed in linux (Fedora):
status: Unknown → Confirmed

I also had a SERIOUS problem installing Fedora 12 on a Dell Optiplex GX620 which *appears* to be the same thing.

I tried to install Fedora 12 on a Dell Optiplex GX620 as a 64-bit OS (x86_64), using BIOS revision A10. When the disk was busy installing things it suddenly hung with this message:
 kernel: mpage_da_map_blocks block allocation failed for inode 211 at logical offset 0 with max blocks 1 with error -30
 kernel: This should not happen.!! Data will be lost
On each boot, I needed to add kernel entry
 iommu=soft
Later I modified /boot/grub/grub.conf so all entries for kernel added:
 iommu=soft

Previously I had tried to upgrade the BIOS to rev. A11; this caused complete loss of the USB keyboard/mouse, so I re-installed revision A10.

I'm in the middle of an install now that the iommu=soft setting has been added; so far, this *appears* to have solved the problem.

Jeremy Foshee (jeremyfoshee) wrote :

Hi Jaume,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/lucid.

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 343749

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Jaume Sabater (jsabater) wrote :

Unfortunately, I cannot try with a newer distribution of Ubuntu, as it's a live database cluster. But I can confirm it's still happening with the newest revision of the kernel of Ubuntu 8.10. I am very sorry, but I am using PostgreSQL 8.3 in that cluster and newest Ubuntu version comes with PostgreSQL 8.4. Not sure how the upgrade would go and I don't want to use live machines as test machines.

We may rearrange a few things in that platform, so maybe in a few months I may have availability of one of those machines and could try this.

Did you check with the guys at Red Hat or with Upstream? I saw that you confirmed the bug.

thehighhat (thehighhat) wrote :

this is still an issue.

for tyan S7025WAGM2NR (S7025) which has LSI 1068E chipset.

if all drives (ssd,sata,sas,optical) are connected to the LSI controller, then the 10.04 installer cd cannot detect anything.

this is with the 64 bit alternative cd.

----

why are not _all_ kernel drivers compile at least as modules and ready for autodetect on the cd?

I notice that Redhat 6 Beta 2 has this in the release notes of known kernel problems:

Calgary IOMMU default detection has been disabled in this release. If you require Calgary IOMMU support add 'iommu=calgary' as a boot parameter.

So perhaps the new Enterprise kernel is now hitting this problem?

This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 12 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Jason Unrein (diabelek) wrote :

It looks like Jaume Sabater might be correct on his issue in post in #1. The problem he encountered appears to be a MSI problem. Some systems I've seen the past cause issues with adapters and so it must be disabled. Newer versions of the driver have the option to disable MSI (see http://lxr.free-electrons.com/source/drivers/message/fusion/mptbase.c#L96). You should be able to pass these options on boot/install as well (see https://help.ubuntu.com/8.04/installation-guide/i386/boot-parms.html)

Luc Stepniewski in post #2 appears to have a different issue since it only occurs during heavy load. This would suggest to me a problem between fw/driver. An upgrade of one at a time to the latest would help narrow down things. My guess though is that during heavy IO, the card locked up and couldn't be recovered by the driver.

If anyone has the issue where the card can't be initialized during driver load (either boot or by modprobe), you should try loading mptbase by "insmod /lib/modules/..../mptbase.ko mpt_msi_enable_sas=0

thehighhat are you able to reproduce the problem? the 1068E driver should be available in the kernel during boot. For some reason it isn't being detected or loaded. I know for a fact that a generic 1068e chip will load in ubuntu 10.04/10.10 so some logs would be needed. If you can get either dmesg or /var/log/messages from boot, that might help clue us in on the problem.

André Scholz (info-m2) wrote :

Yes, this is still an issue with 14.04 server.
I have the same problem.
Here what dmesg | tail says:

[ 201.820039] scsi 2:0:0:0: megasas: RESET cmd=12 retries=0
[ 201.820049] megasas: [ 0]waiting for 1 commands to complete
[ 202.824087] megaraid_sas: FW detected to be in faultstate, restarting it...
[ 205.836096] pcidata = ffffffff
[ 205.836104] mfi 1068 offset read=ffffffff
[ 207.844034] 1068 offset handshake read=ffffffff
[ 207.844042] megaraid_sas: FW restarted successfully,initiating next stage...
[ 207.844044] megaraid_sas: HBA recovery state machine,state 2 starting...
[ 237.964018] megasas: Waiting for FW to come to ready state
[ 237.964023] megasas: FW in FAULT state!!

André Scholz (info-m2) wrote :

Well, an addition:

it seems to work with kernel parameter iommu=soft

Thanks for that tipp, Jaume.

Gad Maor (gad-maor) wrote :

I can confirm this issue has not been resolved yet in Ubuntu 14.04 LTS latest ISO.

Tried the iommu=soft solution -> didn't work.

Tried to load every kernel module that seemed even remotely connected to the issue via the BusyBox tty -> no joy :(

This is a major issue, since it's hampering installation of Ubuntu Server on our servers.

I found out that the mfi module might fix this issue : http://manpages.ubuntu.com/manpages/trusty/man4/mfi.4freebsd.html
but for some reason it's not included in the Ubuntu Server 14.04 installation ISO.

Changed in linux (Fedora):
importance: Unknown → Critical
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.