mpt2sas not loaded with LSISAS2008 card in trusty 14.04.1

Bug #1363313 reported by Paul Johnson on 2014-08-30
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
linux (Ubuntu)
Medium
Unassigned
Trusty
Medium
Unassigned
Utopic
Medium
Unassigned

Bug Description

This card has worked without a problem in 12.04 up through the current kernel. After an upgrade to 14.04.1, mpt2sas tries to load during boot but fails. An apport file from this machine is attached; note the error messages during boot.

WORKAROUND: pci=realloc=off

---
ApportVersion: 2.14.1-0ubuntu3.3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: admsvr 3757 F.... pulseaudio
CurrentDesktop: XFCE
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=7192beea-6a28-4c3f-b274-3332e6d007b7
InstallationDate: Installed on 2010-09-14 (1445 days ago)
InstallationMedia: Ubuntu-Server 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
MachineType: Gigabyte Technology Co., Ltd. P55-UD3L
NonfreeKernelModules: zfs zunicode zavl zcommon znvpair
Package: linux (not installed)
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-35-generic root=UUID=46ac1ed7-062a-4ae3-b071-52ccee0cc346 ro splash nomdmonddf nomdmonisw nomdmonddf nomdmonisw vt.handoff=7
ProcVersionSignature: Ubuntu 3.13.0-35.62-generic 3.13.11.6
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-35-generic N/A
 linux-backports-modules-3.13.0-35-generic N/A
 linux-firmware 1.127.5
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
Uname: Linux 3.13.0-35-generic x86_64
UpgradeStatus: Upgraded to trusty on 2014-08-23 (6 days ago)
UserGroups: adm admin cdrom dialout lpadmin lpadmin mythtv plugdev sambashare vboxusers
_MarkForUpload: True
dmi.bios.date: 06/23/2010
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F7
dmi.board.name: P55-UD3L
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF7:bd06/23/2010:svnGigabyteTechnologyCo.,Ltd.:pnP55-UD3L:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnP55-UD3L:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: P55-UD3L
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1363313

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty

apport information

tags: added: apport-collected
description: updated

apport information

Paul Johnson (pjay) wrote : CRDA.txt

apport information

apport information

Paul Johnson (pjay) wrote : Lspci.txt

apport information

Paul Johnson (pjay) wrote : Lsusb.txt

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed

Paul Johnson, thank you for reporting this and helping make Ubuntu better. Could you please test the latest upstream kernel available from the very top line at the top of the page (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested exactly shown as:
kernel-fixed-upstream-3.17-rc2

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description.

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: regression-release
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Paul Johnson (pjay) wrote :

I tried kernel 3.17.0-031700rc2 and it did not work. These are the kern msgs-

Aug 30 08:16:55 ubu1004 kernel: [ 7.415334] mpt2sas version 16.100.00.00 loaded
Aug 30 08:16:55 ubu1004 kernel: [ 11.535205] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (16424848 kB)
Aug 30 08:16:55 ubu1004 kernel: [ 11.597000] mpt2sas 0000:01:00.0: irq 31 for MSI/MSI-X
Aug 30 08:16:55 ubu1004 kernel: [ 11.597017] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 31
Aug 30 08:16:55 ubu1004 kernel: [ 11.658384] mpt2sas0: iomem(0x00000000efec0000), mapped(0xffffc90005e88000), size(16384)
Aug 30 08:16:55 ubu1004 kernel: [ 11.720734] mpt2sas0: ioport(0x000000000000ce00), size(256)
Aug 30 08:16:55 ubu1004 kernel: [ 11.782645] mpt2sas0: doorbell is in use (line=3035)
Aug 30 08:16:55 ubu1004 kernel: [ 11.843966] mpt2sas0: _base_get_ioc_facts: handshake failed (r=-14)
Aug 30 08:16:55 ubu1004 kernel: [ 11.905177] mpt2sas0: sending diag reset !!
Aug 30 08:16:55 ubu1004 kernel: [ 12.124300] mpt2sas0: diag reset: FAILED
Aug 30 08:16:55 ubu1004 kernel: [ 12.184756] mpt2sas0: failure at /home/apw/COD/linux/drivers/scsi/mpt2sas/mpt2sas_scsih.c:8236/_scsih_probe()!

tags: added: kernel-bug-exists-upstream kernel-bug-exists-upstream-3.17.0-031700rc2
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream-3.17-rc2
removed: kernel-bug-exists-upstream-3.17.0-031700rc2

Paul Johnson, the next step is to fully commit bisect the kernel in order to identify the offending commit. Could you please do this following https://wiki.ubuntu.com/Kernel/KernelBisection ?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Paul Johnson (pjay) wrote :

mpt2sas loads in 3.3.7 precise and 3.3.8 quantal kernels.
It does not load in 3.4.1 quantal, nor any thereafter I happened to check.
Chasing down the exact code problem isn't something I'm set up to easily do.
If I can get some help there it would be appreciated.

These are the 3.3.8 messages when it succeeded -
[ 3.381754] mpt2sas version 12.100.00.00 loaded
[ 3.382155] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (16427124 kB)
[ 3.382242] mpt2sas 0000:01:00.0: irq 47 for MSI/MSI-X
[ 3.382260] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 47
[ 3.382263] mpt2sas0: iomem(0x00000000fbdfc000), mapped(0xffffc900065e0000), size(16384)
[ 3.382267] mpt2sas0: ioport(0x000000000000ce00), size(256)
[ 3.666443] mpt2sas0: sending diag reset !!
[ 4.799390] mpt2sas0: diag reset: SUCCESS
[ 4.944914] mpt2sas0: Allocated physical memory: size(3379 kB)
[ 4.944918] mpt2sas0: Current Controller Queue Depth(1481), Max Controller Queue Depth(1720)
[ 4.944921] mpt2sas0: Scatter Gather Elements per IO(128)
[ 5.174963] mpt2sas0: LSISAS2008: FWVersion(17.00.01.00), ChipRevision(0x03), BiosVersion(07.33.00.00)
[ 5.174967] mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[ 5.175105] mpt2sas0: sending port enable !!
[ 6.697688] mpt2sas0: host_add: handle(0x0001), sas_addr(0x500605b001bdc971), phys(8)
[ 12.562718] mpt2sas0: port enable: SUCCESS

and these are the 3.4.1 messages when it failed -
 [ 3.249035] mpt2sas version 12.100.00.00 loaded
[ 3.249277] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (16426944 kB)
[ 3.249335] mpt2sas 0000:01:00.0: irq 46 for MSI/MSI-X
[ 3.249352] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 46
[ 3.249355] mpt2sas0: iomem(0x00000000eff40000), mapped(0xffffc900065f8000), size(16384)
[ 3.249358] mpt2sas0: ioport(0x000000000000ce00), size(256)
[ 3.249385] mpt2sas0: doorbell is in use (line=3015)
[ 3.249387] mpt2sas0: _base_get_ioc_facts: handshake failed (r=-14)
[ 3.249390] mpt2sas0: sending diag reset !!
[ 3.407606] mpt2sas0: diag reset: FAILED
[ 3.407675] mpt2sas0: failure at /home/apw/COD/linux/drivers/scsi/mpt2sas/mpt2sas_scsih.c:8038/_scsih_probe()!

Changed in linux (Ubuntu):
status: Incomplete → Confirmed

Paul Johnson, now that you have bisected the kernel versions, the next step is to fully commit bisect between these two versions following https://wiki.ubuntu.com/Kernel/KernelBisection . Please mark this Confirmed when the commit number has been identified.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Paul Johnson (pjay) wrote :

The bisect log information is in the attached file.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: performing-bisect
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the CONFIG_PCI_REALLOC_ENABLE_AUTO config option not set. Commit 49cc9a18 only enables this config option. If this does not fix the bug, we will need to perform the bisect again, and bisect between the upstream versions versus the Ubuntu versions.

Joseph Salisbury (jsalisbury) wrote :

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1363313/

Joseph Salisbury (jsalisbury) wrote :

Be sure to install both the linux-image and linux-image-extra .deb packages.

Paul Johnson (pjay) wrote :

Tested your kernel and it worked!

Travis Read (lordfolken) wrote :

I had the exact same error and the provided test kernel resolved it for me also.

Joseph Salisbury (jsalisbury) wrote :

Just to confirm the value of that option, can you run the following command while running the test kernel:

cat /boot/config-$(uname -r) | grep CONFIG_PCI_REALLOC_ENABLE_AUTO

Joseph Salisbury (jsalisbury) wrote :

I built an additional test kernel with just CONFIG_PCI_REALLOC_ENABLE_AUTO re-enabled. It consists of the following two .deb packages:

linux-image-3.13.0-36-generic_3.13.0-36.63~lp1363313v2_amd64.deb
linux-image-extra-3.13.0-36-generic_3.13.0-36.63~lp1363313v2_amd6

It can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1363313/

Can you install this test kernel and confirm that the bug comes back?

Paul Johnson (pjay) wrote :

Hi Joe,
With the v1 kernel--
cat /boot/config-$(uname -r) | grep CONFIG_PCI_REALLOC_ENABLE_AUTO
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set

with the v2--
 cat /boot/config-$(uname -r) | grep CONFIG_PCI_REALLOC_ENABLE_AUTO
CONFIG_PCI_REALLOC_ENABLE_AUTO=y

And yes, the problem is back with v2. Let me know if there is anything I can do to help. - Paul

tags: added: utopic
Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

A similar issue was discussed upstream[0], but a permanent fix has not been implemented as of yet. A similar bug was also opened bug 1245938

I'll ping upstream regarding this issue.

[0] https://lkml.org/lkml/2014/1/10/401

Joseph Salisbury (jsalisbury) wrote :

Also, a workaround found in the other bug was to add the kernel parameter: pci=realloc=off

tags: added: kernel-da-key
removed: performing-bisect
Paul Johnson (pjay) wrote :

Thanks Joseph. The regular release kernel in trusty works for me with pci=realloc=off in the command line.

Joseph Salisbury (jsalisbury) wrote :

We would like to gather some debug information for upstream. Can you boot with the following kernel option pci=earlydump

Then post the kernel messages in dmesg or /var/syslog? This should be done 'without' the pci=realloc=off parameter, so the error 'does' happen.

Paul Johnson (pjay) wrote :

The dmesg file with pci=earlydump is attached.

Joseph Salisbury (jsalisbury) wrote :

Upstream has requested some additional data to help resolve this issue[0]. They would also like us to open an upstream bug for additional tracking.

For regressions, it's helpful if you can attach dmesg logs from working and non-working kernels that are as close together as possible. Is it possible to collect the additional dmesg output?

[0] https://lkml.org/lkml/2014/9/25/432

Paul Johnson (pjay) wrote :

Yes, I can get you the dmesg files working & non-working on whichever kernels needed. Since this didn't look like a quick fix, I have a separate install on the machine that is just a disposable desktop version. That is much easier for me to make changes, but it does, of course still require a boot out of the server. The server is a production machine, though it's internal and not heavily used, so I can get some time on it most days.

Would you take care of opening the upstream bug?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Changed in linux (Ubuntu Trusty):
status: Confirmed → Incomplete
Changed in linux (Ubuntu Utopic):
status: Confirmed → Incomplete
Paul Johnson (pjay) wrote :

This bug looks like it will auto-expire. It is confirmed on trusty.

description: updated

Paul Johnson, the next step would be to test to this issue on the latest mainline kernel via 3.19-rc2. If reproducible, one would want to gather the information requested by upstream in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1363313/comments/35 using that kernel. Then, you would want to personally report this via bugzilla as specifically requested by upstream following https://wiki.ubuntu.com/Bugs/Upstream/kernel . Please feel free to post the URL to this new report here once made so it may be tracked.

tags: added: bisect-done
tags: added: kernel-bug-exists-upstream-3.19-rc2
removed: kernel-bug-exists-upstream-3.17-rc2
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Changed in linux (Ubuntu Utopic):
status: Incomplete → Triaged
Changed in linux (Ubuntu Trusty):
status: Incomplete → Triaged
Rolf Leggewie (r0lf) wrote :

utopic has seen the end of its life and is no longer receiving any updates. Marking the utopic task for this ticket as "Won't Fix".

Changed in linux (Ubuntu Utopic):
status: Triaged → Won't Fix
Bjorn Helgaas (bjorn-helgaas) wrote :

https://bugzilla.kernel.org/show_bug.cgi?id=92351#c1 reports the problem occurs with LSI FW 18.
https://bugzilla.kernel.org/show_bug.cgi?id=92351#c8 reports the problem does NOT occur with LSI FW 19. So upgrading the adapter firmware to version 19 may be a fix.

There is a patch at https://bugzilla.kernel.org/show_bug.cgi?id=92351#c7, but it has not been tested and has not been merged upstream. This fix would be a possible workaround for systems with LSI FW 18. My understanding is that it would not be necessary with LSI FW 19.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.