Kernel Oops - unable to handle kernel NULL pointer dereference; EIP is at mptsas_probe_expander_phys+0x72/0x610 [mptsas]

Bug #205162 reported by Matthew L. Dailey
10
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Stefan Bader

Bug Description

Binary package hint: linux-source-2.6.24

Installing hardy alpha 6 on a Dell PowerEdge M600, the installer kernel panics when modprobing mptsas:

[ 176.439736] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000010 │
[ 176.448254] printing eip: f89cf622 *pde = 00000000 ──────────────────────┘
[ 176.453140] Oops: 0000 [#1] SMP
[ 176.456375] Modules linked in: sg mptsas mptscsih mptbase scsi_transport_sas af_packet sr_mod cdrom sd_mod rsrc_nonstatic pcmcia_core usbserial usbhid hid usbkbd fan usb_storage scsi_mod libusual ehci_hcd thermal evdev psmouse uhci_hcd bnx2 usbcore processor
[ 176.479455]
[ 176.480939] Pid: 10954, comm: modprobe Not tainted (2.6.24-12-generic #1)
[ 176.487704] EIP: 0060:[<f89cf622>] EFLAGS: 00010246 CPU: 7
[ 176.493174] EIP is at mptsas_probe_expander_phys+0x72/0x610 [mptsas]
[ 176.499508] EAX: 00000010 EBX: df993bc0 ECX: f6e29df4 EDX: 00000288
[ 176.505755] ESI: df900800 EDI: dfa5f800 EBP: 00000000 ESP: f6e29d34
[ 176.512001] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 176.517381] Process modprobe (pid: 10954, ti=f6e28000 task=f6819680 task.ti=f6e28000)
[ 176.525013] Stack: 0000ffff 61747300 00656c6c 00002aca f6e29df4 df900800 00000001 00000000
[ 176.533428] 00000000 00000000 00100100 00200200 00000000 00200200 ffff124f f89e0360
[ 176.541845] df900800 f7c8e000 00000000 70646f6d 65626f72 61747300 00656c6c 00002aca
[ 176.550265] Call Trace:
[ 176.552885] [<f89e0360>] mpt_timer_expired+0x0/0x70 [mptbase]
[ 176.558713] [<f89d17af>] mptsas_probe+0x3af/0x480 [mptsas]
[ 176.564279] [<c01d47c3>] sysfs_create_link+0x93/0x110
[ 176.569414] [<c0223526>] pci_device_probe+0x56/0x80
[ 176.574372] [<c027eab8>] driver_probe_device+0x88/0x190
[ 176.579680] [<c0212290>] kobject_uevent_env+0xf0/0x3d0
[ 176.584901] [<c027ed2e>] __driver_attach+0x9e/0xa0
[ 176.589774] [<c027deeb>] bus_for_each_dev+0x3b/0x60
[ 176.594733] [<c027e936>] driver_attach+0x16/0x20
[ 176.599431] [<c027ec90>] __driver_attach+0x0/0xa0
[ 176.604215] [<c027e26a>] bus_add_driver+0x8a/0x1e0
[ 176.606754] [<c02236d6>] __pci_register_driver+0x56/0x90
[ 176.611638] [<f884a0c2>] mptsas_init+0xc2/0xe1 [mptsas]
[ 176.617041] [<c01516c6>] sys_init_module+0x126/0x19c0
[ 176.622363] [<c0105442>] syscall_call+0x7/0xb
[ 176.622367] =======================
[ 176.622367] Code: 89 44 24 20 74 1e 8b 43 0c e8 8b aa 7b c7 89 d8 e8 84 aa 7b c7 8b 44 24 20 81 c4 8c 00 00 00 5b 5e 5f 5d c3 8b 43 0c 8b 4c 24 10 <0f> b7 00 89 01 8b 44 24 14 05 3c 05 00 00 89 44 24 18 e8 e7 8a
[ 176.622379] EIP: [<f89cf622>] mptsas_probe_expander_phys+0x72/0x610 [mptsas] SS:ESP 0068:f6e29d34
[ 176.622386] ---[ end trace d5327034c75fc0a7 ]---
[ 176.688271] Intel ISA PCIC probe: not found.

It looks like this is a known issue and is fixed in 2.6.25-rc5. See Bugzilla for details: http://bugzilla.kernel.org/show_bug.cgi?id=9909

I've attached the Bugzilla patch, just for additional info, but I haven't tried applying this to 2.6.24 myself to see if it works.

I'm guessing hardy is frozen at 2.6.24 - any chance of this fix being backported?

Originally reported as bug #204328, however I realized I filed it against linux-meta, rather than linux-source-2.6.24. I'll mark the other bug as a duplicate.

Tags: kernel-oops
Revision history for this message
Matthew L. Dailey (matthew-l-dailey) wrote :
Revision history for this message
Matthew L. Dailey (matthew-l-dailey) wrote :

A few more data points on this. I consistently get this oops on our new Dell PowerEdge M600, but do not get the oops on a Dell PowerEdge 1955. They are the same model card, but slightly different revisions:
Dell M600: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)
Dell 1955: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)

I applied the upstream patch to the 2.6.24-12 sources and re-compiled the module. The patch location was slightly different, but it patched and compiled cleanly. I then swapped in this patched module after the "Loading Additional Components" step, but before the "Configuring the Clock." With the patched module, the installer is able to load the module and the installation proceeds normally. Of course, the unpatched module gets installed in /target, so I have to swap it in there before the reboot so that the machine does not panic at boot. :-)

I'm not sure what the next steps are to help the kernel team get this patched, but I'm happy to provide any additional information and/or assistance. If this could make it into hardy final, we'd love to see this, since we're trying to move our servers to the LTS release.

I've attached the actual patch I used on the hardy linux 2.6.24-12 sources this morning.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Mathew,

Thanks for the upstream reference as well as testing. It's really helpful and very much appreciated. As you noted we're currently in Beta Freeze however the kernel isn't frozen just yet. I'll reassign to the kernel team and milestone it for Hardy. They'll have the final say as to whether this will make it in in time. I"m adding the upstream git commit id and description for them to reference. Thanks.

commit 51f39eae14b4874618e73281c236e3a1c1572d4d
Author: Krzysztof Oledzki <email address hidden>
Date: Tue Mar 4 14:56:23 2008 -0800

    [SCSI] mpt fusion: don't oops if NumPhys==0

    Don't oops if NumPhys==0, instead return -ENODEV.
    This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=9909

    Signed-off-by: Krzysztof Piotr Oledzki <email address hidden>
    Acked-by: Eric Moore <email address hidden>
    Signed-off-by: Andrew Morton <email address hidden>
    Signed-off-by: James Bottomley <email address hidden>

Changed in linux:
importance: Undecided → High
milestone: none → ubuntu-8.04
status: New → Triaged
Steve Langasek (vorlon)
Changed in linux:
assignee: nobody → stefan-bader-canonical
Stefan Bader (smb)
Changed in linux:
status: Triaged → In Progress
Stefan Bader (smb)
Changed in linux:
status: In Progress → Fix Committed
Revision history for this message
Matthew L. Dailey (matthew-l-dailey) wrote :

Glad to help. Thank for your fast action on this!

Revision history for this message
Stefan Bader (smb) wrote :

For reference:

commit 7bbdb35e0281f90c1ffeadbe051a1fd5b72bc9ac
Date: Tue Mar 25 15:58:54 2008 -0400

    UBUNTU: SCSI: mpt fusion: don't oops if NumPhys==0
    OriginalAuthor: Krzysztof Oledzki <email address hidden>
    Bug: #205162

Revision history for this message
Stefan Bader (smb) wrote :

This actually should be released with 2.6.24-13.

Changed in linux:
status: Fix Committed → Fix Released
Revision history for this message
Matthew L. Dailey (matthew-l-dailey) wrote :

I can confirm that this bug is not present in the current hardy kernel (2.6.24-15). I just did an install from scratch and there was no kernel oops loading the mptsas module.

I looked through the changelogs for 2.6.24-13 through -15 and didn't see this fix listed. No big deal, but I just wanted to mention it.

Thanks again!

Revision history for this message
Vik (vik-catalyst) wrote :

just got this with 2.6.24-20-generic:

[ 4990.258821] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000010
[ 4990.258828] printing eip: c031a034 *pde = 00000000
[ 4990.258832] Oops: 0000 [#1] SMP
[ 4990.258835] Modules linked in: usb_storage libusual tun ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack binfmt_misc rfcomm l2cap kvm_amd kvm ppdev powernow_k8 cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave container sbs sbshc dock iptable_filter ip_tables x_tables deflate zlib_deflate twofish twofish_common camellia serpent blowfish des_generic cbc ecb blkcipher xcbc sha256_generic sha1_generic crypto_null af_key nls_iso8859_1 nls_cp437 vfat fat ipv6 nfs lockd nfs_acl sunrpc aes_i586 dm_crypt dm_mod dm9601 usbnet mii loop parport_pc lp parport hci_usb bluetooth snd_hda_intel tifm_sd snd_usb_audio snd_pcm_oss snd_mixer_oss snd_seq_dummy mmc_core snd_pcm snd_seq_oss uvcvideo pcmcia snd_page_alloc snd_usb_lib compat_ioctl32 af_packet videodev joydev v4l1_compat snd_seq_midi nvidia(P) snd_rawmidi snd_seq_midi_event wlan_scan_sta ac v4l2_common agpgart snd_seq battery ath_rate_sample snd_hwdep snd_timer snd_seq_device tifm_7xx1 yenta_socket video ath_pci wlan serio_raw output i2c_nforce2 ath_hal(P) acer_acpi led_class shpchp snd psmouse tifm_core button rsrc_nonstatic pcmcia_core i2c_core pci_hotplug soundcore wmi_acer evdev k8temp pcspkr usbhid hid ext3 jbd mbcache sd_mod sg sr_mod cdrom sata_nv ata_generic pata_acpi pata_amd libata scsi_mod forcedeth ehci_hcd ohci_hcd usbcore thermal processor fan fbcon tileblit font bitblit softcursor fuse
[ 4990.258905]
[ 4990.258908] Pid: 10, comm: events/1 Tainted: P (2.6.24-20-generic #1)
[ 4990.258911] EIP: 0060:[<c031a034>] EFLAGS: 00010292 CPU: 1
[ 4990.258917] EIP is at klist_del+0x14/0x50
[ 4990.258919] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
[ 4990.258921] ESI: f082a900 EDI: f082a914 EBP: dfa7f6b8 ESP: f7c83f58
[ 4990.258923] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 4990.258926] Process events/1 (pid: 10, ti=f7c82000 task=f7c80000 task.ti=f7c82000)
[ 4990.258928] Stack: 00000000 f082a900 f7c0df00 c02803f5 00000000 f082a900 f7c0df00 f8c10c20
[ 4990.258933] f8c10c52 f082a8f0 f7c0df00 c013ce6f 00000000 000000ff 00000000 00000000
[ 4990.258938] f7c0df04 f7c0df0c f7c0df00 c013d910 f7c0df04 c013d994 00000000 f7c80000

my kbd dead - using kvkbd!

Vik :v)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.