cciss: hangs in modprobe, prevents server from booting

Bug #1435509 reported by Marius Gedminas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

I've upgraded an Ubuntu 10.04 LTS server to 12.04 LTS. It now fails to boot. More specifically, kernel 3.2.0-77 hangs in the cciss driver during initialization.

The hardware is an HP ProLiant DL385 G2. The RAID controller is HP Smart Array P600:

    lspci -knn -s 06:01
    06:01.0 RAID bus controller [0104]: Hewlett-Packard Company Smart Array P600 [103c:3220]
     Subsystem: Hewlett-Packard Company 3 Gb/s SAS RAID [103c:3225]
     Kernel driver in use: cciss
     Kernel modules: cciss

I can observe the boot process over serial console (attached: serial-console.log). You can see the kernel messages about cciss being detected, udev messages about modprobe timing out followed by boot process timeouts and a failure to mount the root filesystem.
You can also see hung task messages from the kernel (kworker/0:2 and modprobe are both hung inside cciss), and you can see some of my futile attempts to look around in the initramfs.

I can boot an older kernel (2.6.32-73-server, left over from Ubuntu 10.04 LTS) just fine. I'm afraid dmesg and other bits of information collected by apport for this bug report will be misleadingly from that older kernel.

===================== WORKAROUND =====================

Add 'pci=noioapicreroute' to the kernel command line.

===================== WORKAROUND =====================

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-77-generic 3.2.0-77.114
ProcVersionSignature: Ubuntu 2.6.32-73.141-server 2.6.32.63+drm33.26
Uname: Linux 2.6.32-73-server x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: aplay: device_list:252: no soundcards found...
ApportVersion: 2.0.1-0ubuntu17.8
Architecture: amd64
ArecordDevices: arecord: device_list:252: no soundcards found...
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
CurrentDmesg:
 [ 8.816593] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
 [ 19.233266] eth0: no IPv6 routers present
 [ 79.512524] Clocksource tsc unstable (delta = -264288670 ns)
Date: Mon Mar 23 19:07:30 2015
HibernationDevice: RESUME=UUID=06183660-d359-4e05-9f0f-941b61ae65cf
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.

 eth1 no wireless extensions.
MachineType: HP ProLiant DL385 G2
MarkForUpload: True
PciMultimedia:

ProcEnviron:
 LC_CTYPE=lt_LT.UTF-8
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 VGA16 VGA
ProcKernelCmdLine: root=UUID=3241700a-e0ca-4d26-8a7f-4fa088b98e75 ro console=tty0 console=ttyS1,115200n8
RelatedPackageVersions:
 linux-restricted-modules-2.6.32-73-server N/A
 linux-backports-modules-2.6.32-73-server N/A
 linux-firmware 1.79.18
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
UpgradeStatus: Upgraded to precise on 2015-03-20 (2 days ago)
dmi.bios.date: 04/07/2007
dmi.bios.vendor: HP
dmi.bios.version: A09
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrA09:bd04/07/2007:svnHP:pnProLiantDL385G2:pvr:cvnHP:ct23:cvr:
dmi.product.name: ProLiant DL385 G2
dmi.sys.vendor: HP

Revision history for this message
Marius Gedminas (mgedmin) wrote :
description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: lucid
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.0 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-rc5-vivid/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Marius Gedminas (mgedmin) wrote :
Download full text (4.8 KiB)

The same problem persists with kernel 3.4.0-030400-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-precise/ (which is the latest precise kernel in that PPA).

Bits of dmesg that mention cciss:

    [ 1.679701] HP CISS Driver (v 3.6.26)
    [ 1.681001] cciss 0000:06:01.0: PCI IRQ 41 -> rerouted to legacy IRQ 17
    [ 1.681807] cciss 0000:06:01.0: Controller reports max supported commands of 0, an obvious lie. Using 16. Ensure that firmware is up to date.
    [ 1.800107] cciss 0000:06:01.0: cciss0: <0x3220> at PCI 0000:06:01.0 IRQ 17 using DAC

Hung task messages:

    [ 241.260240] INFO: task modprobe:342 blocked for more than 120 seconds.
    [ 241.266958] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 241.273594] modprobe D 0000000000000001 0 342 164 0x00000004
    [ 241.280148] ffff88007af93ad8 0000000000000086 ffff88007af93fd8 00000000000137c0
    [ 241.286979] ffff88007af92010 00000000000137c0 00000000000137c0 00000000000137c0
    [ 241.293974] ffff88007af93fd8 00000000000137c0 ffff880037718000 ffff88003782adc0
    [ 241.293978] Call Trace:
    [ 241.293989] [<ffffffff8166ba29>] schedule+0x29/0x70
    [ 241.293993] [<ffffffff81669c7d>] schedule_timeout+0x1fd/0x2e0
    [ 241.293999] [<ffffffff810825aa>] ? task_rq_lock+0x5a/0xb0
    [ 241.294003] [<ffffffff8103ff59>] ? default_spin_lock_flags+0x9/0x10
    [ 241.294006] [<ffffffff8166b87b>] wait_for_common+0xdb/0x180
    [ 241.294009] [<ffffffff81086e20>] ? try_to_wake_up+0x2d0/0x2d0
    [ 241.294013] [<ffffffff81347170>] ? pci_bus_match+0x30/0x30
    [ 241.294016] [<ffffffff8166b9fd>] wait_for_completion+0x1d/0x20
    [ 241.294021] [<ffffffff8106e8ee>] work_on_cpu+0xce/0xf0
    [ 241.294023] [<ffffffff81347170>] ? pci_bus_match+0x30/0x30
    [ 241.294027] [<ffffffff813472ef>] __pci_device_probe+0xaf/0xf0
    [ 241.294031] [<ffffffff8140b299>] ? get_device+0x19/0x20
    [ 241.294034] [<ffffffff8134853a>] pci_device_probe+0x3a/0x60
    [ 241.294038] [<ffffffff8140f2a8>] really_probe+0x68/0x200
    [ 241.294041] [<ffffffff8140f485>] driver_probe_device+0x45/0x70
    [ 241.294044] [<ffffffff8140f54b>] __driver_attach+0x9b/0xa0
    [ 241.294047] [<ffffffff8140f4b0>] ? driver_probe_device+0x70/0x70
    [ 241.294050] [<ffffffff8140d7c8>] bus_for_each_dev+0x68/0x90
    [ 241.294053] [<ffffffff8140f0ee>] driver_attach+0x1e/0x20
    [ 241.294056] [<ffffffff8140ea20>] bus_add_driver+0xd0/0x270
    [ 241.294060] [<ffffffffa0051000>] ? 0xffffffffa0050fff
    [ 241.294062] [<ffffffffa0051000>] ? 0xffffffffa0050fff
    [ 241.294065] [<ffffffff8140fbf0>] driver_register+0x80/0x150
    [ 241.294068] [<ffffffffa0051000>] ? 0xffffffffa0050fff
    [ 241.294070] [<ffffffff813487c6>] __pci_register_driver+0x56/0xd0
    [ 241.294078] [<ffffffffa0051088>] cciss_init+0x88/0x1000 [cciss]
    [ 241.294082] [<ffffffff81002042>] do_one_initcall+0x42/0x180
    [ 241.294087] [<ffffffff810b655c>] sys_init_module+0xcc/0x220
    [ 241.294091] [<ffffffff81674e69>] system_call_fastpath+0x16/0x1b
    [ 241.294094] INFO: task work_for_cpu:358 blocked for more than 120 seconds.
    [ ...

Read more...

Revision history for this message
Marius Gedminas (mgedmin) wrote :
Download full text (5.1 KiB)

The same problem persists with kernel 4.0.0-040000rc5-generic.

dmesg bits:

    [ 1.745316] HP CISS Driver (v 3.6.26)
    [ 1.746427] cciss 0000:06:01.0: PCI IRQ 41 -> rerouted to legacy IRQ 17
    [ 1.747134] cciss 0000:06:01.0: Controller reports max supported commands of 0, an obvious lie. Using 16. Ensure that firmware is up to date.
...
    [ 2.061355] cciss 0000:06:01.0: cciss0: <0x3220> at PCI 0000:06:01.0 IRQ 27 using DAC
...
    [ 240.416185] INFO: task modprobe:353 blocked for more than 120 seconds.
    [ 240.422489] Not tainted 4.0.0-040000rc5-generic #201503230035
    [ 240.429152] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 240.435587] modprobe D ffff880037283a18 0 353 135 0x00000004
    [ 240.441897] ffff880037283a18 0000000000000008 ffff8800cf958000 ffff88007d069e00
    [ 240.448564] ffff880036910a00 ffff880037283a78 ffff880037283fd8 7fffffffffffffff
    [ 240.454923] ffff880037283b88 ffff880036910a00 0000000000000000 ffff880037283a38
    [ 240.461449] Call Trace:
    [ 240.467119] [<ffffffff817e8567>] schedule+0x37/0x90
    [ 240.473177] [<ffffffff817eb215>] schedule_timeout+0x1b5/0x210
    [ 240.479751] [<ffffffff810a78a5>] ? try_to_wake_up+0x215/0x2a0
    [ 240.486058] [<ffffffff8104afbb>] ? native_smp_send_reschedule+0x4b/0x60
    [ 240.492338] [<ffffffff817e9918>] wait_for_completion+0xa8/0x170
    [ 240.498443] [<ffffffff810a7930>] ? try_to_wake_up+0x2a0/0x2a0
    [ 240.504617] [<ffffffff81093859>] flush_work+0x29/0x40
    [ 240.510452] [<ffffffff8108f860>] ? worker_detach_from_pool+0xd0/0xd0
    [ 240.516900] [<ffffffff810938d8>] work_on_cpu+0x68/0x70
    [ 240.522786] [<ffffffff8108f270>] ? workqueue_congested+0x80/0x80
    [ 240.528734] [<ffffffff81401940>] ? pci_device_shutdown+0x90/0x90
    [ 240.534632] [<ffffffff81402c42>] __pci_device_probe+0xe2/0xf0
    [ 240.540620] [<ffffffff81402c8a>] pci_device_probe+0x3a/0x60
    [ 240.546675] [<ffffffff8150dfd1>] really_probe+0x91/0x380
    [ 240.552392] [<ffffffff8150e447>] driver_probe_device+0x47/0xa0
    [ 240.557861] [<ffffffff8150e54b>] __driver_attach+0xab/0xb0
    [ 240.563487] [<ffffffff8150e4a0>] ? driver_probe_device+0xa0/0xa0
    [ 240.569119] [<ffffffff8150c32e>] bus_for_each_dev+0x5e/0x90
    [ 240.574618] [<ffffffff8150db7e>] driver_attach+0x1e/0x20
    [ 240.580199] [<ffffffff8150d704>] bus_add_driver+0x124/0x260
    [ 240.586065] [<ffffffff8150ee24>] driver_register+0x64/0xf0
    [ 240.591450] [<ffffffff81401d0c>] __pci_register_driver+0x4c/0x50
    [ 240.597030] [<ffffffffc011307d>] cciss_init+0x7d/0x1000 [cciss]
    [ 240.602153] [<ffffffffc0113000>] ? 0xffffffffc0113000
    [ 240.607056] [<ffffffff81002157>] do_one_initcall+0xd7/0x200
    [ 240.612103] [<ffffffff811c246d>] ? __vunmap+0xad/0xf0
    [ 240.616746] [<ffffffff811de13d>] ? kmem_cache_alloc_trace+0x19d/0x210
    [ 240.621400] [<ffffffff817d59c7>] do_init_module+0x61/0x1ce
    [ 240.626026] [<ffffffff8110059e>] load_module+0x49e/0x5f0
    [ 240.630361] [<ffffffff810fd600>] ? show_initstate+0x50/0x50
    [ 240.634677] [<ffffffff811007a4>] SyS_init_module...

Read more...

tags: added: kernel-bug-exists-upstream
Revision history for this message
Marius Gedminas (mgedmin) wrote :

Workaround: adding 'pci=noioapicreroute' to the kernel command line makes the 3.2.0-79-generic kernel boot on this hardware.

dmesg bits:

    [ 1.511457] HP CISS Driver (v 3.6.26)
    [ 1.512497] cciss 0000:06:01.0: PCI INT A -> GSI 41 (level, low) -> IRQ 41
    [ 1.513773] cciss 0000:06:01.0: Controller reports max supported commands of 0, an obvious lie. Using 16. Ensure that firmware is up to date.
...
    [ 1.712112] cciss 0000:06:01.0: cciss0: <0x3220> at PCI 0000:06:01.0 IRQ 41 using DAC
    [ 1.731161] cciss/c0d0: p1 p2 < p5 p6 p7 p8 >
    [ 1.732443] scsi2 : cciss

Revision history for this message
Marius Gedminas (mgedmin) wrote :

I'm hesitating to link to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=650119 because it seems like a subtly different bug (also one that was fixed in upstream 3.2.0), but it probably has emails of people who understand CCISS kernel code.

description: updated
description: updated
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.