[feisty][linux-image-2.6.19-7] PCMCIA bridge driver i82365 causes "BUG: soft lockup detected on CPU#0!" on Asus motherboards

Bug #72895 reported by Allcolor-g
12
Affects Status Importance Assigned to Milestone
linux-source-2.6.20 (Ubuntu)
Fix Released
Medium
Ben Collins

Bug Description

Binary package hint: linux-image-2.6.19-6-generic

I'm unable to boot up with the new kernel 2.6.19-6 on feisty. After linux detected my lvm partition "BUG: soft lockup detected on CPU#0!" appears on the screen an nothing happen next.

I'm able to boot with 2.6.17-10-generic which does not show this problem.

Also since the update of mdadm in either kernel I have the message: mdadm: no array defined in configuration file. (I have no mdadm defined that's true, I only use LVM)... if by mistake I let my ipod on the usb plug and power on the computer, the boot take 5/10 minutes and stay freezed on this message.

Revision history for this message
Allcolor-g (allcolor) wrote :

This bug that affect my system is not related on bug 63418 https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.19/+bug/63418

The solution there is related to a problem with wifi what I've not.

My system is an amd athlon 2600+ 1GB ram with a geforcefx 5700. I have a bt848 tv card installed. 2 hdd of 120 gb merged as one big partition with lvm.

Again no problem with 2.6.17-10-generic but neither 2.6.19-5 or 2.6.19-6 boot... they fail at the same moment with the same message... so seems definitely related with 2.6.19 kernel.

Revision history for this message
Allcolor-g (allcolor) wrote :

So for the mdadm problem, removing "/usr/share/initramfs-tools/hooks/mdadm" solve the problem, no more warning on boot nor on update-initramfs. Still the "BUG: soft lockup detected on CPU#0!" on 2.6.19-6 is occuring. I'll try compiling a custom 2.6.19 to see if it also occurs.

Revision history for this message
Allcolor-g (allcolor) wrote :

I've installed "linux-source-2.6.19 - Linux kernel source for version 2.6.19 with Ubuntu patches" and compiled it with the config file of my working 2.6.17-10... but no luck it hang with the same message at the same place. I've noticed that the ubuntu package is based on the 2.6.19-rc6 kernel. So I've downloaded the vanilla 2.6.18 kernel, applied the 2.6.19-rc6 kernel patch and compiled it with the same config file as above... and it worked, it boot the system no more "soft lockup". So it seems the problem lies in one of the ubuntu patch to the kernel. While I don't know what I can do to help to pinpoint what change between the vanilla 2.6.19-rc6 and the ubuntu 2.6.19-rc6, I can give more details as needed.

Revision history for this message
Allcolor-g (allcolor) wrote :

So for the system details:

Athlon 2600+, Motherboard: Asus A7N8X-X
1GB Ram
2 HDD 120 GB unified in one big / partition named /dev/mapper/Ubuntu-root with LVM
1 TV card BT878
Geforce fx5700

Revision history for this message
Randakar (randakar) wrote :

I can confirm this one. Same thing happening on my box.

Hardware:

AMD Sempron Processor 3400+
Asus mobo K8VXSE
NVidia GeForce FX 5200

1Gb ram
No LVM
No TV card.

In all, pretty similar to the original reporter's hardware.

More importantly the lockup appears to occur directly after the kernel loads the (binary) NVidia driver. (ugh ..)

Tried booting with noapic and pci=routeirq, neither helps.

I still have to try pci=noacpi and acpi=off, if those don't help I'll remove the restricted-modules package to see if it makes a difference.

Unfortunately I can't seem to get to the stage where disks are mounted, otherwise I'd be able to post a stacktrace.

Revision history for this message
Randakar (randakar) wrote :

Ok,

pci=noacpi and acpi=off don't fix the issue. Instead, 'acpi=off' turns the soft lockup into a full-blown kernel panic.

Removing the nvidia stuff now. Not expecting much of it since the 'tainted' flag wasn't set during that kernel panic, suggesting that it hadn't loaded yet.

Nonetheless ..

Revision history for this message
Randakar (randakar) wrote :

Alright, here comes a stack trace:

This trace appears right after it attempts to load the i82365 module. (some kind of pcmcia bridge device, I gather.)

Be aware this was typed in by hand from a blurry photo, zeroes might be 8's and vice-versa especially near the bottom.

===================
BUG: soft lockup detected on CPU#0!
 [<c0147b2>] dump_trace+0x192/0x1c0
 [<c01047f8>] show_trace_log_lvl+0x18/0x30
 [<c0104ecf>] show_trace+0xf/0x20
 [<c0105035>] dump_stack+0x15/0x20
 [<c01521eb>] softlockup_tick+0x9b/0xe0
 [<c012e481>] update_process_times+0x31/0x80
 [<c011501e>] smp_apic_timer_interrupt+0x8e/0xb0
 [<c010420f>] apic_timer_interrupt+0x1f/0x30
 [<c02ebe51>] _spin_lock_irqsave+0x11/0x30
 [<c02ebc69>] __down+0x46/0xf3
 [<c02eba87>] __down_failed+0x7/0x10
 [<c025601c>] device_attach+0x2c/0x80
 [<c0255226>] bus_attach_device+0x26/0x60
 [<c025426d>] device_add+0x37d/0x4d0
 [<c0257c25>] platform_device_add+0xf7/0x150
 [<f0069d0f>] init_i82365+0x2f/0x47c [i82365]
 [<c014282b>] sys_init_module+0x15b/0x1c70
 [<c010310d>] sysenter_past_esp+0x56/0x79
 [<b7f2c410>] 0xb7f2c410
===================

Revision history for this message
Randakar (randakar) wrote :

Correction, I spotted a typo in there ;-)

===
BUG: soft lockup detected on CPU#0!
 [<c01047b2>] dump_trace+0x192/0x1c0
 [<c01047f8>] show_trace_log_lvl+0x18/0x30
 [<c0104ecf>] show_trace+0xf/0x20
 [<c0105035>] dump_stack+0x15/0x20
 [<c01521eb>] softlockup_tick+0x9b/0xe0
 [<c012e481>] update_process_times+0x31/0x80
 [<c011501e>] smp_apic_timer_interrupt+0x8e/0xb0
 [<c010420f>] apic_timer_interrupt+0x1f/0x30
 [<c02ebe51>] _spin_lock_irqsave+0x11/0x30
 [<c02ebc69>] __down+0x46/0xf3
 [<c02eba87>] __down_failed+0x7/0x10
 [<c025601c>] device_attach+0x2c/0x80
 [<c0255226>] bus_attach_device+0x26/0x60
 [<c025426d>] device_add+0x37d/0x4d0
 [<c0257c25>] platform_device_add+0xf7/0x150
 [<f0069d0f>] init_i82365+0x2f/0x47c [i82365]
 [<c014282b>] sys_init_module+0x15b/0x1c70
 [<c010310d>] sysenter_past_esp+0x56/0x79
 [<b7f2c410>] 0xb7f2c410
===================

Revision history for this message
Randakar (randakar) wrote :

To continue this barrage of comments, I removed the nvidia binary driver package before I made that trace as well - to no effect, obviously.

Revision history for this message
Randakar (randakar) wrote :
Revision history for this message
Allcolor-g (allcolor) wrote :

Installed linux-image-2.6.19-7 this morning and it is able to boot.

So the problem seems resolved (for me ;)

Revision history for this message
Randakar (randakar) wrote :

No such luck for me. Still getting the lockup on 2.6.19-7.

Revision history for this message
Allcolor-g (allcolor) wrote :

Maybe this was not 2.6.19-7 who solved the problem.

Yesterday I have removed the two symlink related to pcmcia and pcmciautils in /etc/rcS.d/ (S13 and S40). Pcmciautils is launched before lvm. I had not tried to boot 2.6.19-6 with that because I had 2.6.19-rc6 vanilla who did boot. So if you could try remove these two symlink and see if it boot. It's just an idea, it is surely not related but beside the update to 2.6.19-7 it is the only modification I've made... as I have removed 2.6.19-6 I can't test to see if either pcmcia was the issue or 2.6.19-6 (and if it is then in my case last update did resolve the problem).

Revision history for this message
Randakar (randakar) wrote :

S13 is pcmciautils alright, but S40 is S40networking. It doesn't sound like a good idea to remove that to me.

But I'll try this.

Revision history for this message
Randakar (randakar) wrote :

Removed S13pcmciautils, and now it boots!

You were dead right on that one.

It also seems to me that the bug here is definitely in either that i82365 PCMCIA bridge driver module, or in the code that attempts to find PCMCIA devices.

My guess is the latter.
One of the things I noticed was that this system doesn't even *have* an i82365 pci device when I look at the lspci output. No surprise there when I apply hindsight..

Here's what the S13pcmciautils script does (and note the kernel never reaches the end of this script):

---
        log_daemon_msg "Loading PCMCIA bridge driver module" "$PCIC"

        if [ "$CORE_OPTS" ]; then
            modprobe -Qb pcmcia_core $CORE_OPTS
        fi

        modprobe -Qb $PCIC $PCIC_OPTS

        log_end_msg $?
----

Revision history for this message
Randakar (randakar) wrote :

I can now confirm that running '/etc/init.d/pcmciautils start' as root after bootup hangs the kernel.

That script sources /etc/default/pcmciautils which lists what hardware it thinks I have:

------
root@claire:~# cat /etc/default/pcmciautils
# Defaults for PCMCIA (sourced by /etc/init.d/pcmcia)
PCMCIA=yes
PCIC=i82365
PCIC_OPTS=
CORE_OPTS=
CARDMGR_OPTS=
# If REFRAIN_FROM_IFUP is set to yes, cardmgr will not bring up
# network interfaces. They should be brought up by hotplug instead.
REFRAIN_FROM_IFUP=yes
-----

Presumably removing the S13pcmciautils link is overkill, just setting PCMCIA to 'no' would be enough to stop that module from loading.

I wonder why it is set to this in the first place, I surely don't have any PCMCIA hardware ..

All that having been said though, I think it has been shown abundantly now that loading the i82365 module causes the lockup.

Revision history for this message
Allcolor-g (allcolor) wrote :

Great, did as you said and put PCMCIA=no and put back the startup symlink and it still boot.

Revision history for this message
Randakar (randakar) wrote :

Allcolor_g, can you try running a vanilla 2.6.19 kernel and then issuing the modprobe by hand? I wonder if this affects upstream.

On paper all that is needed to trigger the bug would be:

modprobe pcmcia_core
modprobe i82365

Revision history for this message
Allcolor-g (allcolor) wrote :

modprobe pcmcia_core is working ok.
modprobe i82365 is not working, "error inserting module, no such device", note that the i82365.ko file exists. "no such device" seems a correct message since I do not have any pcmcia device and/or controller.

So the vanilla kernel does not hang.

Revision history for this message
Dave Wickham (dave.wickham) wrote :

I thought I'd just add that I also got this issue, and the workaround mentioned (changing PCMCIA=yes to =no) worked, however I'm not using an ASUS motherboard - I'm using an ASRock K7S8X (details: http://www.asrock.com/product/K7S8X.htm ).

Revision history for this message
Jerome Haltom (wasabi) wrote :

The stacktrace occurs right after the kernel message:

pcmcia bridge drive module i82365

Pretty easy to see. ;)

I am having the same issue, also with an ASUS.

Changed in linux-source-2.6.19:
assignee: nobody → ben-collins
importance: Undecided → Medium
status: Unconfirmed → Fix Released
Revision history for this message
Randakar (randakar) wrote :

Ben,
The new kernel did NOT fix this bug.

Running package version 2.6.19-7.11, tested it as follows:

modprobe i82365
....
BUG: soft lockup detected on CPU#0!
*hang*

Revision history for this message
Hervé Fache (rvfh) wrote :

I shall test the work-around tonight. My computer is a Shuttle with nForce2 chipset (someone reported the bug on a Via chipset too above), GeForce 6 graphics card, and obviously no PCMCIA device.

Revision history for this message
Randakar (randakar) wrote :

That teaches me to speak too soon ;-)

2.6.20 landed. Trying to reproduce there, yields:

---
root@claire:~# modprobe i82365
FATAL: Error inserting i82365 (/lib/modules/2.6.20-2-generic/kernel/drivers/pcmcia/i82365.ko): No such device
---

As it should.
Consider this bug swatted.

Revision history for this message
Ben Collins (ben-collins) wrote :

There is no bug. What you have is filesystem corruption, and you need to fix that, else you will likely see this problem again.

Just wanted to make sure you know what the real cause of this is.

Changed in linux-source-2.6.20:
status: Fix Released → Rejected
status: Rejected → Fix Released
Revision history for this message
Allcolor-g (allcolor) wrote :

To whom are you talking and about what problem ? The fix works till 2.6.20 and it is in *no way* related to file system corruption. Anyway it is corrected so no bother.

Revision history for this message
Allcolor-g (allcolor) wrote :

What I mean is that I've run an fsck and it doesn't find any corruptions/inconsitencies.

Also 2.6.17-10, vanilla 2.6.19-rc6, vanilla 2.6.19 boot up fine. A 2.6.19-6 compiled from ubuntu source does not boot. 2.6.19-6 and -7 from ubuntu does not boot. 2.6.20-2 does boot.

So if you are right, you should open a bug against fsck which does not find the corruption and give the diff between the ubuntu i82365 and the vanilla i82365 that catch this corruption to the fsck dev. I'm worried to have a corrupt filesystem which fsck tell it's fine.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.