BUG: soft lockup - CPU#0 stuck for 61s! [events/0:6]

Bug #556350 reported by linker7474
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

I have installed Lucid (daily build) on a virtual machine; having freezes randomly at boot time; sometimes system boots up successfully; sometimes doesn't.

ProblemType: KernelOops
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-19-generic 2.6.32-19.28
Regression: Yes
Reproducible: No
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Annotation: Your system might become unstable now and might need to be restarted.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: I82801AAICH [Intel 82801AA-ICH], device 0: Intel ICH [Intel 82801AA-ICH]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: linkertux 1122 F.... pulseaudio
CRDA: Error: [Errno 2] No existe el fichero ó directorio
Card0.Amixer.info:
 Card hw:0 'I82801AAICH'/'Intel 82801AA-ICH with STAC9700,83,84 at irq 9'
   Mixer name : 'SigmaTel STAC9700,83,84'
   Components : 'AC97a:83847600'
   Controls : 34
   Simple ctrls : 24
CurrentDmesg:
 [ 16.646592] ppdev: user-space parallel port driver
 [ 24.404242] eth0: no IPv6 routers present
 [ 480.169294] end_request: I/O error, dev fd0, sector 0
 [ 480.198622] end_request: I/O error, dev fd0, sector 0
Date: Tue Apr 6 01:27:47 2010
Failure: oops
Frequency: I don't know.
HibernationDevice: RESUME=UUID=665bf222-03b7-4289-91b2-593bd22310ca
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Alpha i386 (20100329)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: innotek GmbH VirtualBox
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-19-generic root=UUID=c527fe82-7ac1-4d88-a71c-c706a889111e ro quiet splash
RelatedPackageVersions: linux-firmware 1.33
RfKill:

SourcePackage: linux
Title: BUG: soft lockup - CPU#0 stuck for 61s! [events/0:6]
dmi.bios.date: 12/01/2006
dmi.bios.vendor: innotek GmbH
dmi.bios.version: VirtualBox
dmi.modalias: dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:
dmi.product.name: VirtualBox
dmi.product.version: 1.2
dmi.sys.vendor: innotek GmbH

Revision history for this message
linker7474 (cglinker747) wrote :
Revision history for this message
linker7474 (cglinker747) wrote :

This is the first time I have seen this kind of problems; it is the same thing on my desktop computer (bug #553753); I have made a clean installation of Hardy, jaunty and karmic in different times with no problems at all.

Revision history for this message
Charlie Kravetz (cjkgeek) wrote :

If you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

Changed in linux (Ubuntu):
status: New → Incomplete
summary: - BUG: soft lockup - CPU
+ BUG: soft lockup - CPU#0 stuck for 61s! [events/0:6]
Revision history for this message
Charlie Kravetz (cjkgeek) wrote :

We'd like to figure out what's causing this bug for you, but we haven't heard back from you in a while. Could you please provide the requested information? Thanks!

Revision history for this message
David McGiven (davidmcgivenn) wrote :

Charlie,

I have the same problem as the user who posted the bug, although I'm using 8.04 LTS with kernel 2.6.24-27.

It's a very annoying bug because you don't really know if it's because a bug in the kernel or a hardware failure. Both options sound possible to me ...

Do you want me to post some more information ?

Thanks

Revision history for this message
Charlie Kravetz (cjkgeek) wrote :

David McGiven:
Please file a new bug using 'ubuntu-bug linux' for that issue. It is a different kernel and different hardware. That does often times mean a different solution to fix it. Thanks.

Revision history for this message
Kirk Wolff (kirk-stpaulinternet) wrote :

I have a similar problem, however it happens all the time. Tried on two computers, both fail the same way. I tried the 'generic' and the 'i386' install, same thing. Before I tried reinstalling from a CD, I tried installing other 10.04 kernels on the broken upgraded system to no avail (I tried generic, generic-pae, i386 every kernel available under 10.04). One machine is a K7, the other is a Via 800MHz ITX (everyone has one of those right?) with nothing special other than an asterisk card installed and a single ATA hard drive and a single ATA CDROM on their own IDE cables. I went through the normal install on ubuntu-server-10.04 release in order to try getting a 'fresh' install. I install the entire operating system on /dev/sda5 (the first extended partition). Once the install is finished, the system reboots and guess what? Nothing. It sits there and after one minute it prints out the message "BUG: soft lockup - CPU#0 stuck for 61s! [events/0:6]". It repeats this message every minute. The hardware works fine with Ubuntu 9.10 server, even ran memtest for a few hours, never had a problem until I did a 'do-release-upgrade' to 10.04 then all hell broke loose and I lost an entire day of working trying to figure out what the deal was, how could ubuntu (or linus, it appears to be a kernel problem) have dared release something without testing it on older hardware? I have Ubuntu 10.04 installed on a dual-proc P4 3GHz machine, and two 64-bit AMD machines and it works fine! Its just the older systems that won't seem to work.

Revision history for this message
Victor (vprea) wrote :

Same here. Fresh Ubuntu Server 10.04 install on a Pentium III 733MHz, SOYO 7BVA, 384Mb RAM, 4Gb hard disk. The system was fully operational with CentOS 5.2

I booted with "rw init=/bin/bash" and made aptitude update && aptitude upgrade to get latest 2.6.32-23 kernel, but when I reboot, I still get the same problem.

Charlie, if you explain me how to, I can test the Upstream Kernel.

thanks.

Revision history for this message
Victor (vprea) wrote :

Another test: I disabled ureadahead (renamed ureadahead.conf to ureadahead.conf.disable) but still not working.

Revision history for this message
Victor (vprea) wrote :

Well..... I tried with this kernel:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.32.16-lucid/

Here is what I get from console, running with --verbose option:
------------
Begin: Running /scripts/init-bottom ...
Done.
init: Handling startup event
init: mountall goal changed from stop to start
init: mountall state changed from waiting to starting
init: hostname goal changed from stop to start
init: hostname state changed from waiting to starting
init: Handling starting event
init: plymouth goal changed from stop to start
init: plymouth state changed from waiting to starting
init: hwclock goal changed from stop to start
init: hwclock state changed from waiting to starting
init: Handling starting event
init: hostname state changed from starting to pre-start
init: hostname state changed from pre-start to spawned
init: hostname main process (214)
init: hostname state changed from spawned to post-start
init: hostname soinit: Handling starting event
init: plymouth state changed from starting to pre-start
init: plymouth state changed from pre-start to spawned
init: plymouth mainit: Handling starting event
init: hwclock state changed from starting to pre-start
init: hwclock state changed from pre-start to spawned
init: hwclock main process (216)
init: hwclock state changed from spawned to post-start
init: hwclock state changed from post-start to running
init: Handling started event
init: Handling started event
init: hostname main process (214) exited normally
init: hostname goal changed from start to stop
init: hostinit: hwclock main process (216) exited normally
init: hwclock goal changed from start to stop
init: hwclock state changed from running to stopping
init: Handling stopping event
init: hostname state changed finit: hostname state changed from killed to post-stop
init: hostname state changed from post-stop to waiting
init: Handling stopping event
init: hwclock state changed from stopping to killed
init: hwclock state changed from killed to post-stop
init: hwclock state changed from post-stop to waiting
init: Handling stopped event
init: Handling stopped event
init: plymouth main process (215) executable changed
iinit: plymouth state changed from spawned to post-start
init: plymouth post-start process (218)
init: plymouth post-start process (218) exited normally
init: plymouth state changed from post-start to running
init: mountall state changed from starting to pre-start
init: mountall state changed from pre-start to spawned
init: mountall main process (219)
init: Handling started event
[ 84.828019] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:6]
------------------

tags: removed: needs-upstream-testing
Revision history for this message
Charlie Kravetz (cjkgeek) wrote :

Please file a new bug using 'ubuntu-bug linux' for that issue. It is a different kernel and different hardware. That does often times mean a different solution to fix it. You can mention in it this report and bug number. Unless your hardware is exactly the same as the original reporters, we will need a new report. Thanks.

Revision history for this message
Victor (vprea) wrote :

Thanks!
BUG based on i686 platforms can be continued here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/607056

Revision history for this message
Charlie Kravetz (cjkgeek) wrote :

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again!

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Francisco Adell Péricas (pericas) wrote : [Bug 556350] Re: BUG: soft lockup - CPU#0 stuck for 61s! [events/0:6]

Hi Kirk Wolff.

This exactly problem exists in my system with a asterisk wctdm digium card: put the card -> soft lockup; remove the card -> system boots fine.

A have looked around for help and a have not yet found any solution.

I wouldn´t need to regress to Ubuntu 9 version (where the card used to work without any problem).

Does anybody know what should I do to bypass this bug?

Thanks,
Péricas.

Revision history for this message
Kirk Wolff (kirk-stpaulinternet) wrote :

Francisco,

I've found the solution to this problem. Its simpler than it appears, the problem is tracking down the source of the problem as linux doesn't necessarily tell you when a driver fails or locks up.

The problem is this (drum roll) the netjet driver.

The netjet ISDN PCI card and the digium wildcard PCI card happen to use the same PCI interface chip. This interface chip provides a PCI-bus abstraction to hardware designers and can be used in many applications though its marketed to VOIP. Unfortunately for us (or perhaps fortunately) the PCI interface chip isn't what the DAHDI and netnet drivers talk to, but the hardware that uses the chip to interface with the PCI bus. Apparently, due to some poor compliance considerations by both the netjet and the wildcard designers, they use the default PCI VID/PID for that chip, perhaps from their development kit. Since netjet became part of the linux kernel and dahdi has not, linux attempts to load the netjet driver. The netjet driver thinks its talking to a netjet PCI card when a Digium wildcard is installed, so it attempts to initialize it as such and locks the kernel up in the process. In some cases, and under what conditions I have no idea, dahdi loads first and netjet doesn't get probed so some people don't see a problem.

In order to prevent the netjet driver from taking over the digium wildcard, do the following:

1) power-off your machine and remove the digium card
2) Boot your machine
3) Append the following line to the file /etc/modprobe.d/blacklist.conf: "blacklist netjet"
4) Power-off your machine and re-install your digium card
5) Boot your machine and check that your digium card isn't sharing interrupts with any other device in your system by checking /proc/interrupts

- Kirk

Revision history for this message
Francisco Adell Péricas (pericas) wrote :

Kirk.

Thanks, thanks, thanks, thanks!!!! This workaround works perfect. My asterisk is againd up and running. Thanks a lot.

Péricas.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.