BUG: soft lockup - CPU#1 stuck for 61s! [hald-addon-stor:3486]

Bug #359653 reported by paraiko
44
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Fedora)
In Progress
Unknown
linux (Ubuntu)
Won't Fix
Medium
Unassigned

Bug Description

System was not completely unresponsive, but the internet connection was gone.
the hald-addon-stor proces was using 50% (1 core) of the cpu and was unresponsive.
it was not possible anymore to start an administrative shell to killthe proces.
system restart was the only way out.

ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be restarted.
Architecture: amd64
DistroRelease: Ubuntu 9.04
Failure: oops
MachineType: Zepto Znote
Package: linux-image-2.6.28-11-generic 2.6.28-11.41
ProcCmdLine: root=UUID=e0ecbde7-d23f-4ed2-8937-808d0a99cccb ro quiet splash
ProcVersionSignature: Ubuntu 2.6.28-11.41-generic
SourcePackage: linux
Tags: kernel-oops
Title: BUG: soft lockup - CPU#1 stuck for 61s! [hald-addon-stor:3486]

Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :

The same bug occurred again. This time the pc did not resume and I had to hold the powerbutton for 4 sec.
After booting up, kerneloops and apport reported this bug.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi paraiko,

Does this always happen when you attempt to resume? Or is there a series of steps you are able to take to reproduce this issue? Can you also comment if this just recently started happening or did this issue exist with previous kernels? Thanks.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
paraiko (paraiko) wrote :

Hello Leann,

I've been experencing a lot of crashes and kernel freezes with Jaunty so far (multiple times a day), but I thought most of them were related to the nvidia binary drivers. I've seen this particular bug for the first time when I reported it on 11-apr (actually that was without the nvidia drivers enabled), but it has occurred now both with and without nvidia drivers. To answer your questions:
- This bug just recently started happening, before I was experiencing a lot of bugs 348672 & 350609.
- I've never seen this bug on another kernel version.
- It is not only related to suspend resume. I've had at least one complete system freeze with only power down as escape. If I see it in relation to suspend it is after resuming from suspend. But that only happens with the nvidia drivers enabled. with the nv drivers the system does not resume at all.

I hope that answers your questions, I'm happy to suppy more information if needed.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
Changed in linux (Fedora):
status: Unknown → Confirmed
Revision history for this message
Andres Mujica (andres.mujica) wrote :

Thanks for testing and confirming against the latest Jaunty released. Please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux-image-2.6.28-11-generic <bug #>

If the issue remains in Jaunty, if you could also test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine this issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
Revision history for this message
Andres Mujica (andres.mujica) wrote :

the right command would be apport-collect -p linux-image-2.6.28-11-generic 359653

Revision history for this message
Andres Mujica (andres.mujica) wrote :

I'm putting this notes here for reference. (if i put those somewhere else i'd have to search again..)

similar type of bug: http://marc.info/?l=linux-wireless&m=121731102709834&w=4
recollecting debug data: http://marc.info/?l=linux-wireless&m=121747000409226&w=4
some conclusions: http://marc.info/?l=linux-kernel&m=121757602319407&w=4

another similar type of bug:
http://marc.info/?l=linux-kernel&m=122014663919273&w=4
http://article.gmane.org/gmane.linux.kernel.wireless.general/32586

There's a lot of similar reported bugs at upstream bugzilla, with the info recollected i'll try to find a matching one, if not a new report would need to be open.

launchpad bugs related

bug #376363
bug #363368
bug #338047

Revision history for this message
paraiko (paraiko) wrote : apport-collect data

Architecture: amd64
DistroRelease: Ubuntu 9.04
HibernationDevice: RESUME=UUID=fa83e644-4584-4375-bcd0-6661a3cf7f3c
MachineType: Zepto Znote
NonfreeKernelModules: nvidia
Package: linux-image-2.6.28-11-generic 2.6.28-11.42
PackageArchitecture: amd64
ProcCmdLine: root=UUID=e0ecbde7-d23f-4ed2-8937-808d0a99cccb ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.28-13.44-generic
Uname: Linux 2.6.28-13-generic x86_64
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare

Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote : apport-collect data

Architecture: amd64
DistroRelease: Ubuntu 9.04
HibernationDevice: RESUME=UUID=fa83e644-4584-4375-bcd0-6661a3cf7f3c
MachineType: Zepto Znote
NonfreeKernelModules: nvidia
Package: linux-image-2.6.28-13-generic 2.6.28-13.44
PackageArchitecture: amd64
ProcCmdLine: root=UUID=e0ecbde7-d23f-4ed2-8937-808d0a99cccb ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.28-13.44-generic
Uname: Linux 2.6.28-13-generic x86_64
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare

Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :
Revision history for this message
paraiko (paraiko) wrote :

I've actually not experienced this bug in quite some time now.

In the mean time my kernel was updated to 2.6.28-13-generic, so in addition to the output of apport-collect -p linux-image-2.6.28-11-generic 359653 I've also attached the output of apport-collect -p linux-image-2.6.28-13-generic 359653.

Revision history for this message
walec51 (me-adamwalczak) wrote :

I'm experiencing this almost once a day even on the 2.6.28-13-generic. This mostly happens randomly while browsing the net with firefox.

I've attached:

- a screen shot when it happened,

- my syslog where you can find the moments when it happened by searching the phrase 'soft lockup - CPU#0 stuck for'

- my systems spec dumped by SysInfo,

tell me if you need something more

Revision history for this message
walec51 (me-adamwalczak) wrote :
Revision history for this message
walec51 (me-adamwalczak) wrote :
Revision history for this message
walec51 (me-adamwalczak) wrote :

Ok. Now something similar happened but with the nullmailer. Generally it seams to act bizarre from the beginning. Attached the syslog

tags: removed: needs-kernel-logs
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Fedora):
status: Confirmed → In Progress
Revision history for this message
Brad Figg (brad-figg) wrote :

Can you also confirm this issue exists with the most recent Karmic Koala 9.10 Alpha release? ISO CD images are available at http://cdimage.ubuntu.com/releases/karmic/ . If the issue remains with Karmic it would be great to then also test the latest upstream mainline kernel available. This will allow additional upstream developers to examine this issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Thanks in advance.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
paraiko (paraiko) wrote : Re: [Bug 359653] Re: BUG: soft lockup - CPU#1 stuck for 61s! [hald-addon-stor:3486]

Hello Brad,

I've upgraded my pc to Karmic yesterday.
So far I've had not a single crash or kernel oops and it feels a lot more
stable than Jaunty, but that is hard to say after only 1 day of use.
On Jaunty I had on average 2 system freezes every day and several more
kernel oopses...

I will post back after some more days of testing.

On Tue, Sep 22, 2009 at 11:40 PM, Brad Figg <email address hidden> wrote:

> Can you also confirm this issue exists with the most recent Karmic Koala
> 9.10 Alpha release? ISO CD images are available at
> http://cdimage.ubuntu.com/releases/karmic/ . If the issue remains with
> Karmic it would be great to then also test the latest upstream mainline
> kernel available. This will allow additional upstream developers to
> examine this issue. Refer to
> https://wiki.ubuntu.com/KernelMainlineBuilds . Thanks in advance.
>
> ** Changed in: linux (Ubuntu)
> Status: Confirmed => Incomplete
>
> --
> BUG: soft lockup - CPU#1 stuck for 61s! [hald-addon-stor:3486]
> https://bugs.launchpad.net/bugs/359653
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
paraiko (paraiko) wrote :

Hello Brad,

I've been using Karmic for several days now and I've not experienced this bug for as far as I know.

I had only 3 system freezes in total, but I do not know the cause because apport was not kicking in after the reboot.

Revision history for this message
walec51 (me-adamwalczak) wrote :

My Dell Studio 17 with Juanty crushes 2-4 times a day when I'm using wifi.

I've been collecting log traces right before the crash. Take a look at the file I've attached. It have dozens of them.

Revision history for this message
walec51 (me-adamwalczak) wrote :

My Dell Studio 17 with Juanty crushes 2-4 times a day when I'm using wifi.

I've been collecting log traces right before the crash. Take a look at the file I've attached. It have dozens of them.

I'll switch to Karmic kernel to see if something changes

Revision history for this message
walec51 (me-adamwalczak) wrote :

Beside some other serve bugs I had to workaround in Karmic this one seems to be fixed. I had not random crush like this since I've updated to weeks ago.

Revision history for this message
David Huggins-Daines (dhuggins) wrote :

Hi. I have this problem every day with Karmic. It is always 61s. As with the paraiko above I thought it was related to the nvidia drivers, but I get it with the nv driver too. And, like paraiko, with the nv driver I also can't resume from hibernation - it just hangs.

Usually I get the BUG: when listening to music and web browsing at the same time. Lately, though, the process that seems to be hanging is usually 'fuser' (which is a bit odd, I wonder what is running this so often).

And why is it always 61s?

This is definitely NOT fixed in karmic for me, in fact, I did not get it with Jaunty.

One other data point here, though - I have a DVD-RW drive in my system which seems to be malfunctioning. I'm not really sure if it is the drive itself, or the cable (I swapped cables with the hard drive and it's still broken), the driver, or the motherboard. I have also gotten lockups with hald-addon-stor which are surrounded by a bunch of "ata1: lost interrupt" and similar messages.

See the attached file for examples. The motherboard has a VIA V8T890 chipset and the CPU is an Athlon 64 X2 3800+

Revision history for this message
David Huggins-Daines (dhuggins) wrote :

I haven't experienced this problem since doing two things:

 1) Switching to a SATA hard drive with a separate controller card. However, this just caused my machine to lock up randomly without printing any useful messages...
 2) Upgrading my kernel to 2.6.32-10 from Lucid. This has (so far, knock on wood) fixed all of my hanging and stability issues.

I think there is something very wrong with 64-bit 2.6.31 on older AMD processors and VIA chipsets, from the sounds of it...

Revision history for this message
David Huggins-Daines (dhuggins) wrote :

Just wanted to follow up here - I'm now using Lucid, and haven't had this specific problem. I have had a lot of random system lockups and other errors. It seems for the moment, however, that setting the DRAM clock to 166MHz (rather than 200MHz as reported by the SPD) has fixed them.

I believe that my particular chipset/motherboard/cpu combo has some hardware bug in it, which from the sounds of it may be pretty common.

Revision history for this message
Wizzu (wizzu) wrote :

I've just experienced this exact problem on Lucid.

kern.log has eg.:
Aug 27 00:54:50 carrell kernel: [10790.032494] BUG: soft lockup - CPU#2 stuck for 61s! [tomboy:11040]

uname -a:
Linux carrell 2.6.32-24-generic #41-Ubuntu SMP Thu Aug 19 01:38:40 UTC 2010 x86_64 GNU/Linux

I upgraded from Karmic a few days ago, and the kernel on that was rock solid, this only started happening now.
There's also been some system freezes (not sure if it's just X or the entire kernel).

Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
To post a comment you must log in.