Comment 77 for bug 245779

Revision history for this message
David McGiven (davidmcgivenn) wrote :

Dear Ubuntu Users,

I'm hitting the same error/bug you are. My setup is the following :

- SunFire X4450 with 4 Intel Xeon 6-Core :
(Intel(R) Xeon(R) CPU E7450 @ 2.40GHz)

- Ubuntu 8.04 LTS :
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 8.04.4 LTS
Release: 8.04
Codename: hardy

- Kernel version :
Linux xxxxx 2.6.24-27-server #1 SMP Wed Mar 24 11:32:39 UTC 2010 x86_64 GNU/Linux

- I'm running a 24 processors NAMD job (http://www.ks.uiuc.edu/Research/namd/)

After less than 1 minute, the system becomes unresponsive for ~10 minutes and then it comes back to "normal" (no need to reboot if you are patient enough).

Checking the dmesg buffer shows the already discussed "[ 2618.201092] BUG: soft lockup - CPU#23 stuck for 11s! [events/23:98]"

I've also seen some messages regarding a RAID module :
[ 2625.160886] aacraid: Host adapter abort request (0,0,0,0)
[ 2625.161029] aacraid: Host adapter reset request. SCSI hang ?

But I don't know if they're very relevant because the software and the data are accessed through NFS so it's not really writing to a local disk.

Does this provide any help to solve the problem ? I can send more detailed logs to the ubuntu LTS team if needed, this bug has to be solved! it's been more than a year now.