soft lockup - CPU#0 stuck for 11s! [clurgmgrd:4587]

Bug #338047 reported by Shang Wu
10
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned
redhat-cluster (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Hardware:
System - 2 HP servers
Ubuntu version - Hardy 8.04.2

Step to reproduce:
1. Configure the hostname
2. Configure the static IP address
3. Install redhat-cluster-suite software
4. Create the cluster.conf file[1].

5. Reboot both server. When the server boot up, the error message will appear repeatedly even after reboot:
soft lockup - CPU#0 stuck for 11s! [clurgmgrd:4587]
soft lockup - CPU#0 stuck for 11s! [clurgmgrd:4587]
soft lockup - CPU#0 stuck for 11s! [clurgmgrd:4587]
soft lockup - CPU#0 stuck for 11s! [clurgmgrd:4587]
soft lockup - CPU#0 stuck for 11s! [clurgmgrd:4587]

*see soft_lockup.png

[1] Here is the cluster.conf file that I created.
Here is the cluster.conf file:
<?xml version="1.0" ?>
<cluster alias="ubuntu" config_version="2" name="ubuntu">
 <fence_daemon post_fail_delay="0" post_join_delay="3"/>
 <clusternodes>
  <clusternode name="node0.ubuntu.local" nodeid="1" votes="1">
   <fence>
    <method name="1">
     <device name="m_fencing" nodename="node0.ubuntu.local"/>
    </method>
   </fence>
  </clusternode>
  <clusternode name="node1.ubuntu.local" nodeid="2" votes="1">
   <fence>
    <method name="1">
     <device name="m_fencing" nodename="node1.ubuntu.local"/>
    </method>
   </fence>
  </clusternode>
 </clusternodes>
 <cman expected_votes="1" two_node="1"/>
 <fencedevices>
  <fencedevice agent="fence_manual" name="m_fencing"/>
 </fencedevices>
 <rm>
  <failoverdomains/>
  <resources/>
 </rm>
</cluster>

Revision history for this message
Shang Wu (shangwu) wrote :
Revision history for this message
Chuck Short (zulcss) wrote :

Anything in the log files when this happens?

Changed in redhat-cluster (Ubuntu):
status: New → Triaged
Revision history for this message
Andres Mujica (andres.mujica) wrote :

Thanks for testing and confirming against the latest Jaunty released. Please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux-image-2.6.28-11-generic 338047

If the issue remains in Jaunty, if you could also test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine this issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Shang Wu (shangwu) wrote :

Hi Andres,

In this specific situation, I am not able to get into the console nor can I ssh into the machine while it soft lockup on one system and another machine will have the kernel oops and reboot after. If there is another way to run the command, please let me know.

Shang Wu (shangwu)
Changed in linux (Ubuntu):
status: Incomplete → New
Revision history for this message
kernel-janitor (kernel-janitor) wrote :

Hi shangwu,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/releases/ . Please then run following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux-image-`uname -r` 338047

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

While clarifying remaining work on (admittedly too old) bugs I found that
this bugs package no more is in Ubuntu since >12.04
Nowdays you'd directly use corosync/pacemaker/*agents for that, see https://ubuntu.com/server/docs/ubuntu-ha-introduction.

Changed in redhat-cluster (Ubuntu):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.