After ran the command fence_tool dump, the fenced process will take 100% CPU usage

Bug #290399 reported by Shang Wu on 2008-10-28
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Red Hat Cluster
Fix Released
Low
redhat-cluster (Ubuntu)
Undecided
Unassigned
Hardy
High
Unassigned

Bug Description

Binary package hint: cman

I have setup with redhat-cluster-suite on two of the hardy 8.04.1 server kernel. After the setup complete, run the command: sudo fence_tool dump. The fenced process start taking 100% CPU usage.

SRU justification:
In certain conditions groupd, fenced, and dlm_controld all have the potential to enter infinite/tight loops surrounding poll(2) due to a file descriptor being closed and not correctly handled. The poll loop for these daemons checks for POLLHUP, but not POLLERR or POLLNVAL. As such, file descriptors in these states are unhandled.

Versions affected: Hardy
Fix in development branch: Was fixed in >= Intrepid by integration of a later, fixed, upstream release.

Minimal patch: see comment 2 just above.

TEST CASE:
This is difficult to reproduce as it supposes a full RHCS setup and hitting the situation where those daemons enter the loop.

Regression potential:
Looking at the patch, regression potential is very low. It was taken from the RedHat bug and was later successfully integrated in the following RHCS releases. Users running the version from my PPA all reported success without any nasty side-effect.

Description of problem:

 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 4560 root RT 0 7644 808 444 R 100 0.0 122:50.94 groupd

Version-Release number of selected component (if applicable):
2.6.9-68.26.ELsmp
dlm-kernel-smp-2.6.9-53.3
dlm-1.0.7-1

Created attachment 304047
Fixes behavior

In certain conditions (which I am not sure of the cause), groupd, fenced, and
dlm_controld all have the potential to enter infinite/tight loops surrounding
poll(2) due to a file descriptor being closed and not correctly handled.

The poll loop for these daemons checks for POLLHUP, but not POLLERR or
POLLNVAL. As such, file descriptors in these states are unhandled.

This patch fixes these daemons.

Created attachment 304048
Updated patch

Fixing component; this is all user-space.

Shang Wu (shangwu) on 2008-10-28
description: updated
Shang Wu (shangwu) wrote :

The issue is effected by bugs from upstream - https://bugzilla.redhat.com/show_bug.cgi?id=444529

The redhat-cluster - 2.20080227-0ubuntu1.1~ppa1 package under https://launchpad.net/~tcarrez/+archive fixes the issue.

Changed in redhat-cluster:
status: New → Confirmed
Changed in redhatcluster:
status: Unknown → Fix Committed
Thierry Carrez (ttx) wrote :

Debdiff for the hardy SRU

redhat-cluster (2.20080227-0ubuntu1.1) hardy-proposed; urgency=low

   * Applied fix from https://bugzilla.redhat.com/show_bug.cgi?id=444529
     to avoid getting stuck in infinite loops (LP: #290399)

Thierry Carrez (ttx) wrote :

Intrepid/jaunty are already fixed through integration of a later upstream release.

Changed in redhat-cluster:
status: Confirmed → Fix Released

Hi Thierry,

Just curious if this fix will make it into the Hardy 8.04.2 point release? Thanks.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Leann Ogasawara wrote:
> Just curious if this fix will make it into the Hardy 8.04.2 point
> release? Thanks.

Hello Leann,
Chuck was preparing a common SRU with bug 282249, but afaik he got
caught in reproduction environments.

I'm cc-ing him for a more current situation.

- -Thierry
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklnG6IACgkQvcL1obalX09kMACeO7FiAThDGseu7dgENfeF+CQr
E+sAn1yDStko1a/D+2WSS+4QuQGoyX6S
=PWSt
-----END PGP SIGNATURE-----

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0189.html

Changed in redhatcluster:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2009-02-23
Changed in redhat-cluster:
assignee: nobody → tcarrez
importance: Undecided → High
milestone: none → ubuntu-8.04.3
status: New → Triaged
Thierry Carrez (ttx) wrote :

This was nominated as a 8.04.3 candidate by Steve Langasek, so here is the SRU report, we won't wait for the other bug to be ready since it's stuck on hardware availability for reproduction.

Hardy SRU report

Bug impact:
In certain conditions groupd, fenced, and dlm_controld all have the potential to enter infinite/tight loops surrounding poll(2) due to a file descriptor being closed and not correctly handled. The poll loop for these daemons checks for POLLHUP, but not POLLERR or POLLNVAL. As such, file descriptors in these states are unhandled.

Versions affected: Hardy
Fix in development branch: Was fixed in >= Intrepid by integration of a later, fixed, upstream release.

Minimal patch: see comment 2 just above.

TEST CASE:
This is difficult to reproduce as it supposes a full RHCS setup and hitting the situation where those daemons enter the loop.

Regression potential:
Looking at the patch, regression potential is very low. It was taken from the RedHat bug and was later successfully integrated in the following RHCS releases. Users running the version from my PPA all reported success without any nasty side-effect.

Changed in redhat-cluster:
assignee: tcarrez → nobody
status: Triaged → Confirmed
Martin Pitt (pitti) wrote :

Please upload. BTW, you know that you should just upload such changes to the queue, and the SRU team can still reject them if they deem it unappropriate, or there's something wrong?

Thierry Carrez (ttx) wrote :

Can't upload to main (yet), subscribing ubuntu-main-sponsors

Chuck Short (zulcss) wrote :

I have uploaded it for Thierry.

chuck

Martin Pitt (pitti) wrote :

Chuck, Thierry, redhat-cluster uses dpatch. Why did you apply the patch inline?

Thierry Carrez (ttx) wrote :

/me headdesks
I wrongly assumed that since there was no patch, there was no patchsystem used.
Please reject it, I'll do a new one.

Thierry Carrez (ttx) wrote :

New debdiff, properly using dpatch

Michael Jeanson (mjeanson) wrote :

I am running the patched package from Thierry's PPA since yesterday and I have not been able to trigger the bug since and everything is running smoothly in the cluster.

*** Bug 492989 has been marked as a duplicate of this bug. ***

Thierry Carrez (ttx) on 2009-05-06
Changed in redhat-cluster (Ubuntu Hardy):
status: Confirmed → Fix Committed
Martin Pitt (pitti) wrote :

Accepted redhat-cluster into hardy-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Thierry Carrez (ttx) wrote :

@Michael, Shang Wu:
Could you please enable hardy-proposed and make sure the package in that repository fixes the issue as well as the one in my PPA ? Thanks!

Michael Jeanson (mjeanson) wrote :

I can not currently work on this cluster since it's a client that is currently in production. I'll see what I can do next week.

Steve Langasek (vorlon) wrote :

Michael,

Have you had any chance to test the package in hardy-proposed? We need to know soon whether the fix is correct, to be able to include it in the 8.04.3 LTS point release.

description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package redhat-cluster - 2.20080227-0ubuntu1.1

---------------
redhat-cluster (2.20080227-0ubuntu1.1) hardy-proposed; urgency=low

  * debian/patches/avoid-infinite-loops.dpatch: Avoid getting stuck in
    infinite loops (LP: #290399). Patch from Lon Hohberger <email address hidden>
    https://bugzilla.redhat.com/show_bug.cgi?id=444529

 -- Thierry Carrez <email address hidden> Tue, 10 Mar 2009 12:52:15 +0000

Changed in redhat-cluster (Ubuntu Hardy):
status: Fix Committed → Fix Released
Changed in redhatcluster:
importance: Unknown → Low
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.