Pacemaker fails to start and reports a Library Error

Bug #1595627 reported by Eric Desrochers
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
libqb (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Fix Released
Medium
Eric Desrochers
Wily
Won't Fix
Medium
Eric Desrochers

Bug Description

[SRU JUSTIFICATION]

[Impact]

Pacemaker fails to start and reports a Library Error as follow :
"notice: mcp_read_config: Configured corosync to accept connections from group 124: Library error (2)"

[Test Case]

- Have a corosync/pacemaker cluster with libqb version <=0.16.0.real-1ubuntu4
- You need sequentially start/stop Corosync/Pacemaker for some amount of times to trigger the issue (until both PIDs of corosync/pacemaker are >99999 and fd >=10 to trigger this issue)

[Regression Potential]

The patch is already in place in Debian & Xenial and late Ubuntu release version.

This patch make the description field larger to satisfy all possible pids and file descriptor values.

[Other Info]

Upstream Commit:

0766a3ca Increase the length of description field
https://github.com/ClusterLabs/libqb/commit/0766a3ca5473a9e126e91022075b4b3798b8d5bc

Note : The commit has been introduced first in upstream branch : v0.17.2

[Original Description]

It has been brought to my attention by a user the following :

Pacemaker fails to start if its PID is greater than 99999, then it reports a Library error as follow :

notice: mcp_read_config: Configured corosync to accept connections from group 124: Library error (2)"

Revision history for this message
Eric Desrochers (slashd) wrote :

HOWTO reproduce the problem:

 * Have a corosync/pacemaker cluster with libqb version <=0.16.0.real-1ubuntu4
 * You need sequentially start/stop Corosync/Pacemaker for some amount of times to trigger the issue.
 (Note : PID of pacemaker must be >99999 to trigger this issue)

Revision history for this message
Eric Desrochers (slashd) wrote :

The problem is then length of CONNECTION_DESCRIPTION field.

filename: lib/ipc_int.h
#define CONNECTION_DESCRIPTION (16)

When both corosync and pacemaker has PID greater than 99999 and the file descriptor >= 10 the description field will exceed the 16 value.

The description field need to be larger to satisfy all possible pids and file descriptor values.

Revision history for this message
Eric Desrochers (slashd) wrote :

Trusty and Wily are affected :
#define CONNECTION_DESCRIPTION (16)

Xenial is not affected :
#define CONNECTION_DESCRIPTION (34) /* INT_MAX length + 3 */

Revision history for this message
Eric Desrochers (slashd) wrote :

Upstream fix is the following :

https://github.com/ClusterLabs/libqb/commit/0766a3ca5473a9e126e91022075b4b3798b8d5bc

It consist of increasing the CONNECTION_DESCRIPTION

-#define CONNECTION_DESCRIPTION (18)
+#define CONNECTION_DESCRIPTION (34) /* INT_MAX length + 3 */

Changed in libqb (Ubuntu Trusty):
assignee: nobody → Eric Desrochers (slashd)
Changed in libqb (Ubuntu Wily):
assignee: nobody → Eric Desrochers (slashd)
Changed in libqb (Ubuntu Trusty):
importance: Undecided → Medium
Changed in libqb (Ubuntu Wily):
importance: Undecided → Medium
Eric Desrochers (slashd)
description: updated
Eric Desrochers (slashd)
Changed in libqb (Ubuntu):
status: New → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libqb (Ubuntu Trusty):
status: New → Confirmed
Changed in libqb (Ubuntu Wily):
status: New → Confirmed
Revision history for this message
Eric Desrochers (slashd) wrote :

Patch for Trusty

Revision history for this message
Eric Desrochers (slashd) wrote :

Patch for Wily

tags: added: patch sts sts-sponsor sts-sru ubuntu-sponsors
Changed in libqb (Ubuntu Trusty):
status: Confirmed → In Progress
Changed in libqb (Ubuntu Wily):
status: Confirmed → In Progress
Eric Desrochers (slashd)
description: updated
Revision history for this message
Eric Desrochers (slashd) wrote :

A test package that can be found here [ppa:slashd/sf99715] including the upstream commit [0766a3ca5473a9e126e91022075b4b3798b8d5bc] has been tested by a user affected by this issue.

Here's the user feedbacks :

"Patched version does seem to fix the issue."

Eric

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Eric, or anyone else affected,

Accepted libqb into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libqb/0.16.0.real-1ubuntu5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libqb (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Eric Desrochers (slashd)
tags: removed: sts-sponsor
Revision history for this message
David A. Desrosiers (setuid) wrote :

If you're unable modify your system's packages by installing patched .deb packages or changing your default repositories to include -proposed, you can use the following command:

sysctl -w kernel.pid_max=99999

To make this permanent, you'll need to add the following line to /etc/sysctl.conf:

kernel.pid_max = 99999

**NOTE** This solution is not intended to be a long-term fix, but merely a hotfix or stopgap until you are able to install the updated package that contains the appropriate code-level fix.

Using the 'pid_max' change above allows you to keep your package installation tree controlled and managed with stable packages, and limits the number of pids your machine will allocate to a total of 99,999 which is just enough NOT to reach the limit that this patch addresses.

When the package reaches the standard repository (after leaving -proposed), you'll receive the patched version with your regular updates and can install it as you would normally.

Hope that helps!

Revision history for this message
Eric Desrochers (slashd) wrote :

It has been brought to my attention by a user experiencing the issue on Trusty (14.04 LTS) the following :

"I installed libqb0 from proposed repo and it works as expected. No more issues when starting with pid >99,999"

Eric

tags: added: verification-done
removed: verification-needed
Revision history for this message
Roman Semenov (rsemenov-u) wrote :

Tested version 0.16.0.real-1ubuntu5 from proposed repository - it works as expected. No more issues with startup when pacemaker\corosync pid >99,999

Mathew Hodson (mhodson)
Changed in libqb (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libqb - 0.16.0.real-1ubuntu5

---------------
libqb (0.16.0.real-1ubuntu5) trusty; urgency=medium

  * d/p/increase-the-length-of-description-field.patch: increase the
    length of description field. (LP: #1595627)

 -- Eric Desrochers <email address hidden> Fri, 24 Jun 2016 11:32:36 +0200

Changed in libqb (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote : Update Released

The verification of the Stable Release Update for libqb has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Eric Desrochers (slashd) wrote :

I changed the status for Wily to "Won't fix" since Ubuntu 15.10 (Wily Werewolf) reaches End of Life on July 28 2016.

Eric

Changed in libqb (Ubuntu Wily):
status: In Progress → Won't Fix
Louis Bouchard (louis)
tags: removed: sts-sru
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.