"Invalid IPC credentials" after corosync, pacemaker service restarts

Bug #1490727 reported by JuanJo Ciarlante
44
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Landscape Server
Fix Released
High
Andreas Hasenack
15.07
Fix Released
High
Andreas Hasenack
Cisco-odl
Fix Released
High
Andreas Hasenack
hacluster (Juju Charms Collection)
Fix Released
Critical
Billy Olsen

Bug Description

Followup from https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1439649,
see there comments #14, #15 as it _maybe_ related to missing uidgid ACLs for
hacluster:haclient (as apparently presented by pacemaker).

FYI you can find relevant IPC resources with:
$ find /run/shm -user hacluster -group haclient -ls

Related branches

Changed in hacluster (Juju Charms Collection):
status: New → Confirmed
importance: Undecided → Critical
Changed in hacluster (Juju Charms Collection):
assignee: nobody → Billy Olsen (billy-olsen)
Changed in hacluster (Juju Charms Collection):
milestone: none → 15.10
Changed in hacluster (Juju Charms Collection):
status: Confirmed → In Progress
Revision history for this message
Billy Olsen (billy-olsen) wrote :

The work around presented by JuanJo seems to work out reasonably well, so that may be a good path forward for the time being. However, the ACLs for pacemaker shouldn't really be enabled so something else seems likes its going on. For now, I think proposing the work around is a good thing until the percona/corosync issue is straightented out.

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

Thanks for the quick turnaround, could you please backport the fix
to 1507 trunk ?
We have several stacks where we need to manually apply
above workaround for corosync/pacemaker to behave properly,
and several coming down the line before 1510.

FYI I while fixing hacluster trunk (essentially came out with the same
changes), had to add a line to test_hacluster_utils.py to pass unittests:
http://paste.ubuntu.com/12272628/

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in pacemaker (Ubuntu):
status: New → Confirmed
tags: added: landscape-release-29
David Britton (dpb)
tags: added: landscape
tags: added: backport-potential
Revision history for this message
Robie Basak (racb) wrote :

Please could you complete the bug report from the perspective of the pacemaker task? I don't think there's enough here to go on for example in a way that upstream or the Debian maintainer would understand.

Changed in pacemaker (Ubuntu):
status: Confirmed → Incomplete
Changed in hacluster (Juju Charms Collection):
status: In Progress → Fix Released
David Britton (dpb)
no longer affects: landscape
Changed in landscape:
milestone: none → 15.08
importance: Undecided → High
tags: added: kanban
tags: removed: kanban
David Britton (dpb)
tags: added: kanban
removed: landscape-release-29
Changed in landscape:
status: New → In Progress
assignee: nobody → Andreas Hasenack (ahasenack)
Changed in landscape:
status: In Progress → Fix Committed
Changed in landscape:
status: Fix Committed → Fix Released
milestone: 15.08 → 15.07
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

For pacemaker, this is a duplicate of:

https://bugs.launchpad.net/bugs/1439649

So, instead of marking it as duplicate, since it had charm work, I'm removing pacemaker from the bug and leaving this comment:

https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1439649/comments/20

"""
From Corosync 2.4.1 Release Notes:

This release contains fix for one regression and few more smaller fixes.

"""
During 2.3.6 development the bug which is causing pacemaker to not work after corosync configuration file is reloaded happened. Solution is ether to use this fixed version (recommended) or as a quick workaround (for users who wants to stay on 2.3.6 or 2.4.0) is to create file pacemaker (file name can be arbitrary) in /etc/corosync/uidgid.d directory with following content (you can also put same stanza into /etc/corosync/corosync.conf):

uidgid {
    gid: haclient
}
"""

Anyone relying in Trusty or Xenial corosync:

 corosync | 2.3.3-1ubuntu1 | trusty
 corosync | 2.3.3-1ubuntu4 | trusty-updates
 corosync | 2.3.5-3ubuntu1 | xenial
 corosync | 2.3.5-3ubuntu2.3 | xenial-security
 corosync | 2.3.5-3ubuntu2.3 | xenial-updates

should apply the mitigation above, like discovered previously by commenters of this bug.

Note: Trusty is already EOS so I'm marking it as "won't fix".

Xenial should include the mitigation in a SRU.
"""

no longer affects: pacemaker (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.