deadlock can occur in between clustercheck vs ceph-mon setup in pacemaker scenario

Bug #1598907 reported by Giulio Fidente on 2016-07-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Medium
Giulio Fidente
Mitaka
Undecided
Unassigned

Bug Description

During step2, in pacemaker scenario, all nodes will try to perform the ceph-mon setup which blocks puppet until all initial members are available when forming the ceph cluster.

On the bootstrap node this can happen *before* galera setup is initialized but if in the same deployment, on non-bootstrap nodes clustercheck is launched first, the deployment will stop in a deadlock where:

1 ceph-mon on bootstrap node is waiting for the non-bootstrap nodes
2 clustercheck on non-bootstrap nodes is waiting galera to come up on the bootstrap nodes

The clustercheck resource should really only be used on bootstrap nodes.

Giulio Fidente (gfidente) wrote :

I have a WIP patch at https://review.openstack.org/#/c/337302/1

More infos on the BZ can also be found in https://bugzilla.redhat.com/show_bug.cgi?id=1349456

Changed in tripleo:
importance: Undecided → Medium
assignee: nobody → Giulio Fidente (gfidente)
status: New → Triaged
Changed in tripleo:
status: Triaged → In Progress
Steven Hardy (shardy) on 2016-07-08
Changed in tripleo:
milestone: none → newton-2

Reviewed: https://review.openstack.org/337302
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=3b9544b265ff239225ef7a23986182019a14d34b
Submitter: Jenkins
Branch: master

commit 3b9544b265ff239225ef7a23986182019a14d34b
Author: Giulio Fidente <email address hidden>
Date: Mon Jul 4 18:22:19 2016 +0200

    Merge pacemaker_master/sync_db conditionals

    By condensing the pacemaker_master and sync_db conditions we ensure
    there won't be unrelevant (clustercheck) execs deployed on
    non-bootstrap nodes.

    Closes-Bug: 1598907

    Change-Id: Iae6aa13682d63096265f4751b2f71019a49f6fa6

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/339444
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=64d70eb2f61de3ab1e47f54c932b4421da4a816a
Submitter: Jenkins
Branch: stable/mitaka

commit 64d70eb2f61de3ab1e47f54c932b4421da4a816a
Author: Giulio Fidente <email address hidden>
Date: Fri Jul 8 12:04:16 2016 +0200

    Merge pacemaker_master/sync_db conditionals

    By condensing the pacemaker_master and sync_db conditions we ensure
    there won't be unrelevant (clustercheck) execs deployed on
    non-bootstrap nodes.

    Closes-Bug: 1598907
    Change-Id: Iae6aa13682d63096265f4751b2f71019a49f6fa6
    (cherry picked from commit 3b9544b265ff239225ef7a23986182019a14d34b)

This issue was fixed in the openstack/tripleo-heat-templates 2.1.0 release.

This issue was fixed in the openstack/tripleo-heat-templates 5.0.0.0b3 development milestone.

This issue was fixed in the openstack/tripleo-heat-templates 2.1.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.