stonith resource creation is racy when deploying on multiple nodes

Bug #1717566 reported by Michele Baldessari
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
puppet-pacemaker
Fix Released
Undecided
Michele Baldessari
tripleo
Fix Released
High
Unassigned

Bug Description

The way we currently create stonith resource is inherently racy due to the direct calls to the 'pcs' command in the puppet-pacemaker stonith manifests. Explanation as follows:
- Stonith resources get created on multiple nodes at step 5
- If node A is in the middle of creating its own stonith resource and B starts the same process at the right time it will potentially overwrite the CIB that A created.
- The only safe way to do this is already implemented for all other resources in the pcs() ruby function

So the following might happen:
Sep 15 09:43:51 localhost os-collect-config: "Notice: /Stage[main]/Tripleo::Fencing/Pacemaker::Stonith::Fence_ipmilan[00:92:bb:63:40:9f]/Exec[Create stonith-fence_ipmilan-0092bb63409f]/returns: executed successfully",
Sep 15 09:43:51 localhost os-collect-config: "Notice: /Stage[main]/Tripleo::Fencing/Pacemaker::Stonith::Fence_ipmilan[00:92:bb:63:40:9f]/Exec[Add non-local constraint for stonith-fence_ipmilan-0092bb63409f]/returns: Error: Resource 'stonith-fence_ipmilan-0092bb63409f' does not exist"

Which would give us one stonith resource less than what is needed.

To fix this we need to avoid calling pcs directly for the resource creation and for the constraint creation

summary: - stonith resource creation is racy
+ stonith resource creation is racy when deploying on multiple nodes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-pacemaker (master)

Fix proposed to branch: master
Review: https://review.openstack.org/504931

Changed in puppet-pacemaker:
assignee: nobody → Michele Baldessari (michele)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/504932

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-pacemaker (master)

Reviewed: https://review.openstack.org/504931
Committed: https://git.openstack.org/cgit/openstack/puppet-pacemaker/commit/?id=375ff165036d0b529b79876b273cc66566811d62
Submitter: Jenkins
Branch: master

commit 375ff165036d0b529b79876b273cc66566811d62
Author: Michele Baldessari <email address hidden>
Date: Sun Sep 17 10:01:33 2017 +0200

    Add pcmk_stonith provider

    This new pcmk_stonith provider exists in order to create
    stonith resources and location constraints using the common
    pcs() function. The reason for this is that this function
    contains the retry logic needed in case the CIB gets updated
    from another node.

    In this provider we also create the location constraint in
    a race-free way by first creating the stonith resource in
    disabled mode, then we create the location constraint and
    finally we enable the stonith resource.

    This is to avoid races when creating stonith resources from
    multiple nodes which is the how they are created within
    TripleO.

    Change-Id: I424302bbf8d0d5f233e3a7debc082be1c9a170bb
    Partial-Bug: #1717566

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/504932
Committed: https://git.openstack.org/cgit/openstack/puppet-pacemaker/commit/?id=1c56377c8ecfe65c24f5d7fba93a917254eca7e6
Submitter: Jenkins
Branch: master

commit 1c56377c8ecfe65c24f5d7fba93a917254eca7e6
Author: Michele Baldessari <email address hidden>
Date: Mon Sep 18 13:06:08 2017 +0200

    Switch the stonith_agent_generator to pcmk_stonith provider

    In change I424302bbf8d0d5f233e3a7debc082be1c9a170bb we added the
    pcmk_stonith provider to provide race-free stonith resource
    creation. With this commit we switch the agent_generator
    to make use of this new provider and we update all stonith
    manifests.

    Co-Authored-By: John Eckersberg <email address hidden>

    Change-Id: I8be5d5d1a9894b0e2915459b10ea2feed703ba8e
    Closes-Bug: #1717566

Changed in puppet-pacemaker:
status: In Progress → Fix Released
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-pacemaker 0.7.0

This issue was fixed in the openstack/puppet-pacemaker 0.7.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.