HA: resource restart fails on stack update if status is disabled in pacemaker

Bug #1868533 reported by Damien Ciabrini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Undecided
Damien Ciabrini

Bug Description

In HA control plane, each service has a dedicated container called "<service>_restart_bundle", which is run when the service configuration files change.

During a stack update / minor update converge, container <service>_restart_bundle will call pcs to restart the pacemaker resource associated with the service. E.g. for redis:

    /usr/bin/bootstrap_host_exec redis /sbin/pcs resource restart --wait=600 redis_bundle

However, if - for whatever reason - the resource is in a state that prevent pacemaker from restarting it, pcs will complain and return an error:

2020-03-20T15:57:32.161477524+00:00 stdout F Fri Mar 20 15:57:32 UTC 2020: Restarting redis-bundle globally
2020-03-20T15:57:32.747605707+00:00 stderr F Error: Error performing operation: No such device or address
2020-03-20T15:57:32.747605707+00:00 stderr F redis-bundle is not running anywhere and so cannot be restarted
2020-03-20T15:57:32.747605707+00:00 stderr F

This causes the container to fail, and the stack update / minor update converge to fail as well.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/714407

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/714407
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=4d21bab8f2c9380ccf6ff02971101dfcfb4655fb
Submitter: Zuul
Branch: master

commit 4d21bab8f2c9380ccf6ff02971101dfcfb4655fb
Author: Damien Ciabrini <email address hidden>
Date: Mon Mar 23 10:48:08 2020 +0100

    HA: check before restarting resource on stack update

    When container <service>_restart_bundle is run, it checks whether
    it can call pcs to restart the associated pacemaker resource, when
    applicable.
    Make sure we enforce the checks in all cases (when we run during
    a stack update / update converge, and during a minor update).

    Change-Id: I0367a657ddf440f0b73c4de5346306f12439db15
    Closes-Bug: #1868533

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/714634

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/714634
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=5f00163a07c7c69635489b91f5b50ae40a4c32c5
Submitter: Zuul
Branch: stable/train

commit 5f00163a07c7c69635489b91f5b50ae40a4c32c5
Author: Damien Ciabrini <email address hidden>
Date: Mon Mar 23 10:48:08 2020 +0100

    HA: check before restarting resource on stack update

    When container <service>_restart_bundle is run, it checks whether
    it can call pcs to restart the associated pacemaker resource, when
    applicable.
    Make sure we enforce the checks in all cases (when we run during
    a stack update / update converge, and during a minor update).

    Change-Id: I0367a657ddf440f0b73c4de5346306f12439db15
    Closes-Bug: #1868533
    (cherry picked from commit 4d21bab8f2c9380ccf6ff02971101dfcfb4655fb)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 12.2.0

This issue was fixed in the openstack/tripleo-heat-templates 12.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.4.0

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.