OpenStack Pacemaker Remote Charm

Bug #1889094
Comment #1

Comment 1 for bug 1889094

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-04: Fix merged to charm-hacluster (master)

Reviewed: https://review.opendev.org/743742
Committed: https://git.openstack.org/cgit/openstack/charm-hacluster/commit/?id=b40a6754b0256058213afcde80174ca7e730a403
Submitter: Zuul
Branch: master

commit b40a6754b0256058213afcde80174ca7e730a403
Author: Liam Young <email address hidden>
Date: Wed Jul 29 11:59:43 2020 +0000

Create null stonith resource for lxd containers.

    If stonith is enabled then when a compute node is detected as failed
    it is powered down. This can include a lxd container which is also
    part of the cluster. In this case because stonith is enabled at a
    global level, pacemaker will try and power off the lxd container
    too. But the container does not have a stonith device and this causes
    the container to be marked as unclean (but not down). This running
    unclean state prevents resources being moved and causes any
    pacemaker-remotes that are associated with the lost container from
    losing their connection which prevents masakari hostmonitor from
    ascertaining the cluster health.

    The way to work around this is to create a dummy stonith device for
    the lxd containers. This allows the cluster to properly mark the lost
    container as down and resources are relocated.

Change-Id: Ic45dbdd9d8581f25549580c7e98a8d6e0bf8c3e7
Partial-Bug: #1889094