If stonith is enabled then when a compute node is detected as failed
it is powered down. This can include a lxd container which is also
part of the cluster. In this case because stonith is enabled at a
global level, pacemaker will try and power off the lxd container
too. But the container does not have a stonith device and this causes
the container to be marked as unclean (but not down). This running
unclean state prevents resources being moved and causes any
pacemaker-remotes that are associated with the lost container from
losing their connection which prevents masakari hostmonitor from
ascertaining the cluster health.
The way to work around this is to create a dummy stonith device for
the lxd containers. This allows the cluster to properly mark the lost
container as down and resources are relocated.
Reviewed: https:/ /review. opendev. org/743742 /git.openstack. org/cgit/ openstack/ charm-hacluster /commit/ ?id=b40a6754b02 56058213afcde80 174ca7e730a403
Committed: https:/
Submitter: Zuul
Branch: master
commit b40a6754b025605 8213afcde80174c a7e730a403
Author: Liam Young <email address hidden>
Date: Wed Jul 29 11:59:43 2020 +0000
Create null stonith resource for lxd containers.
If stonith is enabled then when a compute node is detected as failed remotes that are associated with the lost container from
it is powered down. This can include a lxd container which is also
part of the cluster. In this case because stonith is enabled at a
global level, pacemaker will try and power off the lxd container
too. But the container does not have a stonith device and this causes
the container to be marked as unclean (but not down). This running
unclean state prevents resources being moved and causes any
pacemaker-
losing their connection which prevents masakari hostmonitor from
ascertaining the cluster health.
The way to work around this is to create a dummy stonith device for
the lxd containers. This allows the cluster to properly mark the lost
container as down and resources are relocated.
Change-Id: Ic45dbdd9d8581f 25549580c7e98a8 d6e0bf8c3e7
Partial-Bug: #1889094