Comment 10 for bug 1884284

Revision history for this message
Liam Young (gnuoy) wrote :

I saw this on a recent deploy and was able to reproduce using a stub maas server. The problem appears to be that there is a window between a pacemaker remote resource being added and the location properties for that resource being added. In this window the resource is down and pacemaker fences the node.

The charm currently does:

1) Set stonith-enabled=true cluster property
2) Add maas stonith device that controls pacemaker node that has not yet been added.
3) Add pacemaker remote node
4) Add pacemaker location rules.

I think the following two fixes are needed:

For initial deploy updatw the charm so it does not enable stonith until stonith resources and pacemaker remotes have been added.

For scale-out do not add the new pacemaker remote stonith resource until the corresponding pacemaker resource has been added along with its location rules.