I saw this on a recent deploy and was able to reproduce using a stub maas server. The problem appears to be that there is a window between a pacemaker remote resource being added and the location properties for that resource being added. In this window the resource is down and pacemaker fences the node.
The charm currently does:
1) Set stonith-enabled=true cluster property
2) Add maas stonith device that controls pacemaker node that has not yet been added.
3) Add pacemaker remote node
4) Add pacemaker location rules.
I think the following two fixes are needed:
For initial deploy updatw the charm so it does not enable stonith until stonith resources and pacemaker remotes have been added.
For scale-out do not add the new pacemaker remote stonith resource until the corresponding pacemaker resource has been added along with its location rules.
I saw this on a recent deploy and was able to reproduce using a stub maas server. The problem appears to be that there is a window between a pacemaker remote resource being added and the location properties for that resource being added. In this window the resource is down and pacemaker fences the node.
The charm currently does:
1) Set stonith- enabled= true cluster property
2) Add maas stonith device that controls pacemaker node that has not yet been added.
3) Add pacemaker remote node
4) Add pacemaker location rules.
I think the following two fixes are needed:
For initial deploy updatw the charm so it does not enable stonith until stonith resources and pacemaker remotes have been added.
For scale-out do not add the new pacemaker remote stonith resource until the corresponding pacemaker resource has been added along with its location rules.