Comment 1 for bug 1890491

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

I am able to reproduce a similar issue with the following bundle: https://paste.ubuntu.com/p/VJ3m7nMN79/

Resource created with
sudo pcs resource create test2 ocf:pacemaker:Dummy op_sleep=10 op monitor interval=30s timeout=30s op start timeout=30s op stop timeout=30s

juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers juju-acda3d-pacemaker-remote-10.cloud.sts"
juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers juju-acda3d-pacemaker-remote-11.cloud.sts"
juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers juju-acda3d-pacemaker-remote-12.cloud.sts"

Online: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 juju-acda3d-pacemaker-remote-9 ]
RemoteOnline: [ juju-acda3d-pacemaker-remote-10.cloud.sts juju-acda3d-pacemaker-remote-11.cloud.sts juju-acda3d-pacemaker-remote-12.cloud.sts ]

Full list of resources:

Resource Group: grp_nova_vips
res_nova_bf9661e_vip (ocf::heartbeat:IPaddr2): Started juju-acda3d-pacemaker-remote-7
Clone Set: cl_nova_haproxy [res_nova_haproxy]
Started: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 juju-acda3d-pacemaker-remote-9 ]
juju-acda3d-pacemaker-remote-10.cloud.sts (ocf::pacemaker:remote): Started juju-acda3d-pacemaker-remote-8
juju-acda3d-pacemaker-remote-12.cloud.sts (ocf::pacemaker:remote): Started juju-acda3d-pacemaker-remote-8
juju-acda3d-pacemaker-remote-11.cloud.sts (ocf::pacemaker:remote): Started juju-acda3d-pacemaker-remote-7

test2 (ocf::pacemaker:Dummy): Started juju-acda3d-pacemaker-remote-10.cloud.sts

## After running the following commands on juju-acda3d-pacemaker-remote-10.cloud.sts

1) sudo systemctl stop pacemaker_remote
2) forcedfully shutdown (openstack server stop xxxx) in less than 10 seconds after the pacemaker_remote gets
executed.

Remote is shutdown

RemoteOFFLINE: [ juju-acda3d-pacemaker-remote-10.cloud.sts ]

The resource status remains as stopped across the 3 machines, and doesn't recovers.

$ juju run --application nova-cloud-controller "sudo pcs resource show | grep -i test2"
- Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n"
UnitId: nova-cloud-controller/0
- Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n"
UnitId: nova-cloud-controller/1
- Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n"
UnitId: nova-cloud-controller/2

However, If I do a clean shutdown (without interrupting the pacemaker_remote fence), that ends up
with the resource migrated correctly to another node.

6 nodes configured
9 resources configured

Online: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 juju-acda3d-pacemaker-remote-9 ]
RemoteOnline: [ juju-acda3d-pacemaker-remote-11.cloud.sts juju-acda3d-pacemaker-remote-12.cloud.sts ]
RemoteOFFLINE: [ juju-acda3d-pacemaker-remote-10.cloud.sts ]

Full list of resources:

[...]
test2 (ocf::pacemaker:Dummy): Started juju-acda3d-pacemaker-remote-12.cloud.sts

I will keep investigating this behavior and determine is this is linked to the bug reported.