Resource cleanup sometimes hangs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack HA Cluster Charm |
Triaged
|
Low
|
Unassigned |
Bug Description
The charm performs resource cleanups but these sometimes timeout with messages like:
# crm resource cleanup juju-dd3082-
Cleaned up juju-dd3082-
Cleaned up juju-dd3082-
Cleaned up juju-dd3082-
Waiting for 3 replies from the CRMd..No messages received in 60 seconds.. aborting
These cleanup commands are house keeping and do not seem to effect functionality but it would be good to get to the bottom of what is going on. In most cases it is just one node not responding, as in case below where juju-dd3082-
# crm resource cleanup juju-dd3082-
Cleaned up juju-dd3082-
Waiting for 1 replies from the CRMd. OK
# crm resource cleanup juju-dd3082-
Cleaned up juju-dd3082-
Waiting for 1 replies from the CRMd. OK
# crm resource cleanup juju-dd3082-
Cleaned up juju-dd3082-
Waiting for 1 replies from the CRMdNo messages received in 60 seconds.. aborting
Oddly, moving the resource onto the node that isn't responding works fine:
# crm_resource -M -r juju-dd3082-
# crm status | grep juju-dd3082-
RemoteOnline: [ juju-dd3082-
juju-dd3082-
But cleanup still fails:
# crm resource cleanup juju-dd3082-
Cleaned up juju-dd3082-
Waiting for 1 replies from the CRMdNo messages received in 60 seconds.. aborting
Until corosync is restarted on the errant node:
$ juju ssh hacluster/2 "uname -a; sudo systemctl restart corosync"
Linux juju-dd3082-
Connection to 10.5.0.109 closed.
# crm resource cleanup juju-dd3082-
Cleaned up juju-dd3082-
Waiting for 1 replies from the CRMd. OK
Changed in charm-hacluster: | |
status: | New → Triaged |
importance: | Undecided → Low |