Resource cleanup sometimes hangs

Bug #1822962 reported by Liam Young
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Triaged
Low
Unassigned

Bug Description

The charm performs resource cleanups but these sometimes timeout with messages like:

# crm resource cleanup juju-dd3082-zaza-493257a251ad-16
Cleaned up juju-dd3082-zaza-493257a251ad-16 on juju-dd3082-zaza-493257a251ad-10
Cleaned up juju-dd3082-zaza-493257a251ad-16 on juju-dd3082-zaza-493257a251ad-11
Cleaned up juju-dd3082-zaza-493257a251ad-16 on juju-dd3082-zaza-493257a251ad-9
Waiting for 3 replies from the CRMd..No messages received in 60 seconds.. aborting

These cleanup commands are house keeping and do not seem to effect functionality but it would be good to get to the bottom of what is going on. In most cases it is just one node not responding, as in case below where juju-dd3082-zaza-493257a251ad-10 is not responding:

# crm resource cleanup juju-dd3082-zaza-493257a251ad-16 juju-dd3082-zaza-493257a251ad-9
Cleaned up juju-dd3082-zaza-493257a251ad-16 on juju-dd3082-zaza-493257a251ad-9
Waiting for 1 replies from the CRMd. OK

# crm resource cleanup juju-dd3082-zaza-493257a251ad-16 juju-dd3082-zaza-493257a251ad-11
Cleaned up juju-dd3082-zaza-493257a251ad-16 on juju-dd3082-zaza-493257a251ad-11
Waiting for 1 replies from the CRMd. OK

# crm resource cleanup juju-dd3082-zaza-493257a251ad-16 juju-dd3082-zaza-493257a251ad-10
Cleaned up juju-dd3082-zaza-493257a251ad-16 on juju-dd3082-zaza-493257a251ad-10
Waiting for 1 replies from the CRMdNo messages received in 60 seconds.. aborting

Oddly, moving the resource onto the node that isn't responding works fine:

# crm_resource -M -r juju-dd3082-zaza-493257a251ad-16 -H juju-dd3082-zaza-493257a251ad-10
# crm status | grep juju-dd3082-zaza-493257a251ad-16
RemoteOnline: [ juju-dd3082-zaza-493257a251ad-16 juju-dd3082-zaza-493257a251ad-17 juju-dd3082-zaza-493257a251ad-18 ]
 juju-dd3082-zaza-493257a251ad-16 (ocf::pacemaker:remote): Started juju-dd3082-zaza-493257a251ad-10

But cleanup still fails:

# crm resource cleanup juju-dd3082-zaza-493257a251ad-16 juju-dd3082-zaza-493257a251ad-10
Cleaned up juju-dd3082-zaza-493257a251ad-16 on juju-dd3082-zaza-493257a251ad-10
Waiting for 1 replies from the CRMdNo messages received in 60 seconds.. aborting

Until corosync is restarted on the errant node:

$ juju ssh hacluster/2 "uname -a; sudo systemctl restart corosync"
Linux juju-dd3082-zaza-493257a251ad-10 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Connection to 10.5.0.109 closed.

# crm resource cleanup juju-dd3082-zaza-493257a251ad-16 juju-dd3082-zaza-493257a251ad-10
Cleaned up juju-dd3082-zaza-493257a251ad-16 on juju-dd3082-zaza-493257a251ad-10
Waiting for 1 replies from the CRMd. OK

Changed in charm-hacluster:
status: New → Triaged
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.