OpenStack HA Cluster Charm

Bug #1903745
Comment #21

Comment 21 for bug 1903745

Revision history for this message

Billy Olsen (billy-olsen) wrote on 2020-11-17:

#21

Based on discussion with Trent, who has access to more logs and data than I currently do for this, all signs are indeed pointing to the override timeouts provided by the charm itself.

A viable work-around to prevent this is to tweak the service_stop_timeout config on the hacluster charm to be higher than the 60 second default. Setting it to 1800 would restore this to package's default.

I am also going to invalidate the pacemaker task as it wasn't caused by the change to pacemaker and a more targeted bug to tweak the behavior of whether the service starts/stops should be raised instead.

An investigation on possible alternatives for dealing with the upgrades and maintenance mode of the cluster should be pursued outside the bounds of this particular bug.

As a work-around is available, I'll reduce subscribe field-high/remove field-critical while working on a patch to change the service timeout defaults.