Comment 1 for bug 1902793

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Are logs available from the juju units and systemd journal to help corroborate the restarts of the nodes? The charm has built-in logic to roll restarts through the cluster (since the 17.11 charm release). By default, it will wait 30 seconds between each node to do the restart.

It takes the unit number and performs modulo division with the number of units found, and multiplies that by the known-wait time (default to 30 seconds). I can see that this strategy would provide a collision and restart services at the same time on two different nodes in the cluster, if all of the unit numbers result in the same modulo division (i.e. 0, 3, 6 or 1, 4, 7).

The charm also provides module-nodes and known-wait config options as tunable parameters to fine tune these values.