Comment 12 for bug 1875481

Revision history for this message
Ian Booth (wallyworld) wrote :

TL;DR; a quick win is to fix the charms to do a is_leader() check before doing leader only calls.

So there's a few things here.

In trying to reproduce on microk8s, I've had it work many times and fail a few times. The charms which have failed have been dex-auth and katib-controller. One thing to note about the charms is that start_charm() in dex-auth does not appear to have an is_leader() check. This check in needed in *all* charms that need to use leader only api calls. So this needs to be fixed in any of the charms in the bundle that don't do that check.

One way is which juju could trigger a pod bounce is in how it does write out the deployment yaml - it's not reading the existing yaml and updating, so the result is a new replicaset which means a bounce of the pod(s). That needs fixing in Juju, but doesn't appear to be the issue here.

When the issue has been observed, extra logging added to juju appears to show that juju is creating the deployment with scale 1 and correctly leaving it alone after that. Something else is causing the pod to bounce and this triggers the cycle of new pod -> new unit -> start_charm() -> error not leader. Fixing bug 1469731 will mask the issue somewhat.

Adding to 2.8 milestone to track the work to improve how juju creates deployments.