Switching primaries in a juju controller caused CrashLoopBackOff in some pods
Bug #2039418 reported by
Tom Haddon
This bug affects 3 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Earlier today we had an issue where we were seeing high load on a juju controller. We switched primaries and a number of k8s pods (not all) went into CrashLoopBackoff with the following being the entirety of the log output in the charm-init container:
ERROR opening "/charm/
Deleting pods (or triggering a charm upgrade which lead to a rescheduling of pods) seems to fix the issue.
The controller in this case was running Juju 2.9.44
tags: | added: canonical-is |
description: | updated |
Changed in juju: | |
status: | New → Confirmed |
To post a comment you must log in.
Similarly in our case, we have a k8s charm deployed against the same controller and saw the number of ready pods in the stateful set drop to 0 when the Juju controller primary was changed.
Looking at our Grafana dashboards, almost to the minute when the command `rs.stepDown(120)` was run on the primary, the number of ready pods dropped and then started coming back up 4-5 minutes later. In this case the pods didn't enter a CrashLoopBackoff though.
And this behaviour is identical to what we see when the Juju controllers are restarted as mentioned in https:/ /bugs.launchpad .net/juju/ +bug/2036594
Here is the output from `juju debug-log --replay` around the time. I've removed the controller IPs but out of an abundance of caution it's a Canonical only pastebin - https:/ /pastebin. canonical. com/p/hQC53MqGr 3/
And finally logs from the charm-init container don't seem all too helpful
$ kubectl logs <unit-0> -n <namespace> -c charm-init
starting containeragent init command