jujuc unable to execute in k8s charm container
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Medium
|
Ian Booth |
Bug Description
After some unknown cause, a customer reported to us that a unit was in a bad state. When we investigated, it was found that the charm hooks were unable to run properly, as when they ran `juju-log` it died with a SIGKILL.
This was not found to have any connection to an out-of-memory KILL or other causes coming from the k8s host. Nothing relevant was reported in the hosts kern/syslog.
By manually doing a `kubectl exec -it ... -- /bin/bash` into the container, I was able to run `/charm/bin/jujuc` and it would die immediately with the message "KILLED". On a different pod which was working properly, this would instead exit with the message "ERROR JUJU_CONTEXT_ID not set".
The "fix" was to simply restart the pod, however we don't know why it entered this state in the first place.
This happened for 3 pods, corresponding to the charms:
- jupyter-ui (rev 25)
- seldon-core (rev 354)
- training-operator (rev 215)
I tried some more debugging:
- Running `/charm/bin/pebble` manually in the container did not provoke the same KILL result. It exited with the expected help text.
- Installing strace and running `strace jujuc` showed that the process was being KILLed very early, even before it had a chance to setup it's signal hooks.
- Running `strace --inject=
Could this be due to pebble? Or an odd configuration issue with the kubelet?
I forgot versions of the juju here. This is on a k8s model with juju agent version 2.9.43.