K8s container terminates and is not recreated

Bug #1968745 reported by jarred wilson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Yang Kelvin Liu

Bug Description

I have hit an issue where the container for a Juju charm terminates and is not recreated. This very well could be an issue with the charm code, but there are no logs that would indicate to me that there is an issue with the code. The pod terminates when the SSL configuration is passed through which could be indicative that the charm misconfigures the SSL secrets in the Kubernetes volumeconfig and causes the issue.

The result is that the agent stays up and active, but the working pod is no longer there.

Here are the related logs from juju debug-log:

application-minio: 14:35:36 ERROR juju.worker.caasoperator.runner exited "minio/5": executing operation "remote init" for minio/5: attempt count exceeded: container not running not found
application-minio: 14:38:01 WARNING juju.worker.caasoperator.uniter.minio/5.operation we should run a leader-deposed hook here, but we can't yet
application-minio: 14:38:08 WARNING juju.worker.caasoperator stopping uniter for dead unit "minio/5": worker "minio/5" not found
application-minio: 14:39:14 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
application-minio: 14:47:17 WARNING juju.worker.caasoperator.uniter.minio/6.operation we should run a leader-deposed hook here, but we can't yet
application-minio: 14:51:28 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
application-minio: 17:08:13 WARNING juju.worker.caasoperator.uniter.minio/7.operation we should run a leader-deposed hook here, but we can't yet
application-minio: 17:08:20 WARNING juju.worker.caasoperator stopping uniter for dead unit "minio/7": worker "minio/7" not found
application-minio: 17:10:03 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
application-minio: 13:42:38 ERROR juju.worker.caasoperator.runner exited "minio/8": panic resulted in: runtime error: invalid memory address or nil pointer dereference
panic resulted in: runtime error: invalid memory address or nil pointer dereference
stacktrace:
goroutine 299 [running]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x65
github.com/juju/worker/v3/catacomb.runSafely.func1()
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/vendor/github.com/juju/worker/v3/catacomb/catacomb.go:292 +0x8f
panic({0x52bac00, 0x9b6fda0})
        /usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/juju/juju/worker/uniter/runner.(*runner).getRemoteEnviron(0xc000d9a210, 0xc0005bb1a0)
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/worker/uniter/runner/runner.go:748 +0x215
github.com/juju/juju/worker/uniter/runner.(*runner).runCommandsWithTimeout(0xc000d9a210, {0xc00014ef00, 0xd}, 0x45d964b800, {0x673f260, 0x9bfb4f0}, 0x2, 0xc0006b6520)
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/worker/uniter/runner/runner.go:225 +0x185
github.com/juju/juju/worker/uniter/runner.(*runner).runJujuRunAction(0xc000d9a210)
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/worker/uniter/runner/runner.go:311 +0x205
github.com/juju/juju/worker/uniter/runner.(*runner).RunAction(0xc000d9a210, {0xc00014eef0, 0xc0007f4830})
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/worker/uniter/runner/runner.go:370 +0x85
github.com/juju/juju/worker/uniter/operation.(*runAction).Execute(0xc000fc7180, {0x1, 0x1, 0x0, 0x1, 0x0, 0x1, {0x5d3bbbb, 0xa}, {0x5d31bea, ...}, ...})
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/worker/uniter/operation/runaction.go:107 +0x166
github.com/juju/juju/worker/uniter/operation.(*executor).do(0xc00006eb40, {0x6769668, 0xc0008c4378}, {{0x5d37064, 0x203000}, 0x5f6caa0})
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/worker/uniter/operation/executor.go:133 +0x1f4
github.com/juju/juju/worker/uniter/operation.(*executor).Run(0xc00006eb40, {0x6769668, 0xc0008c4378}, 0xc0005baa80)
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/worker/uniter/operation/executor.go:113 +0x505
github.com/juju/juju/worker/uniter/resolver.Loop({{0x6667160, 0xc0008c66c0}, {0x673e0a8, 0xc0002ec600}, {0x66c1cc8, 0xc00006eb40}, {0x67f6c40, 0xc00080e600}, 0xc00032cf60, 0xc00046b250, ...}, ...)
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/worker/uniter/resolver/loop.go:121 +0x845
github.com/juju/juju/worker/uniter.(*Uniter).loop(0xc000507680, {{0xc00045c3b5, 0x7}})
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/worker/uniter/uniter.go:519 +0x20a5
github.com/juju/juju/worker/uniter.newUniter.func2.1()
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/worker/uniter/uniter.go:283 +0x29
github.com/juju/worker/v3/catacomb.runSafely(0xc00067c070)
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/vendor/github.com/juju/worker/v3/catacomb/catacomb.go:295 +0x62
github.com/juju/worker/v3/catacomb.Invoke.func3()
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/vendor/github.com/juju/worker/v3/catacomb/catacomb.go:116 +0x6d
gopkg.in/tomb%2ev2.(*Tomb).run(0xc000507680, 0xc00067c070)
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x36
created by gopkg.in/tomb%2ev2.(*Tomb).Go
        /home/jenkins/workspace/build-juju-amd64/_build/src/github.com/juju/juju/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xf3

application-minio: 13:42:41 ERROR juju.worker.caasoperator could not get pod "unit-minio-8" "minio-0" pod "minio-0" not found
application-minio: 13:56:51 WARNING juju.worker.caasoperator.uniter.minio/8.operation we should run a leader-deposed hook here, but we can't yet
application-minio: 13:56:59 WARNING juju.worker.caasoperator stopping uniter for dead unit "minio/8": worker "minio/8" not found
application-minio: 13:58:02 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
application-minio: 14:02:24 ERROR juju.worker.caasoperator.uniter.minio/9 resolver loop error: executing operation "remote init" for minio/9: attempt count exceeded: container not running not found
application-minio: 14:02:24 ERROR juju.worker.caasoperator.runner exited "minio/9": executing operation "remote init" for minio/9: attempt count exceeded: container not running not found

To reproduce the issue, follow these steps:

1) need microk8s 1.21 and juju 2.9.28
2) git clone https://github.com/jardon/minio-operator.git
3) git checkout b2a04cb63ec319b96b7e8861a9d1f2c38c2c23ae
4) charmcraft pack
5) juju deploy ./minio_ubuntu-20.04-amd64.charm --resource oci-image=minio/minio:RELEASE.2021-09-03T03-56-13Z
6) juju config minio ssl-key=key
7) juju config minio ssl-cert=cert
8) juju config minio ssl-root-ca=root-ca

Revision history for this message
jarred wilson (jardon) wrote :

And to add to this, I did some debugging on the charm code, and the pod terminates as soon as the pod_spec is set.

Revision history for this message
Ian Booth (wallyworld) wrote :

A panic in the juju agent is always bad, regardless of the cause.

Changed in juju:
milestone: none → 2.9.29
importance: Undecided → Critical
status: New → Triaged
Changed in juju:
assignee: nobody → Yang Kelvin Liu (kelvin.liu)
Revision history for this message
jarred wilson (jardon) wrote :

Here is the output of kubectl describe statefulset -n kubeflow minio

https://pastebin.canonical.com/p/8xXXsJ2KPm/

Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :
Changed in juju:
status: Triaged → In Progress
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
jarred wilson (jardon) wrote :

While the nil pointer error is gone, the behavior of the pod not being recreated persists. The pastebin with the output from the statefulset contains an error that I am still observing.

Pod "minio-0" is invalid: spec.containers[0].volumeMounts[2].name: Not found: "minio-ssl"

I played around with PDB some after building the latest Juju from source and self._run('pod-spec-set', *args) is called with the following args (in this case, I simplified the secret config so theres only the ssl-key in the minio-ssl secret):

self = <ops.model._ModelBackend object at 0x7f401f3b51f0>
spec = {'version': 3, 'containers': [{'name': 'minio', 'args': ['server', '/data', '--console-address', ':9001'], 'imageDetails': {'imagePath': 'minio/minio:RELEASE.2021-09-03T03-56-13Z', 'username': '', 'password': ''}, 'ports': [{'name': 'minio', 'containerPort': 9000}, {'name': 'console', 'containerPort': 9001}], 'envConfig': {'minio-secret': {'secret': {'name': 'minio-secret'}}, 'configmap-hash': '06bfdc922e5f7a557ef3b9b59e438c67bda1a0aaa14538ac725a791fa217f5cb'}, 'volumeConfig': [{'name': 'minio-ssl', 'mountPath': '/root/.minio/certs/', 'secret': {'name': 'minio-ssl', 'defaultMode': 511, 'files': [{'path': 'private.key', 'key': 'PRIVATE_KEY'}]}}]}], 'kubernetesResources': {'secrets': [{'name': 'minio-secret', 'type': 'Opaque', 'data': {'MINIO_ACCESS_KEY': 'bWluaW8=', 'MINIO_SECRET_KEY': 'WjhKTklTNVhQR0VUNzhLNElBN1laM0RNTTFSVVNO'}}, {'name': 'minio-ssl', 'type': 'Opaque', 'data': {'PRIVATE_KEY': 'a2V5'}}]}}
k8s_resources = None

Changed in juju:
status: Fix Committed → Fix Released
Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :

rely #5, this is caused by a different bug: https://bugs.launchpad.net/juju/+bug/1973097
And it's fixed in 2.9 now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.