Comment 0 for bug 1882600

Revision history for this message
Paul Collins (pjdc) wrote :

I've ended up with a failed k8s workload charm unit in a different model to that in LP:1882146. However, there are differences.

1) The error is:

application-mattermost: 2020-06-08 21:37:32 DEBUG juju.worker.leadership mattermost/15 waiting for mattermost leadership release gave err: error blocking on leadership release: connection is shut down
application-mattermost: 2020-06-08 21:37:32 DEBUG juju.worker.caasoperator killing "mattermost/15"
application-mattermost: 2020-06-08 21:37:32 INFO juju.worker.caasoperator stopped "mattermost/15", err: leadership failure: error making a leadership claim: connection is shut down
application-mattermost: 2020-06-08 21:37:32 DEBUG juju.worker.caasoperator "mattermost/15" done: leadership failure: error making a leadership claim: connection is shut down
application-mattermost: 2020-06-08 21:37:32 ERROR juju.worker.caasoperator exited "mattermost/15": leadership failure: error making a leadership claim: connection is shut down
application-mattermost: 2020-06-08 21:37:32 DEBUG juju.worker.caasoperator no restart, removing "mattermost/15" from known workers
application-mattermost: 2020-06-08 21:37:40 DEBUG juju.worker.uniter starting uniter for "mattermost/15"
application-mattermost: 2020-06-08 21:37:40 DEBUG juju.worker.caasoperator start "mattermost/15"
application-mattermost: 2020-06-08 21:37:40 INFO juju.worker.caasoperator start "mattermost/15"
application-mattermost: 2020-06-08 21:37:40 DEBUG juju.worker.caasoperator "mattermost/15" started
application-mattermost: 2020-06-08 21:37:40 DEBUG juju.worker.leadership mattermost/15 making initial claim for mattermost leadership
application-mattermost: 2020-06-08 21:37:40 INFO juju.worker.uniter unit "mattermost/15" started
application-mattermost: 2020-06-08 21:37:50 INFO juju.worker.leadership mattermost leadership for mattermost/15 denied
application-mattermost: 2020-06-08 21:37:50 DEBUG juju.worker.leadership mattermost/15 is not mattermost leader
application-mattermost: 2020-06-08 21:37:50 DEBUG juju.worker.leadership mattermost/15 waiting for mattermost leadership release
application-mattermost: 2020-06-08 21:37:51 INFO juju.worker.uniter unit "mattermost/15" shutting down: open /var/lib/juju/agents/unit-mattermost-15/charm/metadata.yaml: no such file or directory
application-mattermost: 2020-06-08 21:37:51 DEBUG juju.worker.uniter.remotestate got leadership change for mattermost/15: leader
application-mattermost: 2020-06-08 21:37:51 INFO juju.worker.caasoperator stopped "mattermost/15", err: open /var/lib/juju/agents/unit-mattermost-15/charm/metadata.yaml: no such file or directory
application-mattermost: 2020-06-08 21:37:51 DEBUG juju.worker.caasoperator "mattermost/15" done: open /var/lib/juju/agents/unit-mattermost-15/charm/metadata.yaml: no such file or directory
application-mattermost: 2020-06-08 21:37:51 ERROR juju.worker.caasoperator exited "mattermost/15": open /var/lib/juju/agents/unit-mattermost-15/charm/metadata.yaml: no such file or directory
application-mattermost: 2020-06-08 21:37:51 INFO juju.worker.caasoperator restarting "mattermost/15" in 3s
application-mattermost: 2020-06-08 21:37:54 INFO juju.worker.caasoperator start "mattermost/15"
application-mattermost: 2020-06-08 21:37:54 DEBUG juju.worker.caasoperator "mattermost/15" started

2) Restarting the controller does not fix the problem.

I thought I'd try copying the charm back into the unit directories on the modeloperator unit, to see what would happen. This triggered a panic in one of the units and resulted in the other two entering a state whereby bouncing the controller did remove the units.

So at least we have a workaround, although since this model is hosting a soon-to-be-production service, it'd be nice not to have to rely on it.