Canonical Juju

failed unit due to "shutting down: open [...]/charm/metadata.yaml: no such file or directory"

Bug #1882600 reported by Paul Collins on 2020-06-08

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Ian Booth	Canonical Juju 2.9.14

Bug Description

I've ended up with a failed k8s workload charm unit in a different model to that in LP:1882146 ("cannot exec"). However, there are differences.

1) The error is:

application-mattermost: 2020-06-08 21:37:32 DEBUG juju.worker.leadership mattermost/15 waiting for mattermost leadership release gave err: error blocking on leadership release: connection is shut down
application-mattermost: 2020-06-08 21:37:32 DEBUG juju.worker.caasoperator killing "mattermost/15"
application-mattermost: 2020-06-08 21:37:32 INFO juju.worker.caasoperator stopped "mattermost/15", err: leadership failure: error making a leadership claim: connection is shut down
application-mattermost: 2020-06-08 21:37:32 DEBUG juju.worker.caasoperator "mattermost/15" done: leadership failure: error making a leadership claim: connection is shut down
application-mattermost: 2020-06-08 21:37:32 ERROR juju.worker.caasoperator exited "mattermost/15": leadership failure: error making a leadership claim: connection is shut down
application-mattermost: 2020-06-08 21:37:32 DEBUG juju.worker.caasoperator no restart, removing "mattermost/15" from known workers
application-mattermost: 2020-06-08 21:37:40 DEBUG juju.worker.uniter starting uniter for "mattermost/15"
application-mattermost: 2020-06-08 21:37:40 DEBUG juju.worker.caasoperator start "mattermost/15"
application-mattermost: 2020-06-08 21:37:40 INFO juju.worker.caasoperator start "mattermost/15"
application-mattermost: 2020-06-08 21:37:40 DEBUG juju.worker.caasoperator "mattermost/15" started
application-mattermost: 2020-06-08 21:37:40 DEBUG juju.worker.leadership mattermost/15 making initial claim for mattermost leadership
application-mattermost: 2020-06-08 21:37:40 INFO juju.worker.uniter unit "mattermost/15" started
application-mattermost: 2020-06-08 21:37:50 INFO juju.worker.leadership mattermost leadership for mattermost/15 denied
application-mattermost: 2020-06-08 21:37:50 DEBUG juju.worker.leadership mattermost/15 is not mattermost leader
application-mattermost: 2020-06-08 21:37:50 DEBUG juju.worker.leadership mattermost/15 waiting for mattermost leadership release
application-mattermost: 2020-06-08 21:37:51 INFO juju.worker.uniter unit "mattermost/15" shutting down: open /var/lib/juju/agents/unit-mattermost-15/charm/metadata.yaml: no such file or directory
application-mattermost: 2020-06-08 21:37:51 DEBUG juju.worker.uniter.remotestate got leadership change for mattermost/15: leader
application-mattermost: 2020-06-08 21:37:51 INFO juju.worker.caasoperator stopped "mattermost/15", err: open /var/lib/juju/agents/unit-mattermost-15/charm/metadata.yaml: no such file or directory
application-mattermost: 2020-06-08 21:37:51 DEBUG juju.worker.caasoperator "mattermost/15" done: open /var/lib/juju/agents/unit-mattermost-15/charm/metadata.yaml: no such file or directory
application-mattermost: 2020-06-08 21:37:51 ERROR juju.worker.caasoperator exited "mattermost/15": open /var/lib/juju/agents/unit-mattermost-15/charm/metadata.yaml: no such file or directory
application-mattermost: 2020-06-08 21:37:51 INFO juju.worker.caasoperator restarting "mattermost/15" in 3s
application-mattermost: 2020-06-08 21:37:54 INFO juju.worker.caasoperator start "mattermost/15"
application-mattermost: 2020-06-08 21:37:54 DEBUG juju.worker.caasoperator "mattermost/15" started

2) Restarting the controller does not fix the problem.

Next I tried scaling down the application to 0 units. The other two units also got stuck in a similar although perhaps not identical state.

Then I thought I'd try copying the charm back into the unit directories on the mattermost-operator-0 pod, to see what would happen. This triggered a panic in one of the units and resulted in the other two entering a state whereby bouncing the controller did remove the units.

So at least we have a workaround, although since this model is hosting a soon-to-be-production service, it'd be nice not to have to rely on it.

See original description

Paul Collins (pjdc) on 2020-06-08

description:

updated

Paul Collins (pjdc) on 2020-06-08

description:

updated

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-06-11:

Is this juju 2.8.0?
Are the reproduction steps the same as the referenced bug?

Revision history for this message

Paul Collins (pjdc) wrote on 2020-06-22:

Yes, Juju 2.8.0. The reproduction steps are not entirely clear to me at this time.

However, it also just happened over the weekend when nobody was working on the model. Here's "juju debug-log"; mattermost/64 is the unit that got stuck: https://private-fileshare.canonical.com/~pjdc/lp1882600.txt

I was able to get rid of the stuck unit by copying the charm directory from application-mattermost to unit-mattermost-64 and bouncing the controllers.

Revision history for this message

Barry Price (barryprice) wrote on 2020-09-30:

Ran into this same issue today after an upgrade to 2.8.3

Pen Gale (pengale) on 2020-10-26

Changed in juju:
status:	New → Triaged
importance:	Undecided → Medium

Paul Collins (pjdc) on 2021-01-05

description:

updated

Revision history for this message

Ian Booth (wallyworld) wrote on 2021-08-11:

Seen again today with mattermost charm

Changed in juju:
milestone:	none → 2.9.12
importance:	Medium → High

Revision history for this message

Haw Loeung (hloeung) wrote on 2021-08-11:

Ran into this earlier, model running Juju 2.8.7. The workaround to copy the charm directory on the operator pod worked.

Canonical Juju QA Bot (juju-qa-bot) on 2021-08-25

Changed in juju:
milestone:	2.9.12 → 2.9.13

Ian Booth (wallyworld) on 2021-09-03

Changed in juju:
assignee:	nobody → Ian Booth (wallyworld)
status:	Triaged → In Progress

Ian Booth (wallyworld) on 2021-09-08

Changed in juju:
status:	In Progress → Fix Committed

Ian Booth (wallyworld) on 2021-09-10

Changed in juju:
milestone:	2.9.13 → 2.9.14

Canonical Juju QA Bot (juju-qa-bot) on 2021-09-13

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.