Broken agent complaints about tomb: dying

Bug #1661681 reported by Jacek Nykis
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Expired
Undecided
Unassigned

Bug Description

I have an environment where a subordinate unit got into weird state.

Juju status does not show any problems but juju run fails for that unit with this error message:
dial unix /var/lib/juju/agents/unit-landscape-client-6/run.socket: connect: no such file or directory

Logs for the unit show this:
2017-02-03 16:41:15 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
2017-02-03 16:41:19 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
2017-02-03 16:41:22 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
2017-02-03 16:41:25 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
2017-02-03 16:41:29 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
2017-02-03 16:41:32 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
2017-02-03 16:41:35 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying

I tried restarting agents a few times but that did not help at all.

This is currently impacting production environment. The charm will not do what it's supposed to and also juju run is non functional.

This looks very similar to #1613992 which is marked "Fix released" in 1.25.8 so possibly different root cause.

Juju 1.25.8
Ubuntu 16.04.1 LTS

Jacek Nykis (jacekn)
description: updated
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Jacek,
Is it a newly bootstrapped environment or has it been running for a while before the error occurred?

Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
importance: Critical → Undecided
status: Triaged → Incomplete
Revision history for this message
Jacek Nykis (jacekn) wrote :

It's been running for a while and was fine but without the subordinate in question.

When I added the subordinate it worked fine on 5 out of 6 units but one ended up with "tomb: dying" problem

Changed in juju-core:
status: Incomplete → New
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Jacek,

Do you mean that 5 out of 6 failed to come up to start with? Or all units came up fine and 1 failed later?

The bug you've referenced seemed to be related to re-deploying - i.e. destroying and then deploying again. Is it what you were doing?

Also, is there any chance to attach the logs? The extract in the description tells us what Juju reports after the failure. It would be useful to know what's in the logs before...

What provider (cloud) is this environment on?

Changed in juju-core:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju-core because there has been no activity for 60 days.]

Changed in juju-core:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.