1.25.6 "ERROR juju.worker.uniter.filter filter.go:137 tomb: dying"

Bug #1613992 reported by Nick Moffitt
50
This bug affects 10 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Unassigned
1.25
Fix Released
Critical
Unassigned

Bug Description

After destroying the only unit in a service and re-deploying a new one with a new charm revision, I got a lot of the following in the unit log:

    ==> unit-prometheus-3.log <==
    2016-08-16 08:33:38 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
    2016-08-16 08:33:41 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
    2016-08-16 08:33:44 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
    2016-08-16 08:33:48 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
    2016-08-16 08:33:51 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
    2016-08-16 08:33:54 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
    2016-08-16 08:33:58 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying

No amount of restarting agents seems to un-stick this, and a subsequent hard terminate and redeploy doesn't help.

Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 1.25.7
milestone: 1.25.7 → 2.0-beta17
importance: Critical → High
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta17 → none
milestone: none → 2.0-beta17
Changed in juju-core:
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
Charles Butler (lazypower) wrote :
Changed in juju:
milestone: 2.0-beta17 → 2.0-beta18
Tim Penhey (thumper)
no longer affects: juju
tags: added: landscape
tags: added: kanban-cross-team
tags: removed: kanban-cross-team
Changed in juju-core:
status: Triaged → Won't Fix
importance: Critical → Undecided
Chris Gregan (cgregan)
tags: added: cdo-qa-blocker
Revision history for this message
Free Ekanayaka (free.ekanayaka) wrote :

I experienced the same issue.

Revision history for this message
Junien F (axino) wrote :

So did I

Revision history for this message
Paul Larson (pwlars) wrote :

After several attempts at redeploying to work around this, I destroyed my environment, rebootstrapped did a juju upgrade-juju --version=1.25.5 and redeployed. I'm not sure if I just got lucky or not, but this did work for me to get it going again.

Revision history for this message
Yangzheng Bai (zoy) wrote :

Experience the same problem after remove old service and redeploy with a newer version. Unless destroy the whole environment, nothing else can solve this tomb dying problem. juju version is 1.25.0-trusty-amd64

unit-wrk9-0[14647]: 2016-09-16 19:49:01 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying

Revision history for this message
Yangzheng Bai (zoy) wrote :

juju agent-version: 1.25.6, service status message: Waiting for agent initialization to finish

This problem is not related to any specific charm. We hit it with 3 different charms. The problem arises when we try to remove a successfully deployed service and redeploy a modified one. Our environment is latest xenial and latest juju. It happens on both arm aarch64 and intel xeon.

The juju server version 1.25.0-trusty-amd64 is different from juju agent-version: 1.25.6

We will try to use --no-auto-upgrade to bootstrap environment again.

logs:
unit-ngi4-0[13689]: 2016-09-16 21:32:31 WARNING juju.worker.uniter.operation leader.go:115 we should run a leader-deposed hook here, but we can't yet
unit-ngi4-0[13689]: 2016-09-16 21:32:33 WARNING juju.worker.uniter.operation metrics.go:50 failed to create a metric reader: failed to open spool directory "/var/lib/juju/agents/unit-ngi4-0/state/spool/metrics": stat /var/lib/juju/agents/unit-ngi4-0/state/spool/metrics: no such file or directory
unit-ngi4-0[13689]: 2016-09-16 21:32:33 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
unit-ngi4-0[13689]: 2016-09-16 21:32:33 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: crypto/tls: use of closed connection
unit-ngi4-0[13689]: 2016-09-16 21:32:33 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
unit-ngi4-0[13689]: 2016-09-16 21:32:33 ERROR juju.worker runner.go:212 fatal "api": agent should be terminated
unit-ngi4-0[13689]: message repeated 5 times: [ 2016-09-16 21:32:33 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down]
machine-0: message repeated 5 times: [2016-09-16 21:30:38 ERROR juju.rpc server.go:573 error writing response: write tcp 10.118.zzz.xxx:17070->10.118.zzz.aaa:49670: write: connection reset by peer]
machine-0: 2016-09-16 21:32:33 ERROR juju.rpc server.go:573 error writing response: write tcp 10.118.zzz.xxx:17070->10.118.zzz.bbb:36298: write: broken pipe

Barry Price (barryprice)
tags: added: caonical-is
tags: added: canonical-is
removed: caonical-is
Changed in juju-core:
status: Won't Fix → Triaged
importance: Undecided → Critical
Revision history for this message
Anastasia (anastasia-macmood) wrote :

According to wgrant, the description of this failure matches bug # 1626304 as well the symptoms that blr and thomi saw. This may be a duplicate \o/

Revision history for this message
Brad Marshall (brad-marshall) wrote :

I've seen this on a Openstack deployment with 1.25.6, a certain small subset of our agents would just enter a failed agent state, with the tomb dying error message. Since I've upgraded to 1.25.8 on this deploy I haven't seen it since.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Based on Brad's comment, this may have been fixed in our highly anticipated next 1.25.x \o/

Changed in juju-core:
status: Triaged → Fix Committed
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.