juju-core

Bug #1465307
Comment #14

Comment 14 for bug 1465307

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2016-01-19:

#14

I'm seeing something similar in 1.25.0. Juju status shows that all agents are lost:
[Services]
NAME STATUS EXPOSED CHARM
haproxy unknown false cs:trusty/haproxy-15
landscape-server unknown false cs:trusty/landscape-server-13
postgresql active false cs:trusty/postgresql-31
rabbitmq-server active false cs:trusty/rabbitmq-server-42

[Machines]
ID STATE VERSION DNS INS-ID SERIES HARDWARE
0 started 1.25.0 lds-ci.scapestack /MAAS/api/1.0/nodes/node-5348ce0e-9df3-11e5-ab31-2c59e54ace76/ trusty arch=amd64 cpu-cores=2 mem=4096M

On machine 0, this is the last machine-0.log entry:
2016-01-05 12:12:24 ERROR juju.state.leadership manager.go:72 stopping leadership manager with error: read tcp 10.96.10.219:37017: i/o timeout

all-machines.log is quite big, 244Mb, and is full of these at the end for all units:
unit-landscape-server-0[955]: 2016-01-19 17:54:08 WARNING juju.worker.dependency engine.go:304 failed to start "uniter" manifold worker: dependency not available
unit-rabbitmq-server-0[943]: 2016-01-19 17:54:08 WARNING juju.worker.dependency engine.go:304 failed to start "uniter" manifold worker: dependency not available

And this is the first message about tomb dying:
unit-rabbitmq-server-0[943]: 2016-01-05 12:12:31 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying

It kind of matches the timestamp of the last entry in the machine-0.log file.

I'm attaching these logs to the bug.

I'm seeing something similar in 1.25.0. Juju status shows that all agents are lost:
[Services]       
NAME             STATUS  EXPOSED CHARM                         
haproxy          unknown false   cs:trusty/haproxy-15          
landscape-server unknown false   cs:trusty/landscape-server-13 
postgresql       active  false   cs:trusty/postgresql-31       
rabbitmq-server  active  false   cs:trusty/rabbitmq-server-42

[Units]            
ID                 WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS                    PUBLIC-ADDRESS MESSAGE                                                            
haproxy/0          unknown        lost        1.25.0  0/lxc/0 80/tcp,443/tcp,10000/tcp 10.96.9.128    agent is lost, sorry! See 'juju status-history haproxy/0'          
landscape-server/0 unknown        lost        1.25.0  0/lxc/1                          10.96.8.165    agent is lost, sorry! See 'juju status-history landscape-server/0' 
postgresql/0       unknown        lost        1.25.0  0/lxc/2 5432/tcp                 10.96.9.108    agent is lost, sorry! See 'juju status-history postgresql/0'       
rabbitmq-server/0  unknown        lost        1.25.0  0/lxc/3 5672/tcp                 10.96.3.82     agent is lost, sorry! See 'juju status-history rabbitmq-server/0'

[Machines] 
ID         STATE   VERSION DNS               INS-ID                                                         SERIES HARDWARE                         
0          started 1.25.0  lds-ci.scapestack /MAAS/api/1.0/nodes/node-5348ce0e-9df3-11e5-ab31-2c59e54ace76/ trusty arch=amd64 cpu-cores=2 mem=4096M

On machine 0, this is the last machine-0.log entry:
2016-01-05 12:12:24 ERROR juju.state.leadership manager.go:72 stopping leadership manager with error: read tcp 10.96.10.219:37017: i/o timeout

And this is the first message about tomb dying:
unit-rabbitmq-server-0[943]: 2016-01-05 12:12:31 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying

It kind of matches the timestamp of the last entry in the machine-0.log file.

I'm attaching these logs to the bug.