Other interesting lines:
2019-05-15 02:21:35 INFO juju.state.presence presence.go:194 watcher loop failed: write tcp 127.0.0.1:38766->127.0.0.1:37017: i/o timeout
...
2019-05-15 02:21:35 INFO juju.state multiwatcher.go:212 store manager loop failed: get unit "neutron-openvswitch/29": cannot get unit "neutron-openvswitch/29": write tcp 127.0.0.1:39172->127.0.0.1:37017: i/o timeout
all of that indicates the database started to stop responding to queries
2019-05-15 02:21:36 ERROR juju.worker.dependency engine.go:636 "is-responsible-flag" manifold worker returned unexpected error: lease manager stopped
...
2019-05-15 02:21:37 ERROR juju.worker.dependency engine.go:636 "is-responsible-flag" manifold worker returned unexpected error: lease manager stopped
in 1 second that line is repeated 409 times.
while that is happening we do see a line like:
2019-05-15 02:21:37 INFO juju.apiserver.connection request_notifier.go:96 agent login: unit-nrpe-physical-24 for 24ecac33-8390-4ad6-80b6-6394a88c74e6
so some sort of login is working.
2019-05-15 02:21:38 WARNING juju.environs.config config.go:1570 unknown config field "tools-metadata-url"
^- this is repeated *many* times. The config field is supposed to be "agent-metadata-url". (but it was migrated from tools-metadata-url in the past.)
...
2019-05-15 02:21:40 INFO juju.apiserver.connection request_notifier.go:125 agent disconnected: unit-neutron-openvswitch-26 for 24ecac33-8390-4ad6-80b6-6394a88c74e6
2019-05-15 02:21:40 INFO juju.agent uninstall.go:36 marking agent ready for uninstall
2019-05-15 02:21:40 INFO juju.worker.stateconfigwatcher manifold.go:119 tomb dying
^- That is the call to SetCanUninstall
But note that there are 2 ways that we get SetCanUninstall. Namely:
connectFilter := func(err error) error {
cause := errors.Cause(err)
if cause == apicaller.ErrConnectImpossible {
err2 := coreagent.SetCanUninstall(config.Agent)
if err2 != nil {
return errors.Trace(err2)
}
return jworker.ErrTerminateAgent
} else if cause == apicaller.ErrChangedPassword {
return dependency.ErrBounce
}
return err
}
The latter is if the Machiner notices that the database record is actually flagged as Dead.
The only caller to NotifyMachineDead is in Machiner.Handle which should only happen after it has called Machine.EnsureDead() which means it isn't a transitory failure, it really is something saying "this machine should be removed".
...
2019-05-15 02:21:40 INFO juju.apiserver.connection request_notifier.go:125 agent disconnected: unit-ubuntu-0 for 7cc4a184-7867-412d-8c06-9c9780fd26a1
2019-05-15 02:21:40 INFO juju.worker.machineundertaker undertaker.go:131 tearing down machine undertaker
2019-05-15 02:21:40 INFO juju.apiserver.connection request_notifier.go:96 agent login: unit-landscape-131 for 24ecac33-8390-4ad6-80b6-6394a88c74e6
...
Other interesting lines: 1:38766- >127.0. 0.1:37017: i/o timeout openvswitch/ 29": cannot get unit "neutron- openvswitch/ 29": write tcp 127.0.0. 1:39172- >127.0. 0.1:37017: i/o timeout
2019-05-15 02:21:35 INFO juju.state.presence presence.go:194 watcher loop failed: write tcp 127.0.0.
...
2019-05-15 02:21:35 INFO juju.state multiwatcher.go:212 store manager loop failed: get unit "neutron-
all of that indicates the database started to stop responding to queries dependency engine.go:636 "is-responsible -flag" manifold worker returned unexpected error: lease manager stopped dependency engine.go:636 "is-responsible -flag" manifold worker returned unexpected error: lease manager stopped
2019-05-15 02:21:36 ERROR juju.worker.
...
2019-05-15 02:21:37 ERROR juju.worker.
in 1 second that line is repeated 409 times.
while that is happening we do see a line like: connection request_ notifier. go:96 agent login: unit-nrpe- physical- 24 for 24ecac33- 8390-4ad6- 80b6-6394a88c74 e6
2019-05-15 02:21:37 INFO juju.apiserver.
so some sort of login is working. config config.go:1570 unknown config field "tools- metadata- url"
2019-05-15 02:21:38 WARNING juju.environs.
^- this is repeated *many* times. The config field is supposed to be "agent- metadata- url". (but it was migrated from tools-metadata-url in the past.)
... connection request_ notifier. go:125 agent disconnected: unit-neutron- openvswitch- 26 for 24ecac33- 8390-4ad6- 80b6-6394a88c74 e6 stateconfigwatc her manifold.go:119 tomb dying
2019-05-15 02:21:40 INFO juju.apiserver.
2019-05-15 02:21:40 INFO juju.agent uninstall.go:36 marking agent ready for uninstall
2019-05-15 02:21:40 INFO juju.worker.
^- That is the call to SetCanUninstall
But note that there are 2 ways that we get SetCanUninstall. Namely: ErrConnectImpos sible { SetCanUninstall (config. Agent) ErrTerminateAge nt ErrChangedPassw ord { ErrBounce
connectFilter := func(err error) error {
cause := errors.Cause(err)
if cause == apicaller.
err2 := coreagent.
if err2 != nil {
return errors.Trace(err2)
}
return jworker.
} else if cause == apicaller.
return dependency.
}
return err
}
and MachineTag) , ddressesOnStart : ignoreMachineAd dresses, Dead: func() error { nstall( a)
w, err := NewMachiner(Config{
MachineAccessor: accessor,
Tag: tag.(names.
ClearMachineA
NotifyMachine
return agent.SetCanUni
},
})
The latter is if the Machiner notices that the database record is actually flagged as Dead.
The only caller to NotifyMachineDead is in Machiner.Handle which should only happen after it has called Machine. EnsureDead( ) which means it isn't a transitory failure, it really is something saying "this machine should be removed".
... connection request_ notifier. go:125 agent disconnected: unit-ubuntu-0 for 7cc4a184- 7867-412d- 8c06-9c9780fd26 a1 machineundertak er undertaker.go:131 tearing down machine undertaker connection request_ notifier. go:96 agent login: unit-landscape-131 for 24ecac33- 8390-4ad6- 80b6-6394a88c74 e6
2019-05-15 02:21:40 INFO juju.apiserver.
2019-05-15 02:21:40 INFO juju.worker.
2019-05-15 02:21:40 INFO juju.apiserver.
...