After reading through the logs, I can see that 2 units (machine-10-lxc-4 and machine-10-lxc-6)
are possible manifesting the occurrence of bug LP: 1514874.
The sequence of events is described below:
The unit fails to authenticate on the API because an 'upgrade' is not completed or in progress. This
happened around 2016-04-10 10:18:55
2016-04-10 10:18:55 ERROR juju.worker runner.go:223 exited "api": login for "machine-10-lxc-4" blocked because upgrade in progress
2016-04-10 10:18:58 ERROR juju.worker runner.go:223 exited "api": login for "machine-10-lxc-4" blocked because upgrade in progress
2016-04-10 10:19:01 ERROR juju.worker runner.go:223 exited "api": try again
Then there is a gap on the logs for both affected units (does the machine was rebooted/halted for a couple of days?) ,
and then a subsequent failed to connect to the state server, around 05:39:02 localtime on 2016-04-12.
154936-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
154937-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
154938-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
154939-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
Then the agent tries to reconnect to the state several times.
154953-2016-04-12 05:39:02 ERROR juju.worker runner.go:223 exited "api": watcher has been stopped
[...]
154959-2016-04-12 05:39:47 ERROR juju.worker runner.go:223 exited "api": try again
154960-2016-04-12 05:39:55 ERROR juju.worker runner.go:223 exited "api": try again
Around 10 minutes later, the state server API is back online, but the agent cannot authenticate in the
state server, as it replies with "unauthorized access" or "not provisioned"
if params.IsCodeNotProvisioned(err) || params.IsCodeUnauthorized(err) {
logger.Errorf("agent terminating due to error returned during API open: %v", err)
return nil, false, worker.ErrTerminateAgent
}
This error is evidenced on both units, as you can check on the log:
machine-10-lxc-4.log
154961:2016-04-12 05:48:50 ERROR juju.worker.apicaller open.go:169 agent terminating due to error returned during API open: invalid entity name or password
After reading through the logs, I can see that 2 units (machine-10-lxc-4 and machine-10-lxc-6)
are possible manifesting the occurrence of bug LP: 1514874.
The sequence of events is described below:
The unit fails to authenticate on the API because an 'upgrade' is not completed or in progress. This
happened around 2016-04-10 10:18:55
2016-04-10 10:18:55 ERROR juju.worker runner.go:223 exited "api": login for "machine-10-lxc-4" blocked because upgrade in progress
2016-04-10 10:18:58 ERROR juju.worker runner.go:223 exited "api": login for "machine-10-lxc-4" blocked because upgrade in progress
2016-04-10 10:19:01 ERROR juju.worker runner.go:223 exited "api": try again
Then there is a gap on the logs for both affected units (does the machine was rebooted/halted for a couple of days?) ,
and then a subsequent failed to connect to the state server, around 05:39:02 localtime on 2016-04-12.
154936-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
154937-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
154938-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
154939-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
Then the agent tries to reconnect to the state several times.
154953-2016-04-12 05:39:02 ERROR juju.worker runner.go:223 exited "api": watcher has been stopped
[...]
154959-2016-04-12 05:39:47 ERROR juju.worker runner.go:223 exited "api": try again
154960-2016-04-12 05:39:55 ERROR juju.worker runner.go:223 exited "api": try again
Around 10 minutes later, the state server API is back online, but the agent cannot authenticate in the
state server, as it replies with "unauthorized access" or "not provisioned"
1) https:/ /github. com/juju/ juju/blob/ e5a77909f70c8d0 b5bbc0bac9b2bf1 8546744614/ apiserver/ params/ apierror. go#L165 /github. com/juju/ juju/blob/ e5a77909f70c8d0 b5bbc0bac9b2bf1 8546744614/ apiserver/ params/ apierror. go#L113
2) https:/
At this point the unit falls on this check:
if params. IsCodeNotProvis ioned(err) || params. IsCodeUnauthori zed(err) { Errorf( "agent terminating due to error returned during API open: %v", err) ErrTerminateAge nt
logger.
return nil, false, worker.
}
Reference: https:/ /github. com/juju/ juju/blob/ 1.25/worker/ apicaller/ open.go# L170
This error is evidenced on both units, as you can check on the log:
machine- 10-lxc- 4.log apicaller open.go:169 agent terminating due to error returned during API open: invalid entity name or password
154961:2016-04-12 05:48:50 ERROR juju.worker.
This validation error causes the return of the worker. ErrTerminateAge nt error, which triggers this signal https:/ /github. com/juju/ juju/blob/ 1.25/worker/ terminationwork er/worker. go#L32
that instructs to wipe the juju agent from the machine.