Comment 2 for bug 1514874

Revision history for this message
Jorge Niedbalski (niedbalski) wrote : Re: Invalid entity name or password error, causes Juju to uninstall

After reading through the logs, I can see that 2 units (machine-10-lxc-4 and machine-10-lxc-6)
are possible manifesting the occurrence of bug LP: 1514874.

The sequence of events is described below:

The unit fails to authenticate on the API because an 'upgrade' is not completed or in progress. This
happened around 2016-04-10 10:18:55

2016-04-10 10:18:55 ERROR juju.worker runner.go:223 exited "api": login for "machine-10-lxc-4" blocked because upgrade in progress
2016-04-10 10:18:58 ERROR juju.worker runner.go:223 exited "api": login for "machine-10-lxc-4" blocked because upgrade in progress
2016-04-10 10:19:01 ERROR juju.worker runner.go:223 exited "api": try again

Then there is a gap on the logs for both affected units (does the machine was rebooted/halted for a couple of days?) ,
and then a subsequent failed to connect to the state server, around 05:39:02 localtime on 2016-04-12.

154936-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
154937-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
154938-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down
154939-2016-04-12 05:39:02 ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down

Then the agent tries to reconnect to the state several times.

154953-2016-04-12 05:39:02 ERROR juju.worker runner.go:223 exited "api": watcher has been stopped
[...]
154959-2016-04-12 05:39:47 ERROR juju.worker runner.go:223 exited "api": try again
154960-2016-04-12 05:39:55 ERROR juju.worker runner.go:223 exited "api": try again

Around 10 minutes later, the state server API is back online, but the agent cannot authenticate in the
state server, as it replies with "unauthorized access" or "not provisioned"

1) https://github.com/juju/juju/blob/e5a77909f70c8d0b5bbc0bac9b2bf18546744614/apiserver/params/apierror.go#L165
2) https://github.com/juju/juju/blob/e5a77909f70c8d0b5bbc0bac9b2bf18546744614/apiserver/params/apierror.go#L113

At this point the unit falls on this check:

if params.IsCodeNotProvisioned(err) || params.IsCodeUnauthorized(err) {
logger.Errorf("agent terminating due to error returned during API open: %v", err)
return nil, false, worker.ErrTerminateAgent
}

Reference: https://github.com/juju/juju/blob/1.25/worker/apicaller/open.go#L170

This error is evidenced on both units, as you can check on the log:

machine-10-lxc-4.log
154961:2016-04-12 05:48:50 ERROR juju.worker.apicaller open.go:169 agent terminating due to error returned during API open: invalid entity name or password

This validation error causes the return of the worker.ErrTerminateAgent error, which triggers this signal https://github.com/juju/juju/blob/1.25/worker/terminationworker/worker.go#L32
that instructs to wipe the juju agent from the machine.