juju (in HA) hangs if we connect but cannot login
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
High
|
Unassigned |
Bug Description
At Scania they are seeing a fairly common occurrence where a command will occasionally just hang. When using '--debug' we see the lines about connecting to a list of IP addresses, and then it gets a successful connection to one of them.
However, it doesn't progress past that point.
We still need to diagnose why that is happening, but there are a few things we could do better on the Juju side:
1) We shouldn't just silently hang. If we get a failure connecting to a controller or are retrying, we might not report immediately, but within a few retries/a few seconds we should let the user know something isn't right.
2) AFAICT, we use the "who can I establish a TCP connection to" to decide which controller is responding first. We probably don't want to close the other connection attempts until we have successfully completed a Login request.
3) It appears that we hang indefinitely whatever it is that we are doing. '--debug' is unclear what we are hung on (it might be in TRACE, but we haven't tried that yet). We should have some sort of timeout on our connection.
With `juju status --debug --logging- config= "<root> =TRACE" ` we were able to see that `juju status` was connecting to JIMM and redirecting to a controller, and successfully connecting to HAProxy, but the Juju RPC request for Login() was not returning.
However, the above points are still things we could address even if it was a problem in HAProxy.