juju (in HA) hangs if we connect but cannot login

Bug #1961999 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

At Scania they are seeing a fairly common occurrence where a command will occasionally just hang. When using '--debug' we see the lines about connecting to a list of IP addresses, and then it gets a successful connection to one of them.

However, it doesn't progress past that point.

We still need to diagnose why that is happening, but there are a few things we could do better on the Juju side:

1) We shouldn't just silently hang. If we get a failure connecting to a controller or are retrying, we might not report immediately, but within a few retries/a few seconds we should let the user know something isn't right.

2) AFAICT, we use the "who can I establish a TCP connection to" to decide which controller is responding first. We probably don't want to close the other connection attempts until we have successfully completed a Login request.

3) It appears that we hang indefinitely whatever it is that we are doing. '--debug' is unclear what we are hung on (it might be in TRACE, but we haven't tried that yet). We should have some sort of timeout on our connection.

Revision history for this message
John A Meinel (jameinel) wrote :

With `juju status --debug --logging-config="<root>=TRACE"` we were able to see that `juju status` was connecting to JIMM and redirecting to a controller, and successfully connecting to HAProxy, but the Juju RPC request for Login() was not returning.

However, the above points are still things we could address even if it was a problem in HAProxy.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.