juju client is not using the floating ip to connect to the state server

Bug #1308767 reported by Para Siva
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins

Bug Description

With a bootstrap instance with 'use-floating-ip: true' setting in environments.yaml for an env in cts cloud, deploying additional services and running 'juju status' is failing with 'ERROR juju apiclient.go:119 state/api: websocket.Dial wss://192.168.1.2:17070/: dial tcp 192.168.1.2:17070: connection timed out'

The following is the bootstrap node.
$ nova list
+--------------------------------------+-----------------------+--------+------------+-------------+------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-----------------------+--------+------------+-------------+------------------------------------+
| 2010cf3b-dc41-49b9-b0b1-8c0d77a84fe0 | juju-mthood-machine-0 | ACTIVE | None | Running | int_net=192.168.1.2, 10.230.21.113 |
+--------------------------------------+-----------------------+--------+------------+-------------+------------------------------------+

$ juju status --debug
2014-04-16 20:14:06 INFO juju.cmd supercommand.go:297 running juju-1.18.1-trusty-amd64 [gc]
2014-04-16 20:14:06 DEBUG juju api.go:179 no cached API connection settings found
2014-04-16 20:14:06 INFO juju.provider.openstack provider.go:202 opening environment "mthood"
2014-04-16 20:14:07 DEBUG juju state.go:75 waiting for DNS name(s) of state server instances [2010cf3b-dc41-49b9-b0b1-8c0d77a84fe0]
2014-04-16 20:14:07 INFO juju apiclient.go:114 state/api: dialing "wss://192.168.1.2:17070/"
2014-04-16 20:16:14 ERROR juju apiclient.go:119 state/api: websocket.Dial wss://192.168.1.2:17070/: dial tcp 192.168.1.2:17070: connection timed out
2014-04-16 20:16:14 INFO juju apiclient.go:114 state/api: dialing "wss://192.168.1.2:17070/"

This was seen with juju versions 1.18.1-0ubuntu1 and 1.18.1-0ubuntu1~14.04.1~juju1

Related branches

Revision history for this message
Para Siva (psivaa) wrote :
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1308767] [NEW] juju client is not using the floating ip to connect to the state server
Download full text (3.2 KiB)

In 1.19.1 we should already be capable of trying more addresses when
connecting. However, the problem that you're seeing is because both
addresses are "private" (both 192.* and 10.* are considered private
networks). If it actually got a publicly routable address (something not in
10.0.0.0/8, 172.16.0.0/12 or 192.168.0.0/16) then I believe we would
actually prefer that address.

However, I do believe current trunk is capable of listing all addresses and
trying all of them. Though it might still have an issue with "first
connect", so this may not be completely fixed.

On Thu, Apr 17, 2014 at 1:32 AM, Parameswaran Sivatharman <
<email address hidden>> wrote:

> Public bug reported:
>
> With a bootstrap instance with 'use-floating-ip: true' setting in
> environments.yaml for an env in cts cloud, deploying additional services
> and running 'juju status' is failing with 'ERROR juju apiclient.go:119
> state/api: websocket.Dial wss://192.168.1.2:17070/: dial tcp
> 192.168.1.2:17070: connection timed out'
>
> The following is the bootstrap node.
> $ nova list
>
> +--------------------------------------+-----------------------+--------+------------+-------------+------------------------------------+
> | ID | Name | Status |
> Task State | Power State | Networks |
>
> +--------------------------------------+-----------------------+--------+------------+-------------+------------------------------------+
> | 2010cf3b-dc41-49b9-b0b1-8c0d77a84fe0 | juju-mthood-machine-0 | ACTIVE |
> None | Running | int_net=192.168.1.2, 10.230.21.113 |
>
> +--------------------------------------+-----------------------+--------+------------+-------------+------------------------------------+
>
>
> $ juju status --debug
> 2014-04-16 20:14:06 INFO juju.cmd supercommand.go:297 running
> juju-1.18.1-trusty-amd64 [gc]
> 2014-04-16 20:14:06 DEBUG juju api.go:179 no cached API connection
> settings found
> 2014-04-16 20:14:06 INFO juju.provider.openstack provider.go:202 opening
> environment "mthood"
> 2014-04-16 20:14:07 DEBUG juju state.go:75 waiting for DNS name(s) of
> state server instances [2010cf3b-dc41-49b9-b0b1-8c0d77a84fe0]
> 2014-04-16 20:14:07 INFO juju apiclient.go:114 state/api: dialing "wss://
> 192.168.1.2:17070/"
> 2014-04-16 20:16:14 ERROR juju apiclient.go:119 state/api: websocket.Dial
> wss://192.168.1.2:17070/: dial tcp 192.168.1.2:17070: connection timed out
> 2014-04-16 20:16:14 INFO juju apiclient.go:114 state/api: dialing "wss://
> 192.168.1.2:17070/"
>
>
> This was seen with juju versions 1.18.1-0ubuntu1 and
> 1.18.1-0ubuntu1~14.04.1~juju1
>
> ** Affects: juju-core
> Importance: Undecided
> Status: New
>
> ** Attachment added: "machine-0 log from the bootstrap node"
>
> https://bugs.launchpad.net/bugs/1308767/+attachment/4085762/+files/machine-0.log
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> https://bugs.launchpad.net/bugs/1308767
>
> Title:
> juju client is not using the floating ip to connect to the state
> server
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-cor...

Read more...

Curtis Hovey (sinzui)
tags: added: addressability
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.20.0
Revision history for this message
John A Meinel (jameinel) wrote :

In 1.19.1 we will try all of the addresses that are listed for a machine, which should at least let us attempt to connect.

Changed in juju-core:
milestone: 1.20.0 → 1.19.1
status: Triaged → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Revision history for this message
John A Meinel (jameinel) wrote :

So the bug that Kiko was running into wasn't fixed by the early patch because this is being triggered at a different time.
Specifically, the fix we put in place is only true if you've successfully connected one time. (apiInfoConnect) there it should be trying all addresses that are listed in the environment.jenv file.

However, the issue they were running into is the *first* connect, where we don't yet have the presumed addresses cached, and are using (apiConfigConnect).

There we are using environAPIInfo, which chains down into providers/common/state.go for the implementation of:
 common.StateInfo(). Which reads the instance id of the first machine from provider storage, and then asks the provider to:
  insts, err := env.Instances(st.StateInstances)
...
  hostnames = getDNSNames(insts)

And getDNSNames() fundamentally calls back into openstack.Instance.DNSName() which does:
func (inst *openstackInstance) DNSName() (string, error) {
 addresses, err := inst.Addresses()
 if err != nil {
  return "", err
 }
 addr := instance.SelectPublicAddress(addresses)
 if addr == "" {
  return "", instance.ErrNoDNSName
 }
 return addr, nil
}

Which can only return 1 address.

And in this case because both addresses are in "private" subnets (192.168.* and 172.16.*) neither one clearly takes precedence over the other one, so we just pick the 'first' one, which in this case is the non-routable one.

Once we have gotten over that hump, then the address we successfully connect to should get cached, along with any other IP addresses for machine-0.

The workaround we went with in the field is to apply this patch:
=== modified file 'instance/address.go'
--- instance/address.go 2014-05-01 00:54:26 +0000
+++ instance/address.go 2014-05-13 07:53:53 +0000
@@ -131,7 +131,7 @@

 func isIPv4PrivateNetworkAddress(ip net.IP) bool {
  return classAPrivate.Contains(ip) ||
- classBPrivate.Contains(ip) ||
+ //classBPrivate.Contains(ip) ||
   classCPrivate.Contains(ip)
 }

(don't treat the 172.* addresses as being private).

A more complete fix is that we need the first-connect code to support multiple IP addresses.
It might just be changing common.StateInfo so that instead of using getDNSNames() it use Instance.Addresses() which can return multiple possible addresses for a machine.

Ultimately, we probably still want to move to HostPort which lets us tag that we think this address is more private, so try it late rather than early, but I think that would have gotten us out of this.

Changed in juju-core:
milestone: 1.19.1 → 1.19.3
status: Fix Released → Triaged
Revision history for this message
Christian Reis (kiko) wrote :

Since I can't reopen myself, I just wanted to note that a) it is 3:43 AM in Atlanta and b) this bug does not appear fixed at least as of revno 2655. Our symptom is identical as in comment 0; we are on an OpenStack installation which has a 192.168.x private network and floating IPs on a 172.16.x network. juju bootstrap works fine (it tries the 192 address twice and then moves on to the 172 address), but juju status hangs connecting to the bootstrap node's address on the 192.168.x network.

It's obvious that the provider does have access to information on the interfaces (i.e. what OpenStack considers the private network and what it considers the floating IP) but it seems juju does not use it.

The workaround was a hack to juju-core/instance/address.go to kill the class B blacklist check; the updated juju binary did what we needed.

Revision history for this message
John A Meinel (jameinel) wrote :

I'm associated a branch that might have fixed the initial connection to try multiple addresses. I haven't thoroughly tested it myself, but it fits logically for me.

tags: added: landscape
Andrew Wilkins (axwalk)
Changed in juju-core:
assignee: nobody → Andrew Wilkins (axwalk)
status: Triaged → In Progress
Go Bot (go-bot)
Changed in juju-core:
status: In Progress → Fix Committed
Ian Booth (wallyworld)
no longer affects: juju-core/1.18
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.