[2.1b3,5] juju deploy ignoring proxy

Bug #1654591 reported by Andreas Hasenack
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Christian Muirhead
2.1
Fix Released
High
Christian Muirhead

Bug Description

TL;DR

13:06:08 ERROR cmd supercommand.go:458 storing charm for URL "cs:ubuntu-10": cannot retrieve charm "cs:ubuntu-10": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/ubuntu-10/archive?channel=stable: dial tcp 162.213.33.122:443: getsockopt: connection timed out

I have two 2.1b3 controllers on MAAS 2.1.3.
There is a mandatory proxy on "squid.vmwarestack" (10.245.200.27)

First one was bootstrapped with this config:
$ juju show-cloud vmwarestack-maas2
defined: local
type: maas
description: Metal As A Service
auth-types: [oauth1]
endpoint: http://10.245.200.25/MAAS/
config:
  bootstrap-timeout: 900
  default-series: xenial
  http-proxy: http://squid.vmwarestack:3128/
  https-proxy: http://squid.vmwarestack:3128/
  no-proxy: localhost,127.0.0.1,10.245.200.25,169.254.169.254,10.245.200.27

Then I switched to the controller model, so as to not waste a maas node, and deployed my first workload there. This is the juju status (attachment juju-status-outer-controller.txt, and also http://pastebin.ubuntu.com/23752666/):

$ juju status
Model Controller Cloud/Region Version
controller vmwarestack-maas2 vmwarestack-maas2 2.1-beta3

App Version Status Scale Charm Store Rev OS Notes
haproxy unknown 1 haproxy jujucharms 40 ubuntu exposed
landscape-server active 1 landscape-server jujucharms 3 ubuntu
postgresql 9.5.5 active 1 postgresql jujucharms 110 ubuntu
rabbitmq-server 3.5.7 active 1 rabbitmq-server jujucharms 57 ubuntu

Unit Workload Agent Machine Public address Ports Message
haproxy/0* unknown idle 0/lxd/0 10.245.200.116 80/tcp,443/tcp,10000/tcp
landscape-server/0* active idle 0/lxd/1 10.245.200.117
postgresql/0* active idle 0/lxd/2 10.245.200.23 5432/tcp Live master (9.5.5)
rabbitmq-server/0* active idle 0/lxd/3 10.245.200.24 5672/tcp Unit is ready

Machine State DNS Inst id Series AZ
0 started 10.245.202.66 4y3h8d xenial bucharest
0/lxd/0 started 10.245.200.116 juju-f53d84-0-lxd-0 xenial
0/lxd/1 started 10.245.200.117 juju-f53d84-0-lxd-1 xenial
0/lxd/2 started 10.245.200.23 juju-f53d84-0-lxd-2 xenial
0/lxd/3 started 10.245.200.24 juju-f53d84-0-lxd-3 xenial

Relation Provides Consumes Type
peer haproxy haproxy peer
website haproxy landscape-server regular
db-admin landscape-server postgresql regular
amqp landscape-server rabbitmq-server regular
replication postgresql postgresql peer
cluster rabbitmq-server rabbitmq-server peer

Landscape then did a cloud deployment. It does that by bootstrapping again on the same MAAS provider, then also switching to the controller model, and then it starts deploying applications.

Landscape used the same proxy settings. This was the bootstrap config it used (attached as inner-bootstrap-config.txt):

{apt-http-proxy: 'http://squid.vmwarestack:3128', bootstrap-timeout: 1800, default-series: xenial,
  http-proxy: 'http://squid.vmwarestack:3128/', https-proxy: 'http://squid.vmwarestack:3128/',
  no-proxy: 'localhost,squid.vmwarestack,localhost,127.0.0.1,10.245.200.25,169.254.169.254,10.245.200.27',
  provisioner-harvest-mode: destroyed}

All charms failed to be added to this inner controller, though. The error from landscape was:
Jan 6 15:34:59 job-handler-1 INFO Traceback (failure with no frames): <class 'txjuju.errors.APIRequestError'>: cannot retrieve charm "cs:xenial/ceilometer-240": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/xenial/ceilometer-240/archive: dial tcp 162.213.33.121:443: getsockopt: connection timed out (code: '')

I tried to deploy the ubuntu charm via the command line, into that inner model. It also failed (http://pastebin.ubuntu.com/23752682/ and attached as juju-deploy-ubuntu-inner-controller.txt):

landscape@juju-f53d84-0-lxd-1:~$ juju deploy ubuntu andreas-ubuntu --debug
(...)
13:06:08 ERROR cmd supercommand.go:458 storing charm for URL "cs:ubuntu-10": cannot retrieve charm "cs:ubuntu-10": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/ubuntu-10/archive?channel=stable: dial tcp 162.213.33.122:443: getsockopt: connection timed out
13:06:08 DEBUG cmd supercommand.go:459 (error details: [{github.com/juju/juju/cmd/juju/application/deploy.go:943: storing charm for URL "cs:ubuntu-10"} {github.com/juju/juju/cmd/juju/application/store.go:117: } {github.com/juju/juju/api/client.go:366: } {github.com/juju/juju/api/apiclient.go:649: } {github.com/juju/retry/retry.go:187: } {github.com/juju/juju/rpc/client.go:149: } {cannot retrieve charm "cs:ubuntu-10": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/ubuntu-10/archive?channel=stable: dial tcp 162.213.33.122:443: getsockopt: connection timed out}])

I'm now going to attach several files: command outputs, juju model configs, and a tarball of the inner controller (the one where deployments are not working).

Tags: eda landscape
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

machine-0 logs from the inner environment. This is where deployments are failing because the proxy is being ignored.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

logs from machine 0 in the outer environment. This is where deployments are working just fine behind the proxy, because the proxy is being used.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Info from the controller and model which is able to deploy applications behind the proxy.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Here is something that looks a bit odd. It's a diff between "juju model-config" of the working and not working models:
--- not-working 2017-01-06 11:11:22.999093197 -0200
+++ working 2017-01-06 11:11:19.647015159 -0200
-http-proxy model http://squid.vmwarestack:3128/
-https-proxy model http://squid.vmwarestack:3128/
+http-proxy controller http://squid.vmwarestack:3128/
+https-proxy controller http://squid.vmwarestack:3128/

You can see both have the proxy settings, but one inherited it from the model (the not working case), while the working case inherited it from the controller.

Full diff attached.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

After I rebooted the inner controller node, juju deploy <app> started working... This is starting to feel very much like https://bugs.launchpad.net/juju/+bug/1614635/comments/3

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Except now, after the reboot, I'm being hit by https://bugs.launchpad.net/juju/+bug/1652161

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Ok, I think this is bug #1614625 back:
$ time juju-deployer -Wdv -c xenial-trunk-landscape-dense-maas.yaml -w 120
2017-01-06 15:01:52 [DEBUG] deployer.cli: Using runtime GoEnvironment on vmwarestack-maas2:admin/controller
2017-01-06 15:01:52 [INFO] deployer.cli: Using deployment xenial-trunk-landscape-dense-maas.yaml
2017-01-06 15:01:52 [INFO] deployer.cli: Starting deployment of xenial-trunk-landscape-dense-maas.yaml
2017-01-06 15:01:52 [DEBUG] deployer.import: Getting charms...
2017-01-06 15:01:52 [DEBUG] deployer.charm: Cache dir /home/andreas/.local/share/juju/.deployer-store-cache/cs_xenial_haproxy
2017-01-06 15:01:52 [DEBUG] deployer.charm: Cache dir /home/andreas/.local/share/juju/.deployer-store-cache/cs_xenial_rabbitmq-server
2017-01-06 15:01:52 [DEBUG] deployer.charm: Cache dir /home/andreas/.local/share/juju/.deployer-store-cache/cs_xenial_postgresql
2017-01-06 15:01:52 [DEBUG] deployer.charm: Cache dir /home/andreas/.local/share/juju/.deployer-store-cache/cs_~ahasenack_landscape-server
2017-01-06 15:01:52 [DEBUG] deployer.deploy: Resolving configuration
2017-01-06 15:01:52 [DEBUG] deployer.env: Connecting to vmwarestack-maas2:admin/controller...
2017-01-06 15:01:56 [DEBUG] deployer.env: Connected.
2017-01-06 15:01:56 [INFO] deployer.import: Creating machines...
2017-01-06 15:01:57 [INFO] deployer.import: Machine 0 already exists on environment, skipping creation
2017-01-06 15:01:57 [INFO] deployer.import: Deploying applications...
2017-01-06 15:01:57 [INFO] deployer.import: Deploying application haproxy using cs:xenial/haproxy
2017-01-06 15:01:57 [DEBUG] deployer.import: Refetching status for placement deploys
2017-01-06 15:06:22 [ERROR] deployer.env: Error deploying application 'haproxy'
2017-01-06 15:06:22 [ERROR] deployer.env: Command (juju deploy -m vmwarestack-maas2:admin/controller --config /tmp/tmpMF6dnD --to=lxd:0 --series xenial cs:xenial/haproxy haproxy) Output:

 ERROR storing charm for URL "cs:haproxy-40": cannot retrieve charm "cs:haproxy-40": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/haproxy-40/archive?channel=stable: dial tcp 162.213.33.121:443: getsockopt: connection timed out

2017-01-06 15:06:22 [INFO] deployer.cli: Deployment stopped. run time: 269.54

real 4m29.989s
user 0m0.840s
sys 0m0.180s

Forget the complication of inner and outer environments that I described in the comments above. It now happened right after bootstrap, in the first controller I created.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Typoed the link, sorry, it's bug #1614635

Changed in juju:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 2.1.0
tags: added: eda
Changed in juju:
assignee: nobody → Menno Smits (menno.smits)
Changed in juju:
milestone: 2.1.0 → 2.1-rc1
Changed in juju:
status: Triaged → In Progress
Changed in juju:
milestone: 2.1-rc1 → 2.2.0-alpha1
assignee: Menno Smits (menno.smits) → nobody
importance: Critical → High
status: In Progress → Triaged
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

I've had a good look at this and haven't been able to reproduce. My best guess is that there's some kind of race. There's a specific worker which applies proxy settings and perhaps it's not able to do its job before the first deploy attempt happens. If that's the case, then an automated deploy (using the deploy) is more likely to trigger the race. I'm about to dig through the supplied logs to confirm if this is the case.

For Andreas:

I want to make one fact clear: the downloads that are failing happen in the controller. This means the proxy settings in the controller model are used for the downloads. This is confusing and we should probably fix things so that a particular model's proxy settings are used for all downloads for that model, but that's not how it works right now. You need to ensure that the desired proxy settings are set for the controller model *and* the model(s) hosting workloads. It seems you already understand this but I thought it was prudent to spell this out.

Is the issue intermittent (as per bug 1614635) or does it happen all the time? Based on what you've said so far, it seems intermittent.

Is it possible for you to supply logs for the simplest possible reproduction?

I will keep digging.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

gah "using the deployer", not "using the deploy"

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

The "inner" machine-0 logs supplied indicate that the proxy-config-updater worker *might* have had a problem which may be the smoking gun. Unfortunately the usefulness of the logs is diminished by the log level switching to WARNING.

Can you please reproduce with the "logging-config" model option set to "<root>=DEBUG"? You could set this in the cloud's config (i.e. clouds.yaml) or at bootstrap time.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Happened again on the first try, and this time I bootstrapped with the debug logging config:

$ time juju bootstrap --config logging-config="<root>=DEBUG" vmwarestack
Creating Juju controller "vmwarestack" on vmwarestack
Looking for packaged Juju agent version 2.1-beta5 for amd64
Launching controller instance(s) on vmwarestack...
 - 4y3h8g (arch=amd64 mem=9.8G cores=2)
Fetching Juju GUI 2.2.7
Waiting for address
Attempting to connect to 10.245.203.221:22
Warning: Permanently added '10.245.203.221' (ECDSA) to the list of known hosts.
Logging to /var/log/cloud-init-output.log on the bootstrap machine
Running apt-get update
Running apt-get upgrade
Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux
Fetching Juju agent version 2.1-beta5 for amd64
Installing Juju machine agent
Starting Juju machine agent (service jujud-machine-0)
Bootstrap agent now started
Contacting Juju controller at 10.245.203.221 to verify accessibility...
Bootstrap complete, "vmwarestack" controller now available.
Controller machines are in the "controller" model.
Initial model "default" added.

real 7m9.461s
user 0m1.564s
sys 0m0.164s

$ juju switch controller
vmwarestack:admin/default -> vmwarestack:admin/controller

$ juju deploy ubuntu --to lxd:0
ERROR storing charm for URL "cs:ubuntu-10": cannot retrieve charm "cs:ubuntu-10": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/ubuntu-10/archive?channel=stable: dial tcp 162.213.33.122:443: getsockopt: connection timed out

$

This was with 2.1b5. Logs coming up.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

And netstat on machine 0 while the "juju deploy ubuntu --to lxd:0" is stuck confirms the connection is done without using the proxy:

tcp 0 1 10.245.203.221:39642 162.213.33.122:443 SYN_SENT 11294/jujud

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

/home/ubuntu/.juju-proxy contains the correct proxy information.

Shall I reboot this controller now to verify that a reboot "fixes" it?

summary: - [2.1b3] juju deploy ignoring proxy
+ [2.1b3,5] juju deploy ignoring proxy
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Andreas: Thanks for the reproduction and the new logs. I can now see that the proxyupdater worker is indeed setting the proxy in the controller agent at 11:30 and the charm download failing at 11:35. I have a feeling we're missing something wrt to how environment variables affect processes in Go.

The .juju-proxy file is only there for when a user uses SSH to a machine and is not relevant for Juju's own downloads. It's setting environment variables within the process for that.

I've got a few ideas. Digging further.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Problem found: it turns out that Go caches the various proxy environment variables at the first request. This means that any download attempt before the proxyupdater worker gets to do it's thing the proxy config won't be used from then on. This also means that any future changes to Juju's proxy config settings will be ignored (until the controller agent restarts).

This is pretty terrible and could be hard to fix. I'm exploring options.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Ok, we have a fix in mind which doesn't involve major changes to Juju and it's dependencies.

POC is here: https://gist.github.com/mjs/f4efd4c11d64ff57015ac601973794c0

Christian will take over from here.

Revision history for this message
Christian Muirhead (2-xtian) wrote :

Still working on this - should have a PR up this morning.

Revision history for this message
Anastasia (anastasia-macmood) wrote :
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Fixes forward-ported to develop (2.2).

Proxy initialization: https://github.com/juju/juju/pull/7004
The rest of the fix went in as part of the bigger merge: https://github.com/juju/juju/pull/7001

Changed in juju:
status: Triaged → Fix Committed
assignee: nobody → Christian Muirhead (2-xtian)
Revision history for this message
Christian Muirhead (2-xtian) wrote : Re: [Bug 1654591] Re: [2.1b3,5] juju deploy ignoring proxy
Download full text (6.3 KiB)

Oops, thanks Anastasia! :)

On Mon., 20 Feb. 2017, 15:55 Anastasia, <email address hidden>
wrote:

> Fixes forward-ported to develop (2.2).
>
> Proxy initialization: https://github.com/juju/juju/pull/7004
> The rest of the fix went in as part of the bigger merge:
> https://github.com/juju/juju/pull/7001
>
> ** Changed in: juju
> Status: Triaged => Fix Committed
>
> ** Changed in: juju
> Assignee: (unassigned) => Christian Muirhead (2-xtian)
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1654591
>
> Title:
> [2.1b3,5] juju deploy ignoring proxy
>
> Status in juju:
> Fix Committed
> Status in juju 2.1 series:
> Fix Committed
>
> Bug description:
> TL;DR
>
> 13:06:08 ERROR cmd supercommand.go:458 storing charm for URL
> "cs:ubuntu-10": cannot retrieve charm "cs:ubuntu-10": cannot get
> archive: Get
>
> https://api.jujucharms.com/charmstore/v5/ubuntu-10/archive?channel=stable:
> dial tcp 162.213.33.122:443: getsockopt: connection timed out
>
>
> I have two 2.1b3 controllers on MAAS 2.1.3.
> There is a mandatory proxy on "squid.vmwarestack" (10.245.200.27)
>
> First one was bootstrapped with this config:
> $ juju show-cloud vmwarestack-maas2
> defined: local
> type: maas
> description: Metal As A Service
> auth-types: [oauth1]
> endpoint: http://10.245.200.25/MAAS/
> config:
> bootstrap-timeout: 900
> default-series: xenial
> http-proxy: http://squid.vmwarestack:3128/
> https-proxy: http://squid.vmwarestack:3128/
> no-proxy:
> localhost,127.0.0.1,10.245.200.25,169.254.169.254,10.245.200.27
>
> Then I switched to the controller model, so as to not waste a maas
> node, and deployed my first workload there. This is the juju status
> (attachment juju-status-outer-controller.txt, and also
> http://pastebin.ubuntu.com/23752666/):
>
> $ juju status
> Model Controller Cloud/Region Version
> controller vmwarestack-maas2 vmwarestack-maas2 2.1-beta3
>
> App Version Status Scale Charm Store
> Rev OS Notes
> haproxy unknown 1 haproxy jujucharms
> 40 ubuntu exposed
> landscape-server active 1 landscape-server jujucharms
> 3 ubuntu
> postgresql 9.5.5 active 1 postgresql jujucharms
> 110 ubuntu
> rabbitmq-server 3.5.7 active 1 rabbitmq-server jujucharms
> 57 ubuntu
>
> Unit Workload Agent Machine Public address Ports
> Message
> haproxy/0* unknown idle 0/lxd/0 10.245.200.116
> 80/tcp,443/tcp,10000/tcp
> landscape-server/0* active idle 0/lxd/1 10.245.200.117
> postgresql/0* active idle 0/lxd/2 10.245.200.23 5432/tcp
> Live master (9.5.5)
> rabbitmq-server/0* active idle 0/lxd/3 10.245.200.24 5672/tcp
> Unit is ready
>
> Machine State DNS Inst id Series AZ
> 0 started 10.245.202.66 4y3h8d xenial bucharest
> 0/lxd/0 started 10.245.200.116 juju-f53d84-0-lxd-0 ...

Read more...

Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.