1.25.4: Units attempt to go through the proxy to download charm from state server

Bug #1556207 reported by Andreas Hasenack
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
James Tunnicliffe
juju-core
Fix Released
Critical
James Tunnicliffe
1.25
Fix Released
Critical
James Tunnicliffe

Bug Description

1.25.4-0ubuntu1~16.04.1~juju1
behind a proxy
MAAS 1.9.1

unit logs have these lines, repeated every 3s:
2016-03-11 19:19:08 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
2016-03-11 19:19:11 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying
2016-03-11 19:19:14 ERROR juju.worker.uniter.filter filter.go:137 tomb: dying

I narrowed this down to a simple test case:
juju bootstrap
juju deploy ubuntu --to lxc:0

With 1.25.3 it works.
With 1.25.4 it does not.

I have attached a tarball of /var/log of each case.

I also verified that 1.25.4 on another MAAS cluster (1.9.0, no proxy) works.

Juju status tabular of a failed deploy with 1.25.4:
$ juju status --format=tabular
[Services]
NAME STATUS EXPOSED CHARM
ubuntu unknown false cs:trusty/ubuntu-6

[Units]
ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
ubuntu/0 unknown failed 1.25.4 0/lxc/0 10.245.201.175 Waiting for agent initialization to finish

[Machines]
ID STATE VERSION DNS INS-ID SERIES HARDWARE
0 started 1.25.4 node-1.vmwarestack /MAAS/api/1.0/nodes/node-c77c8dd6-cb68-11e5-ad83-00505698101c/ trusty arch=amd64 cpu-cores=2 mem=8192M

Tags: landscape
description: updated
summary: - "tomb: dying" in all units
+ 1.25.4: "tomb: dying" in all units
description: updated
tags: removed: kanban-cross-team
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Re: 1.25.4: "tomb: dying" in all units

/var/log of the bootstrap node using juju 1.25.3. This is the working case.

description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Tarball of /var/log of the bootstrap node with juju 1.25.4. This is the non working case.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

/var/log of the bootstrap node, this time the env was bootstrapped with debug logs.

description: updated
description: updated
Revision history for this message
Cheryl Jennings (cherylj) wrote :

Andreas - looks like the second upload was just the bootstrap log and not /var/log. Can you re-upload?

In the meantime, I'm going to try and set up one of my vMAASes to use a proxy to see if I can recreate.

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Re: [Bug 1556207] Re: 1.25.4: "tomb: dying" in all units

Hmm, sorry, I'll fix that when I'm back in about 3h
On Mar 11, 2016 6:15 PM, "Cheryl Jennings" <email address hidden>
wrote:

> Andreas - looks like the second upload was just the bootstrap log and
> not /var/log. Can you re-upload?
>
> In the meantime, I'm going to try and set up one of my vMAASes to use a
> proxy to see if I can recreate.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1556207
>
> Title:
> 1.25.4: "tomb: dying" in all units
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1556207/+subscriptions
>

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Re: 1.25.4: "tomb: dying" in all units

And now the real juju debug logs for the failing case.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Looks like this was injected by the fix for bug 1515289. We added in the proxy, but need a way to specify to not use the proxy when downloading things from the state server.

If you add the state server to the no-proxy list, can you deploy?

Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 1.25.5
milestone: 1.25.5 → 2.0-beta3
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I assigned a static IP to the maas node, added that IP to the no_proxy list, bootstrapped into it (bootstrap --to node-1) and then deployed ubuntu into lxc:0. That worked. Logs attached.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Cheryl Jennings (cherylj) wrote :

I was able to recreate by setting up a tinyproxy in AWS (which obviously can't talk to my vMAAS) and attempting a basic deploy.

The fix will be to specify noproxy or add the state server to the no proxy list when downloading the charm from the state server. I will also need to go through the code to find any other places affected by the fix for bug 1515289 and fix them as well.

tags: added: blocker
Changed in juju-core:
importance: Critical → High
Revision history for this message
Cheryl Jennings (cherylj) wrote :

Discussed possible solutions with John and Andy a few days ago and came up with the general idea to make sure state server addresses are automatically added to the no-proxy list.

The trouble arises in that we cannot just get the no-proxy list from golang's http.ProxyFromEnvironment as it only reads the proxy information once, so if it is changed after the process starts running, the updates won't get applied.

The general gist of the proposal was:
1 - Implement tracking of proxy information inside juju/utils, so if updates are made to proxy information, the changes are reflected in any jujud processes currently running. (This information would be http(s)-proxy and the no-proxy lists). Post-discussion question - could we just update the env vars and not keep an internal list? Then just re-read the env vars whenever we create a new HTTP transport?

2 - Update the machine agent to populate the lists in #1 when starting, if not just updating env vars.

3 - Update the proxy list whenever APIHostPorts are changed

summary: - 1.25.4: "tomb: dying" in all units
+ 1.25.4: Units attempt to go through the proxy to download charm from
+ state server
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Don't environment variables get read when a process is spawned and inherited by its children? This would prevent us from performing a re-read.

package main

import (
 "fmt"
 "os"
 "os/exec"
 "strings"
 "time"
)

func main() {
 useExec := false
 var strout string

 for {
  if useExec {
   out, err := exec.Command("/usr/bin/printenv", "FOOBAR").Output()
   if err != nil {
    panic(err)
   }
   strout = string(out)
  } else {
   strout = os.Getenv("FOOBAR")
  }

  strout = strings.TrimSpace(strout)
  if strout != "hello" {
   fmt.Println(strout)
  }

  time.Sleep(1000 * time.Millisecond)
 }
}

To test:
FOOBAR=hello go run main.go &
export FOOBAR=foo

(now fg and kill it)

No change seen.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta3 → 2.0-beta4
tags: removed: blocker
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

https://github.com/juju/juju/pull/4913 is up for review.

To fix this we add the API server addresses to the no-proxy list.

Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → James Tunnicliffe (dooferlad)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta4 → none
milestone: none → 2.0-beta4
Changed in juju-core:
assignee: nobody → James Tunnicliffe (dooferlad)
importance: Undecided → Critical
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.