juju bootstrap is failing in the MAAS CI lab

Bug #1679948 reported by Brendan Donegan
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
High
Unassigned

Bug Description

We've been having the juju bootstrap test that we do in our CI lab fail for the past few weeks. It only fails with the version of juju in the devel PPA and not the one in the archive. We could do with some input on how to debug the issue. See http://162.213.35.104:8080/job/maas-xenial-trunk-manual-juju/240/testReport/maas-integration/TestMAASIntegration/test_juju_bootstrap/ for the output of the bootstrap command.

Below is the version being used in testing:

Get:19 http://ppa.launchpad.net/juju/devel/ubuntu xenial/main amd64 juju-2.0 amd64 1:2.2~beta2-0ubuntu1~16.04.1~juju1 [49.0 MB]

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

The "Forbidden" error that's coming back from the API server is unusual.

Is it possible to bootstrap with the --keep-broken flag so the controller machine is left behind following the failed bootstrap attempt? It would be useful to see /var/log/cloud-init-output.log and /var/log/juju/machine-0.log on that machine.

Changed in juju:
status: New → Triaged
importance: Undecided → High
Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

Logs attached as requested, hopefully those tell you what you need to know.

Changed in juju:
status: Incomplete → New
Revision history for this message
John A Meinel (jameinel) wrote :

I don't see any connections coming in from outside of the machine. Are you sure you don't have something like ufw / iptables set up?

I see *one* attempt at connection: grep "API connection from" machine-0.log:
2017-04-06 08:44:17 DEBUG juju.apiserver request_notifier.go:69 [1] API connection from 127.0.0.1:36578
2017-04-06 08:44:17 DEBUG juju.apiserver request_notifier.go:69 [2] API connection from 127.0.0.1:36582
2017-04-06 08:44:17 DEBUG juju.apiserver request_notifier.go:69 [3] API connection from 127.0.0.1:36588
2017-04-06 08:44:17 DEBUG juju.apiserver request_notifier.go:69 [4] API connection from 127.0.0.1:36598
2017-04-06 08:44:17 DEBUG juju.apiserver request_notifier.go:69 [5] API connection from 10.245.136.8:45634

Which immediately disconnects without doing *any* RPC requests:
2017-04-06 08:44:17 DEBUG juju.apiserver request_notifier.go:79 [5] API connection terminated after 419.581µs

When we've seen Forbidden in the past, it is usually some sort of firewall in the way.

Is there something different about the units that are being used w/ Juju in the PPA?

It is trying to connect to:
wss://10.245.136.8:17070/model/2f4c334e-00b0-4f45-869c-6046acbde49a/api

which is what the controller thinks its addresses are:
2017-04-06 08:44:17 DEBUG juju.worker.apiaddressupdater apiaddressupdater.go:88 updating API hostPorts to [[10.245.136.8:17070]]

Now, I don't see where it is getting "2f4c334e" as the model or controller UUID:
Internally we are trying to connect to a 69235fc7 UUID:

2017-04-06 08:44:17 DEBUG juju.api apiclient.go:678 dialing "wss://10.245.136.8:17070/model/69235fc7-3dea-470a-8567-8e31e5524c3e/api"

I don't see 2f4c334e anywhere in machine-0.log.

Where would a bogus UUID be coming from? Why would it be affecting this, but not showing up in our CI? I don't think we let you specify a UUID anywhere, but maybe your client library is doing something?

Changed in juju:
status: New → Incomplete
Revision history for this message
Brendan Donegan (brendan-donegan) wrote : Re: [Bug 1679948] Re: juju bootstrap is failing in the MAAS CI lab

The environment is the same between archive and PPA based testing. We use
the juju bootstrap command directly. The only 'funny business' of any kind
is that this is behind a proxy, which set in a config file before calling
juju bootstrap.

On Mon, 10 Apr 2017 at 12:21 Anastasia <email address hidden>
wrote:

> ** Changed in: juju
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1679948
>
> Title:
> juju bootstrap is failing in the MAAS CI lab
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1679948/+subscriptions
>

Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

FYI, this is what we do:

http://paste.ubuntu.com/24354609/

Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

I should also point out that this failure is not peculiar to MAAS 2.2 so I think it's unlikely *we* broke anything.

Changed in juju:
status: Incomplete → New
Revision history for this message
Christian Muirhead (2-xtian) wrote :

We tried connecting with a small websocket client that I made to talk to the API. That worked correctly, annoyingly, but we could see that trying to wget the websocket port with https_proxy did yield a Forbidden error that looked like the one in the logs.

I think the reason the client worked is that it's ignoring the http_proxy settings in the env and connecting directly. I've tweaked it a bit to explicitly get the proxy and use it, and I can see it failing if I use it with a proxy here (Tinyproxy). Brendan, when you get a chance can you try testing with the attached version of the client instead?

I can get it work via my proxy if I change the Tinyproxy configuration to allow CONNECT on port 17070. So I think that might be what's happening here. One way to fix it might be to set no_proxy with the address of the possible juju controller nodes (although that's a pain since it might be one of many nodes and the addresses are assigned by maas).

Can we try configuring the proxy to allow CONNECT on port 17070?

Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

Indeed, using the updated client reproduces the issue correctly. I tried allowing CONNECT on 17070 by adding:

acl Safe_ports port 17070

to /etc/squid3/squid.conf

and restarting squid, but it didn't help. I'm not deeply familiar with Squid so not sure if that was the right thing to do?

Changed in juju:
status: New → In Progress
assignee: nobody → Christian Muirhead (2-xtian)
Revision history for this message
Christian Muirhead (2-xtian) wrote :

I'm finding the Squid docs kind of hard to understand, I think because the ACL stuff is pretty flexible. From my reading it sounds like the Safe_ports ACL controls just whether you can request something from that port at all, and SSL_ports is whether you can use CONNECT to talk to that port (it's a more specific permission since CONNECT makes an unfiltered connection to the destination).

So I think you might also need to add this line to the config?

acl SSL_ports port 17070

Changed in juju:
status: In Progress → Incomplete
assignee: Christian Muirhead (2-xtian) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.