1.25.6 cannot deploy on CI maas 1.9 or 1.8

Bug #1576021 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-ci-tools
Invalid
Critical
Unassigned
juju-core
Invalid
Critical
Unassigned
1.25
Fix Released
Critical
James Tunnicliffe

Bug Description

As seen in
    http://reports.vapour.ws/releases/3928
and will be see in
    http://reports.vapour.ws/releases/3931

Juju 1.25.6 cannot bootstrap on the two Juju CI MAASes. I suspect networking issues with the maases, but I think new eyes need to look at the evidence and data to determine what is broken in CI or Juju.

1. We know that the 1.25.5 hourly cloud health checks are fine. This juju can deploy a trivial stack on maas 1.8 and maas 1.9 at the juju under test cannot. We also know that Juju 2.0-beta7 can use these masses.

2, We see upgrades work. we can deploy a trivial 1.25.5 (and older versions) stack then upgrade it. The agents are found in streams and upgrades are quick.

3. bootstrapping 1.25.6 on maas 1.9 fails to download agents from stream
    http://reports.vapour.ws/releases/3931/job/maas-1_9-deploy-trusty-amd64/attempt/2259
    curl: (7) Couldn't connect to server
    tools from https://swift.canonistack.canonical.com/v1/AUTH_526ad877f3e3464589dc1145dfeaac60/juju-dist/parallel-testing/agents/agent/revision-build-3931/juju-1.25.6-trusty-amd64.tgz downloaded: HTTP 000; time 0.000s; size 0 bytes; speed 0.000 bytes/s sha256sum: /var/lib/juju/tools/1.25.6-trusty-amd64/tools.tar.gz: No such file or directory

    The agents are there, and deploys on other substrates that use swift.canonistack.canonical.com succeeded.
    Munna can download the agents, as can upgrading juju 1.25.5 hosts. Juju 2.0-beta7 can bootstrap using these streams.

4. bootstrapping 1.25.6 on maas 1.8 fails because the node's address cannot be resolved.
    http://reports.vapour.ws/releases/3931/job/maas-1_8-deploy-trusty-amd64/attempt/1716
    juju.network address.go:505 removing unresolvable address "maas-node-2.maas": lookup maas-node-2.maas on 10.125.0.1:53: no such host

   There is nothing wrong with maas-node-2.maas. I.1.25.5 can bootstrap it, as can 2.0-beta7. The two addresses look correct and I was able to ssh to the 10.0.200.136 address from munna.
   2016-04-28 00:40:19 INFO juju.network address.go:505 removing unresolvable address "maas-node-2.maas": lookup maas-node-2.maas on 10.125.0.1:53: no such host
   2016-04-28 00:40:19 INFO juju.api apiclient.go:262 dialing "wss://10.0.200.136:17070/environment/9fb17d1c-dba1-4746-8be6-c8d4f22cb9fa/api"

Curtis Hovey (sinzui)
Changed in juju-core:
status: Triaged → Incomplete
importance: Critical → Undecided
milestone: 1.25.6 → none
Changed in juju-ci-tools:
status: New → Triaged
importance: Undecided → High
importance: High → Critical
Curtis Hovey (sinzui)
tags: added: blocker
Revision history for this message
Cheryl Jennings (cherylj) wrote :

This was caused by PR: https://github.com/juju/juju/pull/5263

I was able to verify that the commit before that worked, and this commit caused 1.25 to fail.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

After rebooting the node (after using --keep-broken in a failed bootstrap), I was able to wget the tools.

This problem exists on at least xenial and trusty.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Using --upload-tools seems to mask the problem, and a bootstrap will complete. However, the node still won't have networking. You can recreate the bootstrap issue with a juju built from tip of 1.25, and an agent-version: 1.25.5 in environments.yaml.

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

I have a fix. Will get a PR in ASAP.

Changed in juju-core:
assignee: nobody → James Tunnicliffe (dooferlad)
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

http://reviews.vapour.ws/r/4741/ seems about ready to go.

Changed in juju-core:
status: Incomplete → In Progress
importance: Undecided → Critical
Revision history for this message
Curtis Hovey (sinzui) wrote :

Sorry. http://reports.vapour.ws/releases/3942 does not show much improvement for maas 1.9 or maas 1.8

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Those don't look related.

Revision history for this message
Alexis Bruemmer (alexis-bruemmer) wrote :

If these are unrelated can we please open another bug with details.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Looks like the latest runs on 1.8 failed because the MAAS was out of nodes.

The maas-1_9-deployer job succeeded, so we know MAAS 1.9 is able to bootstrap.

The maas-1_9-upgrade-win2012hvr2-amd64 job is failing when the agents try to log in after the upgrade a needs a new bug.

The maas-1_9-OS-deployer job looks like a problem with the charm config?

Changed in juju-core:
status: In Progress → Fix Released
status: Fix Released → In Progress
Curtis Hovey (sinzui)
Changed in juju-core:
status: In Progress → Incomplete
Curtis Hovey (sinzui)
Changed in juju-core:
status: Incomplete → Invalid
Changed in juju-core:
assignee: James Tunnicliffe (dooferlad) → nobody
Curtis Hovey (sinzui)
Changed in juju-ci-tools:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.