juju bootstrap hangs forever at "Attempting to connect to 10.0.4.130:22"

Bug #1644566 reported by Martin Pitt
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
juju-core
Invalid
Undecided
Unassigned
golang (Ubuntu)
Confirmed
Undecided
Michael Hudson-Doyle
juju-core (Ubuntu)
Invalid
High
Unassigned
lxd (Ubuntu)
Fix Released
High
Unassigned

Bug Description

In current zesty, "juju-2.0 bootstrap lxd localhost" does:

Creating Juju controller "lxd" on localhost/localhost
Looking for packaged Juju agent version 2.0-rc3 for amd64
Launching controller instance(s) on localhost/localhost...
 - juju-5fc6db-0
Fetching Juju GUI 2.2.3
Waiting for address
Attempting to connect to 10.0.4.130:22

and then hangs forever. It does create the container:

| juju-5fc6db-0 | RUNNING | 10.0.4.130 (eth0) | | PERSISTENT | 0 |

and it runs:

$ lxc exec juju-5fc6db-0 systemctl is-active ssh
active

but ssh into it does not work:

Warning: Permanently added '10.0.4.130' (ECDSA) to the list of known hosts.
Permission denied (publickey).

Indeed it does not install an ssh key:

$ lxc exec juju-5fc6db-0 -- ls -l /home/ubuntu/.ssh
total 0
-rw------- 1 ubuntu ubuntu 0 Nov 24 14:04 authorized_keys

I guess it's supposed to, as apparently password authentication is disabled.

I didn't find a --debug switch or log file or anything similar -- can you reproduce this? If not, what can I do to provide further debugging?

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: juju-2.0 2.0~rc3-0ubuntu4.16.10.1
ProcVersionSignature: Ubuntu 4.8.0-29.31~lp1626436ProposedWithTwoPatches-generic 4.8.8
Uname: Linux 4.8.0-29-generic x86_64
ApportVersion: 2.20.3-0ubuntu8
Architecture: amd64
CurrentDesktop: i3
Date: Thu Nov 24 15:01:52 2016
EcryptfsInUse: Yes
SourcePackage: juju-core
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Martin Pitt (pitti) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :

As a data point, I was doing the same last week in zesty, and it worked perfectly. Also juju-core itself did not change since then (it's the same version as in yakkety), so something underneath it broke.

I confirmed this bug with 2.0 final in zesty-proposed, same behaviour.

Martin Pitt (pitti)
summary: - juju bootstrap hangs forever at "Attempting to connect to 10.0.4.32:22"
+ juju bootstrap hangs forever at "Attempting to connect to 10.0.4.130:22"
description: updated
Revision history for this message
Martin Pitt (pitti) wrote :

This is perfectly reproducible on a pristine zesty cloud image in a VM after installing and configuring lxd first, then juju-2.0.

I also tried on a pristine xenial cloud VM, and I don't even get that far:

$ juju-2.0 bootstrap lxd localhost
Creating Juju controller "lxd" on localhost/localhost
Bootstrapping model "controller"
Starting new instance for initial controller
Launching instance
ERROR failed to bootstrap model: cannot start bootstrap instance: unable to get LXD image for ubuntu-xenial: The requested image couldn't be found.

However, "lxc launch ubuntu:xenial x1" works just fine there.

I tried in a pristine yakkety cloud VM, and it does work there. It installs my public ssh key:

ubuntu@juju-3d6bc3-0:~$ ls -la .ssh/authorized_keys
-rw------- 1 ubuntu ubuntu 402 Nov 24 14:23 .ssh/authorized_keys

So I'm fairly sure it's the authorized_keys setup which is to blame for breaking bootstrapping.

Revision history for this message
Martin Pitt (pitti) wrote :

Another attempt (back at zesty): I lxc exec'ed into the container and ran "ssh-import-id lp:pitti" as ~ubuntu, and now I can ssh to the created container. But juju bootstrap doesn't seem to recognize this and re-try, it just still sits there attempting to ssh. There is no ssh client process running.

Revision history for this message
Martin Pitt (pitti) wrote :

Downgrading lxd from zesty's 2.6 to yakkety's 2.4.1 (which was also the zesty version until last week) fixes this. So this is somewhere between an lxd regression or juju needing to be updated for lxd 2.6. As lxd usually gets backported fairly quickly, raising severity.

Changed in juju-core (Ubuntu):
importance: Undecided → High
Changed in lxd (Ubuntu):
importance: Undecided → High
Revision history for this message
Stéphane Graber (stgraber) wrote :

I've confirmed the issue and so far also confirmed that an upstream build of 2.6 isn't affected, while a distro build is. The bug shows up as all files at /var/lib/cloud/seed/nocloud-net/* being empty.

My current guess is that it's either a Go shared library bug or a bug in one of the go source packages we pull from the archive. I'm doing more tests to track this down now.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Rebuilding LXD with shlibs disabled but still using the in-archive source packages fixes the issue. So this is yet-another Go shared library regression with 1.7 I suspect.

Closing lxd and juju tasks, adding golang-go task and assigning to mwhudson.

Changed in juju-core:
status: New → Invalid
Changed in juju-core (Ubuntu):
status: New → Invalid
Changed in lxd (Ubuntu):
status: New → Invalid
Changed in golang (Ubuntu):
assignee: nobody → Michael Hudson-Doyle (mwhudson)
Revision history for this message
Stéphane Graber (stgraber) wrote :

To reproduce the issue, do:

  lxc launch ubuntu:16.04 abcd
  lxc file pull abcd/var/lib/cloud/seed/nocloud-net/meta-data -

With a LXD built using shlibs, this will return nothing (empty string), if running a LXD build without shlibs, this will return the expected content:
  #cloud-config
  instance-id: abcd
  local-hostname: abcd

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxd - 2.6-0ubuntu3

---------------
lxd (2.6-0ubuntu3) zesty; urgency=medium

  * Disable Go shared libraries as they cause a file templating regression.
    (LP: #1644566)

 -- Stéphane Graber <email address hidden> Thu, 24 Nov 2016 13:30:29 -0500

Changed in lxd (Ubuntu):
status: Invalid → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in golang (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.