juju 2.0.0 bootstrap to lxd fails (connect to wrong "remote" IP address)

Bug #1633788 reported by Mikko Tanner on 2016-10-16
114
This bug affects 22 people
Affects Status Importance Assigned to Milestone
OpenStack Charm Test Infra
High
Ryan Beisner
juju
Critical
Andrew Wilkins

Bug Description

Trying to bootstrap a juju 2.0.0 controller on Xenial LXD host. Command and error message:

$ juju bootstrap --config juju2-bootstrap.yaml lxd lxd-xenial --debug

[...]
2016-10-15 23:02:54 INFO juju.cmd supercommand.go:63 running jujud [2.0.0 gc go1.6.2]
2016-10-15 23:02:54 DEBUG juju.cmd supercommand.go:64 args: []string{"/var/lib/juju/tools/2.0.0-xenial-amd64/jujud", "bootstrap-state", "--timeout", "20m0s", "--data-dir", "/var/lib/juju", "--debug", "/var/lib/juju/bootstrap-params"}
2016-10-15 23:02:54 DEBUG juju.agent agent.go:509 read agent config, format "2.0"
2016-10-15 23:02:54 DEBUG juju.tools.lxdclient client.go:199 connecting to LXD remote "remote": "10.10.1.254:8443"
2016-10-15 23:02:54 ERROR cmd supercommand.go:458 new environ: creating LXD client: Get https://10.10.1.254:8443/1.0: Unable to connect to: 10.10.1.254:8443
2016-10-15 23:02:54 DEBUG cmd supercommand.go:459 (error details: [{github.com/juju/juju/cmd/jujud/bootstrap.go:144: new environ} {github.com/juju/juju/provider/lxd/provider.go:32: } {github.com/juju/juju/provider/lxd/environ.go:59: } {github.com/juju/juju/provider/lxd/environ_raw.go:71: creating LXD client} {github.com/juju/juju/provider/lxd/environ_raw.go:107: } {github.com/juju/juju/tools/lxdclient/client.go:124: } {github.com/juju/juju/tools/lxdclient/client.go:241: } {Get https://10.10.1.254:8443/1.0: Unable to connect to: 10.10.1.254:8443}])
[...]

The issue seems to be that the agent running in the LXD container being created uses the gateway's IP (10.10.1.254) instead of the LXD host's (10.10.1.80). This is a clean juju-2 installation.

Connections to outside internet work fine (through the gateway and http/apt proxy). I can isolate the issue to juju with the following little change (at the gateway machine): redirecting the port 8443 back to the LXD host from the gateway (masq/DNAT) allows the bootstrap to proceed and results in a working controller (LXD container) that can install charms. This way, traffic is bounced through the gateway which obviously is wrong. Removing the DNAT rules stops the controller from working after bootstrap.

There seems to be no way to specify the LXD host's address by hand. Referring to this pull: https://github.com/juju/juju/pull/6078 (info from bug https://bugs.launchpad.net/juju/+bug/1618636 ). However, the suggested workaround there is wrong/incomplete as I already have https_address set in LXD and the whole shebang works if I redirect port 8443 back from gateway.

---
Configuration (on LXD host machine):

$ ip -4 address show up
6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65500 qdisc pfifo_fast state UP
    inet 10.10.10.80/24 brd 10.10.10.255 scope global ib0
10: br-int: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN
    inet 10.10.1.80/24 brd 10.10.1.255 scope global br-int

(br-int is an openvswitch bridge)

$ ip r
default via 10.10.1.254 dev br-int onlink
10.10.1.0/24 dev br-int proto kernel scope link src 10.10.1.80
10.10.3.2 via 10.10.1.254 dev br-int proto zebra metric 20
10.10.9.0/24 via 10.10.1.254 dev br-int proto zebra metric 20
10.10.10.0/24 dev ib0 proto kernel scope link src 10.10.10.80

$ cat juju2-bootstrap.yaml
http-proxy: http://10.10.1.254:8080
https-proxy: http://10.10.1.254:8080
apt-http-proxy: http://10.10.1.254:3142
apt-https-proxy: http://10.10.1.254:3142
no-proxy: localhost,10.10.1.254,10.10.1.80

$ groups
USERNAME adm disk cdrom sudo dip plugdev libvirtd lpadmin sambashare lxd

$ lxc info
apiextensions: []
apistatus: stable
apiversion: "1.0"
auth: trusted
environment:
  addresses:
  - 10.10.10.80:8443
  - 10.10.1.80:8443
  architectures:
  - x86_64
  - i686
  certificate: [...]
  driver: lxc
  driverversion: 2.0.4
  kernel: Linux
  kernelarchitecture: x86_64
  kernelversion: 4.4.0-38-generic
  server: lxd
  serverpid: 17369
  serverversion: 2.0.4
  storage: zfs
  storageversion: "5"
config:
  core.https_address: '[::]'
  core.proxy_http: http://10.10.1.254:8080
  core.proxy_https: http://10.10.1.254:8080
  core.proxy_ignore_hosts: localhost,10.10.1.254,10.10.1.80
  core.trust_password: true
  images.auto_update_interval: "24"
  images.remote_cache_expiry: "14"
  storage.zfs_pool_name: data/lxd
public: false

$ apt-cache policy juju
juju:
  Installed: 1:2.0.0-0ubuntu1~16.04.2~juju1
  Candidate: 1:2.0.0-0ubuntu1~16.04.2~juju1
  Version table:
 *** 1:2.0.0-0ubuntu1~16.04.2~juju1 100
        100 /var/lib/dpkg/status
     2.0~beta15-0ubuntu2.16.04.1 500
        500 http://XX.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        500 http://XX.archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages
     2.0~beta4-0ubuntu2 500
        500 http://XX.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        500 http://XX.archive.ubuntu.com/ubuntu xenial/main i386 Packages

Changed in juju:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Richard Harding (rharding)
milestone: none → 2.0.1
Curtis Hovey (sinzui) on 2016-10-28
Changed in juju:
milestone: 2.0.1 → none
Mikko Tanner (shapemaker) wrote :

This issue completely prevents me from setting up a juju-2.0.0 controller on LXD. Looking at the code (now that I had some time) reveals clearly that the new (fallback?) mechanism is the reason for the wrong address being used:
https://github.com/juju/juju/blob/staging/provider/lxd/environ_raw.go#L140

Are bridges, where the LXD host itself is handling the subnetting (and is the default gateway) the only ones supported now? If not, there needs to be a way to define the host address manually, or the LXD address resolver needs to be fixed.

Changed in juju:
milestone: none → 2.0.2
Changed in juju:
milestone: 2.0.2 → 2.0.3
Benoit (benoit-i) wrote :

Anyone can describe how a workaround ? How to Update to 2.0.3 or to be able to use juju with lxd with 2.0 version

How to avoid bootstrap to look for gateway IP ?

I can't understand why it use lxd local to do everything and then use > connecting to LXD remote "remote": IP_GW

In my case, this happen only using a custom bridge without DNS resolv .
If i use lxdbr0 it works well .

I guess local DNS resolution of juju* is mandatory ?

Benoit (benoit-i) wrote :

What I did to be able to use juju bootloader is to wait the new container to be RUNNING and before the end of the process add iptables rules :

    iptables -t nat -A OUTPUT -d ${WRONG_IP} -p tcp --dport 8443 -j DNAT --to-destination ${GOOD_IP}:8443

Barry Price (barryprice) on 2016-11-10
tags: added: canonical-is
Mikko Tanner (shapemaker) wrote :

Ping! Could someone from the dev team who takes care of this piece of code comment on the logic behind using gateway IP as the target? Will you only support bridges where the LXD host manages IP addressing? This makes setups with custom bridges unusable.

John A Meinel (jameinel) wrote :

This was a mistake that we are working to correct. There was a patch that was trying to simplify the configuration for the LXD provider, but it was a mistake to remove the host IP from what we needed to track. I believe the simple test cases all worked because they were all using NAT, where the host *is* the gateway. However, many people have pointed out they want to expose the containers directly onto the network, and there the gateway is not the host machine.

AIUI we should expect to revert the behavior introduced in 2.0. I'm not sure whether that ends up in 2.1 or in 2.0.x.

Steven McCullie (stevyn) wrote :

Does anyone has a work around for this tried

Benoit (benoit-i) wrote on 2016-11-09:

"iptables -t nat -A OUTPUT -d ${WRONG_IP} -p tcp --dport 8443 -j DNAT --to-destination ${GOOD_IP}:8443"

But didn't work for me when trying on the lxd container, maas, or host system.

Ryan Beisner (1chb1n) wrote :

This impacts OpenStack Charm CI's ability to leverage Juju 2 + LXD provider for charm testing.

tags: added: uosci
Ryan Beisner (1chb1n) wrote :

Tried with 2.0.2 proposed as well. Juju tries to talk to my default network gateway as if it were a lxd host.

2016-12-07 21:03:16 INFO juju.cmd supercommand.go:63 running jujud [2.0.2.1 gc go1.6.2]
2016-12-07 21:03:26 ERROR cmd supercommand.go:458 new environ: creating LXD client: Get https://10.x.x.1:8443/1.0: Unable to connect to: 10.x.x.1:8443
ERROR failed to bootstrap model: subprocess encountered error code 1

⟫ apt-cache policy juju
juju:
  Installed: 1:2.0.2-0ubuntu1~16.04.1~juju1
  Candidate: 1:2.0.2-0ubuntu1~16.04.1~juju1
  Version table:
 *** 1:2.0.2-0ubuntu1~16.04.1~juju1 500
        500 http://ppa.launchpad.net/juju/proposed/ubuntu xenial/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2.0.1-0ubuntu1~16.04.4~juju1 500
        500 http://ppa.launchpad.net/juju/stable/ubuntu xenial/main amd64 Packages
     2.0.0-0ubuntu0.16.04.2 500
        500 http://archive.ubuntu.com//ubuntu xenial-updates/main amd64 Packages
     2.0~beta4-0ubuntu2 500
        500 http://archive.ubuntu.com//ubuntu xenial/main amd64 Packages
     1.25.6-0ubuntu1~16.04.1~juju1 500
        500 http://ppa.launchpad.net/juju/proposed/ubuntu xenial/main amd64 Packages
        500 http://ppa.launchpad.net/juju/stable/ubuntu xenial/main amd64 Packages

Ryan Beisner (1chb1n) on 2016-12-07
Changed in charm-test-infra:
status: New → Confirmed
tags: added: kanban-cross-team
tags: removed: kanban-cross-team
James Page (james-page) on 2017-01-03
Changed in charm-test-infra:
importance: Undecided → High
Vance Morris (vmorris) wrote :

Confirmed, this is still a problem in:
ubuntu@zs93kvi:~$ lxd --version
2.0.8
ubuntu@zs93kvi:~$ juju version
2.1-beta4-xenial-s390x

Bootstrap debug log: http://paste.ubuntu.com/23772244/

I have tried the iptables NAT workaround but it did not help.

With 10.20.123.254 as the default gateway:

2017-01-09 19:48:31 DEBUG juju.tools.lxdclient client.go:199 connecting to LXD remote "remote": "10.20.123.254:8443"
2017-01-09 19:48:31 ERROR cmd supercommand.go:458 new environ: creating LXD client: Get https://10.20.123.254:8443/1.0: Forbidden

Ryan Beisner (1chb1n) on 2017-01-10
tags: added: s390x
Ryan Beisner (1chb1n) on 2017-01-14
Changed in charm-test-infra:
assignee: nobody → Ryan Beisner (1chb1n)
Ryan McAdams (ryanmcadams) wrote :

2017-01-15 19:17:23 INFO juju.cmd supercommand.go:63 running jujud [2.0.2 gc go1.6.2]
2017-01-15 19:17:26 ERROR cmd supercommand.go:458 new environ: creating LXD client: Get https://172.16.10.1:8443/1.0: Unable to connect to: 172.16.10.1:8443
ERROR failed to bootstrap model: subprocess encountered error code 1
root@charms:/etc# lxd --version
2.7
root@charms:/etc# juju --version
2.0.2-zesty-amd64
root@charms:/etc#

Happening on Juju 2.02 and LXD 2.7 also.

Mikko Tanner (shapemaker) wrote :

It has now been 4 months, but apparently this bug has gone nowhere. Maybe as a quick fix just put tracking of LXD hostname/IP back? That would fix the misbehaviour while a more "proper" solution can be devised. As it is, Juju on LXD is broken, IMO.

Andrew Wilkins (axwalk) wrote :

@Mikko: apologies, I wasn't aware of this bug ID specifically, hence lack of communication. There is a workaround for the 2.1 branch referenced in lp:1640455. We'll look at reverting behaviour so that things work OOTB as they did before.

Mikko Tanner (shapemaker) wrote :

Thank you Andrew!

So to recap, the workaround for people who need this working now (confirmed) is:

$ sudo add-apt-repository ppa:juju/devel
$ sudo apt-get update && sudo apt-get upgrade
$ juju --version
2.1-beta4-xenial-amd64

$ cat ~/.local/share/juju/clouds.yaml
clouds:
    lxd:
      type: lxd
      endpoint: <lxd host IP address>

Then bootstrap as documented.

Ryan Beisner (1chb1n) wrote :

I'd consider that a confirmation of an upcoming bug fix, and not a work-around. It requires users to deploy an unreleased development version, which is not a viable principle for many production users.

Andrew Wilkins (axwalk) on 2017-01-24
Changed in juju:
status: Triaged → In Progress
assignee: Richard Harding (rharding) → Andrew Wilkins (axwalk)
Andrew Wilkins (axwalk) wrote :

Ryan: indeed, it's only a workaround for those users who can use the development version. I'm working on getting a fix into 2.1. I'm not sure that we can reasonably fix this for 2.0.x, because it's a pretty significant change that could affect other non-LXD users.

My current WIP changes are against 2.2, but I will backport them once they're finalised. The WIP branches are at: https://github.com/juju/juju/pull/6878 and https://github.com/juju/juju/pull/6880.

Changed in juju:
milestone: 2.0.3 → 2.1-rc1
John A Meinel (jameinel) on 2017-02-01
Changed in juju:
milestone: 2.1-rc1 → 2.1-beta5
Andrew Wilkins (axwalk) on 2017-02-01
Changed in juju:
status: In Progress → Fix Committed
Andrew Wilkins (axwalk) wrote :

This should be fixed in 2.1 with https://github.com/juju/juju/pull/6887.

Curtis Hovey (sinzui) on 2017-02-03
Changed in juju:
status: Fix Committed → Fix Released
Ryan Beisner (1chb1n) wrote :

Where can this released fix be consumed? I don't see 2.1 available in the stable ppa.

Spyderdyne (spyderdyne) wrote :

That is an excellent question and one I have tried several times to find the answer to. That may be up in the air still actually, but I was unable to find it recorded anywhere. Unfortunately the release schedule on the wiki appears to have been abandoned and has not been updated since July 2016:

https://github.com/juju/juju/wiki/Juju-Release-Schedule

I was unable to find any other release schedules published and it appears that they are arbitrary unless they are scheduled somewhere else.

@spyderdyne Though not ideal, the milestones might give a better clue https://launchpad.net/juju/+milestones

So the 2.1 release date was "expected" over 20 days ago and nothing newer
is forecast. This project may benefit from a tighter ship on the scheduling
side. Let me know if you think there is value in persuing this. I am happy
to help as long as its not disruptive.

On Feb 3, 2017 2:11 PM, "Sandor Zeestraten" <email address hidden> wrote:

> @spyderdyne Though not ideal, the milestones might give a better clue
> https://launchpad.net/juju/+milestones
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1633788
>
> Title:
> juju 2.0.0 bootstrap to lxd fails (connect to wrong "remote" IP
> address)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-test-infra/+bug/1633788/+subscriptions
>

Anastasia (anastasia-macmood) wrote :

2.1-beta5 went out late last week \o/

Spyderdyne (spyderdyne) wrote :

The question is about when these Alpha and Beta releases should be expected
in the stable branch. There are no hard dates attached.

Thanks.

On Sun, Feb 5, 2017 at 6:27 PM, Anastasia <email address hidden>
wrote:

> 2.1-beta5 went out late last week \o/
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1633788
>
> Title:
> juju 2.0.0 bootstrap to lxd fails (connect to wrong "remote" IP
> address)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-test-infra/+bug/1633788/+subscriptions
>

Ryan Beisner (1chb1n) wrote :

Correction: beta5 is in the *devel* PPA. That should not be considered fix-released. That would fix-committed.

Fix-released should mean that it is available in one or both of: the stable PPA, xenial-updates (latest LTS).

I think it is important for this bug status signaling to be in alignment with Ubuntu:
https://wiki.ubuntu.com/Bugs/Bug%20statuses

As of today, Feb 6, here is what I see:
ppa:juju/devel == 2.1~beta5-0ubuntu1~16.04.5~juju1
ppa:juju/proposed == 2.0.2-0ubuntu1~16.04.1~juju1
ppa:juju/stable == 2.0.2-0ubuntu1~16.04.1~juju1
xenial-updates == 2.0.0-0ubuntu0.16.04.2

⟫ apt-cache policy juju
juju:
  Installed: 1:2.1~beta3-0ubuntu1~16.04.1~juju1
  Candidate: 1:2.1~beta5-0ubuntu1~16.04.5~juju1
  Version table:
     1:2.1~beta5-0ubuntu1~16.04.5~juju1 500
        500 http://ppa.launchpad.net/juju/devel/ubuntu xenial/main amd64 Packages
        500 http://ppa.launchpad.net/juju/devel/ubuntu xenial/main i386 Packages
 *** 1:2.1~beta3-0ubuntu1~16.04.1~juju1 100
        100 /var/lib/dpkg/status
     1:2.0.2-0ubuntu1~16.04.1~juju1 500
        500 http://ppa.launchpad.net/juju/proposed/ubuntu xenial/main amd64 Packages
        500 http://ppa.launchpad.net/juju/proposed/ubuntu xenial/main i386 Packages
        500 http://ppa.launchpad.net/juju/stable/ubuntu xenial/main amd64 Packages
        500 http://ppa.launchpad.net/juju/stable/ubuntu xenial/main i386 Packages
     2.0.0-0ubuntu0.16.04.2 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages
     2.0-beta15-0ubuntu1~16.04.1~juju1 500
        500 http://ppa.launchpad.net/juju/devel/ubuntu xenial/main amd64 Packages
        500 http://ppa.launchpad.net/juju/devel/ubuntu xenial/main i386 Packages
     2.0~beta4-0ubuntu2 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        500 http://us.archive.ubuntu.com/ubuntu xenial/main i386 Packages
     1.25.6-0ubuntu1~16.04.1~juju1 500
        500 http://ppa.launchpad.net/juju/proposed/ubuntu xenial/main amd64 Packages
        500 http://ppa.launchpad.net/juju/proposed/ubuntu xenial/main i386 Packages
        500 http://ppa.launchpad.net/juju/stable/ubuntu xenial/main amd64 Packages
        500 http://ppa.launchpad.net/juju/stable/ubuntu xenial/main i386 Packages

Reference:
http://packages.ubuntu.com/xenial-updates/juju
https://launchpad.net/~juju/+archive/ubuntu/stable
https://launchpad.net/~juju/+archive/ubuntu/proposed
https://launchpad.net/~juju/+archive/ubuntu/devel

Greg Lutostanski (lutostag) wrote :

Hitting this still, with the snap on both versions:
beta: 2.1-rc1 (814) 38MB devmode
edge: 2.1-edge (822) 57MB devmode

Nothing seems to help. Can't bootstrap at all.

Andrew Wilkins (axwalk) wrote :

Greg, I have just tested again with 814 and it worked for me. Can you please share your network and LXD default profile configuration, and the output of "juju bootstrap --debug"? My config is below.

$ cat /etc/network/interfaces
auto lo
iface lo inet loopback

auto br0
iface br0 inet dhcp
    bridge-ifaces enp0s31f6
    bridge-ports enp0s31f6
    up ifconfig enp0s31f6 up

iface enp0s31f6 inet manual

$ lxc profile show default
name: default
config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic

Changed in juju:
status: Fix Released → Incomplete
status: Incomplete → Fix Released
Andrew Wilkins (axwalk) wrote :

Greg, seeing as you're just using lxdbr0, this is a bit different. It should just work, and should have been working before my latest changes.

Does your machine have 10.232.128.1 on lxdbr0? Is LXD listening on port 8443? Wild guess but do you have both the LXD snap and debs installed? Having them both might lead to this kind of error.

I've been testing this periodically as new juju 2.1 builds have appeared in the devel ppa but I'm still hitting the issue

Attempt 1 to download tools from https://streams.canonical.com/juju/tools/agent/2.1-rc1/juju-2.1-rc1-ubuntu-amd64.tgz...
tools from https://streams.canonical.com/juju/tools/agent/2.1-rc1/juju-2.1-rc1-ubuntu-amd64.tgz downloaded: HTTP 200; time 9.187s; size 24875903 bytes; speed 2707719.000 bytes/s Tools downloaded successfully.
9c73654ade643b0946bb17155def73ce330bd9fe09b4e440064c18b73a4b4e5d /var/lib/juju/tools/2.1-rc1-xenial-amd64/tools.tar.gz
8b82695119d505d766869244ca5cdcbbdf8d4ccc9afd4d97e91cd0cf48173a7c /var/lib/juju/gui/gui.tar.bz2
2017-02-16 11:59:41 INFO juju.cmd supercommand.go:63 running jujud [2.1-rc1 gc go1.6]
2017-02-16 12:00:56 ERROR cmd supercommand.go:458 new environ: Get https://10.87.204.1:8443/1.0: Service Unavailable

(after a juju-2.1 bootstrap --config http-proxy="${http_proxy}" --config https-proxy="${https_proxy}" --config apt-http-proxy="${http_proxy}" localhost prawn)

ii juju-2.0 1:2.1~rc1-0ubuntu1~16.04.1~juju1 amd64 Juju is devops distilled - client
ii lxd 2.8-0ubuntu1~ubuntu16.04.1~ppa1 amd64 Container hypervisor based on LXC - daemon

No snaps installed

Am I missing something? Or is this bug still there?

Andrew Wilkins (axwalk) wrote :

Stephen, I think in your case it's due to the proxy configuration. Are you expecting for Juju to connect to LXD via the proxy? It appears that we have a bug to do with proxying at bootstrap time, but I'm not sure if it's relevant here.

Andrew Wilkins (axwalk) wrote :

PR https://github.com/juju/juju/pull/7002, which will be in 2.1-rc2, may help when using a proxy.

Many thanks Andrew, adding --config no-proxy=... to the bootstrap command seems to have done the trick and I no longer hit this error (I'm still struggling with subsequent proxy images when subsequently deploying things to this controller but thats more an issue with those containers and the environment).

Greg Lutostanski (lutostag) wrote :

I do not have the lxd snap installed, and this only shows up with juju 2.1 releases for me.
It has always just worked in the past.
Yes, 10.232.128.1 is on lxdbr0, seems like a relatively clean install/never had problems before.
So definitely still a regression/blocker imho.

Andrew Wilkins (axwalk) wrote :

I agree this is critical, I'm reopening this. Can you please check that LXD is in fact listening on port 8443 on the host? The output of the following on the host might shed some light:

  sudo netstat -nap | Grep 8443
  lxc info
  ifconfig

There were some more LXD-related changes in 2.1-rc2 (the new rc2), so please confirm the issue is still there with the latest.

Changed in juju:
status: Fix Released → In Progress
milestone: 2.1-beta5 → 2.1.0
Andrew Wilkins (axwalk) on 2017-02-18
Changed in juju:
status: In Progress → Incomplete
tags: added: regression
tags: added: eda
Greg Lutostanski (lutostag) wrote :

https://paste.ubuntu.com/24002750/

But I don't know how this happened. I am unable to bootstrap any juju version on my box.

2.0.2 nor 2.1.x lxd for me otherwise works 100% fine (other than juju falling over with this issue)

Greg Lutostanski (lutostag) wrote :

http://pastebin.ubuntu.com/24043354/ is the paste you are looking for. Sorry about that.

Greg Lutostanski (lutostag) wrote :

Apologies, it was ufw acting up. Thanks for the help, sorry for the noise.

Changed in juju:
status: Incomplete → Fix Released
Curtis Hovey (sinzui) on 2017-02-22
Changed in juju:
milestone: 2.1.0 → 2.1-rc2
Andrew Wilkins (axwalk) wrote :

Phew :)
Thanks anyway, Greg. Glad we got to the bottom of it! We should do an initial network connectivity check when bootstrapping on LXD. I'll file another bug to investigate doing that.

Gaël THEROND (fl1nt) wrote :

Hi @axwalk,

I'm currently facing the exact same issue using Juju AND lxd as snaps.

I've try to use Juju stable 2.1.1 release and Juju edge 2.2-alpha1+develop-a6abc2e rev 1035.
My lxd snap is stable 2.8 rev 976

and I'm on a Ubuntu 16.04 LTS.

Here is my debug output using Alpha version, which is the exact same output than with stable and my network configuration:

http://paste.ubuntu.com/24147192/

LXD snaps is working fine.

Andrew Wilkins (axwalk) wrote :

Gaël, do you also have the LXD deb package installed? Can you please confirm that the snap LXD is listening on 10.73.144.1:8443? If both deb and snap are installed, and both configured to listen on port 8443, then Juju may be connecting to the wrong one over HTTPS.

Gaël THEROND (fl1nt) wrote :

Hi Andrew, I don't have installed the DEB package and just verified that it is NOT installed on the system.

snap LXD is correctly listening on 10.73.144.1:8443

I'll be happy to provide any needed logs/debug outputs.

Gaël THEROND (fl1nt) wrote :

For informations, just tested the APT method and everything is working smoothly, I think that SNAPS method may miss some patch :D

Does the snaps made from the same sources branches and sync with the APT versions?

Gaël THEROND (fl1nt) wrote :

Ok, here is a complete trace:

http://paste.ubuntu.com/24153104/

What's make me nuts is that Juju at the initial step is able to communicate correctly with the LXD snaps and retrieve the created underlying container.

The issue seems to be coming from the Juju.tools.lxdclient module which can't either:
- save the retrieved informations from LXD later reuse.
- reuse properly saved informations
- call the retrieving information mechanism once again at the end of the process.

or the issue is a completely different reason :D
In any case, the error message is not really pertinent as it do not really reflect what's going on.

Andrew Wilkins (axwalk) wrote :

Gaël, I cannot reproduce the issue with either the deb or snap LXD.

The failure is occurring in code that runs inside the container: "juju bootstrap" connects to LXD and starts a container, and then injects Juju code into the container; that code is what is failing to connect to LXD.

If you could run bootstrap with "--keep-broken", that initial container will be left behind. This would allow you to enter the container and find out why it cannot connect to the LXD socket.

Gaël THEROND (fl1nt) wrote :

Thanks a lot @axwalk, I've set my juju bootstrap with the --keep-broken argument and here is what I can found so far on the LXC container.

dmesg output related to APPArmor:
http://paste.ubuntu.com/24169814/

There is couple of denied related to snap lxc and capability requirements.

Here is the bootstrap-params YAML file:
http://paste.ubuntu.com/24169845/

We can see that it's trying to use the gateway as endpoint, I don't think that it should be that way.

I redacted the SSL CERTS as I don't think it's relevant to you.

There is no log at all on the /var/log/juju directory.

Let me know if you want me to extract something else from this container.

Andrew Wilkins (axwalk) wrote :

Gaël, sorry for the delay in responding.

> We can see that it's trying to use the gateway as endpoint, I don't think that it should be that way.

Why? Have you configured LXD to use an alternative bridge device? By default, we expect the containers to communicate with the LXD host on the lxdbr0 address.

Can you please provide the output of "lxc profile show default", and "lxc network list". Assuming the default profile uses lxdbr0 for the primary NIC, please also include "lxc network show lxdbr0".

Gaël THEROND (fl1nt) wrote :

Hi Andrew, no pb,

No, I mean, I just though that juju would try to contact the container throught its IP on the bridge range, which in my case would have been 10.131.62.135:8443 instead of 10.131.62.1:8443.

here is the result for lxc profile show default:
http://paste.ubuntu.com/24187741/

here is the lxc network list:
http://paste.ubuntu.com/24187746/

here is the lxc network show lxdbr0:
http://paste.ubuntu.com/24187748/

At this point of the bootstrap procedure, which binary is supposed to be called?
I could try to manually start it and see what's going on.

Andrew Wilkins (axwalk) wrote :

> No, I mean, I just though that juju would try to contact the container throught its IP on the bridge range, which in my case would have been 10.131.62.135:8443 instead of 10.131.62.1:8443.

Just to make sure we're both on the same page: it's happening the other way around. The container (.135) needs to contact the host (.1). Think of the host as the "cloud" API endpoint, with which the Juju controller (running in a container) communicates.

The command that Juju runs at bootstrap time, inside the container, is:
    /var/lib/juju/tools/2.1.1-xenial-amd64/jujud bootstrap-state --timeout 20m0s --data-dir /var/lib/juju --debug /var/lib/juju/bootstrap-params

You could also just try "curl -k https://10.131.62.1:8443" from inside the container. If that doesn't work, then something is screwy with the networking/firewalling.

I've tried making my LXD network configuration the same as yours, but still it all works for me.

Gaël THEROND (fl1nt) wrote :

Hi @andrew, Sorry for the late answer, I was on an extended weekend.

Ok, so, I had an issue that totally fucked up this laptop, so I clean installed Ubuntu 16.04.2 LTS again, everything is default now, and I've just made the snap install lxd/juju.

I'll test again with you new informations and see if this error come back again or not.

Thanks for your support and advise on how juju is working, I'll take a look at this.

Ryan Beisner (1chb1n) on 2017-03-24
Changed in charm-test-infra:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers