Juju 2.0.3 fails to deploy LXD container lxdbr0 overlapping subnets

Bug #1665648 reported by Teluka
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Horacio Durán
2.1
Fix Released
High
John A Meinel

Bug Description

Juju fails to fully deploy LXD container on machine due to overlapping subnets on main bridge interface and lxdbr0.

Due to shortest prefix match juju agent on the machine is not able to dial back to the controller. Host stack is trying to route packets via lxdbr0 which is basically stub network.

In order to resolve issue IP address has to be removed from lxdbr0 interface.

- juju controller IP: 10.0.11.151/23
- juju machine 1 br-ens3 IP: 10.0.11.152/23
- juju machine 1 lxdbr0 IP: 10.0.11.1/24

I found following events logged by juju-machine:

2017-02-17 12:14:16 INFO juju.container.lxd initialisation_linux.go:404 LXD_IPV4_ADDR is not set; searching for unused subnet
2017-02-17 12:14:16 INFO juju.container.lxd initialisation_linux.go:409 setting LXD_IPV4_ADDR=10.0.11.1

Even though br-ens3 (main interface) has been already assigned IP from 10.0.10.0/23 network.

root@maas-server:~# juju controllers --format=yaml
controllers:
  maas:
    current-model: default
    user: admin
    access: superuser
    recent-server: 10.0.11.151:17070
    uuid: fcae7a6e-6edd-41af-897c-205c7cfd0aa6
    api-endpoints: ['10.0.11.151:17070']
    ca-cert: |#removed
    cloud: maas
    agent-version: 2.0.3
    model-count: 2
    machine-count: 3
    controller-machines:
      active: 1
      total: 1
current-controller: maas

root@maas-server:~# juju deploy ubuntu --to lxd:1

root@maas-server:~# juju status --format=yaml
model:
  name: default
  controller: maas
  cloud: maas
  version: 2.0.3
machines:
  "1":
    juju-status:
      current: down
      message: agent is not communicating with the server
      since: 17 Feb 2017 13:01:20+01:00
      version: 2.0.3
    dns-name: 10.0.11.152
    ip-addresses:
    - 10.0.11.152
    instance-id: x834tt
    machine-status:
      current: running
      message: Deployed
      since: 17 Feb 2017 12:59:58+01:00
    series: xenial
    containers:
      1/lxd/0:
        juju-status:
          current: pending
          since: 17 Feb 2017 13:13:41+01:00
        instance-id: pending
        machine-status:
          current: pending
          since: 17 Feb 2017 13:13:41+01:00
        series: xenial
    hardware: arch=amd64 cores=1 mem=1024M tags=virtual availability-zone=default
applications:
  ubuntu:
    charm: cs:ubuntu-10
    series: xenial
    os: ubuntu
    charm-origin: jujucharms
    charm-name: ubuntu
    charm-rev: 10
    exposed: false
    application-status:
      current: waiting
      message: waiting for machine
      since: 17 Feb 2017 13:13:40+01:00
    units:
      ubuntu/0:
        workload-status:
          current: waiting
          message: waiting for machine
          since: 17 Feb 2017 13:13:40+01:00
        juju-status:
          current: allocating
          since: 17 Feb 2017 13:13:40+01:00
        machine: 1/lxd/0

root@maas-server:~# juju ssh ubuntu@10.0.11.152
ubuntu@juju-lxd-server:~$ sudo -s

root@juju-lxd-server:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br-ens3 state UP group default qlen 1000
    link/ether 52:54:00:71:54:2a brd ff:ff:ff:ff:ff:ff
3: br-ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:71:54:2a brd ff:ff:ff:ff:ff:ff
    inet 10.0.11.152/23 brd 10.0.11.255 scope global br-ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe71:542a/64 scope link
       valid_lft forever preferred_lft forever
5: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 7a:c0:b7:12:23:e4 brd ff:ff:ff:ff:ff:ff
    inet 10.0.11.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::78c0:b7ff:fe12:23e4/64 scope link
       valid_lft forever preferred_lft forever

root@juju-lxd-server:~# ip route
default via 10.0.10.1 dev br-ens3 onlink
10.0.10.0/23 dev br-ens3 proto kernel scope link src 10.0.11.152
10.0.11.0/24 dev lxdbr0 proto kernel scope link src 10.0.11.1

root@juju-lxd-server:~# tail /var/log/juju/machine-1.log
2017-02-17 13:51:55 INFO juju.api apiclient.go:530 dialing "wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api"
2017-02-17 13:51:58 INFO juju.api apiclient.go:539 error dialing "wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api": websocket.Dial wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api: dial tcp 10.0.11.151:17070: getsockopt: no route to host
2017-02-17 13:51:58 ERROR juju.worker.dependency engine.go:539 "api-caller" manifold worker returned unexpected error: cannot open api: unable to connect to API: websocket.Dial wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api: dial tcp 10.0.11.151:17070: getsockopt: no route to host
2017-02-17 13:52:01 INFO juju.api apiclient.go:530 dialing "wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api"
2017-02-17 13:52:04 INFO juju.api apiclient.go:539 error dialing "wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api": websocket.Dial wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api: dial tcp 10.0.11.151:17070: getsockopt: no route to host
2017-02-17 13:52:04 ERROR juju.worker.dependency engine.go:539 "api-caller" manifold worker returned unexpected error: cannot open api: unable to connect to API: websocket.Dial wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api: dial tcp 10.0.11.151:17070: getsockopt: no route to host
2017-02-17 13:52:07 INFO juju.api apiclient.go:530 dialing "wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api"
2017-02-17 13:52:10 INFO juju.api apiclient.go:539 error dialing "wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api": websocket.Dial wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api: dial tcp 10.0.11.151:17070: getsockopt: no route to host
2017-02-17 13:52:10 ERROR juju.worker.dependency engine.go:539 "api-caller" manifold worker returned unexpected error: cannot open api: unable to connect to API: websocket.Dial wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api: dial tcp 10.0.11.151:17070: getsockopt: no route to host
2017-02-17 13:52:13 INFO juju.api apiclient.go:530 dialing "wss://10.0.11.151:17070/model/a94d649e-d081-4a1d-8857-d6c6c5d4e0d6/api"

root@juju-lxd-server:~# ip addr del 10.0.11.1/24 dev lxdbr0
root@juju-lxd-server:~# systemctl restart jujud-machine-1.service

root@juju-lxd-server:~# tail /var/log/juju/machine-1.log
2017-02-17 13:58:41 INFO juju.tools.lxdclient client_image.go:136 copying image for ubuntu-xenial from https://cloud-images.ubuntu.com/releases: 1% (366.17kB/s)
2017-02-17 13:58:44 INFO juju.tools.lxdclient client_image.go:136 copying image for ubuntu-xenial from https://cloud-images.ubuntu.com/releases: 2% (366.29kB/s)
2017-02-17 13:58:48 INFO juju.tools.lxdclient client_image.go:136 copying image for ubuntu-xenial from https://cloud-images.ubuntu.com/releases: 3% (355.46kB/s)
...

root@maas-server:~# juju status --format=yaml
model:
  name: default
  controller: maas
  cloud: maas
  version: 2.0.3
machines:
  "1":
    juju-status:
      current: started
      since: 17 Feb 2017 14:58:26+01:00
      version: 2.0.3
    dns-name: 10.0.11.152
    ip-addresses:
    - 10.0.11.152
    instance-id: x834tt
    machine-status:
      current: running
      message: Deployed
      since: 17 Feb 2017 12:59:58+01:00
    series: xenial
    containers:
      1/lxd/0:
        juju-status:
          current: started
          since: 17 Feb 2017 15:07:43+01:00
          version: 2.0.3
        dns-name: 10.0.11.153
        ip-addresses:
        - 10.0.11.153
        instance-id: juju-d4e0d6-1-lxd-0
        machine-status:
          current: running
          message: Container started
          since: 17 Feb 2017 15:05:47+01:00
        series: xenial
    hardware: arch=amd64 cores=1 mem=1024M tags=virtual availability-zone=default
applications:
  ubuntu:
    charm: cs:ubuntu-10
    series: xenial
    os: ubuntu
    charm-origin: jujucharms
    charm-name: ubuntu
    charm-rev: 10
    exposed: false
    application-status:
      current: active
      message: ready
      since: 17 Feb 2017 15:11:37+01:00
    units:
      ubuntu/0:
        workload-status:
          current: active
          message: ready
          since: 17 Feb 2017 15:11:37+01:00
        juju-status:
          current: idle
          since: 17 Feb 2017 15:12:44+01:00
          version: 2.0.3
        leader: true
        machine: 1/lxd/0
        public-address: 10.0.11.153
    version: "16.04"

root@maas-server:~# dpkg -l | grep juju
ii juju 1:2.0.3-0ubuntu1~16.04.2~juju1 all next generation service orchestration system
ii juju-2.0 1:2.0.3-0ubuntu1~16.04.2~juju1 amd64 Juju is devops distilled - client

Tags: juju maas
Teluka (mateusz-p)
Changed in juju (Ubuntu):
importance: Undecided → High
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Mateusz,

I believe you mean "LXD" containers both in bug title and description, right? :D

It also looks like a duplicate of bug # 1569106.

Changed in juju (Ubuntu):
status: New → Incomplete
Revision history for this message
Teluka (mateusz-p) wrote :

@Anastasia

Title and description updated.

Could you please confirm if this issue will be addressed in 2.1 ?

Thanks

description: updated
summary: - Juju 2.0.3 fails to deploy LXC container lxdbr0 overlapping subnets
+ Juju 2.0.3 fails to deploy LXD container lxdbr0 overlapping subnets
Changed in juju (Ubuntu):
status: Incomplete → New
Revision history for this message
John A Meinel (jameinel) wrote :

see also bug #1657850 that we probably shouldn't just be using the 'next available 10.0.x' address. (This was the original algorithm used by LXD when we added support, but they have changed their algorithm since then.)

affects: juju (Ubuntu) → juju
Changed in juju:
status: New → Confirmed
milestone: none → 2.1.1
milestone: 2.1.1 → 2.2.0
John A Meinel (jameinel)
Changed in juju:
assignee: juju hackers (juju) → nobody
Changed in juju:
status: Confirmed → Triaged
Curtis Hovey (sinzui)
Changed in juju:
status: Triaged → Incomplete
status: Incomplete → Triaged
Revision history for this message
Anastasia (anastasia-macmood) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/7054 for 2.1.1 which is a small tweak from Horacio's change.

John A Meinel (jameinel)
Changed in juju:
status: Triaged → Fix Committed
assignee: nobody → Horacio Durán (hduran-8)
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.2-rc1 → 2.2-beta1
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.