add-machine ssh:user@$IP fails if $IP is associated with lxdbr0

Bug #1655224 reported by Andrew McLeod
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Triaged
High
Unassigned
juju
Expired
High
Unassigned

Bug Description

Juju 2.0.2
Ubuntu 16.04
arch s390x
bootstrapped with manual provider (controller is s390x)

When add-machine is executed against a host via IP, e.g.

juju add-machine ssh:ubuntu@10.13.3.10

If the ip specified is associated with a bridge interface named lxdbr0, the machine state will be stuck in pending state

1 pending 10.13.3.10 manual:10.13.3.10 xenial

2017-01-10 04:42:47 DEBUG juju.worker.apicaller connect.go:99 connecting with current password
2017-01-10 04:42:47 DEBUG juju.worker.apicaller connect.go:139 failed to connect
2017-01-10 04:42:47 DEBUG juju.worker.dependency engine.go:492 "api-caller" manifold worker stopped: cannot open api: validating info for opening an API connection: missing addresses not valid
2017-01-10 04:42:47 ERROR juju.worker.dependency engine.go:539 "api-caller" manifold worker returned unexpected error: cannot open api: validating info for opening an API connection: missing addresses not valid
2017-01-10 04:42:50 DEBUG juju.worker.apicaller connect.go:99 connecting with current password

full log: http://paste.ubuntu.com/23774484/

If the interface is renamed (to, e.g. "notlxdbr0") and the old bridge is removed (brctl delbr lxdbr0) and the same add-machine command is executed, the agent will start:

2 started 10.13.3.10 manual:10.13.3.10 xenial

http://paste.ubuntu.com/23774518/

Andrew McLeod (admcleod)
affects: juju (Ubuntu) → juju
Ryan Beisner (1chb1n)
tags: added: s390x uosci
Ryan Beisner (1chb1n)
Changed in ubuntu-z-systems:
status: New → Confirmed
Ryan Beisner (1chb1n)
tags: added: multi-lpar
Revision history for this message
Ryan Beisner (1chb1n) wrote :

The following related bug is really the root cause here. If it didn't exist, we wouldn't have to try to munge the bridges ahead of Juju:

"manual provider lxc/lxd units are behind NAT, fail by default"
https://bugs.launchpad.net/juju/+bug/1614364

tags: added: network
Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.1.0
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Confirmed → Triaged
importance: Undecided → High
Revision history for this message
John A Meinel (jameinel) wrote :

To address why I think this happens.

In most deployments, there is a local-only 'lxdbr0' bridge, that often gets an unroutable address (10.0.0.1, for example). There is a similar issue for 'virbr0' which always gets 192.168.122.1 (or something very similar to that).

When Juju introspects the machine to find out what addresses it should advertise to the outside world, we intentionally filter out those addresses, because for most people they are concretely wrong, and actually actively harmful. (Specifically for 'virbr0' because it always gets the same 192.168.122.1 address, that means that if you install 'libvirt' on your laptop and it is installed on the remote machine, *both* machines have a valid 192.168.122.1 address, but they are most definitely not the same.)

In the case of 'lxdbr0' we run into a similar issue. The default rules for IP address ranges that Juju assigns exacerbate this (we default to 10.0.0.X for lxdbr0, which is a different bug.)

I have the feeling that in your case, you are changing lxdbr0 to put 'eth0' onto the lxdbr0 bridge, which means the *bridge* device no longer has a '.1' address, but lxdbr0 now actually has the remotely assigned address that eth0 used to have. Which means that we're now filtering one of the addresses that the host machine *was* using for things in the outside world to talk to it. And especially if you only have one network interface (ie, no eth1/ens4/etc), then there is no other address that we could associate and have the containers find us. (And even if we have an eth1, it may be intentional that the address that containers that are attached to eth0 can't route to the address on eth1).

For now, the answer is to not use lxdbr0 for the name of a bridge that you want connected to the host network. Our standard convention is to use "br-DEVICENAME", so "br-eth0" or "br-ens3".

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Removing 2.1 milestone as we will not be addressing this issue in 2.1.

Changed in juju:
milestone: 2.1.0 → none
Frank Heimes (fheimes)
tags: added: openstack-ibm
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 5 years, so we're marking it Expired. If you believe this is incorrect, please update the status.

Changed in juju:
status: Triaged → Expired
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers