Canonical Juju

add-machine ssh:user@$IP fails if $IP is associated with lxdbr0

Bug #1655224 reported by Andrew McLeod on 2017-01-10

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Expired	High	Unassigned
	Ubuntu on IBM z Systems	Triaged	High	Unassigned

Bug Description

Juju 2.0.2
Ubuntu 16.04
arch s390x
bootstrapped with manual provider (controller is s390x)

When add-machine is executed against a host via IP, e.g.

juju add-machine ssh:ubuntu@10.13.3.10

If the ip specified is associated with a bridge interface named lxdbr0, the machine state will be stuck in pending state

1 pending 10.13.3.10 manual:10.13.3.10 xenial

2017-01-10 04:42:47 DEBUG juju.worker.apicaller connect.go:99 connecting with current password
2017-01-10 04:42:47 DEBUG juju.worker.apicaller connect.go:139 failed to connect
2017-01-10 04:42:47 DEBUG juju.worker.dependency engine.go:492 "api-caller" manifold worker stopped: cannot open api: validating info for opening an API connection: missing addresses not valid
2017-01-10 04:42:47 ERROR juju.worker.dependency engine.go:539 "api-caller" manifold worker returned unexpected error: cannot open api: validating info for opening an API connection: missing addresses not valid
2017-01-10 04:42:50 DEBUG juju.worker.apicaller connect.go:99 connecting with current password

full log: http://paste.ubuntu.com/23774484/

If the interface is renamed (to, e.g. "notlxdbr0") and the old bridge is removed (brctl delbr lxdbr0) and the same add-machine command is executed, the agent will start:

2 started 10.13.3.10 manual:10.13.3.10 xenial

http://paste.ubuntu.com/23774518/

Tags:

Andrew McLeod (admcleod) on 2017-01-10

affects:

juju (Ubuntu) → juju

Ryan Beisner (1chb1n) on 2017-01-10

tags:

added: s390x uosci

Ryan Beisner (1chb1n) on 2017-01-10

Changed in ubuntu-z-systems:
status:	New → Confirmed

Ryan Beisner (1chb1n) on 2017-01-11

tags:

added: multi-lpar

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-01-11:

The following related bug is really the root cause here. If it didn't exist, we wouldn't have to try to munge the bridges ahead of Juju:

"manual provider lxc/lxd units are behind NAT, fail by default"
https://bugs.launchpad.net/juju/+bug/1614364

Anastasia (anastasia-macmood) on 2017-01-12

tags:	added: network
Changed in juju:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 2.1.0

Frank Heimes (fheimes) on 2017-01-12

Changed in ubuntu-z-systems:
status:	Confirmed → Triaged
importance:	Undecided → High

Revision history for this message

John A Meinel (jameinel) wrote on 2017-02-14:

To address why I think this happens.

In most deployments, there is a local-only 'lxdbr0' bridge, that often gets an unroutable address (10.0.0.1, for example). There is a similar issue for 'virbr0' which always gets 192.168.122.1 (or something very similar to that).

When Juju introspects the machine to find out what addresses it should advertise to the outside world, we intentionally filter out those addresses, because for most people they are concretely wrong, and actually actively harmful. (Specifically for 'virbr0' because it always gets the same 192.168.122.1 address, that means that if you install 'libvirt' on your laptop and it is installed on the remote machine, *both* machines have a valid 192.168.122.1 address, but they are most definitely not the same.)

In the case of 'lxdbr0' we run into a similar issue. The default rules for IP address ranges that Juju assigns exacerbate this (we default to 10.0.0.X for lxdbr0, which is a different bug.)

I have the feeling that in your case, you are changing lxdbr0 to put 'eth0' onto the lxdbr0 bridge, which means the *bridge* device no longer has a '.1' address, but lxdbr0 now actually has the remotely assigned address that eth0 used to have. Which means that we're now filtering one of the addresses that the host machine *was* using for things in the outside world to talk to it. And especially if you only have one network interface (ie, no eth1/ens4/etc), then there is no other address that we could associate and have the containers find us. (And even if we have an eth1, it may be intentional that the address that containers that are attached to eth0 can't route to the address on eth1).

For now, the answer is to not use lxdbr0 for the name of a bridge that you want connected to the host network. Our standard convention is to use "br-DEVICENAME", so "br-eth0" or "br-ens3".

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-02-14:

Removing 2.1 milestone as we will not be addressing this issue in 2.1.

Changed in juju:
milestone:	2.1.0 → none

Frank Heimes (fheimes) on 2017-07-03

tags:

added: openstack-ibm

Revision history for this message

Canonical Juju QA Bot (juju-qa-bot) wrote on 2022-11-03:

This bug has not been updated in 5 years, so we're marking it Expired. If you believe this is incorrect, please update the status.

Changed in juju:
status:	Triaged → Expired
tags:	added: expirebugs-bot

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.