Canonical Juju

container addressability: lxc/lxd units are behind NAT on manual and openstack providers

Bug #1614364 reported by Ryan Beisner on 2016-08-18

This bug affects 14 people

	Status	Importance	Assigned to
Canonical Juju	Triaged	High	John A Meinel
2.1	Won't Fix	High	Unassigned
2.2	Won't Fix	Undecided	Unassigned
Ubuntu on IBM z Systems	Triaged	High	Unassigned
juju-core	Won't Fix	Critical	Unassigned
1.25	Won't Fix	Critical	Unassigned

Bug Description

1.25.6: Charm applications deployed to lxc units on multiple manual machines with the manual provider are guaranteed to fail by default.

This is because the lxc units sit behind a NAT bridge interface on each manual machine. The lxc units are not reachable from the controller, and lxc units on a manual machine cannot communicate with lxc units on another manual machine.

An over-simplification of what I'm seeing:

### One Simple Network
192.168.100.0/24

### Bastion (bootstrapped here) - 16.04
This could be your laptop.
192.168.100.10/24

### Machine 1 - 16.04
192.168.100.11/24

1/lxc/0:
10.0.3.12/24

1/lxc/1:
10.0.3.13/24

### Machine 2 - 16.04
192.168.100.12/24

2/lxc/0:
10.0.3.12/24

2/lxc/1:
10.0.3.15/24

### Machine 3 - 16.04
192.168.100.13/24

3/lxc/0:
10.0.3.13/24

3/lxc/1:
10.0.3.22/24

I think a more sane default behavior for the manual provider would be to configure the bridge as a pure L2 ('transparent') bridge, similar to what the Juju MAAS provider creates.

This would require that the user have pre-existing DHCP and DNS services ready on the network in advance. But I think that is in line with the spirit of the manual provider, and that can be documented accordingly.

If this turns out not to be something that is addressed, the docs should be updated to indicate --to lxc:foo is not supported with the manual provider in a default machine configuration.

See original description

Tags:

Anastasia (anastasia-macmood) on 2016-08-19

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 2.0-beta17

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2016-08-19:

juju-manual-provider-foo.txt Edit (22.5 KiB, text/plain)

For a real example, manual reproducer, juju status and connectivity checks, see the attached file.

It's two machines, 10 containers on each machine.

From the "LAN," all containers are unreachable, metal hosts are reachable.

From a metal host, containers on other metal hosts are unreachable, containers on that metal host are reachable.

From any given container, containers on other metal hosts are unreachable, containers on that metal host are reachable.

This is all as expected given the L3 NAT.

Canonical Juju QA Bot (juju-qa-bot) on 2016-08-23

affects:	juju-core → juju
Changed in juju:
milestone:	2.0-beta17 → none
milestone:	none → 2.0-beta17

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2016-08-23:

Tagging with s390x for tracking purposes, although this also impacts amd64.

tags:

added: amd64 s390x

Canonical Juju QA Bot (juju-qa-bot) on 2016-08-23

Changed in juju-core:
importance:	Undecided → Critical
status:	New → Triaged

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2016-08-23:

Workaround:

Pre-configure the bridge on each host as a true L2 (transparent) bridge before adding the machines into the model/environment. Units deployed to lxc/lxd will then be on the same L2 network as their host, and will obtain IP addresses via DHCP. This assumes that there is a DHCP server on the host's connected broadcast domain which is present, ready and willing to serve. :)

Alexis Bruemmer (alexis-bruemmer) on 2016-08-29

Changed in juju:
status:	Triaged → Invalid

Anastasia (anastasia-macmood) on 2016-09-01

Changed in juju:
milestone:	2.0-beta17 → none

Anastasia (anastasia-macmood) on 2016-09-01

Changed in juju-core:
status:	Triaged → Won't Fix

Revision history for this message

Andrew Cloke (andrew-cloke) wrote on 2016-09-08:

As Juju manual provider is currently the only option for IBM z, this issue gives openstack deployments on z a less than ideal user experience. My understanding is that the workaround is not something that would be advised outside of a development environment.

Could you expand on the rationale for the decision not to fix?

Revision history for this message

Alexis Bruemmer (alexis-bruemmer) wrote on 2016-09-08: Re: manual provider lxc units are behind NAT, fail by default

Just to be clear, this bug has been marked invalid for 2.0 as it lxc + 1.25; it is still marked and considered critical for 1.25.7

Ryan Beisner (1chb1n) on 2016-09-09

description:

updated

Revision history for this message

Lou Peers (louie-pe) wrote on 2016-09-22:

Is there a different bug we should be tracking? Also can we have an update to indicate when this might be looked at?
Thank you!

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2016-09-22:

We are planning to address this bug for the next release of 1.25 - 1.25.7.

Revision history for this message

Richard Harding (rharding) wrote on 2016-11-06: Re: manual provider lxc/lxd units are behind NAT, fail by default

#10

Renamed to be about lxd as well. The idea is that containers on providers that don't support spaces don't have the ability to get a DHCP address on the host network.

This requires changes to the way 1.25 works which is in critical only support and so marked wontfix for 1.25.

However, for 2.0 we need to support networking to allow the user to let Juju know there's something on the network that can provide host-level addresses for containers. This would then work across any provider.

summary:	- manual provider lxc units are behind NAT, fail by default + manual provider lxc/lxd units are behind NAT, fail by default
Changed in juju:
status:	Invalid → Triaged
milestone:	none → 2.2.0

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2016-11-08:

#11

Ok. So imho, the 2.0-equivalent of this should be just as critical (a s390x Multi-LPAR OpenStack deploy blocker). Unless a human goes and manually configures bridges on all machines before the Juju manual provider is in the picture, the same containers-behind-NAT situation exists.

tags:

added: repeatability

Anastasia (anastasia-macmood) on 2016-11-08

Changed in juju:
importance:	High → Critical
milestone:	2.2.0 → 2.0.3
assignee:	nobody → Tim Penhey (thumper)

Anastasia (anastasia-macmood) on 2016-11-10

Changed in juju:
assignee:	Tim Penhey (thumper) → nobody
importance:	Critical → High
milestone:	2.0.3 → 2.2.0

Revision history for this message

John A Meinel (jameinel) wrote on 2016-11-10:

#12

Here are my thoughts
This can't be a critical as we clearly haven't stopped the line. And there is a reasonably simple workaround (juju run lxd init) and reconfigure the bridge. It's fully script able and if you are doing the manual provider I would be surprised if you have that many machines without scripting around adding them anyway.

That said, bridging to the host nic seems the more sane default. If you don't have dhcp available you'll get unusable containers but they are unusable anyway if they are hidden.

But just doing that really doesn't solve manual issues, because we haven't really gotten them into the model. What happens if you add a machine but when you get there you find there are 2 NICs

Even if you have added the 2 NIC machine we don't have a way to describe where you want workloads to listen.

We also don't have anything like a description of storage on the machine. Where to get more IP addresses, etc. All the things that really bring a machine into the Juju Model.

Now if there is a stakeholder escalation then we can have this critical and focus on it, but standard bug triage wouldn't put it there.

Revision history for this message

Andrew Cloke (andrew-cloke) wrote on 2016-11-10:

#13

I would like to make a stakeholder escalation. This bug impacts running openstack on IBM z using a single or multiple LPARs and LXD containers. The automation required around openstack installation makes that the workload around impracticable. It is not something that we would want to recommend.

Revision history for this message

cargonza (cargonza) wrote on 2016-11-15:

#14

Hi, any decision on the path for this bug? A stakeholder escalation has been requested and we would like to get an update on the next steps. Thank you!

Revision history for this message

Richard Harding (rharding) wrote on 2016-11-15:

#15

At the moment we're in the state that John notes. We have a work around and we've got work in flight to improve container networking that the team is focused on. Unfortunately, this isn't a single "patch a bug" type of fix. Our goal is to get improved container networking, primarily at correcting issues on MAAS, for the 2.1 release before the holiday break.

We will take this bug and attempt to provide a tasteful path forward as part of that work.

Changed in juju:
milestone:	2.2.0 → 2.1.0
assignee:	nobody → Richard Harding (rharding)

Frank Heimes (fheimes) on 2016-11-29

tags:

added: openstack-ibm

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-01-11:

#16

Confirmed s390x multi-lpar blocker @ juju 2.1~beta4-0ubuntu1~16.04.1~juju1.

FWIW, we were able to work around this with Juju 1 by carefully crafting the bridge in advance to not do NAT.

For Juju2, we've not found a successful workaround yet. See bugs raised in attempts to work around this issue in Juju2:

https://bugs.launchpad.net/juju/+bug/1655224
https://bugs.launchpad.net/juju/+bug/1655229
https://bugs.launchpad.net/juju/+bug/1575676

tags:

added: multi-lpar

Ryan Beisner (1chb1n) on 2017-01-11

Changed in ubuntu-z-systems:
status:	New → Confirmed
summary:	- manual provider lxc/lxd units are behind NAT, fail by default + juju1 and juju2 - manual provider lxc/lxd units are behind NAT, fail by + default

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-01-11: Re: juju1 and juju2 - manual provider lxc/lxd units are behind NAT, fail by default

#17

To be clear, this affects s390x, but is not specific to s390x. The behavior is reproducible on any arch that I have tried.

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-01-12:

#18

Changed to 'Critical' as attempts to workaround this issue cause even more issues as per Ryan's comment # 16.

Changed in juju:
importance:	High → Critical

Frank Heimes (fheimes) on 2017-01-12

Changed in ubuntu-z-systems:
importance:	Undecided → Critical
status:	Confirmed → Triaged

Anastasia (anastasia-macmood) on 2017-01-30

Changed in juju:
assignee:	Richard Harding (rharding) → John A Meinel (jameinel)

Anastasia (anastasia-macmood) on 2017-02-01

Changed in juju:
milestone:	2.1.0 → 2.1-rc1

Anastasia (anastasia-macmood) on 2017-02-08

Changed in juju:
milestone:	2.1-rc1 → 2.2.0-alpha1

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-02-09:

#19

As this is not likely to be addressed in 2.1, we are removing it from current milestone.

Curtis Hovey (sinzui) on 2017-02-16

summary:	- juju1 and juju2 - manual provider lxc/lxd units are behind NAT, fail by - default + container addresability: manual provider lxc/lxd units are behind NAT, + fail by default on juju1 and juju2
summary:	- container addresability: manual provider lxc/lxd units are behind NAT, + container addressability: manual provider lxc/lxd units are behind NAT, fail by default on juju1 and juju2

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-02-16: Re: container addressability: manual provider lxc/lxd units are behind NAT, fail by default on juju1 and juju2

#20

Updating description as the corresponding Juju openstack-provider bug [1] was marked as a duplicate.
https://bugs.launchpad.net/juju/+bug/1615917

tags:	added: openstack-provider
summary:	- container addressability: manual provider lxc/lxd units are behind NAT, - fail by default on juju1 and juju2 + container addressability: lxc/lxd units are behind NAT on manual and + openstack providers

Revision history for this message

John A Meinel (jameinel) wrote on 2017-02-28:

#21

At this point, I won't be able to address this for 2.1.1, I should, however, be able to get it for 2.2. Doing this correctly is going to need a fair bit of testing, and it isn't reasonable to put it directly into a stable 2.1 series without that testing. As 2.2 is likely to be hot on the heels of 2.1.1 anyway, it doesn't really make sense to target a 2.1.X series, but *if* we've proven the work in 2.2, I'll likely make sure to do it in a way that we *could* backport it to 2.1 if there is reason to do so.

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-02-28:

#22

Marking as Fix Committed for 2.1.1 as we have put in a solution deemed sufficient by stakeholders. The fix requires to set the bridge up in advance.

A follow-on fix is being worked on as part of 2.2, comment # 21, that will not require the bridge in advance.

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-02-28:

#23

As a stakeholder in this issue, I'm not in agreement with #22. 'Wont-fix' is a more accurate triage level for 2.1 based on the stated timeline.

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-03-01:

#24

@Ryan Beisner (1chb1n),
Stakeholder endorsement I referred to was from an internal conversation with a different group, not Openstack/OSCI.
Are you saying that the fix that went in into 2.1-beta5, that required bridge to be set up prior and outside of Juju, does not address this issue for you?

We are saying that we have tackled it partially: the behavior and the workaround are improved. This got some of the people affected un-stuck enough - they can proceed. We are working on a cleaner solution for later releases.

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-03-01:

#25

The essence of this bug is that containers are behind NAT by default on the manual provider and the openstack provider. As far as I'm aware, there have been no commits to change that behavior.

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-03-01:

#26

Marking as Won't Fix for 2.1 as per recommendation and feedback around workaround.

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-03-10:

#27

Due to time and resource constraints, moving this issue into next release.

Changed in juju:
milestone:	2.2-alpha1 → 2.3.0

Curtis Hovey (sinzui) on 2017-04-08

Changed in juju:
importance:	Critical → High

Revision history for this message

Tim Penhey (thumper) wrote on 2017-11-07:

#28

John, is this now solved with the FAN config?

Changed in juju:
milestone:	2.3.0 → 2.3-beta3

John A Meinel (jameinel) on 2017-11-09

Changed in juju:
status:	Triaged → Fix Committed
status:	Fix Committed → Triaged

Canonical Juju QA Bot (juju-qa-bot) on 2017-11-10

Changed in juju:
milestone:	2.3-beta3 → none

Revision history for this message

Frank Heimes (fheimes) wrote on 2018-02-27:

#29

bump

Is there a revised plan to provide this improved container addressability?

Revision history for this message

John A Meinel (jameinel) wrote on 2018-03-19: Re: [Bug 1614364] Re: container addressability: lxc/lxd units are behind NAT on manual and openstack providers

#30

In 2.3 you should be able to set 'container-networking-method=provider' and
then we will try to allocate IP addresses for containers via DHCP. It has
not been tested very rigorously by us, so there may be issues that we are
missing. (manual machines and spaces are generally not tracked very well,
so determining what network device needs to be bridged into the containers
could easily be a missing linkage.)
Theoretically with manual bootstrap and provisioning you would be able to
"juju add-space NAME CIDR" to declare a space for one of the network
devices.

On Tue, Feb 27, 2018 at 6:55 PM, Frank Heimes <email address hidden>
wrote:

> bump
>
> Is there a revised plan to provide this improved container
> addressability?
>
> --
> You received this bug notification because you are a bug assignee.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1614364
>
> Title:
> container addressability: lxc/lxd units are behind NAT on manual and
> openstack providers
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1614364/+subscriptions
>

Revision history for this message

Frank Heimes (fheimes) wrote on 2019-05-29:

#31

Lowering importance - based on discussion in mail thread ... (and #30).

Changed in ubuntu-z-systems:
importance:	Critical → High

Revision history for this message

David (liewebagency-deactivatedaccount) wrote on 2020-09-09:

#32

Hello,

https://linuxconfig.org/how-to-setup-a-static-ip-address-on-debian-linux

You know the issue we or many has with the netool ? Where netool makes a .yaml file and calls the network mine for example enp5s0 instead of eth0, eth0:1 and also DHCP so we can see what IP adresses we have.

The url above Im gonna try now, it has to do with cofigurate the Grub.cfg and when i open it, then it said that it was a debian kernel or grub.cfg i dont know if This is normal But Im starting to Wonder if my dedicated server provider uses the same rescue mode to install all net installers and then it make sense.

Cause ive been customer at Leaseweb for 3 years and altushost for a year and they had whole images we could choose from. Not a common rescue we had to use to install the images.

Is it just me or could this be the reason? Its only 1 week i was at altushost and there it was normal eth0 and i could see all my ips because of the dhcp.

//D

Revision history for this message

Canonical Juju QA Bot (juju-qa-bot) wrote on 2022-11-03:

#33

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance:	High → Low
tags:	added: expirebugs-bot

Revision history for this message

Felipe Reyes (freyes) wrote on 2023-09-20:

#35

this bug has been on our radar for a long time, this feature gap prevents from testing configurations that are more similar to what final users deploy (hyperconverged clouds), so I'm setting this bug back to high.