Bug #1627037 “rc1 bridges all nics, breaks neutron-gateway” : Bugs : Canonical Juju

Chris Gregan (cgregan) on 2016-09-23

tags:

added: cdo-qa-blocker

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-09-23:

#1

Is it possible to change the charm to consider what "in use" is. If we don't bridge the device then it won't ever be available in a LXD container should you want to run neutron gateway there.

George Kraft (cynerva) on 2016-09-23

tags:

added: v-pil

🤖 Landscape Builder (landscape-builder) on 2016-09-23

tags:

removed: kanban-cross-team

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2016-09-23:

#2

Andrew, that's a question for the openstack guys.

From my side, I would like to understand better this LXD issue you speak of. When using the maas provider, all containers (lxd or lxc) will, with beta18 and older at least, get an IP from MAAS. The PXE interface will be bridged and the containers will hook up there.

The problem is that you have a use case where containers will want to hook up to another nic? How is this other nic configured in maas for that node?

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2016-09-23:

#3

I think that if a NIC is set as 'unconfigured' in MAAS, then Juju should not touch it in any way.

tags:

added: uosci

Torsten Baumann (torbaumann) on 2016-09-23

Changed in juju:
milestone:	none → 2.0-rc2

Revision history for this message

Richard Harding (rharding) wrote on 2016-09-23:

#4

Ryan, the issue we're getting is that we've gotten feedback in both directions from Ante and other folks that things don't work unless Juju does touch the nics. We're trying to find a middle ground to help both parties here.

Revision history for this message

Ante Karamatić (ivoks) wrote on 2016-09-23:

#5

There are two use cases. One is where one needs containers on a NIC and the other one where one wants to leave the NIC unconfigured.

In betas (I haven't tested RC yet) experience was such that bridge was created only when a subnet was configured on a NIC in MAAS. This allows connecting LXD container (juju creates a bridge), without exposing the host on the same layer 3.

If a NIC is not configured at all in MAAS, i.e. subnet is not configured for it, I would argue that it shouldn't be in /etc/network/interfaces in the first place. Therefore, juju won't see it and won't do anything with it. That still doesn't prevent one to rename the interface and use for whatnot (in this case neutron-gateway).

So, Andreas, check if your NIC has subnet configured. If it does, remove subnet, because you really don't need it (you only need layer 2 connection; i.e. fabric). If even after that you see the same behaviour, then bug is in MAAS - it should not define interface in ENI.

If you do not define subnet, MAAS doesn't put interface in ENI and juju still creates a bridge, then that's a bug in juju. Considering how juju works, my understanding is that this is impossible.

Configured fabric, configured subnet -> MAAS creates 'manual' entry in ENI, juju creates the bridge
Configured fabric, unconfigured subnbet -> MAAS doesn't create entry in ENI, Juju doesn't create the bridge

This way everything can be decided in MAAS. Another approach is the one that neutron-contrail charm does - provides an option of removing the bridge. While this works, it creates two places for networking configuration. That would be a bad design.

Revision history for this message

George Kraft (cynerva) wrote on 2016-09-23:

#6

FWIW looks like we can work around this in VPIL infrastructure by configuring neutron-gateway to bridge to e.g. `br-bond0` instead of `bond0`. Don't know how reasonable that is though - I don't fully understand the implications of that change.

Revision history for this message

Ante Karamatić (ivoks) wrote on 2016-09-23:

#7

@George while it's possible, you should not do it. OVS is a bridge by it self, and bridging the bridge might produce unwanted results (bridge terminates some L2 traffic; LLDP, CDP, LACP, STP...). This might also be a bug, charm should not take bridge as an argument for a NIC - or, it should remove the bridge.

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2016-09-23:

#8

We need the NIC connected to the subnet, because we use that for placement and checklist purposes essentially. In the autopilot we show which public networks are available, and which machines are connected to each, so we won't let for example an user select a machine that has no public network connectivity to be a network gateway.

Revision history for this message

Ante Karamatić (ivoks) wrote on 2016-09-24:

#9

@Andreas, I see. It makes sense for layer 3 connections. "If a node is connected to subnet from space 'OAM', then it can serve management services", etc.

But in case of neutron-gateway's external port, the subnet it is connected to has no meaning. External port is (and should be) stripped of an IP as it serves as a layer 2 bridge. The only attribute that classifies a port on that machine as capable of running neutron-gateway is fabric. NG allows you to even split that interface into multiple other layer2 interfaces (different external networks on different VLANs).

I guess all I'm saying is that in case of neutron gateway 'public network' is a fabric (or, if we go into complicated setups - a vlan), not a subnet.

Anastasia (anastasia-macmood) on 2016-09-26

Changed in juju:
status:	New → Triaged
importance:	Undecided → High
importance:	High → Critical

Anastasia (anastasia-macmood) on 2016-09-26

Changed in juju:
importance:	Critical → Undecided
importance:	Undecided → High
assignee:	nobody → Richard Harding (rharding)

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2016-09-26:

#10

FWIW, iMAAS *can* have an interface that is attached to a subnet but it is UNCONFIGURED. In this state, it is usable by neutron. So the user shouldn't have to detach the interface from a subnet in order for this scenario to work.

Richard Harding (rharding) on 2016-09-26

tags:

added: eda

Andrew McDermott (frobware) on 2016-09-27

Changed in juju:
assignee:	Richard Harding (rharding) → Andrew McDermott (frobware)
status:	Triaged → In Progress

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-09-27:

#11

WIP branches:

https://github.com/frobware/juju/tree/master-lp1627037
https://github.com/dimitern/juju/tree/maas-bridge-some

We have run into an issue with aliases and looking to resolve those now. Once done, we'll put together a branch/build that combines both of these trees.

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-09-28:

#12

The combined PR - https://github.com/juju/juju/pull/6342

Andrew McDermott (frobware) on 2016-09-29

Changed in juju:
status:	In Progress → Fix Committed

Curtis Hovey (sinzui) on 2016-09-29

Changed in juju:
status:	Fix Committed → Fix Released

Revision history for this message

Ante Karamatić (ivoks) wrote on 2016-10-01:

#13

In my tests RC2's behavior now breaks LXD containers because it creates bridges only those interfaces that have an IP address configured in MAAS. Interfaces connected to a fabric and a subnet, but without configure IP, are not converted to a bridge.

I thought idea was to bridge everything that had fabric configured (and everything with fabric unconfigured to be lest unbridged). And then, other part of the work was for charms team to implement 'unbridging' of neutron-gateway interface.

Revision history for this message

David Britton (dpb) wrote on 2016-10-03:

#14

Confirmed this behavior *works* for neutron-gateway and the autopilot, at least in the case where the second interfaces is 'Unconfigured' (as agreed).

But, I went further and also confirmed @Ante's concern that if the second NIC is set to 'auto-assign', juju does *not* bridge that interface.

http://paste.ubuntu.com/23271254/

If you look closely at the /e/n/i there, you will see that eth1 is set to 'manual' with no ip address set at all. In fact, I went another step and deployed just with MaaS. This is the result:

http://paste.ubuntu.com/23271312/

You'll note that both interfaces have IPs assigned from maas -- 'Auto assign' in the UI -- as expected.

Revision history for this message

Richard Harding (rharding) wrote on 2016-10-04:

#15

After additional discussions we came to the conclusion that Juju should respect the same ideas that MAAS does. In MAAS, a nic is not "configured" until it has an IP address. Setting a fabric, or even a subnet, does not return that as "configured" in MAAS. It's still unconfigured.

Juju will not bridge all "configured" interfaces. If we want to question the definition of configured then we need to address it in our complete stack so that we're consistent and have a reasonable explanation of what users can expect as they put together our tools into a final solution.

The behavior seen in RC2 is expected, and we admit is not ideal for all cases, but we decided that it was better to be consistent and predictable and to work to get layer 2 support into Juju so that it can properly handle the additional cases that folks are looking to put into play.

Revision history for this message

Ante Karamatić (ivoks) wrote on 2016-10-04: Re: [Bug 1627037] Re: rc1 bridges all nics, breaks neutron-gateway

#16

Download full text (3.4 KiB)

I agree there needs to be a vertical agreement on what is what. I do think
we have opposite views on what 'configured NIC' means.

For instance, I'd argue that NIC is configured even without an IP. I would
argue that NIC is just a device, that can have, but doesn't have to,
attached layer2 properties (MTU settings, VLAN ID, etc). Then it can also
have layer 3 properties - an IP address. I would also argue that MAAS
doesn't say that a NIC without an IP is an unconfigured NIC. Not only is
such NIC available on the system, but it also has layer properties - VLAN
ID, MTU. MAAS does some work to make this happen, it configures it. IMHO,
unconfigured NIC would be a NIC without an IP, but also without MTU
settings and VLAN ID and also a NIC that is 'DOWN' (no link-layer). Such
NIC would be unusable by any charm unless the charm goes to an extent of
managing the NIC.

Fabrics and spaces go well in line with this - NIC connected to fabric has
layer2 properties, NIC connected to a space has layer3 properties. Problem
with both Landscape and LXDs with multiple interfaces is that they both
require some layer3 properties to figure out layer2 device. In case of LXD,
this is not a hard requirement, rather a nice way for juju to figure out
which IP to assign to the container. Neutron gateway charm is a bit stupid
about it, it just uses whatever you give it. However, in an ideal world,
one should be able to say 'put neutron-gateway to a machine that has a NIC
attached to that fabric'. Question that we have then; is any NIC attached
to that fabric a candidate, or only those NICs that have only layer2
properties from that fabric, ie. no subnet declaration?

I think juju should be aware that a NIC is exposed only with layer2
properties, but it should also know what layer3 properties are possible on
it. This is why in RC1, attaching a subnet to a NIC gave juju opportunity
to figure out which layer3 properties are available on the NIC. It used
that to create a bridge, connect a container to it, and assign an IP to
that container from that subnet. With RC2 we bring in hard requirement of
having an IP on the host to create a bridge for containers.

On Tue, Oct 4, 2016 at 6:26 AM Richard Harding <email address hidden>
wrote:

> After additional discussions we came to the conclusion that Juju should
> respect the same ideas that MAAS does. In MAAS, a nic is not
> "configured" until it has an IP address. Setting a fabric, or even a
> subnet, does not return that as "configured" in MAAS. It's still
> unconfigured.
>
> Juju will not bridge all "configured" interfaces. If we want to question
> the definition of configured then we need to address it in our complete
> stack so that we're consistent and have a reasonable explanation of what
> users can expect as they put together our tools into a final solution.
>
> The behavior seen in RC2 is expected, and we admit is not ideal for all
> cases, but we decided that it was better to be consistent and
> predictable and to work to get layer 2 support into Juju so that it can
> properly handle the additional cases that folks are looking to put into
> play.
>
> --
> You received this bug notification because you are subscri...

I agree there needs to be a vertical agreement on what is what. I do think
we have opposite views on what 'configured NIC' means.

For instance, I'd argue that NIC is configured even without an IP. I would
argue that NIC is just a device, that can have, but doesn't have to,
attached layer2 properties (MTU settings, VLAN ID, etc). Then it can also
have layer 3 properties - an IP address. I would also argue that MAAS
doesn't say that a NIC without an IP is an unconfigured NIC. Not only is
such NIC available on the system, but it also has layer properties - VLAN
ID, MTU. MAAS does some work to make this happen, it configures it. IMHO,
unconfigured NIC would be a NIC without an IP, but also without MTU
settings and VLAN ID and also a NIC that is 'DOWN' (no link-layer). Such
NIC would be unusable by any charm unless the charm goes to an extent of
managing the NIC.

Fabrics and spaces go well in line with this - NIC connected to fabric has
layer2 properties, NIC connected to a space has layer3 properties. Problem
with both Landscape and LXDs with multiple interfaces is that they both
require some layer3 properties to figure out layer2 device. In case of LXD,
this is not a hard requirement, rather a nice way for juju to figure out
which IP to assign to the container. Neutron gateway charm is a bit stupid
about it, it just uses whatever you give it. However, in an ideal world,
one should be able to say 'put neutron-gateway to a machine that has a NIC
attached to that fabric'. Question that we have then; is any NIC attached
to that fabric a candidate, or only those NICs that have only layer2
properties from that fabric, ie. no subnet declaration?

I think juju should be aware that a NIC is exposed only with layer2
properties, but it should also know what layer3 properties are possible on
it. This is why in RC1, attaching a subnet to a NIC gave juju opportunity
to figure out which layer3 properties are available on the NIC. It used
that to create a bridge, connect a container to it, and assign an IP to
that container from that subnet. With RC2 we bring in hard requirement of
having an IP on the host to create a bridge for containers.

On Tue, Oct 4, 2016 at 6:26 AM Richard Harding <rick.harding@canonical.com>
wrote:

> After additional discussions we came to the conclusion that Juju should
> respect the same ideas that MAAS does. In MAAS, a nic is not
> "configured" until it has an IP address. Setting a fabric, or even a
> subnet, does not return that as "configured" in MAAS. It's still
> unconfigured.
>
> Juju will not bridge all "configured" interfaces. If we want to question
> the definition of configured then we need to address it in our complete
> stack so that we're consistent and have a reasonable explanation of what
> users can expect as they put together our tools into a final solution.
>
> The behavior seen in RC2 is expected, and we admit is not ideal for all
> cases, but we decided that it was better to be consistent and
> predictable and to work to get layer 2 support into Juju so that it can
> properly handle the additional cases that folks are looking to put into
> play.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1627037
>
> Title:
>   rc1 bridges all nics, breaks neutron-gateway
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1627037/+subscriptions
>
-- 
Ante Karamatić
ante.karamatic@canonical.com
Canonical

Revision history for this message

Björn Tillenius (bjornt) wrote on 2016-10-04:

#17

I think that Ante has some really good points, but I think it's too late to do anything about it. This is a complicated issue, and shouldn't be addressed in an RC.

Ante I agree that using fabrics is a nice abstraction. But you need to get MAAS fixed first. Currently MAAS puts everything on the same fabric by default, even NICs that aren't even connected to anything. So if you simply ask for a NIC on a fabric, things most likely will break.

We (Landscape) would gladly use fabrics to figure out which NICs we could use, but it's not possible with the way MAAS works today (unless we would require the user to manually reconfigure every node, which isn't feasible).

I don't think that Juju can solve this on their own. MAAS, OpenStack and Landscape need to be involved in the discussion as well.

Ante Karamatić (ivoks) on 2016-10-04

tags:

added: 4010

Canonical Juju

rc1 bridges all nics, breaks neutron-gateway

Bug Description

Other bug subscribers

Remote bug watches