rc1 bridges all nics, breaks neutron-gateway

Bug #1627037 reported by Andreas Hasenack
44
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Andrew McDermott

Bug Description

It looks like when rc1 is used with maas 2 (didn't try maas1), all nics are bridged:

From /etc/network/interfaces on a maas node deployed with rc1:
"""
auto eno1
iface eno1 inet manual
    mtu 1500

auto br-eno1
iface br-eno1 inet static
    gateway 10.2.0.1
    address 10.2.0.3/16
    bridge_ports eno1

auto enx0
iface enx0 inet manual
    mtu 1500

auto br-enx0
iface br-enx0 inet manual
    bridge_ports enx0
"""

Note br-enx0 has no IP (nor does enx0), which is correct because that's how this node is configured in maas: the nic (enx0) is connected, but unconfigured. We need it that way, because neutron-gateway will use it for its own purposes.

The problem is that this second nic is used as the nic for the public network in an openstack deployment using neutron. The neutron-gateway charm will skip NICs that are deemed "in use", and that includes NICs that are part of a bridge. Which means we won't be able to connect to the openstack instances via the public network when they come up.

Juju beta18 leaves the second nic (enx0) alone, meaning it stays in whatever way maas configured it.

Chris Gregan (cgregan)
tags: added: cdo-qa-blocker
Revision history for this message
Andrew McDermott (frobware) wrote :

Is it possible to change the charm to consider what "in use" is. If we don't bridge the device then it won't ever be available in a LXD container should you want to run neutron gateway there.

George Kraft (cynerva)
tags: added: v-pil
tags: removed: kanban-cross-team
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Andrew, that's a question for the openstack guys.

From my side, I would like to understand better this LXD issue you speak of. When using the maas provider, all containers (lxd or lxc) will, with beta18 and older at least, get an IP from MAAS. The PXE interface will be bridged and the containers will hook up there.

The problem is that you have a use case where containers will want to hook up to another nic? How is this other nic configured in maas for that node?

Revision history for this message
Ryan Beisner (1chb1n) wrote :

I think that if a NIC is set as 'unconfigured' in MAAS, then Juju should not touch it in any way.

tags: added: uosci
Changed in juju:
milestone: none → 2.0-rc2
Revision history for this message
Richard Harding (rharding) wrote :

Ryan, the issue we're getting is that we've gotten feedback in both directions from Ante and other folks that things don't work unless Juju does touch the nics. We're trying to find a middle ground to help both parties here.

Revision history for this message
Ante Karamatić (ivoks) wrote :

There are two use cases. One is where one needs containers on a NIC and the other one where one wants to leave the NIC unconfigured.

In betas (I haven't tested RC yet) experience was such that bridge was created only when a subnet was configured on a NIC in MAAS. This allows connecting LXD container (juju creates a bridge), without exposing the host on the same layer 3.

If a NIC is not configured at all in MAAS, i.e. subnet is not configured for it, I would argue that it shouldn't be in /etc/network/interfaces in the first place. Therefore, juju won't see it and won't do anything with it. That still doesn't prevent one to rename the interface and use for whatnot (in this case neutron-gateway).

So, Andreas, check if your NIC has subnet configured. If it does, remove subnet, because you really don't need it (you only need layer 2 connection; i.e. fabric). If even after that you see the same behaviour, then bug is in MAAS - it should not define interface in ENI.

If you do not define subnet, MAAS doesn't put interface in ENI and juju still creates a bridge, then that's a bug in juju. Considering how juju works, my understanding is that this is impossible.

Configured fabric, configured subnet -> MAAS creates 'manual' entry in ENI, juju creates the bridge
Configured fabric, unconfigured subnbet -> MAAS doesn't create entry in ENI, Juju doesn't create the bridge

This way everything can be decided in MAAS. Another approach is the one that neutron-contrail charm does - provides an option of removing the bridge. While this works, it creates two places for networking configuration. That would be a bad design.

Revision history for this message
George Kraft (cynerva) wrote :

FWIW looks like we can work around this in VPIL infrastructure by configuring neutron-gateway to bridge to e.g. `br-bond0` instead of `bond0`. Don't know how reasonable that is though - I don't fully understand the implications of that change.

Revision history for this message
Ante Karamatić (ivoks) wrote :

@George while it's possible, you should not do it. OVS is a bridge by it self, and bridging the bridge might produce unwanted results (bridge terminates some L2 traffic; LLDP, CDP, LACP, STP...). This might also be a bug, charm should not take bridge as an argument for a NIC - or, it should remove the bridge.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

We need the NIC connected to the subnet, because we use that for placement and checklist purposes essentially. In the autopilot we show which public networks are available, and which machines are connected to each, so we won't let for example an user select a machine that has no public network connectivity to be a network gateway.

Revision history for this message
Ante Karamatić (ivoks) wrote :

@Andreas, I see. It makes sense for layer 3 connections. "If a node is connected to subnet from space 'OAM', then it can serve management services", etc.

But in case of neutron-gateway's external port, the subnet it is connected to has no meaning. External port is (and should be) stripped of an IP as it serves as a layer 2 bridge. The only attribute that classifies a port on that machine as capable of running neutron-gateway is fabric. NG allows you to even split that interface into multiple other layer2 interfaces (different external networks on different VLANs).

I guess all I'm saying is that in case of neutron gateway 'public network' is a fabric (or, if we go into complicated setups - a vlan), not a subnet.

Changed in juju:
status: New → Triaged
importance: Undecided → High
importance: High → Critical
Changed in juju:
importance: Critical → Undecided
importance: Undecided → High
assignee: nobody → Richard Harding (rharding)
Revision history for this message
Andres Rodriguez (andreserl) wrote :

FWIW, iMAAS *can* have an interface that is attached to a subnet but it is UNCONFIGURED. In this state, it is usable by neutron. So the user shouldn't have to detach the interface from a subnet in order for this scenario to work.

tags: added: eda
Changed in juju:
assignee: Richard Harding (rharding) → Andrew McDermott (frobware)
status: Triaged → In Progress
Revision history for this message
Andrew McDermott (frobware) wrote :

WIP branches:

  https://github.com/frobware/juju/tree/master-lp1627037
  https://github.com/dimitern/juju/tree/maas-bridge-some

We have run into an issue with aliases and looking to resolve those now. Once done, we'll put together a branch/build that combines both of these trees.

Revision history for this message
Andrew McDermott (frobware) wrote :
Changed in juju:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
Revision history for this message
Ante Karamatić (ivoks) wrote :

In my tests RC2's behavior now breaks LXD containers because it creates bridges only those interfaces that have an IP address configured in MAAS. Interfaces connected to a fabric and a subnet, but without configure IP, are not converted to a bridge.

I thought idea was to bridge everything that had fabric configured (and everything with fabric unconfigured to be lest unbridged). And then, other part of the work was for charms team to implement 'unbridging' of neutron-gateway interface.

Revision history for this message
David Britton (dpb) wrote :

Confirmed this behavior *works* for neutron-gateway and the autopilot, at least in the case where the second interfaces is 'Unconfigured' (as agreed).

But, I went further and also confirmed @Ante's concern that if the second NIC is set to 'auto-assign', juju does *not* bridge that interface.

http://paste.ubuntu.com/23271254/

If you look closely at the /e/n/i there, you will see that eth1 is set to 'manual' with no ip address set at all. In fact, I went another step and deployed just with MaaS. This is the result:

http://paste.ubuntu.com/23271312/

You'll note that both interfaces have IPs assigned from maas -- 'Auto assign' in the UI -- as expected.

Revision history for this message
Richard Harding (rharding) wrote :

After additional discussions we came to the conclusion that Juju should respect the same ideas that MAAS does. In MAAS, a nic is not "configured" until it has an IP address. Setting a fabric, or even a subnet, does not return that as "configured" in MAAS. It's still unconfigured.

Juju will not bridge all "configured" interfaces. If we want to question the definition of configured then we need to address it in our complete stack so that we're consistent and have a reasonable explanation of what users can expect as they put together our tools into a final solution.

The behavior seen in RC2 is expected, and we admit is not ideal for all cases, but we decided that it was better to be consistent and predictable and to work to get layer 2 support into Juju so that it can properly handle the additional cases that folks are looking to put into play.

Revision history for this message
Ante Karamatić (ivoks) wrote : Re: [Bug 1627037] Re: rc1 bridges all nics, breaks neutron-gateway
Download full text (3.4 KiB)

I agree there needs to be a vertical agreement on what is what. I do think
we have opposite views on what 'configured NIC' means.

For instance, I'd argue that NIC is configured even without an IP. I would
argue that NIC is just a device, that can have, but doesn't have to,
attached layer2 properties (MTU settings, VLAN ID, etc). Then it can also
have layer 3 properties - an IP address. I would also argue that MAAS
doesn't say that a NIC without an IP is an unconfigured NIC. Not only is
such NIC available on the system, but it also has layer properties - VLAN
ID, MTU. MAAS does some work to make this happen, it configures it. IMHO,
unconfigured NIC would be a NIC without an IP, but also without MTU
settings and VLAN ID and also a NIC that is 'DOWN' (no link-layer). Such
NIC would be unusable by any charm unless the charm goes to an extent of
managing the NIC.

Fabrics and spaces go well in line with this - NIC connected to fabric has
layer2 properties, NIC connected to a space has layer3 properties. Problem
with both Landscape and LXDs with multiple interfaces is that they both
require some layer3 properties to figure out layer2 device. In case of LXD,
this is not a hard requirement, rather a nice way for juju to figure out
which IP to assign to the container. Neutron gateway charm is a bit stupid
about it, it just uses whatever you give it. However, in an ideal world,
one should be able to say 'put neutron-gateway to a machine that has a NIC
attached to that fabric'. Question that we have then; is any NIC attached
to that fabric a candidate, or only those NICs that have only layer2
properties from that fabric, ie. no subnet declaration?

I think juju should be aware that a NIC is exposed only with layer2
properties, but it should also know what layer3 properties are possible on
it. This is why in RC1, attaching a subnet to a NIC gave juju opportunity
to figure out which layer3 properties are available on the NIC. It used
that to create a bridge, connect a container to it, and assign an IP to
that container from that subnet. With RC2 we bring in hard requirement of
having an IP on the host to create a bridge for containers.

On Tue, Oct 4, 2016 at 6:26 AM Richard Harding <email address hidden>
wrote:

> After additional discussions we came to the conclusion that Juju should
> respect the same ideas that MAAS does. In MAAS, a nic is not
> "configured" until it has an IP address. Setting a fabric, or even a
> subnet, does not return that as "configured" in MAAS. It's still
> unconfigured.
>
> Juju will not bridge all "configured" interfaces. If we want to question
> the definition of configured then we need to address it in our complete
> stack so that we're consistent and have a reasonable explanation of what
> users can expect as they put together our tools into a final solution.
>
> The behavior seen in RC2 is expected, and we admit is not ideal for all
> cases, but we decided that it was better to be consistent and
> predictable and to work to get layer 2 support into Juju so that it can
> properly handle the additional cases that folks are looking to put into
> play.
>
> --
> You received this bug notification because you are subscri...

Read more...

Revision history for this message
Björn Tillenius (bjornt) wrote :

I think that Ante has some really good points, but I think it's too late to do anything about it. This is a complicated issue, and shouldn't be addressed in an RC.

Ante I agree that using fabrics is a nice abstraction. But you need to get MAAS fixed first. Currently MAAS puts everything on the same fabric by default, even NICs that aren't even connected to anything. So if you simply ask for a NIC on a fabric, things most likely will break.

We (Landscape) would gladly use fabrics to figure out which NICs we could use, but it's not possible with the way MAAS works today (unless we would require the user to manually reconfigure every node, which isn't feasible).

I don't think that Juju can solve this on their own. MAAS, OpenStack and Landscape need to be involved in the discussion as well.

Ante Karamatić (ivoks)
tags: added: 4010
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.