Autogenerated interface name prevents creating a bridge over a VLAN

Bug #1804018 reported by Peter Penchev
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Witold Krecicki
2.4
Fix Released
High
John A Meinel
vlan (Ubuntu)
Won't Fix
Low
Dan Streetman
Trusty
Won't Fix
Low
Dan Streetman
Xenial
Won't Fix
Low
Dan Streetman
Bionic
Won't Fix
Low
Dan Streetman
Cosmic
Won't Fix
Low
Dan Streetman
Disco
Won't Fix
Low
Dan Streetman

Bug Description

Hi,

First of all, thanks for all your work on creating and maintaining Juju and the charms ecosystem!

I believe I have stumbled onto a bug in autogenerating the name for the bridge interface when one needs to be created to grant a container access to a host's network interface. This bug is currently blocking a MAAS/Juju/OpenStack deployment where traffic is separated into VLANs. I have successfully reproduced it on 2.4.6 and 2.5-beta1 installations, although I believe that it has been present ever since at least 2.2.0, if not maybe earlier.

Currently the name of the bridge interface is, if possible, generated by prepending "br-" to the host interface name; however, this is problematic with VLAN interfaces. If the host interface is called e.g. "enp3s0f0.503", this would create a bridge named "br-enp3s0f0.503"; however, if there is *also* a bridge on the "enp3s0f0" interface (without the VLAN ID), this would cause the Debian/Ubuntu ifupdown scripts to consider "br-enp3s0f0.503" to be VLAN 503 on the "br-enp3s0f0" interface and, consequently, fail to bring it up correctly the next time the node is rebooted.

Steps to reproduce:

1. Define a node with an Ethernet interface (let's call it "enp3s0f0") and a network space (let's call it "mgmt")

2. Define a VLAN over that interface (let's call it "enp3s0f0.503") and a network space for the VLAN (let's call it "storage")

3. Deploy a charm on that node so that Juju knows about the enp3s0f0 interface in the mgmt space and the enp3s0f0.503 interface in the storage space

4. Deploy a charm in a container, specitying `--constraints spaces=storage`; this will lead to Juju autogenerating a bridge interface and calling it "br-enp3s0f0.503"

5. Deploy a charm in a container, specifynig `--constraints spaces=mgmt`; this will lead to Juju autogenerating another bridge interface and calling it "br-enp3s0f0"

6. Reboot the server, then log into it

The br-enp3s0f0 bridge may be brought up correctly, but the br-enp3s0f0.503 interface, although it will exist, will have been created as a VLAN interface, not a bridge interface, and so any attempts to add any interfacesd to it will have failed; consequently, the container will also have failed to start up.

A naive fix for newly-bootstrapped environments would be to replace any dots with e.g. dashes in the `BridgeNameForDevice()` function in the `network/containerizer/bridgepolicy.go` file - this will lead to creating the new interface with a name that will not be interpreted as a VLAN over an existing interface. However, I think that a proper fix would have to include some sort of migration path for existing deployments, e.g. generating both the old and new names and possibly migrating network interfaces from the old bridge to the new one.

Please let me know if there is any more information that I can provide for a hopefully speedy resolution of this problem.

Thanks in advance for your consideration, and keep up the great work!

Best regards,
Peter

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Interesting, I would have thought you'd hit the 15-byte limit on interface names but you didn't at it's 15 bytes exactly.

Looking at a manpage, bridge_ports should prevent this interface from being treated as a VLAN interface:
http://manpages.ubuntu.com/manpages/xenial/man5/interfaces.5.html#vlan%20and%20bridge%20interfaces
"To ease the configuration of VLAN interfaces, interfaces having . (full stop character) in the name are configured as 802.1q tagged virtual LAN interface. For example, interface eth0.1 is a virtual interface having eth0 as physical link, with VLAN ID 1. For compatibility with bridge-utils package, if bridge_ports option is specified, VLAN interface configuration is not performed."

I would have thought I'd find this code in this file
dpkg -S /etc/network/if-pre-up.d/vlan
vlan: /etc/network/if-pre-up.d/vlan

but I did not because of this commit it seems:
https://git.launchpad.net/ubuntu/+source/vlan/commit/?id=4c88eab61549ec4c6c8ed65e9610fca712ed98f4

I have this in /e/n/i in a test env before reboot:

auto br-enp4s0f0
iface br-enp4s0f0 inet static
    address 10.232.6.131/21
    dns-nameservers 10.232.0.2
    gateway 10.232.0.1
    bridge_ports enp4s0f0

auto b-enp4s0f0.2730
iface b-enp4s0f0.2730 inet static
    address 10.232.45.226/21
    bridge_ports enp4s0f0.2730

auto b-enp4s0f0.2731
iface b-enp4s0f0.2731 inet static
    address 10.232.24.3/21
    bridge_ports enp4s0f0.2731

brctl show
bridge name bridge id STP enabled interfaces
b-enp4s0f0.2730 8000.78e7d124e458 no enp4s0f0.2730
       veth9RV4DA
       vethO1TEE1
b-enp4s0f0.2731 8000.78e7d124e458 no enp4s0f0.2731
       veth04DDDF
       vethV8F3YP
br-enp4s0f0 8000.78e7d124e458 no enp4s0f0
       veth52C4IX
       veth8L5MG5
       veth8XWOE9
       vethOGIXIJ
       vethXSH47J

Could you paste what you have too?

Revision history for this message
Peter Penchev (openstack-dev-s) wrote :
Download full text (5.2 KiB)

Hi,

Thanks for the quick reaction!

Ahh, so I guess I should have read the interfaces(5) manual page a bit more carefully; I found the passage that said "anything with a dot is a VLAN interface" and I did not read to the end of the paragraph, so I completely missed the part about the bridge_ports integration...

OK, so it looks like this is a problem caused by Ubuntu's vlan package deviating from Debian's vlan package - indeed, Debian's vlan-1.9-3.2 and later do handle interfaces with bridge_ports specified - see e.g. https://salsa.debian.org/debian/vlan/blob/debian/debian/network/if-pre-up.d/vlan#L21 - the check for BRIDGE_PORTS was put in with commit https://salsa.debian.org/debian/vlan/commit/7517228145310a29b7cb7f4bddccb1b70df347eb and has not been removed since. And, yes, it was skipped in the Ubuntu sync commit that you pointed to. Thanks for the sleuthing work and pointing me in the right direction!

Of course, it may turn out that there were good reasons for this back then - it seems that the way interfaces are brought up differed between Ubuntu and Debian. Argh :( So what does this mean - if the same reasons hold true today, does this mean that under Ubuntu, a bridge interface over VLANs is not supported? That would be... a bit weird, to be honest. Please note that I'm not trying to be hostile in any way, just wondering how our setup may be configured.

So what now - do I reassign this bug to Ubuntu's vlan package and see what the maintainers have to say about it?

For the record, here's the configuration on our server before rebooting - it's virtually the same as yours modulo the VLAN IDs:

auto enp3s0f0
iface enp3s0f0 inet manual
    mtu 9000

auto enp3s0f0.503
iface enp3s0f0.503 inet manual
    mtu 9000
    vlan-raw-device enp3s0f0
    vlan_id 503

auto br-enp3s0f0
iface br-enp3s0f0 inet static
    address 172.30.0.124/24
    gateway 172.30.0.254
    bridge_ports enp3s0f0

auto br-enp3s0f0.503
iface br-enp3s0f0.503 inet static
    address 172.22.110.2/24
    bridge_ports enp3s0f0.503

And here's the "brctl show" output:

bridge name bridge id STP enabled interfaces
br-enp3s0f0 8000.a0369f8cc570 no enp3s0f0 ...

Read more...

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I think adding the vlan package to get a comment is the right call.

There's also this patch by wupeka:
https://github.com/juju/juju/pull/9482

summary: - Autogenerated nterface name prevents creating a bridge over a VLAN
+ Autogenerated interface name prevents creating a bridge over a VLAN
Revision history for this message
Richard Harding (rharding) wrote :

Assigning to jam to shepherd through. The PR isn't yet complete but discussion is going on in that PR towards a fix.

Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.5.0
assignee: nobody → John A Meinel (jameinel)
John A Meinel (jameinel)
Changed in juju:
assignee: John A Meinel (jameinel) → Witold Krecicki (wpk)
status: Triaged → In Progress
Revision history for this message
Anastasia (anastasia-macmood) wrote :

I am pretty sure that Witold is not working on this for Juju 2.5.
However, since John has addressed the issue on 2.4.7, we would have merged the solution forward. I am re-assigning this to John and updating the status to Fix Committed.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Or maybe Witold is since the last update was less than 2 weeks ago :)

Revision history for this message
Dan Streetman (ddstreet) wrote :

hi, here are some clarifications on this bug:

> https://salsa.debian.org/debian/vlan/blob/debian/debian/network/if-pre-up.d/vlan#L21
> https://salsa.debian.org/debian/vlan/commit/7517228145310a29b7cb7f4bddccb1b70df347eb

these have nothing to do with this bug. those changes are in case sections that don't match the naming of your interface - yours starts with "br-*" while those changes are in sections that only match actual interface naming (i.e., not starting with "br-").

Additionally, I'm pretty sure that check was incorrectly written and has never worked the way they thought it was working - i.e. it should check for -n "$IF_BRIDGE_PORTS", but it doesn't, it checks for -z, so it *won't* exit when called for a device that includes a 'bridge_ports' param - it will exit for a normally-named interface with vlan extension *without* vlan-raw-device or bridge_ports defined, which may be why it was removed from ubuntu long ago (looks like it was re-introduced to ubuntu in cosmic, so i should probably go and fix that too).

> https://git.launchpad.net/ubuntu/+source/vlan/commit/?id=4c88eab61549ec4c6c8ed65e9610fca712ed98f4

getting closer with this one, but not quite there yet :)

this is the change that introduces this regression:

https://git.launchpad.net/ubuntu/+source/vlan/commit/?h=applied/ubuntu/bionic-devel&id=117d183fa16e2d5e43da4cfe03e7e8685d27c6d1

yep it was me, sorry.

I want to say that I absolutely *HATE* the debian ifupdown api that allows this kind of 'automagic' vlan creation just by appending ".NNN" to an interface's name. Is it *really* so hard to add a simple "vlan-raw-device" param to the config? The 'automagic'-ness of it makes things *much* harder - especially when there are special cases like this, a bridge on top of a vlan.

In any case, we can fix this regression by special-casing any configured interface that includes a "bridge_ports" param, which (basically) should get us back to how things worked before.

However, I personally think it's super, super confusing for juju to name a bridge device with vlan suffix, and I highly support the juju patch from comment 3 that changes the '.' to '-' (in addition to fixing this regression in vlan).

Changed in vlan (Ubuntu):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → Medium
status: New → In Progress
Dan Streetman (ddstreet)
Changed in vlan (Ubuntu Cosmic):
status: New → In Progress
Changed in vlan (Ubuntu Bionic):
status: New → In Progress
Changed in vlan (Ubuntu Xenial):
status: New → In Progress
Changed in vlan (Ubuntu Trusty):
status: New → In Progress
Changed in vlan (Ubuntu Cosmic):
importance: Undecided → Medium
Changed in vlan (Ubuntu Bionic):
importance: Undecided → Medium
Changed in vlan (Ubuntu Xenial):
importance: Undecided → Medium
Changed in vlan (Ubuntu Trusty):
importance: Undecided → Medium
Changed in vlan (Ubuntu Cosmic):
assignee: nobody → Dan Streetman (ddstreet)
Changed in vlan (Ubuntu Bionic):
assignee: nobody → Dan Streetman (ddstreet)
Changed in vlan (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
Changed in vlan (Ubuntu Trusty):
assignee: nobody → Dan Streetman (ddstreet)
Revision history for this message
Dan Streetman (ddstreet) wrote :

What a mess this is.

First, in trusty/xenial, my patch (as discussed above in comment 7) works correctly and I have a ppa at https://launchpad.net/~ddstreet/+archive/ubuntu/lp1804018

However, I need to first fix disco and sru back from there. Unfortunately, in later releases, things get quite unpleasant.

Unfortunately, it seems like ALL three of the packages that used to coordinate now think they own vlan creation. In the past (i.e. trusty/xenial), the vlan package was solely responsible for the creation of any vlan link, from the if-pre-up.d/vlan script (which was called either during ifup of the vlan interface, or from udev when the vlan's raw-device was detected).

Now, in bionic and later:

ifupdown thinks it owns vlan link creation, in the link.defn file. It used to special-case interfaces with bridge_ports defined, but that was explicitly removed, and now ifupdown will create a vlan link for ALL interfaces that end with .NNN, even if they contain a bridge_ports parameter (note that 'special case' has been removed from the man page in bionic and later, http://manpages.ubuntu.com/manpages/bionic/man5/interfaces.5.html).

bridge-utils ALSO thinks it owns vlan link creation, in the create_vlan_port() function from /lib/bridge-utils/bridge_utils.sh, which is called from /lib/udev/bridge-network-interface and from /lib/bridge-utils/ifupdown.sh

Finally, vlan still thinks (quite correctly, IMHO) that it owns vlan link creation.

Unfortunately, all this mess results in it being quite difficult to have these 'special case' ifupdown interface names, with a .NNN vlan suffix but also 'bridge_ports' parameter that supposedly indicates it's a bridge, not a vlan.

I'm tempted to simply fix xenial/trusty and not touch bionic or later, but that will result in a regression for anyone upgrading from xenial to bionic (since the upgrade will keep ifupdown). However, if the ifupdown "api" really has changed - meaning in t/x you *can* have bridge interface names ending in .NNN, while in bionic and later you *cannot* - then "fixing" only t/x is the correct thing to do, and simply let people upgrade from x->b and hit the behavior change and then debug and have to fix their ifupdown config.

Also I should note that in bionic and later, during normal boot, this "vlan-named-bridge" interface actually *does* work correctly, and creates it as a bridge, not vlan. However, ifupdown remains confused about it and thinks the vlan-named-bridge never was brought fully up. So you can't ifdown it. Also, manually trying to ifup the interface fails, so it's only a timing coincidence that it "works" during boot.

I'll take some time to think about this. Since juju has already been patched to fix this (by changing the bridge name to replace '.' with '-'), this should be less urgent.

Changed in vlan (Ubuntu Trusty):
importance: Medium → Low
Changed in vlan (Ubuntu Xenial):
importance: Medium → Low
Changed in vlan (Ubuntu Bionic):
importance: Medium → Low
Changed in vlan (Ubuntu Cosmic):
importance: Medium → Low
Changed in vlan (Ubuntu Disco):
importance: Medium → Low
Revision history for this message
Dan Streetman (ddstreet) wrote :

After some thought about this, I believe this 'special case' of vlan-named-bridge interfaces should be a Won't Fix for trusty and xenial. The 'special case' has been removed in bionic and later, and 'fixing' this for t/x would only result in later regressions during release upgrade.

The original issue - juju creating these vlan-named-bridge interface names - has been fixed, so I believe it's ok to mark this as won't fix for vlan for all releases; if anyone has manually (or programmatically) created such special case interface names, they can fix their manual naming (or fix their script/application). The special case was a really bad idea in the first place, anyway.

Changed in vlan (Ubuntu Disco):
status: In Progress → Won't Fix
Changed in vlan (Ubuntu Cosmic):
status: In Progress → Won't Fix
Changed in vlan (Ubuntu Bionic):
status: In Progress → Won't Fix
Changed in vlan (Ubuntu Xenial):
status: In Progress → Won't Fix
Changed in vlan (Ubuntu Trusty):
status: In Progress → Won't Fix
Changed in juju:
milestone: 2.5.0 → 2.5.1
Revision history for this message
Richard Harding (rharding) wrote :
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Marking the main task as "wontfix" too, as disco was.

Changed in vlan (Ubuntu):
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.