juju 1.25 misconfigures juju-br0 when using MAAS 1.9 bonded interface

Bug #1516891 reported by Ante Karamatić
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
juju-core
Fix Released
High
Andrew McDermott
1.25
Fix Released
High
Andrew McDermott

Bug Description

MAAS 1.9 introduced support for bonding of network interfaces. When juju deployes a machine with such an interface, it creates juju-br0 on top of bond0. However, it doesn't do this properly.

Instead of creating juju-br0 and taking ownership of layer3 properties (IP address, gateway), juju-br0 fully replaces bond0. This leads to a configuration in /etc/network/interfaces that looks something like this:

auto juju-br0
iface juju-br0 inet static
   address a.b.c.d/24
   gateway a.b.c.f
   bridge-port bond0
   bond-mode active-backup
   <other bonding properties>

While this isn't terribly wrong (it should still work), problem is that juju-br0 *replaces* bond0. This leads to a problem where bond0 is marked as bridge-port, but bond0 is nowhere to be defined in /etc/network/interfaces. End result is - juju-br0 stays DOWN since there's no underlying physical/logical interface.

I can easily reproduce this with MAAS 1.9, by creating a bond interfaces that use the same bond, but different VLANs (bond0 bond0.10 for example). Unfortunately, I managed to delete /etc/network/interfaces from one of these deployments, so I can't provide exact example right now. I will update the bug later with an example of such interface file.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

This does not seem to be a MAAS bug, since MAAS simply calls curtin to create the proper /etc/network/interfaces. So I'm marking this 'Invalid' for MAAS. Please adjust as necessary if it turns out a MAAS change is needed.

Changed in maas:
status: New → Invalid
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Following the bond0 issue, this not only happens when creating bonds but also with any other interface.

Revision history for this message
Andrew McDermott (frobware) wrote :

I am currently working on updating the script that re-renders /etc/network/interfaces with the juju bridge.

https://bugs.launchpad.net/juju-core/+bug/1516891

Revision history for this message
Cheryl Jennings (cherylj) wrote :

I believe Andy meant to link to this bug: https://bugs.launchpad.net/juju-core/+bug/1512371

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.25.1
Revision history for this message
Andrew McDermott (frobware) wrote :

I did indeed; apologies for the confusion.

Revision history for this message
Andrew McDermott (frobware) wrote :

The re-work of the script mentioned in comment #4 will not fix this issue. Also, the existing bug caught my eye when debugging this:

  https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873

I manually setup my /etc/network/interfaces to mimic what juju would do (during bootstrap):

ubuntu@bubbly-dog:~$ cat /etc/network/interfaces
auto eth0
iface eth0 inet manual
    bond-lacp_rate slow
    bond-xmit_hash_policy layer2
    bond-miimon 100
    bond-master bond0
    mtu 1500
    bond-mode active-backup

auto eth1
iface eth1 inet manual
    bond-lacp_rate slow
    bond-xmit_hash_policy layer2
    bond-miimon 100
    bond-master bond0
    mtu 1500
    bond-mode active-backup

iface bond0 inet manual

auto juju-br0
iface juju-br0 inet dhcp
    bridge_ports bond0
    bond-lacp_rate slow
    bond-xmit_hash_policy layer2
    bond-miimon 100
    mtu 1500
    bond-mode active-backup
    hwaddress 52:54:00:de:0e:71
    bond-slaves none

dns-nameservers 10.17.20.200
dns-search maas19

However, the juju-br0 is not brought up properly when the machine reboots.

ntu@bubbly-dog:~$ ifconfig -a
bond0 Link encap:Ethernet HWaddr d2:87:1a:55:d1:c8
          UP BROADCAST MASTER MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

eth0 Link encap:Ethernet HWaddr 52:54:00:de:0e:71
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

eth1 Link encap:Ethernet HWaddr 52:54:00:2e:5e:2c
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

juju-br0 Link encap:Ethernet HWaddr 52:54:00:de:0e:71
          inet6 addr: fe80::d087:1aff:fe55:d1c8/64 Scope:Link
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:180 (180.0 B)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:56 errors:0 dropped:0 overruns:0 frame:0
          TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4240 (4.2 KB) TX bytes:4240 (4.2 KB)

Changed in juju-core:
milestone: 1.25.1 → 1.25.2
Changed in juju-core:
assignee: nobody → Andrew McDermott (frobware)
Changed in juju-core:
milestone: 1.25.2 → 1.26-beta1
Revision history for this message
Andrew McDermott (frobware) wrote :

juju renders /etc/network/interfaces incorrectly for the bond.

A correct rendering of /e/n/i should be like the following.

Note: currently updating the juju script to make it bond aware.

ubuntu@node2:~$ cat /etc/network/interfaces
auto eth0
iface eth0 inet manual
    bond-lacp_rate slow
    bond-xmit_hash_policy layer2
    bond-miimon 100
    bond-master bond0
    mtu 1500
    bond-mode active-backup

auto eth1
iface eth1 inet manual
    bond-lacp_rate slow
    bond-xmit_hash_policy layer2
    bond-miimon 100
    bond-master bond0
    mtu 1500
    bond-mode active-backup

iface bond0 inet manual

auto bond0
iface bond0 inet manual
    bond-lacp_rate slow
    bond-xmit_hash_policy layer2
    bond-miimon 100
    mtu 1500
    bond-mode active-backup
    hwaddress 52:54:00:1c:f1:5b
    bond-slaves none

auto br0
iface br0 inet dhcp
 bridge_ports bond0

dns-nameservers 10.17.20.200
dns-search maas19

Revision history for this message
Andrew McDermott (frobware) wrote :

For 1.25 there's a WIP branch here:

  https://github.com/frobware/juju/tree/1.25-lp1516891-wip

but currently verifying this won't break arm64.

Revision history for this message
Andrew McDermott (frobware) wrote :

An update: found some issues with adding the bridge to the bond0 interface on vivid so retesting there and on wily. Sometimes the newly added bridge would fail to come up and bootstrap would fail.

Revision history for this message
Andrew McDermott (frobware) wrote :
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Andrew McDermott (frobware) wrote :
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
Andrew McDermott (frobware) wrote :
Changed in juju-core:
milestone: 1.26-beta1 → 2.0-alpha1
Revision history for this message
Paul Gear (paulgear) wrote :

Does the fixed network interfaces changing script correctly handle the case where the bonded interface has a static address? I've been unable to get LACP bonding to work with DHCP (presumably due to something similar to the issue described at https://andrewpeng.net/posts/2011/08/031345-linux-network-interface-channel-bonding-and-dhcp.html), so my MAAS postinstall script sets up a working interfaces file with static addressing. Then juju comes along and turns it into something that could never possibly work: https://pastebin.canonical.com/145908/

So I thought, "I'll just pre-set up the interface so that juju doesn't have to do anything and gets delivered a working bridge interface". That resulted in a network interface setup that was even less working, if that were possible: https://pastebin.canonical.com/145986/

Can anyone suggest a workaround? Or does this warrant a new bug?

Revision history for this message
Andrew McDermott (frobware) wrote :

I updated https://pastebin.canonical.com/145986/ to show before and after - though my "before" has no mention of juju-br0 and should represent what the deployed node would have during deployment.

Revision history for this message
Andrew McDermott (frobware) wrote :

Not sure where my paste went: adding a 'before' and 'after' examples.

Revision history for this message
Andrew McDermott (frobware) wrote :

After the add-juju-bridge.py script runs.

Revision history for this message
Andrew McDermott (frobware) wrote :

After the add-juju-bridge.py script runs.

Revision history for this message
Paul Gear (paulgear) wrote :

Per discussion with Andrew in IRC, I got the system working by using "disable-network-management: true" in environments.yaml.

Here's what the current /etc/network/interfaces looks like: https://pastebin.canonical.com/146015/

If we could get MAAS to set up the DHCP client to work appropriately with LACP, here's what the preferred version would look like: https://pastebin.canonical.com/146016/

And if that doesn't happen, here's the preferred version: https://pastebin.canonical.com/146017/

In all cases, I believe juju should transfer only the relevant settings (anything about IP addressing and DNS) to its bridge interface, and leave all of the bond settings (anything beginning with "bond-") where they are.

Changed in juju-core:
milestone: 2.0-alpha1 → 1.26-alpha3
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Revision history for this message
Mike Deats (mikedeats) wrote :

This still seems to be an issue when using MAAS 1.9+Juju 1.25.2 with VLANs off the bonded interfaces. It appears that juju will create a bridge interface for every entry in /e/n/i that contains "bond0", resulting in duplicate "juju-br0" lines.

 I deployed the mysql charm to one of my physical MAAS nodes, and after the install the network config looked fine and everything was up on all my VLAN interfaces. But after the Juju agent tried to create the bridge interface, it ended up adding juju-br0 for every VLAN interface attached to bond0, AND to bond0 itself. (See the attached file maas19_juju1.25.2_bond_vlan_after_juju.txt) There are a total of 5 entries for juju-br0 in my /e/n/i file at that point. This makes all the networks off of bond0 stop working.

Fortunately Juju seems to leave bond1 alone, so I can still SSH into the node through one of those interfaces and see what kind of mess it made.

Note that bond0 is set to the untagged VLAN for use by MAAS PXE boot, which is why it has a static IP assigned. All network configurations were done via MAAS, and all addresses are set to "Auto-Assign".

Revision history for this message
Mike Deats (mikedeats) wrote :

A couple of more data points, I did confirm that everything works perfectly if you just have bond ports with no VLANs.

But if you have VLANs with no bond ports, Juju still creates a bridge for each VLAN interface, only this time it seems to replace the VLAN interface with the bridge. This doesn't seem to work very well either.

Maybe this needs to be a new bug?

Revision history for this message
Andrew McDermott (frobware) wrote :

@Mike - there is a separate bug for the VLAN issue. https://bugs.launchpad.net/juju-core/+bug/1532167

Revision history for this message
Mike Deats (mikedeats) wrote :

@Andrew - Cool, thanks. I didn't see that before, and this was the closest one I could find.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.