802.3ad bonding not configured correctly

Bug #889423 reported by Albert Chin
54
This bug affects 7 people
Affects Status Importance Assigned to Milestone
bridge-utils (Ubuntu)
Fix Released
Undecided
Unassigned
Oneiric
Fix Released
Undecided
Stéphane Graber
ifenslave-2.6 (Ubuntu)
Fix Released
Undecided
Unassigned
Oneiric
Fix Released
Undecided
Stéphane Graber
vlan (Ubuntu)
Fix Released
Undecided
Unassigned
Oneiric
Fix Released
Undecided
Stéphane Graber

Bug Description

SRU instructions (from comment 41 and 46):
=== bridge-utils ===
So there are two things to test with that new bridge-utils:
 1) Bridge interface with bridge-ports set instead of bridge_ports works too
 2) Bridging a non-existing vlan interface will now create it

These two are in the udev hooks, so need to be tested by creating a network interface, like a tap device (using uml-utilities to create it).

Test for 1)
 - Make sure uml-utilities and bridge-utils are both installed
 - Add the following entry to /etc/network/interfaces:
auto br0
iface br0 inet static
    address 192.168.1.1
    netmask 255.255.255.0
    bridge-ports eth9
 - Create the tap device: tunctl -t eth9
 - Check that the bridge has been created and the interface added to it (bridge shouldn't have an IP configuration at this point):
root@castiana:~# brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.1a6bffdb2551 no eth9

The previous release wouldn't do anything unless you were using bridge_ports.

Test for 2)
 - Make sure uml-utilities, bridge-utils and vlan are all intalled
 - Add the following entry to /etc/network/interfaces:
auto br0
iface br0 inet static
    address 192.168.1.1
    netmask 255.255.255.0
    bridge-ports eth9.1010
 - Create the tap device: tunctl -t eth9
 - Check that the bridge has been created and the interface added to it (bridge shouldn't have an IP configuration at this point):
root@castiana:~# brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.06c2192d61ab no eth9.1010

The previous release would create the bridge but not add the port to it as the tag interface wouldn't exist.

Between each test, cleanup with:
 - tunctl -d eth9
 - ifconfig br0 down
 - brctl delbr br0

The use of eth9 instead of tap0 is done on purpose as the vlan script explicitly checks for interfaces with eth, bond or wlan in their name.

=== vlan ===
Here's a quick example of how to test the new vlan package:

 - Make sure uml-utilities and vlan are installed
 - Add the following entry to /etc/network/interfaces:
auto eth9.1010
iface eth9.1010 inet static
    address 192.168.1.1
    netmask 255.255.255.0
 - Create the tap device: tunctl -t eth9
 - Check that the vlan interface has been created and configured correctly: ifconfig eth9.1010
eth9.1010 Link encap:Ethernet HWaddr ce:51:62:98:16:78
          inet addr:192.168.1.1 Bcast:0.0.0.0 Mask:255.255.255.0
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

Prior to this update, vlan interface creation would be racy as it'd depend on the catch all networking.conf job to initialise eth9.1010 with the race being that this job would be triggered before eth9 actually exists.

=== ifenslave-2.6 ===
TODO: Using setup from original description before/after should work but I'll comment with a simplified testcase when I have a minute.

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

This is against Ubuntu Server 11.10.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ifupdown (Ubuntu):
status: New → Confirmed
Revision history for this message
Tom Ellis (tellis) wrote :

I'm also seeing this and have carried out some tests.

The following interfaces config works across Lucid, Maverick & Natty but fails on Oneiric:
==
auto bond0
iface bond0 inet static
        bond-slaves none
        bond-mode 802.3ad
        bond-miimon 100
       address 10.153.107.22
       netmask 255.255.255.0
       gateway 10.153.107.1

auto eth0
iface eth0 inet manual
        bond-master bond0
        bond-primary eth0 eth1

auto eth1
iface eth1 inet manual
        bond-master bond0
        bond-primary eth0 eth1
==

That config is based on the example from the ifenslave docs (/usr/share/doc/ifenslave-2.6/examples/two_hotplug_ethernet), with the mode swapped out.

During those tests I also tried out bonding mode 1 (active-standby), this works fine in every release tested including Oneiric.

I can confirm the Alberts' findings that bringing up the bond manually works around the problem in ifupdown/ifenslave.

tags: added: pse
Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

Tom, can you also confirm the dropped packets I'm seeing on bond0?

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

 Looking at my original post and the ifconfig output of bond0:
  # ifconfig -a
  bond0 Link encap:Ethernet HWaddr 00:1b:21:b7:21:ea
            inet addr:10.191.62.2 Bcast:10.191.62.255 Mask:255.255.255.0
            inet6 addr: fe80::21b:21ff:feb7:21ea/64 Scope:Link
            UP BROADCAST MASTER MULTICAST MTU:1500 Metric:1
            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

Note bond0 isn't "RUNNING", like eth2 and eth3. Whereas, after I manually bring up the bond0 interface, bond0 is "RUNNING".

Revision history for this message
Tom Ellis (tellis) wrote :

I do not see any dropped packets.

In syslog I have a lot of these:
[10292.612017] bonding: bond0: Warning: Found an uninitialized port

Attaching my /proc/net/bonding/bond0 & ifconfig outputs

description: updated
Revision history for this message
Tom Ellis (tellis) wrote :
Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

Consider the following entries in /etc/network/interfaces:
#auto bond0
iface bond0 inet static
  address 10.191.62.2
  netmask 255.255.255.0
  broadcast 10.191.62.255
  bond-slaves none
  bond-primary eth2 eth3
  bond-mode 802.3ad
  bond-lacp_rate fast
  bond-miimon 100

#auto eth2
iface eth2 inet manual
  bond-master bond0
  bond-primary eth2 eth3
  bond-mode 802.3ad
  bond-lacp_rate fast
  bond-miimon 100

#auto eth3
iface eth3 inet manual
  bond-master bond0
  bond-primary eth2 eth3
  bond-mode 802.3ad
  bond-lacp_rate fast
  bond-miimon 100

If I reboot the system and manually start the bridge:
  # ifup bond0
  # ifup eth2
  # ifup eth3
  # ifconfig bond0
  bond0 Link encap:Ethernet HWaddr 00:1b:21:b7:21:ea
            inet addr:10.191.62.2 Bcast:10.191.62.255 Mask:255.255.255.0
            inet6 addr: fe80::21b:21ff:feb7:21ea/64 Scope:Link
            UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
            RX packets:31 errors:0 dropped:21 overruns:0 frame:0
            TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:3204 (3.2 KB) TX bytes:3940 (3.9 KB)

But, if instead I reverse the order:
  # ifup eth2
  # ifup eth3
  # ifup bond0
  # ifconfig bond0
  bond0 Link encap:Ethernet HWaddr 00:1b:21:b7:21:ea
            inet addr:10.191.62.2 Bcast:10.191.62.255 Mask:255.255.255.0
            inet6 addr: fe80::21b:21ff:feb7:21ea/64 Scope:Link
            UP BROADCAST MASTER MULTICAST MTU:1500 Metric:1
            RX packets:126 errors:0 dropped:126 overruns:0 frame:0
            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:15624 (15.6 KB) TX bytes:0 (0.0 B)

So, things work if the bond0 interface is ifup'd first, not last.

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

Another datapoint. I installed the 2.6.39-3.10 kernel from https://launchpad.net/ubuntu/oneiric/+source/linux/2.6.39-3.10. I configured bonding in this version of the kernel the same way that I configured it for the 3.0.0-12-server kernel. For the 2.6.39-3.10 kernel, there are *NO* dropped packets. Should I create a separate bug report for the issue of dropped packets?

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

With lacp_rate=fast, the number of dropped packets increases faster than with lacp_rate=slow (the default).

Revision history for this message
Tom Ellis (tellis) wrote :

I don't see the drop packets at all, maybe it's driver related. So yeah, perhaps it'll be better to open another issue for that.

I tested ifupdown 0.7~beta2 from debian experimental and the problem still persists. The package uses sysvinit scripts instead of the Ubuntu upstart delta but the problem is still present.

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

I created bug #890475 for the dropped packets. Tom, I'm using the igb driver for my Intel E1G42ET Gigabit ET Dual Port Server Adapter. I've also tried using the latest igb driver from intel with no difference. However, considering that I get no dropped packets on either RHEL6 or 11.10 with a 2.6.39 kernel, I think the problem lies elsewhere.

Revision history for this message
Tom Ellis (tellis) wrote :

I've been testing an experimental patched ifupdown by stgraber https://launchpad.net/~stgraber/+archive/experimental/+build/2931666 which should get around any race conditions.
This installs fine onto oneiric too even though the target is precise.

I've had more success with this but still having issues restarting interfaces, they don't seem to go down cleanly for me.

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

I tried this version of ifenslave and didn't see any change:
  $ cat /etc/network/interfaces
  ...
  auto bond0
  iface bond0 inet manual
    bond-slaves none
    bond-primary eth4 eth5
    bond-mode 802.3ad
    bond-lacp_rate fast
    bond-miimon 100
    bond-updelay 200

  auto eth4
  iface eth4 inet manual
    bond-master bond0
    bond-primary eth4 eth5
    bond-mode 802.3ad
    bond-lacp_rate fast
    bond-miimon 100
    bond-updelay 200

  auto eth5
  iface eth5 inet manual
    bond-master bond0
    bond-primary eth4 eth5
    bond-mode 802.3ad
    bond-lacp_rate fast
    bond-miimon 100
    bond-updelay 200

  # ifconfig bond0
  bond0 Link encap:Ethernet HWaddr 00:1b:21:d3:f6:0b
            BROADCAST MASTER MULTICAST MTU:1500 Metric:1
            RX packets:383 errors:0 dropped:383 overruns:0 frame:0
            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:47492 (47.4 KB) TX bytes:0 (0.0 B)

  # ifconfig eth4
  eth4 Link encap:Ethernet HWaddr 00:1b:21:d3:f6:0b
            UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
            RX packets:193 errors:0 dropped:193 overruns:0 frame:0
            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:23932 (23.9 KB) TX bytes:0 (0.0 B)
            Memory:b1a80000-b1b00000

  # ifconfig eth5
  eth5 Link encap:Ethernet HWaddr 00:1b:21:d3:f6:0b
            UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
            RX packets:196 errors:0 dropped:196 overruns:0 frame:0
            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:24304 (24.3 KB) TX bytes:0 (0.0 B)
            Memory:b1a00000-b1a80000

  # cat /proc/net/bonding/bond0
  Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

  Bonding Mode: IEEE 802.3ad Dynamic link aggregation
  Transmit Hash Policy: layer2 (0)
  MII Status: down
  MII Polling Interval (ms): 100
  Up Delay (ms): 200
  Down Delay (ms): 0

  802.3ad info
  LACP rate: fast
  Aggregator selection policy (ad_select): stable
  bond bond0 has no active aggregator

  Slave Interface: eth5
  MII Status: down
  Speed: 1000 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: 00:1b:21:d3:f6:0b
  Aggregator ID: N/A
  Slave queue ID: 0

  Slave Interface: eth4
  MII Status: down
  Speed: 1000 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: 00:1b:21:d3:f6:0a
  Aggregator ID: N/A
  Slave queue ID: 0

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

What's more, if I comment "auto bond0", "auto eth4", and "auto eth5" and manuall start up the interfaces with:
  # ifup bond0
  # ifup eth4
  # ifup eth5
the bond0 interface does not come up correctly. With the previous ifenslave-2.6, this worked to bring up bond0. So, the latest ifenslave-2.6 you're testing is a step back.

Revision history for this message
Tom Ellis (tellis) wrote :

Oh, odd. It comes up fine on a cold boot for me.

Out of interest, how is your switch configured? is LACP in active or passive?

Revision history for this message
Stéphane Graber (stgraber) wrote :

I'm also going to copy/paste an answer I posted somewhere else:

Also, as noted on IRC, bond-primary seems to be limited to active-backup only.
For reference: http://www.kernel.org/doc/Documentation/networking/bonding.txt
Here's a minimal 802.3ad config:

----
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# Link Aggregation bonding
auto bond0
iface bond0 inet static
        bond-slaves none
        bond-mode 802.3ad
        bond-miimon 100
        address 172.22.15.11
        netmask 255.255.255.0
        gateway 172.22.15.1

auto eth0
iface eth0 inet manual
        bond-master bond0

auto eth1
iface eth1 inet manual
        bond-master bond0

----

Please try with a config similar to that one as I'd expect kernel errors when trying to change options that don't exist in 802.3ad mode.
Looking at "dmesg | grep bond" may also give some clues as to what's wrong.

That's with my proposed ifenslave package, the one in Oneiric and in former Ubuntu releases is known to have a race condition so at this point I'm only interested in feedback on the one that's in my PPA as that's the one I'll push to Precise and use for SRUs.

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

I now have the following in /etc/network/interfaces:
auto bond0
iface bond0 inet manual
  bond-slaves none
  bond-mode 802.3ad
  bond-lacp_rate slow
  bond-miimon 100
  bond-updelay 200

auto eth2
iface eth2 inet manual
  bond-master bond0

auto eth3
iface eth3 inet manual
  bond-master bond0

When I reboot the system, I have:
% ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:1b:21:d3:f6:0b
          BROADCAST MASTER MULTICAST MTU:1500 Metric:1
          RX packets:318 errors:0 dropped:318 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:39432 (39.4 KB) TX bytes:0 (0.0 B)

% ifconfig eth2
eth2 Link encap:Ethernet HWaddr 00:1b:21:d3:f6:0b
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:161 errors:0 dropped:161 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:19964 (19.9 KB) TX bytes:0 (0.0 B)
          Memory:b1a80000-b1b00000

% ifconfig eth3
eth3 Link encap:Ethernet HWaddr 00:1b:21:d3:f6:0b
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:163 errors:0 dropped:163 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:20212 (20.2 KB) TX bytes:0 (0.0 B)
          Memory:b1a00000-b1a80000

% cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: down
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
bond bond0 has no active aggregator

Slave Interface: eth3
MII Status: down
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1b:21:d3:f6:0b
Aggregator ID: N/A
Slave queue ID: 0

Slave Interface: eth2
MII Status: down
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1b:21:d3:f6:0a
Aggregator ID: N/A
Slave queue ID: 0

Revision history for this message
Stéphane Graber (stgraber) wrote :

Ok, that looks good, the slaves are down because the master itself is down.
Bringing the master up and giving it an IP address should bring the slaves up and hopefully negotiate LACP with your switch.

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

This bonded interface will not have an IP. It will be added to a bridge for KVM. My expectation is that even without an IP, bond0 should be up after the system boots with the above config.

Revision history for this message
Stéphane Graber (stgraber) wrote :

I'm not sure if ifupdown guarantees that an interface marked as manual is to be brought up but that's an interesting point thing to test.

For now, can you try adding a:
 post-up ip link set dev bond0 up

And see if that indeed brings up the slaves and negociates LACP?

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

Added "post-up ip link set dev bond0 up" and that appears to work:

$ ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:1b:21:d3:f6:0b
          inet6 addr: fe80::21b:21ff:fed3:f60b/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
          RX packets:15 errors:0 dropped:9 overruns:0 frame:0
          TX packets:78 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1486 (1.4 KB) TX bytes:9430 (9.4 KB)

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 2
        Actor Key: 17
        Partner Key: 24
        Partner Mac Address: 00:04:96:18:54:d5

Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1b:21:d3:f6:0b
Aggregator ID: 2
Slave queue ID: 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1b:21:d3:f6:0a
Aggregator ID: 2
Slave queue ID: 0

Revision history for this message
Stéphane Graber (stgraber) wrote :

Yep, that looks good.
I'll make sure we won't get any weird bug by setting the link up by default and if I don't see anything exploding, I'll change that in my branch.

Thanks for the tests.

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

Tried adding the bond0 interface to a bridge but it's not in the "RUNNING" state:
  # cat /etc/network/interfaces
  ...
  auto bond0
  iface bond0 inet manual
    bond-mode 802.3ad
    bond-lacp_rate slow
    bond-miimon 100
    bond-updelay 200
    bond-slaves none
    post-up ip link set dev bond0 up
    post-down ip link set dev bond0 down

  auto eth2
  iface eth2 inet manual
    bond-master bond0

  auto eth3
  iface eth3 inet manual
    bond-master bond0

  auto br0
  iface br0 inet manual
    bridge_ports bond0
    bridge_stp off

  # ifconfig bond0
  bond0 Link encap:Ethernet HWaddr 00:1b:21:d3:f6:0b
            UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
            RX packets:231 errors:0 dropped:30 overruns:0 frame:0
            TX packets:772 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:17484 (17.4 KB) TX bytes:95288 (95.2 KB)

  # ifconfig br0
  br0 Link encap:Ethernet HWaddr 92:ee:88:d7:12:40
            UP BROADCAST MULTICAST MTU:1500 Metric:1
            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

Oddly, bond0 does not seem to be a part of br0:
  # brctl show
  bridge name bridge id STP enabled interfaces
  br0 8000.000000000000 no
  virbr0 8000.000000000000 yes

Do I need to add "up brctl addif br0 bond0" in my iface definition for br0? That would defeat the purpose of bridge_ports in the br0 iface definition.

Revision history for this message
Stéphane Graber (stgraber) wrote :

You shouldn't have to, but it may be a case where devices aren't brought up in the right order.

On your system, can you try to do:
 - ifdown -a
 - ifup lo
 - ifup eth2
 - ifup eth3
 - ifup bond0
 - ifup br0

And see if that brings everything online properly?

With that test ifenslave in my PPA, I added the logic to setup the bond when the slaves are being brought up before the master.
I don't know what the bridge scripts are doing in a similar scenario where br0 would be brought up before bond0. As bond0 in that case wouldn't even exist (not just be in a non-configured state), I guess it'd simply fail to add it, giving you that empty bring.

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

After "ifdown -a", neither bond0 nor br0 are show in "ifconfig -a". However, after "ifup eth2", bond0 and br0 are brought up:
# ifdown -a
# ifup lo
# ifup eth2
...
# ifconfig -a
...
bond0 Link encap:Ethernet HWaddr 00:1b:21:d3:f6:0a
          UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
          RX packets:18 errors:0 dropped:5 overruns:0 frame:0
          TX packets:65 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1702 (1.7 KB) TX bytes:8060 (8.0 KB)

br0 Link encap:Ethernet HWaddr 0e:e5:c5:ea:5e:bc
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

# brctl show
root@trunks:~# brctl show
bridge name bridge id STP enabled interfaces
br0 8000.000000000000 no
virbr0 8000.000000000000 yes

# ifup eth3
# ifup bond0
ifup: interface bond0 already configured
# ifup br0
ifup: interface br0 already configured

Revision history for this message
Stéphane Graber (stgraber) wrote :

A new test ifenslave is available in my ppa: https://launchpad.net/~stgraber/+archive/experimental/+packages

The package is built for Precise but I'd expect it to work just as well on Oneiric (without the need for rebuild).

Some other users are testing it now and if I don't get negative feedback, I plan on uploading it to Precise early next week, then look at SRUing it to lucid and above.

For the bridging and vlan issues, it's a separate problem that the new ifenslave won't fix.
I described my observations and some ideas here: http://paste.ubuntu.com/754234

Changed in ifupdown (Ubuntu):
assignee: nobody → Stéphane Graber (stgraber)
Revision history for this message
Stéphane Graber (stgraber) wrote :

And one more version pushed to my PPA.

This one should also fix some race conditions when using the bond interface in a bridge, with vlans or doing dhcp on it.

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

I've updated to ifenslave-2.6_1.1.0-19ubuntu2~ppa3_amd64.deb and no longer need the following in "auto bond0":
  post-up ip link set dev bond0 up

For "auto br0", I still need:
  up brctl addif br0 bond0

Revision history for this message
Stéphane Graber (stgraber) wrote :

Hmm, ok, I'm a bit surprised that bond0 still doesn't get added to the bridge.

What's supposed to happen with that new ifenslave at boot time is:
1) One of the network card gets detected and sends a udev event
2) Upstart picks up the event and brings up the interface
3) As the interface is part of a bond, bond0 is brought up first
4) Usually around that time, the second interface is detected and also triggers upstart
5) This card is put on hold waiting for the bond to be ready
6) bond0 is done initialising
7) The first network card is added as a slave
8) bond0 is now up and ready with one slave and can be bridged from this point on
9) The second network card is added as a slave
10) All of eth0, eth1 and bond0 are now up and working
11) At some point later in the boot sequence, "ifup -a" is called by the networking.conf upstart job, creating br0.

So it's not completely impossible that 11) actually happens before 8) but that'd most likely be cause by something taking a lot longer than it should when initialising the bond or the interfaces.

Did you happen to see brctl's error message?
It could be two things:
1) Complaining that bond0 doesn't exist
2) Saying Operation not permitted, meaning that bond0 exists but has no slaves and so no mac address at this point

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :
Download full text (3.9 KiB)

Looking at the boot sequence:
  Begin: Running /[ 40.126206] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
  scripts/local-bottom ... done.
  done.
  Begin: Running /scripts/init-bottom ... done.
  [about 1 minute delay]
  fsck from util-linux 2.19.1
  fsck from util-linux 2.19.1
  /dev/mapper/trunks-lv_root: clean, 356330/1954064 files, 2538052/7812096 blocks
  /dev/sdb1 has been mounted 27 times without being checked, check forced.
  /dev/sdb1: 237/62248 files (0.8% non-contiguous), 73465/248832 blocks
    * Starting configure network device security [ OK ]
    * Starting configure network device [ OK ]
  ...
  Ubuntu 11.10 trunks ttyS0

  trunks login:

When I log in:
  # brctl show
  bridge name bridge id STP enabled interfaces
  br0 8000.000000000000 no
  virbr0 8000.000000000000 yes

I don't see any noticeable errors on boot. I see a few of the following entries:
   * Starting configure network device [ OK ]

I sometimes also see:
  Waiting for network configuration...
  Waiting up to 60 more seconds for network configuration...

Looking at kern.log, I see:
  Dec 2 02:50:52 trunks kernel: [ 45.113211] device bond0 entered promiscuous mode
  Dec 2 02:50:52 trunks kernel: [ 45.113221] device bond0 left promiscuous mode
  Dec 2 02:50:52 trunks kernel: [ 45.113224] device bond0 entered promiscuous m ode
  Dec 2 02:50:52 trunks kernel: [ 45.113226] device bond0 left promiscuous mode
  Dec 2 02:50:52 trunks kernel: [ 45.115598] ADDRCONF(NETDEV_UP): br0: link is not ready
  ...
  Dec 2 02:50:52 trunks kernel: [ 45.235355] FS-Cache: Loaded
  Dec 2 02:50:52 trunks kernel: [ 45.258321] FS-Cache: Netfs 'nfs' registered for caching
  Dec 2 02:50:52 trunks kernel: [ 45.378114] bonding: bond0: Setting MII monitoring interval to 100.
  Dec 2 02:50:52 trunks kernel: [ 45.378143] bonding: bond0: Setting up delay to 200.
  Dec 2 02:50:52 trunks kernel: [ 45.379966] bonding: bond0: setting mode to 802.3ad (4).
  Dec 2 02:50:52 trunks kernel: [ 45.380734] bonding: bond0: Setting LACP rate to slow (0).
  Dec 2 02:50:52 trunks kernel: [ 45.382597] ADDRCONF(NETDEV_UP): bond0: link is not ready
  Dec 2 02:50:52 trunks kernel: [ 45.408253] bonding: bond0: Adding slave eth3.
  Dec 2 02:50:52 trunks kernel: [ 45.490835] bonding: bond0: enslaving eth3 as a backup interface with a down link.
  Dec 2 02:50:52 trunks kernel: [ 45.491929] bonding: bond0: Adding slave eth2.
  Dec 2 02:50:52 trunks kernel: [ 45.493123] init: udev-fallback-graphics main process (1634) terminated with status 1
  Dec 2 02:50:52 trunks kernel: [ 45.574310] bonding: bond0: enslaving eth2 as a backup interface with a down link.
  Dec 2 02:50:54 trunks kernel: [ 47.293867] igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
  Dec 2 02:50:54 trunks kernel: [ 47.294899] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
  Dec 2 02:50:55 trunks kernel: [ 47.944385] igb: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
  Dec 2 02:50:55 ...

Read more...

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :
Revision history for this message
Stéphane Graber (stgraber) wrote :

Wow, that's quite a lot of things happening on that system :)
So indeed looking at the number of CPUs, network cards and disks showing up, it's enough to flood udev and upstart and likely make things start a bit slower than usual and so out of order.

Essentially, the fallback networking script starts before the network cards actually got setup and announced by udev.
That's the one case where you indeed end up trying to add something to the bond just before the bond actually gets created (by not even a second apparently).

Just for testing's sake can you add:
pre-up sleep 2

To your bridge to confirm that it's indeed a race condition happening there?

What it shows at least is that we definitely can't rely on the fallback networking job as running after all the kernel events have been processed. I guess the easiest way out of that problem will be to add the same hack to bridge-utils that I added to ifenslave, essentially waiting for up to a minute for the slaves/members to appear before giving up and continuing without them.

In your case, that'd wait for around 200ms, then find bond0, move it into the bridge and continue.

At least it looks like the proposed ifenslave isn't at fault, it's just an extra change that'll need to happen to bridge-utils.

Thanks for the tests, good to have someone with that kind of hardware around for testing :)

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

The "pre-up sleep 2" didn't work. But, "pre-up sleep 5" did.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Hi Albert, I just uploaded one more ifenslave to the PPA.
This one should fix the bridging issues by calling the bridging udev hook once the bond is ready.

Please note that I haven't tested this particular change at all as I don't have my test hardware around at the moment.

Let me know if that works for you.

Revision history for this message
Albert Chin (bugs-ubuntu-vendor) wrote :

Just finished testign the new ifenslave. Rebooted three times and bond0, br0, and bond1 came up successfully every time!

Revision history for this message
Stéphane Graber (stgraber) wrote :

Great to hear!

I'll be pushing that into Precise then and then look at the SRU, some of the changes I had to do change the behaviour of bonding quite a lot, so it may take a while to discuss and prepare the SRU for lucid and above.

affects: ifupdown (Ubuntu) → ifenslave-2.6 (Ubuntu)
Changed in ifenslave-2.6 (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Stéphane Graber (stgraber) wrote :

Now that everything seems to be working in Precise, I'll start uploading the SRUs for Oneiric.
The affected packages are vlan, bridge-utils and ifenslave-2.6.

I added a task for each of them and will be uploading packages to oneiric-proposed shorly.

Changed in bridge-utils (Ubuntu Oneiric):
status: New → In Progress
Changed in ifenslave-2.6 (Ubuntu Oneiric):
status: New → In Progress
Changed in vlan (Ubuntu Oneiric):
status: New → In Progress
assignee: nobody → Stéphane Graber (stgraber)
Changed in bridge-utils (Ubuntu Oneiric):
assignee: nobody → Stéphane Graber (stgraber)
Changed in ifenslave-2.6 (Ubuntu):
assignee: Stéphane Graber (stgraber) → nobody
Changed in ifenslave-2.6 (Ubuntu Oneiric):
assignee: nobody → Stéphane Graber (stgraber)
Changed in vlan (Ubuntu):
status: New → Fix Released
Changed in bridge-utils (Ubuntu):
status: New → Fix Released
Changed in vlan (Ubuntu Oneiric):
status: In Progress → Fix Committed
Changed in bridge-utils (Ubuntu Oneiric):
status: In Progress → Fix Committed
Revision history for this message
Stéphane Graber (stgraber) wrote :

Just a quick note for the SRU team :)

The ifenslave-2.6 upload makes a pretty big diff, including a documentation update.
I could have stripped it down to the absolutely necessary changes but that'd still have been all of debian/pre-up which is by far the biggest change in the package. As the other changes are basically /var/run => /run transition and a documentation update (for the changes done to debian/pre-up), I assumed we'd prefer to have a package that's identical between Oneiric and Precise rather than stripping these out (and confusing our users in the process due to inaccurate documentation).

Changed in ifenslave-2.6 (Ubuntu Oneiric):
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Albert, or anyone else affected,

Accepted bridge-utils into oneiric-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Revision history for this message
Stéphane Graber (stgraber) wrote :

So there are two things to test with that new bridge-utils:
 1) Bridge interface with bridge-ports set instead of bridge_ports works too
 2) Bridging a non-existing vlan interface will now create it

These two are in the udev hooks, so need to be tested by creating a network interface, like a tap device (using uml-utilities to create it).

Test for 1)
 - Make sure uml-utilities and bridge-utils are both installed
 - Add the following entry to /etc/network/interfaces:
auto br0
iface br0 inet static
    address 192.168.1.1
    netmask 255.255.255.0
    bridge-ports eth9
 - Create the tap device: tunctl -t eth9
 - Check that the bridge has been created and the interface added to it (bridge shouldn't have an IP configuration at this point):
root@castiana:~# brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.1a6bffdb2551 no eth9

The previous release wouldn't do anything unless you were using bridge_ports.

Test for 2)
 - Make sure uml-utilities, bridge-utils and vlan are all intalled
 - Add the following entry to /etc/network/interfaces:
auto br0
iface br0 inet static
    address 192.168.1.1
    netmask 255.255.255.0
    bridge-ports eth9.1010
 - Create the tap device: tunctl -t eth9
 - Check that the bridge has been created and the interface added to it (bridge shouldn't have an IP configuration at this point):
root@castiana:~# brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.06c2192d61ab no eth9.1010

The previous release would create the bridge but not add the port to it as the tag interface wouldn't exist.

Between each test, cleanup with:
 - tunctl -d eth9
 - ifconfig br0 down
 - brctl delbr br0

The use of eth9 instead of tap0 is done on purpose as the vlan script explicitly checks for interfaces with eth, bond or wlan in their name.

Revision history for this message
Chris Halse Rogers (raof) wrote :

Hello Albert, or anyone else affected,

Accepted vlan into oneiric-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Chris Halse Rogers (raof) wrote :

@Stéphane: I'm not sure that /var/run → /run is a good idea as an SRU, and it doesn't look like it would be terribly difficult to revert that change. What documentation points to /run rather than /var/run?

Revision history for this message
Chris Halse Rogers (raof) wrote :

Hello Albert, or anyone else affected,

Accepted ifenslave-2.6 into oneiric-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Stéphane Graber (stgraber) wrote :

03:52 < stgraber> RAOF: hey there
03:53 < stgraber> RAOF: thanks for accepting the vlan package
03:54 < stgraber> RAOF: as for the extra delta in the ifenslave-2.6 SRU (one line change for /run + documentation), the idea was that reverting these just for the sake of having
                  a smaller delta isn't probably worth it as indeed the /run change isn't needed (but won't make any change as /run is a symlink /var/run in Oneiric already) and
                  the documentation would just be inaccurate if reverted
03:55 < stgraber> RAOF: keeping these two changes gives us the advantage of having an identical package in Oneiric and Precise, making diffing the two quite a bit easier (if we
                  start getting more changes in Precise) and so making debugging easier
03:55 < RAOF> Fair enough.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Here's a quick example of how to test the new vlan package:

 - Make sure uml-utilities and vlan are installed
 - Add the following entry to /etc/network/interfaces:
auto eth9.1010
iface eth9.1010 inet static
    address 192.168.1.1
    netmask 255.255.255.0
 - Create the tap device: tunctl -t eth9
 - Check that the vlan interface has been created and configured correctly: ifconfig eth9.1010
eth9.1010 Link encap:Ethernet HWaddr ce:51:62:98:16:78
          inet addr:192.168.1.1 Bcast:0.0.0.0 Mask:255.255.255.0
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

Prior to this update, vlan interface creation would be racy as it'd depend on the catch all networking.conf job to initialise eth9.1010 with the race being that this job would be triggered before eth9 actually exists.

description: updated
Revision history for this message
Tom Ellis (tellis) wrote :

The updated vlan, bridge-utils and ifenslave packages from oneiric-proposed have fixed the issues I had with 802.3ad. Thanks!

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package bridge-utils - 1.5-2ubuntu1.1

---------------
bridge-utils (1.5-2ubuntu1.1) oneiric-proposed; urgency=low

  * debian/bridge-network-interface.sh: If the interface doesn't exist,
    then call the vlan hook and check if the interface appears then.
    (LP: #889423)
  * Update debian/bridge-network-interface.sh to also work with
    bridge-ports (instead of just bridge_ports).
 -- Stephane Graber <email address hidden> Fri, 20 Jan 2012 16:08:34 -0500

Changed in bridge-utils (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vlan - 1.9-3ubuntu3.1

---------------
vlan (1.9-3ubuntu3.1) oneiric-proposed; urgency=low

  * Add a udev trigger similar to bridge-utils' so vlan interfaces are
    created when the parent appears (this will then trigger upstart and
    ifupdown to configure the newly created vlan interface) (LP: #889423)
 -- Stephane Graber <email address hidden> Fri, 20 Jan 2012 16:08:51 -0500

Changed in vlan (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifenslave-2.6 - 1.1.0-19ubuntu1.1

---------------
ifenslave-2.6 (1.1.0-19ubuntu1.1) oneiric-proposed; urgency=low

  * Update ifenslave scripts to work with event based boot (LP: #889423):
    - Create the master interface whenever a slave comes online
    - Make sure we wait for the master to be completely ready before doing
      any work on the slaves
    - Call post-enslaving code every time a slave is added to the master
    - Properly destroy the master when it goes down
    - Always bring the bond interface up once initialized
    - Wait up to a minute for a slave to join the master
    - If the bond is tagged, run the udev vlan hook to create
      the vlan interfaces once it's ready (has a MAC address)
    - If the bond is part of a bridge, run the udev bridge hook to join
      it to the bridge once it's ready (has a MAC address)
    - Update examples, moving bond0 at the end of /etc/network/interfaces
      so that running "ifup -a" won't wait for a minute.
  * Change path to /run
 -- Stephane Graber <email address hidden> Fri, 20 Jan 2012 17:00:07 -0500

Changed in ifenslave-2.6 (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Revision history for this message
Jean-Daniel Bussy (silversurfer972) wrote :

On a fully updated Ubuntu 12.04
I have dropped packets on the bonded interface:

bond0 Link encap:Ethernet HWaddr 00:10:18:e0:5e:a4
          inet6 addr: fe80::210:18ff:fee0:5ea4/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
          RX packets:315 errors:0 dropped:185 overruns:0 frame:0
          TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:31834 (31.8 KB) TX bytes:1772 (1.7 KB)

My network settings are:

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

# Network: instances
# Bonding
auto eth2
iface eth2 inet manual
    bond-master bond0
auto eth4
iface eth4 inet manual
    bond-master bond0

auto bond0
iface bond0 inet manual
    bond-mode 802.3ad
    bond-miimon 100
    bond-lacp-rate 1
    bond-slaves none

My ifenslave package version :
Package: ifenslave-2.6
State: installed
Automatically installed: no
Version: 1.1.0-19ubuntu5

Did I miss something in my configuration?

Revision history for this message
arjarj (arjarj) wrote :

Hello,

After an upgrade from 10.04 to 12.04, our interfaces files stopped working for the 802.3ad interfaces. After converting the interfaces file to the new "format", the configuration seems to be correct, but I seem to be running into a timing issue or race condition again. I implemented most hints on this page, and the ones on https://bugs.launchpad.net/ubuntu/+source/ifenslave-2.6/+bug/1015199 , but if the system starts up, the bond interfaces don't get their slaves (I've seen them come up with 0, 1 or 2 slave interfaces).
I'm trying to create bond1 and bond0, both with 2 physical interfaces. As suggested, the phyiscal interfaces are at the top, the bonds itself are defined at the bottom of the file.

It's a Dell PowerEdge R610, configured with 12 network interfaces, 4x broadcom bnx2 onboard, 2x4x igb Intel on quad nic boards. The used physical interfaces (eth0 through eth3) are mapped to the first 2 interfaces on both of the quad boards. I've tried playing with the pre-up sleep option as mentioned on comment 33 (https://bugs.launchpad.net/ubuntu/+source/ifenslave-2.6/+bug/889423/comments/33), but to no avail. Could it be another timing issue, maybe due to the large amount of physical interfaces? If I run an "ifdown -a; ifup -a" from rc.local, all the interfaces come up just fine. The machine isn't in production yet, so I might be able to test or provide more logging if necessary.

auto lo
iface lo inet loopback
auto eth0
iface eth0 inet manual
    bond-master bond1
auto eth2
iface eth2 inet manual
    bond-master bond1
auto eth1
iface eth1 inet manual
    bond-master bond0
auto eth3
iface eth3 inet manual
    bond-master bond0

auto bond0
iface bond0 inet static
    address REDACTED
    netmask 255.255.255.224
    bond_mode 802.3ad # LACP
    bond_miimon 100
    bond_lacp_rate 1
    #xmit_hash_policy layer2+3
    bond_slaves none
iface bond0 inet6 static
    address REDACTED
    netmask 64
    #gateway REDACTED

auto bond1
iface bond1 inet static
    address REDACTED
    netmask 255.255.255.248
    gateway REDACTED
    bond_mode 802.3ad # LACP
    bond_miimon 100
    bond_lacp_rate 1
    #xmit_hash_policy layer2+3
    bond_slaves none

iface bond1 inet6 static
    address REDACTED
    netmask 64
    gateway REDACTED

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.