Ubuntu
ifenslave-2.6 package

Bonded network device is not correctly detected during boot-up.

Bug #1056792 reported by annunaki2k2 on 2012-09-26

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	ifenslave-2.6 (Ubuntu)	Invalid	Undecided	Unassigned

Bug Description

We have an x86_64 Intel server running 12.04.1, and it is connected using two on board 1G network in an LACP bond. The configuration works fine, but for some very annoying reason, when the machine boots, the start-up scripts hang for two minutes waiting for the connection to come up - yet the connection is actually already up (and pingable remotely).

Here is my interfaces configuration file:
russell@pm1 ~ $ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# Slave Definition for bond0
auto eth0
iface eth0 inet manual
bond-master bond0

auto eth1
iface eth1 inet manual
bond-master bond0

# The primary network interface
auto bond0
iface bond0 inet static
address 10.0.1.151
netmask 255.255.254.0
broadcast 10.0.1.255
network 10.0.0.0
gateway 10.0.0.1
dns-nameservers 10.0.0.120 10.0.1.120
dns-search mps.lan wilts.mps.lan
dns-domain mps.lan
bond-mode 802.3ad
bond-miimon 100
bond-lacp_rate 1
bond-slaves none
# bond-use_carrier 1
post-up /usr/local/sbin/check-bond.sh $IFACE
pre-down /usr/local/sbin/check-bond.sh stop $IFACE

And (once the machine times out and continues it's boot), here is the resultant configuration:
russell@pm1 ~ $ ifconfig
bond0 Link encap:Ethernet HWaddr 00:1e:67:44:58:88
          inet addr:10.0.1.151 Bcast:10.0.1.255 Mask:255.255.254.0
          inet6 addr: fe80::21e:67ff:fe44:5888/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
          RX packets:2644 errors:0 dropped:827 overruns:0 frame:0
          TX packets:1575 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:282832 (282.8 KB) TX bytes:261199 (261.1 KB)

eth0 Link encap:Ethernet HWaddr 00:1e:67:44:58:88
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:803 errors:0 dropped:803 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:70241 (70.2 KB) TX bytes:992 (992.0 B)
          Memory:d0b20000-d0b40000

eth1 Link encap:Ethernet HWaddr 00:1e:67:44:58:88
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:1841 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1567 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:212591 (212.5 KB) TX bytes:260207 (260.2 KB)
          Memory:d0b00000-d0b20000

russell@pm1 ~ $ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1
Actor Key: 17
Partner Key: 1
Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1e:67:44:58:88
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1e:67:44:58:87
Aggregator ID: 2
Slave queue ID: 0

As you can see, it has actually booted with the correct configuration - it just decided to waste two minutes because it failed to detect correctly that the network is actually configured and ready.

Here are the relevant lines from the syslog relating to the bonding interface:
russell@pm1 ~ $ sudo cat /var/log/syslog | grep -i bond | grep kernel | grep "Sep 26 12:06"
Sep 26 12:06:38 pm1 kernel: [ 6.069287] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Sep 26 12:06:38 pm1 kernel: [ 6.077144] bonding: bond0: Setting MII monitoring interval to 100.
Sep 26 12:06:38 pm1 kernel: [ 6.084404] bonding: bond0: setting mode to 802.3ad (4).
Sep 26 12:06:38 pm1 kernel: [ 6.086176] bonding: bond0: Setting LACP rate to fast (1).
Sep 26 12:06:38 pm1 kernel: [ 6.088046] ADDRCONF(NETDEV_UP): bond0: link is not ready
Sep 26 12:06:38 pm1 kernel: [ 6.213700] bonding: bond0: Adding slave eth1.
Sep 26 12:06:38 pm1 kernel: [ 6.296412] bonding: bond0: enslaving eth1 as a backup interface with a down link.
Sep 26 12:06:38 pm1 kernel: [ 7.083578] bonding: bond0: link status definitely up for interface eth1, 1000 Mbps full duplex.
Sep 26 12:06:38 pm1 kernel: [ 7.084460] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
Sep 26 12:06:38 pm1 kernel: [ 7.270717] bonding: bond0: Adding slave eth0.
Sep 26 12:06:38 pm1 kernel: [ 7.354304] bonding: bond0: enslaving eth0 as a backup interface with an up link.
Sep 26 12:06:38 pm1 kernel: [ 7.594951] bonding: bond0: Setting MII monitoring interval to 100.
Sep 26 12:06:38 pm1 kernel: [ 7.595780] bonding: unable to update mode of bond0 because interface is up.
Sep 26 12:06:38 pm1 kernel: [ 7.596696] bonding: bond0: Unable to update LACP rate because interface is up.
Sep 26 12:06:46 pm1 kernel: [ 17.418840] bond0: no IPv6 routers present

It appears that the ifenslave script is trying to modify the bond network device after it is brought up - though it has already brought it up in the correct way before hand - perhaps this is the reason for the failed detection? The relevant lines are:
Sep 26 12:06:38 pm1 kernel: [ 7.595780] bonding: unable to update mode of bond0 because interface is up.
Sep 26 12:06:38 pm1 kernel: [ 7.596696] bonding: bond0: Unable to update LACP rate because interface is up.

And in fact, you see these lines on boot-up just before the big wait happens (please see attached screen shot taken using the Remote Management Module at boot time).