bonded interface doesn't reliably come up after reboot

Bug #1078387 reported by Mark Fox on 2012-11-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ifenslave-2.6 (Ubuntu)
High
Unassigned

Bug Description

Ubuntu 12.04 LTS (release 12.04 according to lsb_release -rd). Machine has four NICs.

Set up a four-interface bond using the method outlined at http://www.stgraber.org/2012/01/04/networking-in-ubuntu-12-04-lts/. Restarting networking shows everything working as one would expect. Testing shows performance and behaviour to be what you would expect of a bonded interface.

After a reboot things get interesting: Sometimes none of the slave interfaces come up and the bond therefore fails. Sometimes one or two of the interfaces come up, along with the bond. Leaving the machine for 15 minutes or so does not make for any changes. The interfaces that didn't come up at boot remain down. Unplugging, then replugging, cables makes no difference either. Restarting networking gets things back in their proper state.

Mark Fox (mark-fox-ecacs16) wrote :
Mark Fox (mark-fox-ecacs16) wrote :
Mark Fox (mark-fox-ecacs16) wrote :
Mark Fox (mark-fox-ecacs16) wrote :

Note that I tried setting this up under Debian 6.0.6 on an identical machine and one the same switch. It brings the interfaces up after a reboot, but doesn't pass any traffic. Arg!

Mark Fox (mark-fox-ecacs16) wrote :

Only other details I can think of is that might affect this is that I'm running LVM on top of a hardware RAID array, and have separate partitions/volumes for /, /usr, /var, /tmp, and home. We've been experiencing these problems on machines that were running LVM on top of Linux's md RAID as well. The partitioning scheme would be close to identical on all machines.

Mark Fox (mark-fox-ecacs16) wrote :

On the off chance that it is a problem with the switch, set up Arch on an identical machine. That helped eliminate some configuration issues with the switch. Bonding works fine on the Arch machine even after a reboot. Swapping cables to use the same ports as the Ubuntu 12.04 machine, the Arch box works fine there too.

Mark Fox (mark-fox-ecacs16) wrote :

After discovering an issue with the switch's configuration under Arch, I decided to try Debian again. Under Debian, I have no issues. The bonded link come up fine after a reboot and behaves as one would expect. This is on an identical system to the Ubuntu Server box.

This seems related to Launchpad Bug #881379. Actually, they seem like the same bug. And from the comments, it doesn't seem like folk feel it has been fixed.

Mark Fox (mark-fox-ecacs16) wrote :

Just did another Ubuntu 12.04 LTS install from scratch on one of the problem machines. Installed emacs and ifenslave then set up the 4X bond. Rebooted and all four interface came up. Rebooted again and they all came up again. Switched to four different ports that should be identically configured on the switch and they came up fine both initially and after a reboot.

I'm not sure what I've done differently, but I'll be carefully comparing my configuration to what I've uploaded here and trying to duplicate my success on an identical machine.

Mark Fox (mark-fox-ecacs16) wrote :

Compared the relevant configuration files between the server with the issue and the one that works. Except for IP address, the files are identical.

The only two differences between the machines is that the one that works has the default layout for partitions as set up by the Ubuntu installer (using LVM). The problem machine has additional LVM volumes for /var, /tmp, /home, and /usr. My guess is that /usr is the issue. I've broken out everything but /usr on the machine I need for production and it works fine. I'll set up the other machine similarly and try putting /usr on a separate partition to see if that causes anything funny.

The other difference is that I set some of the options for bonding (mode=802.3ad miimon=100 xmit_hash_policy=layer3+4) in /etc/modules. I've played with adding and removing those options and it doesn't seem to make a difference. Bonding comes up reliably either way.

Stéphane Graber (stgraber) wrote :

Did you have any chance to try with another machine where /usr is split out?

Changed in ifenslave-2.6 (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Mark Fox (mark-fox-ecacs16) wrote :

I haven't. However, I have set up the two machines that were giving me problems with /usr on the root partition. I've had no issues with bonding since doing that.

I may have a chance to try a two-port bond with /usr in a separate LVM partition in the near future. I'll update here when/if that happens.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers