MAC address of bonding interface is randomly picked

Bug #1288196 reported by Tore Anderson
130
This bug affects 28 people
Affects Status Importance Assigned to Milestone
ifenslave (Ubuntu)
Triaged
High
Unassigned

Bug Description

The new style of bonding configuration (using "iface bond0 [...] \ bond_slaves none" for the master interface plus "iface ethX inet manual \ bond_master bond0" for each slave interface) results in the MAC address of the bond0 interface being randomly picked from one of the slaves.

This causes problems for auto-configuration methods such as IPv4 DHCP and IPv6 SLAAC, as DHCPv4 leases and IPv6 Interface Identifiers are directly based on the interface's MAC address. This means that if the MAC address changes unexpectedly, so may the IP address(es) as well, which might be big problem if the machine in question is some sort of server or similar that have just rebooted.

Unexpected MAC address changes may also cause problems for statically configured addresses, as the upstream router will likely have cached the IPv4 ARP and/or the IPv6 Neighbour entry pointing to the old MAC address. This results in the server not having any network connectivity until the upstream router have timed out its cache entry and probes anew. These timeouts may well be up to 20 minutes.

The old configuration style ("iface bond0 [...] \ bond_slaves eth0 eth1 [...]") did not have this problem, as the MAC address used for bond0 would always be the first listed interface (eth0). While I have no particular objection to the syntax change in itself, the choice of MAC address should be deterministic. It is probably possible to manually set the MAC address with the "hwaddr" option, but this is not ideal because it by necessity means every node must have a unique configuration file, which is problematic for large automated server deployments for example.

Tore

Revision history for this message
Stéphane Graber (stgraber) wrote :

Yeah, this is one of the unfortunate side effects of having event based network bring up in Ubuntu. As devices are added to the bond in the order in which they're initialized by the kernel and as that initialization happens in parallel, the final order tends to be pretty random...

Setting hwaddr is the obvious workaround, however as you said, that's problematic on deployments where you'd like to have an identical interfaces file on all machines...

I'm really not sure of what to do about this... I've been thinking about a few possibilities but they each come with serious problems:
 - Add a new field which lets you specify what interface to pick the mac address from. The problem is that this won't work until the interface actually exists. If we only switch the MAC once the interface exists, then that won't solve your IPv6 case as the link-local and eui64 addresses don't update on mac address change.
 - Reuse the bond-master field and have all bond actions held until the bond-master appears (with a 5min timeout), thereby guaranteeing it'll always be joined first. The obvious problem there is that if the bond-master doesn't appear, your boot will hang for 5 minutes and you'll still get the wrong MAC (and therefore the wrong IP).

I think the bond-master idea is the least wrong of the two but I'd need to get back into the code quite a bit to figure out how that'd work exactly and decide if we should reuse the name or find a new one for this.

Changed in ifenslave-2.6 (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Stéphane Graber (stgraber)
Revision history for this message
Tore Anderson (toreanderson) wrote :

For what it's worth, we never had any problem with the old style "bond_master eth0 eth1" syntax. On a server, typically all the slaves will become available pretty much at the same time during the boot process - devices hot-plugged at a later time is generally not the use case you'd need to optimise for. So waiting until the primary slave appears before setting up the bonding interface seems to me to be a perfectly adequate way to handle this .

Tore

Revision history for this message
Tore Anderson (toreanderson) wrote :

Sorry, that should be "bond_slaves eth0 eth1" of course.

Revision history for this message
Stéphane Graber (stgraber) wrote :

While that's indeed correct for most standard servers, it's unfortunately not the case for some massive blade setups where the hardware can literally take minutes to show up (due to rather slow enumeration by the kernel caused by the massive amount of entries), also some recent hardware now has flexible network configurations where new interfaces may appear on demand.

bond_slaves really isn't appropriate with our way of setting up networking, however supporting a "bond_master" field on the bond interface itself may be reasonable (as I described earlier on).

Revision history for this message
Tore Anderson (toreanderson) wrote :

Don't get me wrong, I meant to indicate that this seems completely fine by me; my point was simply that I was happy with waiting until the primary slave is available before with the old style of configuration, therefore I will be happy with waiting in a similar manner in the future too (even though the semantics change according to what you described).

Tore

affects: ifenslave-2.6 (Ubuntu) → ifenslave (Ubuntu)
Changed in ifenslave (Ubuntu):
importance: Medium → High
Revision history for this message
Alex Gottschalk (alex-gottschalk) wrote :

For what it's worth, adding a "pre-up sleep 5" to the secondary interfaces is a pretty decent workaround for smaller systems.

Revision history for this message
Wido den Hollander (wido) wrote :

I'm seeing the same behavior on my 12.04 and 14.04 systems.

I wanted to deploy a fairly large (50 servers) IPv6 only setup using SLAAC but I had to revert to static IPv6 configuration due to this issue with Ubuntu.

Configuring static mac addresses isn't something I want either, so for now I'm sticking with a static IPv6 configuration, but that's not something I want to keep.

Would like to see this resolved

Revision history for this message
lesar (leonardo-saracini) wrote :

I like to have a mail and backup server only to serve 2 pc at my home/office.
on the server I have put 2 nic + MB nic
when I do a backup on the same time on the two pc all work very fast.

To make it simple I use a desktop edition Ubuntu 64 bit 14.04

and I have set up ifenslave to work by NetworkManager
and I to have this bug:

I like to get IP address in my LAN by router dhcp
but setting the dhcp to bind assigned IP on MEC

I stop the server every nignt and wake up at morning.
often the MEC address change so my server come by bad IP.

the solution is to reboot until the IP is right but this is not a very good solution.

I like if bound0 can be assigned by user configuration to bind bond0 to specific MEC address card

best regards

Revision history for this message
csmcd5 (geek-n) wrote :

This problem is pretty insidious. I couldn't ping my servers after reboot 50% of the time. It took most of a day to trace it to the bond MAC address flopping around between slave MAC addresses.

The ARP entries in upstream switches don't get updated; the switches keep delivering traffic to the old MAC address. Traffic comes into the server, visible on tcpdump, but is dropped for having the wrong MAC address.

Thanks to Alex Gottschalk, whose solution above worked around this problem in my case.

I think it would be worth having some extra code, even if it's somewhat messy or limited, that attempts to retrieve the MAC address of the primary slave (first in interfaces?), and use it whether or not that slave ever comes up. Or as long as the slaves come first in the interfaces file, things should be far enough along to get the MAC.

Revision history for this message
Andy Foster (andy-foster) wrote :

For me, the pre-up sleep 5 solution was working just fine on 14.10.

I have now upgraded to 15.04 and am seeing this issue again. Upgrading on 16 nodes, I have seen about a 50/50 random MAC address choice. As my network uses MAC addresses to assign IPs through DHCP, it's a real problem!

Changed in ifenslave (Ubuntu):
assignee: Stéphane Graber (stgraber) → nobody
Revision history for this message
Vishal (vishal-ktpl) wrote :

I'm facing problem of interface selection during data Rx. I have configured two bonded interfaces bond0 with IP 144 and bond1 with IP 122 when I do scp of a file to theis system using either of the IPs it always selects one bonded interface(say bond0). And when I reboot the system it sometimes changes the selection of interface to the other bonded interface(say bond1). But I want the IPs to always refer to the respective bonded interfaces. Is it possible?? if yes, the How to do that??

Revision history for this message
Dan Keys (dan31415) wrote :

Oops, I am a brand new user to this Ubuntu bug tracking and accidentally changed the status to "Fixed" from the state of "Triaged". I will message Stephane to get someone to fix it since I cannot seem to change it back. So sorry.

Changed in ifenslave (Ubuntu):
status: Triaged → Fix Released
Colin Watson (cjwatson)
Changed in ifenslave (Ubuntu):
status: Fix Released → Triaged
Changed in ifenslave (Ubuntu):
status: Triaged → Fix Released
Colin Watson (cjwatson)
Changed in ifenslave (Ubuntu):
status: Fix Released → Triaged
Revision history for this message
Wido den Hollander (wido) wrote :

This bug is still affecting me and others that I know.

When using IPv6 with SLAAC you randomly get a new address since you can't be sure which interface is choosen on boot.

A solution to this problem would be very welcome!

Revision history for this message
Gregory Orange (gregoryo2017) wrote :

We too are working around this bug with a sleep. Agreed, a solution would be welcome.

Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

Stephane,

Any ideas on this, and how to push forward with a
permanent solution?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.