Bug #482419 “802.3ad interface bonding fails if started too earl...” : Bugs : ifenslave-2.6 package : Ubuntu

Bryan McLellan (btm) on 2009-11-13

description:	updated
description:	updated
affects:	ubuntu → linux (Ubuntu)

Revision history for this message

Richard Huddleston (rhuddusa) wrote on 2009-11-14:

#1

I'm also seeing this with e1000e driver on an intel s5000psl motherboard

iface bond0 inet dhcp
        slaves eth0 eth1
        bond-mode 4
        bond-miimon 10
        bond-lacp-rate fast
        bond-xmit_hash_policy layer2+3

after boot up
/proc/net/bonding/bond0 has
MII Status: down

i can bring the bond up if i do
sudo ifdown bond0; sudo ifup bond0

i tried playing with the bonding updelay, just for kicks, but no change

04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
        Subsystem: Intel Corporation Device 3476
        Flags: bus master, fast devsel, latency 0, IRQ 57
        Memory at b8820000 (32-bit, non-prefetchable) [size=128K]
        Memory at b8400000 (32-bit, non-prefetchable) [size=4M]
        I/O ports at 3020 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
        Capabilities: [e0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting <?>
        Kernel driver in use: e1000e
        Kernel modules: e1000e

04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
        Subsystem: Intel Corporation Device 3476
        Flags: bus master, fast devsel, latency 0, IRQ 58
        Memory at b8800000 (32-bit, non-prefetchable) [size=128K]
        Memory at b8000000 (32-bit, non-prefetchable) [size=4M]
        I/O ports at 3000 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
        Capabilities: [e0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting <?>
        Kernel driver in use: e1000e
        Kernel modules: e1000e

Revision history for this message

Bryan McLellan (btm) wrote on 2009-11-20:

#2

Marking medium. While there is a function workaround, it requires access to the console on startup. Since this configuration is most often see in a server environment, this is makes the workaround less practical and thus more severe.

Changed in linux (Ubuntu):
importance:	Undecided → Medium
status:	New → Confirmed

Revision history for this message

Tessa (unit3) wrote on 2009-11-26:

#3

Seeing this on a Supermicro server system, motherboard X8STi, with dual e1000e NICs.

Andy Whitcroft (apw) on 2009-11-30

tags:

added: kernel-series-unknown

Revision history for this message

Chad Netzer (chad-netzer) wrote on 2009-12-01:

#4

I'm also seeing this running On a Penguin Relion 2612, using the forcedeth gigabit ethernet driver, bonding eth0 and eth1 with bond-mode 4. Also, I'm running a vlan setup on top of the bonded driver.

I can generally get the setup to work by hand, although seemingly not with a mere "/etc/init.d/networking restart" or "ifdown bond0; ifup bond0". On initial startup, it seems that only one of the two slaves, either eth0 or eth1 is listed with "ifconfig", but not both. If I remove all devices with "ifconfig eth0 down; ifconfig bond0 down; ifconfig lo down", etc. and then "ifdown bond0", and do a network restart, it will end up working after some fiddling. In which case the bond0 interface, both eth0 and eth1, the vlan, and lo, are all listed by ifconfig.

This is on a fresh karmic install from alternate install CD, btw. I just reinstalled from a system running Hardy, where this worked fine, although I had to update the /etc/network/interfaces configs (I was using post-up/down ifenslave passages before)

Revision history for this message

Tessa (unit3) wrote on 2009-12-04:

#5

Just switched my karmic config to "mode 1", and I'm still seeing this issue, so it isn't limited to strict 802.3ad mode (4).

Revision history for this message

Bryan McLellan (btm) wrote on 2009-12-04: Re: [Bug 482419] Re: 802.3ad interface bonding fails if started too early

#6

On Fri, Dec 4, 2009 at 11:28 AM, Graeme Humphries <email address hidden> wrote:
> Just switched my karmic config to "mode 1", and I'm still seeing this
> issue, so it isn't limited to strict 802.3ad mode (4).

Are you sure the system is using bonding mode 1? What does 'cat
/sys/class/net/bond0/link_mode' display?

The "Warning: Found an uninitialized port" error message is in
bond_3ad.c in the kernel source, which would lead me to to believe
that only the code supporting 802.3ad is separated into this source
file and thus if you are getting this error the system is using
802.3ad.

Revision history for this message

Tessa (unit3) wrote on 2009-12-04:

#7

Oh, you're right, this might be a different issue. /sys/class/net/bond0/link_mode shows "0", even though the device in /etc/network/interfaces is configured with "bond-mode 1". And I'm not seeing the "unintialized port" errors anymore, but it still fails to come up when initially started, I have to give it a "service networking restart" before it starts.

Revision history for this message

Bryan McLellan (btm) wrote on 2009-12-04:

#8

On Fri, Dec 4, 2009 at 2:49 PM, Graeme Humphries <email address hidden> wrote:
> Oh, you're right, this might be a different issue.
> /sys/class/net/bond0/link_mode shows "0", even though the device in
> /etc/network/interfaces is configured with "bond-mode 1". And I'm not
> seeing the "unintialized port" errors anymore, but it still fails to
> come up when initially started, I have to give it a "service networking
> restart" before it starts.

The lack of the error message aside, out of curiosity, what about
'/sys/class/net/bond0/bonding/mode' ? I think I was wrong in pointing
to the earlier file.

Revision history for this message

Tessa (unit3) wrote on 2009-12-04:

#9

Ahhh yes, that shows "active-backup 1", which is what I'd expect for configuring with mode 1.

^_Pepe_^ (jose-angel-fernandez-freire) on 2009-12-14

tags:

added: karmic
removed: kernel-series-unknown

Revision history for this message

cfreak (cfreak) wrote on 2010-01-07:

#10

i'm seeing this on karmic with 2.6.31-16-server with Broadcom Corporation NetXtreme BCM5721 cards. Is a blocker for me, i can't restart networking on each server after booting..

Revision history for this message

Scott Call (scott-call-viz) wrote on 2010-01-26:

#11

Is this possible same/related as Debian bug 588097 related to ifenslave-2.6?
http://groups.google.com/group/linux.debian.bugs.dist/browse_thread/thread/9f05eea3b27339a6

Revision history for this message

cfreak (cfreak) wrote on 2010-01-26:

#12

that debug output looks the same, so i'd say yes.

2010/1/26 Scott Call <email address hidden>:
> Is this possible same/related as Debian bug 588097 related to ifenslave-2.6?
> http://groups.google.com/group/linux.debian.bugs.dist/browse_thread/thread/9f05eea3b27339a6
>
> --
> 802.3ad interface bonding fails if started too early
> https://bugs.launchpad.net/bugs/482419
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> 802.3ad bonding configurations that formerly worked on jaunty are now failing on startup under karmic. After the system has started, restarting networking will bring the bond up correctly. This only applies to bond_mode 4 / 802.3ad, I've tested that switching to bond_mode 0 corrects the issue, and other users experiencing this bug all were using bond_mode 4 as well.
>
> dmesg output fills with "bonding: bond0: Warning: Found an uninitialized port", even after the system starts up and the port should be "initialized"
>
> It appears to occur on multiple drivers (bnx2, e1000 confirmed).
>
> One initially wants to blame the startup ordering due to the switch to upstart, but I believe it is an edge case that hasn't been seen before because we haven't been starting up so quickly that the hardware hasn't had time to fully initialized.
>
> Configuration and output from multiple users is in this thread:
> http://ubuntuforums.org/showthread.php?p=8311572
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/482419/+subscribe
>

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2010-01-28:

#13

Direct link to Debian ifenslave bug
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=558097

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2010-01-28:

#14

Some links about the ifenslave in Lucid vs. Karmic vs. Debian Sid

http://changelogs.ubuntu.com/changelogs/pool/main/i/ifenslave-2.6/ifenslave-2.6_1.1.0-14ubuntu1/changelog

Current package versions
Karmic 1.1.0-13ubuntu1 08-Jun-2009
Lucid 1.1.0-14ubuntu1 06-Nov-2009
Debian Sid 1.1.0-15 18-Dec-2009

Revision history for this message

Bhavani Shankar (bhavi) wrote on 2010-01-28:

#15

Thanks I ll prepare a diff against latest version in debian

Regards

Changed in linux (Ubuntu):
status:	Confirmed → In Progress
assignee:	nobody → Bhavani Shankar (bhavi)

Revision history for this message

Bhavani Shankar (bhavi) wrote on 2010-01-28:

#16

debian > ubuntu diff Edit (5.0 KiB, text/plain)

here is the built package

https://edge.launchpad.net/~bhavi/+archive/crickinfo/+build/1472211/+files/ifenslave-2.6_1.1.0-15ubuntu1_amd64.deb

https://edge.launchpad.net/~bhavi/+archive/crickinfo/+build/1472212/+files/ifenslave-2.6_1.1.0-15ubuntu1_i386.deb

and attached is the diff

Changed in linux (Ubuntu):
status:	In Progress → Confirmed
assignee:	Bhavani Shankar (bhavi) → nobody

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2010-01-28:

#17

ifenslave-2.6_1.1.0-15ubuntu1_amd64.deb is working for me on this box:

$ sudo dmidecode | grep Prod
Product Name: SUN FIRE X4450

$ sudo dmidecode | grep Desc.*LAN
        Description: LAN 1 of Gilgal (Intel 82563EB)
        Description: LAN 2 of Gilgal (Intel 82563EB)
        Description: LAN 3 of Gilgal (Intel 82571EB)
        Description: LAN 4 of Gilgal (Intel 82571EB)

:)

Revision history for this message

Bryn Hughes (linux-nashira) wrote on 2010-01-28:

#18

This package fixes things for me too... I've got two e1000's and two bnx2's, they all work properly with this package.

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2010-02-04:

#19

Is this bug/fix eligible for an SRU?

https://wiki.ubuntu.com/StableReleaseUpdates

Revision history for this message

Alexander Usyskin (sanniu) wrote on 2010-03-04:

#20

Affected by this bug too.

With bhavi package the problem was negated. Please do SRU!

Thank you.

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2010-03-30:

#21

Maybe it's about the language. I'm testing to see if it makes a difference if it's worded like LP: #552017

Revision history for this message

Michael Rolli (mrolli) wrote on 2010-05-19:

#22

Same isssue here on 10.04LTS.

Installed the package of Bhavani (ifenslave-2.6_1.1.0-15ubuntu1_amd64.deb)

Bonding of 4 NICs (802.3ad) now works like a charm, even after a restart. +1 for SRU!

Revision history for this message

Neil Wilson (neil-aldur) wrote on 2010-06-04:

#23

This is confirmed as a failure within 10.04 LTS on boot.

I feel this is a critical bug that shouldn't be in an LTS and the ifenslave package should be updated with the patch above.

I've added the ifenslave package above to my release archive at ppa:neil-aldur/ppa

Please SRU Lucid, and look at SRU for Karmic as well.

Steve Langasek (vorlon) on 2010-06-04

affects:	linux (Ubuntu) → ifenslave-2.6 (Ubuntu)
Changed in ifenslave-2.6 (Ubuntu):
status:	Confirmed → Fix Released
Changed in ifenslave-2.6 (Ubuntu Lucid):
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-06-04:

#24

For an SRU we need a targeted backport of just the bugfix, without taking changes that entirely redo the packaging. Bhavani, can you prepare this or should I?

Revision history for this message

Benjamin Drung (bdrung) wrote on 2010-06-29:

#25

unsubscribing ubuntu-sponsors. Please resubscribe the team once you have a debdiff prepared.

Revision history for this message

Bhavani Shankar (bhavi) wrote on 2010-07-14:

#26

yeah steve on it now

Revision history for this message

Bhavani Shankar (bhavi) wrote on 2010-07-14:

#27

the lucid SRU debdiff Edit (3.3 KiB, text/plain)

Hello steve, ubuntu-sru and others

Attached is the lucid SRU diff

Please test out the patch

regards

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-07-14:

#28

Hi Bhavani,

What dpkg trigger error are you trying to fix? The changes to the maintainer scripts should have no effect on dpkg triggers. In fact, they should have no effect in Ubuntu at all because the change only applies to upgrades from version 1.1.0-6 or before, which is the version that was in dapper. (If a user *does* try to upgrade directly from dapper, then this won't work either, because you use the wrong conffile names in your patch to the preinst). If you believe this is an important change to make in spite of this, please provide a separate bug reference in the changelog; there should be a separate bug report in Launchpad for each issue being fixed in SRU.

As for the pre-up change: why is this function being split into two? The two halves of the function are still being called, in the same order, with nothing else between them, so that seems unnecessary for an SRU? The change appears to be equivalent to this much shorter patch:

@@ -128,2 +131,2 @@
-enslave_slaves
setup_master
+enslave_slaves

Have I overlooked some reason that we want to split the function?

Revision history for this message

Bhavani Shankar (bhavi) wrote on 2010-07-15:

#29

Okay steve!

Shortening the patch !

regards

On Thu, Jul 15, 2010 at 1:18 AM, Steve Langasek
<email address hidden> wrote:
> Hi Bhavani,
>
> What dpkg trigger error are you trying to fix? The changes to the
> maintainer scripts should have no effect on dpkg triggers. In fact,
> they should have no effect in Ubuntu at all because the change only
> applies to upgrades from version 1.1.0-6 or before, which is the version
> that was in dapper. (If a user *does* try to upgrade directly from
> dapper, then this won't work either, because you use the wrong conffile
> names in your patch to the preinst). If you believe this is an
> important change to make in spite of this, please provide a separate bug
> reference in the changelog; there should be a separate bug report in
> Launchpad for each issue being fixed in SRU.
>
> As for the pre-up change: why is this function being split into two?
> The two halves of the function are still being called, in the same
> order, with nothing else between them, so that seems unnecessary for an
> SRU? The change appears to be equivalent to this much shorter patch:
>
> @@ -128,2 +131,2 @@
> -enslave_slaves
> setup_master
> +enslave_slaves
>
> Have I overlooked some reason that we want to split the function?
>
> --
> 802.3ad interface bonding fails if started too early
> https://bugs.launchpad.net/bugs/482419
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “ifenslave-2.6” package in Ubuntu: Fix Released
> Status in “ifenslave-2.6” source package in Lucid: Triaged
> Status in “ifenslave-2.6” package in Debian: Unknown
>
> Bug description:
> 802.3ad bonding configurations that formerly worked on jaunty are now failing on startup under karmic. After the system has started, restarting networking will bring the bond up correctly. This only applies to bond_mode 4 / 802.3ad, I've tested that switching to bond_mode 0 corrects the issue, and other users experiencing this bug all were using bond_mode 4 as well.
>
> dmesg output fills with "bonding: bond0: Warning: Found an uninitialized port", even after the system starts up and the port should be "initialized"
>
> It appears to occur on multiple drivers (bnx2, e1000 confirmed).
>
> One initially wants to blame the startup ordering due to the switch to upstart, but I believe it is an edge case that hasn't been seen before because we haven't been starting up so quickly that the hardware hasn't had time to fully initialized.
>
> Configuration and output from multiple users is in this thread:
> http://ubuntuforums.org/showthread.php?p=8311572
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu/+source/ifenslave-2.6/+bug/482419/+subscribe
>

--
Bhavani Shankar.R
https://launchpad.net/~bhavi, a proud ubuntu community member.
What matters in life is application of mind!,
It makes great sense to have some common sense..!

Okay steve!

Shortening the patch !

regards

On Thu, Jul 15, 2010 at 1:18 AM, Steve Langasek
<steve.langasek@canonical.com> wrote:
> Hi Bhavani,
>
> What dpkg trigger error are you trying to fix?  The changes to the
> maintainer scripts should have no effect on dpkg triggers.  In fact,
> they should have no effect in Ubuntu at all because the change only
> applies to upgrades from version 1.1.0-6 or before, which is the version
> that was in dapper.  (If a user *does* try to upgrade directly from
> dapper, then this won't work either, because you use the wrong conffile
> names in your patch to the preinst).  If you believe this is an
> important change to make in spite of this, please provide a separate bug
> reference in the changelog; there should be a separate bug report in
> Launchpad for each issue being fixed in SRU.
>
> As for the pre-up change:  why is this function being split into two?
> The two halves of the function are still being called, in the same
> order, with nothing else between them, so that seems unnecessary for an
> SRU?  The change appears to be equivalent to this much shorter patch:
>
> @@ -128,2 +131,2 @@
> -enslave_slaves
>  setup_master
> +enslave_slaves
>
> Have I overlooked some reason that we want to split the function?
>
> --
> 802.3ad interface bonding fails if started too early
> https://bugs.launchpad.net/bugs/482419
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “ifenslave-2.6” package in Ubuntu: Fix Released
> Status in “ifenslave-2.6” source package in Lucid: Triaged
> Status in “ifenslave-2.6” package in Debian: Unknown
>
> Bug description:
> 802.3ad bonding configurations that formerly worked on jaunty are now failing on startup under karmic. After the system has started, restarting networking will bring the bond up correctly. This only applies to bond_mode 4 / 802.3ad, I've tested that switching to bond_mode 0 corrects the issue, and other users experiencing this bug all were using bond_mode 4 as well.
>
> dmesg output fills with "bonding: bond0: Warning: Found an uninitialized port", even after the system starts up and the port should be "initialized"
>
> It appears to occur on multiple drivers (bnx2, e1000 confirmed).
>
> One initially wants to blame the startup ordering due to the switch to upstart, but I believe it is an edge case that hasn't been seen before because we haven't been starting up so quickly that the hardware hasn't had time to fully initialized.
>
> Configuration and output from multiple users is in this thread:
> http://ubuntuforums.org/showthread.php?p=8311572
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu/+source/ifenslave-2.6/+bug/482419/+subscribe
>

-- 
Bhavani Shankar.R
https://launchpad.net/~bhavi, a proud ubuntu community  member.
What matters in life is application of mind!,
It makes great sense to have some common sense..!

Revision history for this message

Bhavani Shankar (bhavi) wrote on 2010-07-15:

#30

the lucid sru debdiff Edit (768 bytes, text/plain)

here is the shortened patch

regards

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2010-08-14:

#31

Any chance we'll see this as part of the 10.04.1 iso?

Revision history for this message

Ryan Tandy (rtandy) wrote on 2010-09-05:

#32

The patch supplied by Steve in comment #28 worked for me in a virtual machine and I will test it on a physical one when I get the chance. Is there anything holding up the release of this fixed package?

Revision history for this message

Neil Wilson (neil-aldur) wrote on 2010-09-05: Re: [Bug 482419] Re: 802.3ad interface bonding fails if started too early

#33

Somebody to execute the SRU process I'd imagine.

On 5 September 2010 05:49, Ryan Tandy <email address hidden> wrote:
> The patch supplied by Steve in comment #28 worked for me in a virtual
> machine and I will test it on a physical one when I get the chance. Is
> there anything holding up the release of this fixed package?
>
> --
> 802.3ad interface bonding fails if started too early
> https://bugs.launchpad.net/bugs/482419
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Neil Wilson

Revision history for this message

Chad Netzer (chad-netzer) wrote on 2010-10-04:

#34

I was hit by this bug, on multiple machines and architectures (x86, x86_64). I installed the provisional version 1.1.0-15ubuntu1, from comment #16, and the problems went away.

Mackenzie Morgan (maco.m) on 2010-10-09

description:

updated

Revision history for this message

Martin Pitt (pitti) wrote on 2010-10-10:

#35

Bhavani,

this patch looks a bit weird. It not just changes the order of the function calls, it also introduces a new function call "setup_slaves", without any further changes. However, in the original Debian patch (still visible at http://launchpadlibrarian.net/49642996/ifenslave-2.6_1.1.0-14ubuntu2_1.1.0-15ubuntu1.diff.gz) that new function is introduced (by splitting the original one). So can you please confirm that our version already has the new function, but not the call to it?

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2010-10-11:

#36

Any chance we'll see this fix as part of the January 10.04.2 iso?

(Any chance Ubuntu LTS will have this fixed before CentOS 6 is out?) (*ducks*)

Neil Wilson (neil-aldur) on 2010-11-14

Changed in ifenslave-2.6 (Ubuntu Lucid):
assignee:	nobody → Neil Wilson (neil-aldur)

Neil Wilson (neil-aldur) on 2010-11-14

Changed in ifenslave-2.6 (Ubuntu Lucid):
assignee:	Neil Wilson (neil-aldur) → nobody
status:	Triaged → Incomplete

Revision history for this message

Neil Wilson (neil-aldur) wrote on 2010-11-14:

#37

I'm no longer sure this is a bug in the package and may instead be more of a duplicate of #559090. I cannot get the fault to replicate if I use the correct hotplug configurations for the bonding system on a fresh install of lucid with the vanilla lucid ifenslave-2.6 package.

All the posted configs seem to have slaves defined in the bond master interface rather than using 'bond-master' in the slaves.

If you believe you are affected by this bug, can you reconfirm it please after following the guide I've put together here:

http://www.3spoken.co.uk/2010/11/how-to-do-ethernet-bonding-on-ubuntu.html

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-11-15:

#38

I am sure this is a bug in the package. That's why it was marked triaged.

Changed in ifenslave-2.6 (Ubuntu Lucid):
status:	Incomplete → Triaged

Revision history for this message

Neil Wilson (neil-aldur) wrote on 2010-11-15:

#39

I had the problem. I was going to sort the SRU.

Changing the configuration to the hotplug configuration fixed the
problem using the standard package.

Can you tell me what status I should use to get that checked by everybody else?

On 15 November 2010 08:07, Steve Langasek <email address hidden> wrote:
> I am sure this is a bug in the package. That's why it was marked
> triaged.
>
> ** Changed in: ifenslave-2.6 (Ubuntu Lucid)
> Status: Incomplete => Triaged
>
> --
> 802.3ad interface bonding fails if started too early
> https://bugs.launchpad.net/bugs/482419
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Neil Wilson

Revision history for this message

Martin Pitt (pitti) wrote on 2010-11-26:

#40

Setting to incomplete, see comment 35.

Changed in ifenslave-2.6 (Ubuntu Lucid):
status:	Triaged → Incomplete

Revision history for this message

Dave Walker (davewalker) wrote on 2010-12-03:

#41

As per current status, unsubscribing Ubuntu sponsors team. Please free free to re-subscribe the team when the patch is ready for review and upload. Thanks!

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-12-03:

#42

The proposed patch from Bhavani is incorrect, but the bug as originally described is real, is present even when using the hotplug configuration, and should be backported to lucid. Setting this as 'triaged' again.

Changed in ifenslave-2.6 (Ubuntu Lucid):
status:	Incomplete → Triaged
assignee:	nobody → Steve Langasek (vorlon)

Revision history for this message

Neil Wilson (neil-aldur) wrote on 2010-12-04:

#43

Can you demonstrate the configuration to make it fail then.

Revision history for this message

Michal Kleczek (michal-kleczek) wrote on 2010-12-06:

#44

My configuration does not work - note that there is also a bridge set up on two vlan interfaces.

# The loopback network interface
auto lo
iface lo inet loopback

auto bond0
iface bond0 inet manual
        bond-slaves none
        bond-mode 4
        bond-miimon 100

# Internet
auto bond0.1501
iface bond0.1501 inet dhcp

# Wireless
auto bond0.1502
iface bond0.1502 inet manual

# Wired
auto bond0.1503
iface bond0.1503 inet manual

# Wireless-Wired bridge
auto br0
iface br0 inet static
address 10.15.0.1
network 10.15.0.0
netmask 255.255.0.0
bridge_ports bond0.1502 bond0.1503

# Printer
auto bond0.1504
iface bond0.1504 inet static
address 10.16.0.1
netmask 255.255.255.252

auto eth0
iface eth0 inet manual
bond-master bond0
bond-primary eth0 eth1

auto eth1
iface eth1 inet manual
bond-master bond0
bond-primary eth0 eth1

Revision history for this message

Malcolm Scott (malcscott) wrote on 2010-12-06:

#45

re #44, Michal, you're bridging together two VLANs -- beware, this will cause most VLAN-aware switches to break down (as VLANs share one forwarding database, and a MAC address can't appear on two ports; in my experience this leads to intermittent very high packet loss as the forwarding database starts flapping). Are you sure that that isn't the problem you're seeing? What are your symptoms?

Revision history for this message

Michal Kleczek (michal-kleczek) wrote on 2010-12-07:

#46

My symptoms are that interfaces are not brought up at boot. I have to manually:
1. stop networking
2. rmmod bonding
3. modprobe bonding
4. start networking
5. bring up all vlans one by one (ifup bond0.150x)

Once interfaces are up I do not have any problems with the network.

The history is that it didn't work after fresh Lucid installation. Then - after one of the updates it started working and now it is broken again (so I'm not sure if it is actually ifenslave issue). Sorry but I don't remember what updates they were.
I've tried different versions of ifenslave and various kernels. For me it looks like it has something to do with vlans being started too early.

Revision history for this message

BenLake (me-benlake) wrote on 2011-03-07:

#47

Just confirming Neil Wilson's findings. I had the "no bond connection upon reboot" issue and was using the non hot-plug configuration method. I switched to hot-plug and the problem was sorted.

Description: Ubuntu 10.04.2 LTS

--- current config ----

# bond/trunk eth0-1
auto bond0
iface bond0 inet static
    address 192.168.1.94
    netmask 255.255.255.0
    gateway 192.168.1.1
# bond-slaves eth0 eth1
    bond-slaves none
    bond-mode 802.3ad
    bond-miimon 100

# enslave eth0-1
auto eth0
iface eth0 inet manual
bond-master bond0

auto eth1
iface eth1 inet manual
bond-master bond0
--------------

Previous config had bond-slaves eth0 eth1 as comment shows and both bond-master lines on eth0 and eth1 were non-existent.

Revision history for this message

Mark Favas (mark-favas) wrote on 2011-04-23:

#48

I can confirm that this bug still occurs. Although I've been using this configuration for ~6 months, it bit me today. This is on a box running Ubuntu server, with two bonded interfaces. Bond0 failed to come up after a reboot, while bond1 did come up. After I got onto the box via bond1, a reboot worked - both bond0 and bond 1 came up. However, /var/log/messages still contains many lines of the below:

Apr 23 11:02:43 server1 kernel: [104080.510023] bonding: bond0: Warning: Found an uninitialized port

and many lines of:

Apr 23 11:05:14 server1 kernel: [ 16.985610] bonding: bond0: doing slave updates when interface is down.

Details:

root@server1:/etc/network# cat /etc/issue
Ubuntu 10.04.2 LTS \n \l

root@server1:/etc/network#

root@server1e:/etc/network# uname -a
Linux server1 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux
root@server1:/etc/network#

root@servers:/etc/network# apt-cache search ifenslave
ifenslave-2.6 - Attach and detach slave interfaces to a bonding device
root@server1:/etc/network#

root@server1:/etc/network# grep -r -i bond /etc/modprobe.d/
root@server1:/etc/network#

root@server1:/etc/network# cat interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
# Bonded interfaces
auto bond0
iface bond0 inet static
address a.b.c.d
netmask a.b.c.d
network a.b.c.d
broadcast a.b.c.d
metric 99
gateway a.b.c.d
bond-slaves none
bond-mode 4
bond-miimon 100
bond-xmit_hash_policy layer2+3

auto eth0
iface eth0 inet manual
bond-master bond0

auto eth2
iface eth2 inet manual
bond-master bond0

auto bond1
iface bond1 inet static
address w.x.y.z
netmask w.x.y.z
network w.x.y.z
broadcast w.x.y.z
bond-slaves none
bond-mode 4
bond-miimon 100
bond-xmit_hash_policy layer2+3

auto eth1
iface eth1 inet manual
bond-master bond1

auto eth3
iface eth3 inet manual
bond-master bond1

root@server1:/etc/network#

I can confirm that this bug still occurs. Although I've been using this configuration for ~6 months, it bit me today. This is on a box running Ubuntu server, with two bonded interfaces. Bond0 failed to come up after a reboot, while bond1 did come up. After I got onto the box via bond1, a reboot worked - both bond0 and bond 1 came up. However, /var/log/messages still contains many lines of the below:

Apr 23 11:02:43 server1 kernel: [104080.510023] bonding: bond0: Warning: Found an uninitialized port

and many lines of:

Apr 23 11:05:14 server1 kernel: [   16.985610] bonding: bond0: doing slave updates when interface is down.

Details:

root@server1:/etc/network# cat /etc/issue
Ubuntu 10.04.2 LTS \n \l

root@server1:/etc/network#

root@server1e:/etc/network# uname -a
Linux server1 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux
root@server1:/etc/network#

root@servers:/etc/network# apt-cache search ifenslave
ifenslave-2.6 - Attach and detach slave interfaces to a bonding device
root@server1:/etc/network#

root@server1:/etc/network# grep -r -i bond /etc/modprobe.d/
root@server1:/etc/network#

root@server1:/etc/network# cat interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
# Bonded interfaces
auto bond0
iface bond0 inet static
	address a.b.c.d
	netmask a.b.c.d
	network a.b.c.d
	broadcast a.b.c.d
	metric 99
	gateway a.b.c.d
	bond-slaves none
	bond-mode 4
	bond-miimon 100
	bond-xmit_hash_policy layer2+3

auto eth0
iface eth0 inet manual
	bond-master bond0

auto eth2
iface eth2 inet manual
	bond-master bond0

auto bond1
iface bond1 inet static
	address w.x.y.z
	netmask w.x.y.z
	network w.x.y.z
	broadcast w.x.y.z
	bond-slaves none
	bond-mode 4
	bond-miimon 100
	bond-xmit_hash_policy layer2+3

auto eth1
iface eth1 inet manual
	bond-master bond1

auto eth3
iface eth3 inet manual
	bond-master bond1

root@server1:/etc/network#

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-04-23:

#49

@Mark Favas - dude, it's not about fixing things, it's about declaring them fixed. Fiat bug fixes are what Canonical and the other Ubuntunistas do best.

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-04-23:

#50

On Sat, Apr 23, 2011 at 01:23:30PM -0000, nutznboltz wrote:
> @Mark Favas - dude, it's not about fixing things, it's about declaring
> them fixed. Fiat bug fixes are what Canonical and the other
> Ubuntunistas do best.

Such false and slanderous comments have no place in bug reports. Take it
somewhere else.

Revision history for this message

Pete Ashdown (pashdown-xmission) wrote on 2011-05-26:

#51

Is there a workaround for this on 10.04.2? I'd rather that the bond came up the first time so the subsequent network dependent daemons could run properly rather than having to do a "/etc/init.d/networking restart" in rc.local as suggested by the forum thread.

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-05-26:

#52

@Pete Ashdown yes, there is a workaround.

First one of the things that goes wrong is that ethernet media autoconfiguration takes up time and causes network interfaces to take so long to come up there is the possibility that that will interfere with bonding configuring. You can hardcode the media speed/duplex/etc on the switch and in /etc/network/interfaces

auto eth0
iface eth0 inet manual
media 1000baseTx-FD

for each eth0 device to be bonded.

Next you can get bonding to work by using pre-up, etc. in /etc/network/interfaces like this:

auto bond0
iface bond0 inet manual
    pre-up modprobe bonding mode=802.3ad ad_select=bandwidth downdelay=400 miimon=100 lacp_rate=0 max_bonds=2 ; ifconfig bond0 up ; ifconfig eth0 up ; ifconfig eth1 up
    post-up ifenslave bond0 eth0 eth1
    pre-down ifenslave -d bond0 eth0 eth1
    post-down ifconfig eth0 down ; ifconfig eth1 down ; ifconfig bond0 down

Revision history for this message

scott (scott.phelps) wrote on 2011-05-26:

#53

speed/duplex auto-negotiation was not the source of the problem in my experience (not saying that it couldn't potentially cause some problem). Relying on auto-negotiation is not a good idea when using ether-channel.

What I found was that problem is originating from the udev (hotplug) ethernet system which is should be loading the bonding module prior to bringing up the bond interface. This does not happen for some reason and would require detailed inspection of uevents that are being triggered when the interfaces file is parsed.

The workaround, which I posted on 5/17/11 on related bug, 574456 [ https://bugs.launchpad.net/ubuntu/+source/ifenslave-2.6/+bug/574456/comments/5 ], is to use the pre-up scripts to explicitly state the _proper_ order things should happen. My solution works across reboots and is udev agnostic.

Revision history for this message

Pete Ashdown (pashdown-xmission) wrote on 2011-05-27:

#54

Unfortunately this workaround isn't as reliable as adding (the sloppier) "/etc/init.d/networking restart" to my /etc/rc.local.

Revision history for this message

Pete Ashdown (pashdown-xmission) wrote on 2011-06-06:

#55

Short followup to my previous comment #54. The workaround is functioning well for me now. I think I had a typo in my first round of testing.

Steve Langasek (vorlon) on 2011-06-06

Changed in ifenslave-2.6 (Ubuntu Lucid):
assignee:	Steve Langasek (vorlon) → Stéphane Graber (stgraber)

Revision history for this message

Stéphane Graber (stgraber) wrote on 2011-06-07:

#56

I'm having a look at this bug now.

I used a spare DELL PowerEdge 750 with two Intel Gigabit NICs connected to a DELL managed switch with LACP setup.
My /etc/network/interfaces looks like:
auto bond0
iface bond0 inet static
     address 10.145.15.20
     netmask 255.255.255.0
     gateway 10.145.15.1

     slaves eth0 eth1
     bond-mode 4
     bond-miimon 10
     bond-lacp-rate fast
     bond-xmit_hash_policy layer2+3

The switch configuration for the two ports is:
  port-channel load-balance layer-2-3-4
  interface range ethernet g(11-12)
  channel-group 4 mode auto
  exit

Revision history for this message

Stéphane Graber (stgraber) wrote on 2011-06-07:

#57

Download full text (5.2 KiB)

With the current package, I get the issue described in this bug report, with the following output:

Switch:
sw06# show lacp port-channel 4
Port-Channel ch4
       Port Type Gigabit Ethernet
       Attached Lag id:
       Actor
               System Priority:1
               MAC Address: 00:14:22:66:25:10
               Admin Key: 28
               Oper Key: 28
       Partner
               System Priority:0
               MAC Address: 00:00:00:00:00:00
               Oper Key: 0

sw06# show lacp ethernet g11
g11 LACP parameters:
      Actor
              system priority:               system mac addr:               port Admin key:               port Oper key:               port Oper number:               port Admin priority:               port Oper priority:               port Admin timeout:               port Oper timeout:               LACP Activity:               Aggregation:               synchronization       collecting:               distributing:               expired:       Partner
              system priority:               system mac addr:               port Admin key:               port Oper key:               port Oper number:               port Admin priority:               port Oper priority:               port Oper timeout:               LACP Activity:               Aggregation:               synchronization       collecting:               distributing:               expired: g11 LACP statistics:
      LACP Pdus sent: 252
      LACP Pdus received: 27
g11 LACP Protocol State:
      LACP State Machines:
              Receive FSM:               Mux FSM:               Periodic Tx FSM:       Control Variables:
              BEGIN:               LACP_Enabled:               Ready_N:               Selected:               Port_moved:               NNT:               Port_enabled:       Timer counters:
              periodic tx timer:               current while timer:               wait while timer: sw06# show lacp ethernet g12
g12 LACP parameters:
      Actor
              system priority:               system mac addr:               port Admin key:               port Oper key:               port Oper number:               port Admin priority:               port Oper priority:               port Admin timeout:               port Oper timeout:               LACP Activity:               Aggregation:               synchronization 00:14:22:66:25:10
28
28
11
1
1
LONG
LONG
ACTIVE
AGGREGATABLE
/>: FALSE
FALSE
FALSE
FALSE
65535
00:c0:9f:3f:3b:2c
0
17
2
0
255
SHORT
ACTIVE
INDIVIDUAL
/>: FALSE
TRUE
TRUE
FALSE
Port Disabled State
Detached State
No Periodic State
FALSE
TRUE
FALSE
UNSELECTED
FALSE
FALSE
FALSE
0
0
0
1
00:14:22:66:25:10
28
28
12
1
1
LONG
LONG
ACTIVE
AGGREGATABLE
/>: TRUE

Ubuntu
ifenslave-2.6 package

802.3ad interface bonding fails if started too early

Bug Description

Related branches

Other bug subscribers

Patches

Remote bug watches

Affects		Status	Importance	Assigned to
	ifenslave-2.6 (Debian)	Fix Released	Unknown	debbugs #558097
	ifenslave-2.6 (Ubuntu)	Fix Released	Medium	Unassigned
Declined for Karmic by Steve Langasek
	Lucid	Fix Released	Medium	Stéphane Graber

Changed in ifenslave-2.6 (Ubuntu Lucid):
status:	Triaged → Fix Committed

Changed in ifenslave-2.6 (Ubuntu Lucid):
status:	Fix Committed → Fix Released

Changed in ifenslave-2.6 (Debian):
status:	Unknown → Fix Released

Ubuntuifenslave-2.6 package

802.3ad interface bonding fails if started too early

Bug Description

Related branches

Other bug subscribers

Patches

Remote bug watches

Ubuntu
ifenslave-2.6 package