Comment 33 for bug 889423

Revision history for this message
Stéphane Graber (stgraber) wrote :

Wow, that's quite a lot of things happening on that system :)
So indeed looking at the number of CPUs, network cards and disks showing up, it's enough to flood udev and upstart and likely make things start a bit slower than usual and so out of order.

Essentially, the fallback networking script starts before the network cards actually got setup and announced by udev.
That's the one case where you indeed end up trying to add something to the bond just before the bond actually gets created (by not even a second apparently).

Just for testing's sake can you add:
pre-up sleep 2

To your bridge to confirm that it's indeed a race condition happening there?

What it shows at least is that we definitely can't rely on the fallback networking job as running after all the kernel events have been processed. I guess the easiest way out of that problem will be to add the same hack to bridge-utils that I added to ifenslave, essentially waiting for up to a minute for the slaves/members to appear before giving up and continuing without them.

In your case, that'd wait for around 200ms, then find bond0, move it into the bridge and continue.

At least it looks like the proposed ifenslave isn't at fault, it's just an extra change that'll need to happen to bridge-utils.

Thanks for the tests, good to have someone with that kind of hardware around for testing :)