ifupdown initialization problems caused by race condition
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| ifenslave (Ubuntu) |
Medium
|
Unassigned | |||
| Precise |
Medium
|
Unassigned | |||
| Trusty |
Medium
|
Unassigned | |||
| Vivid |
Medium
|
Unassigned | |||
| Wily |
Medium
|
Unassigned | |||
| ifupdown (Debian) |
Fix Released
|
Unknown
|
|||
| ifupdown (Ubuntu) |
Medium
|
Dariusz Gadomski | |||
| Precise |
Medium
|
Unassigned | |||
| Trusty |
Medium
|
Unassigned | |||
| Vivid |
Medium
|
Unassigned | |||
| Wily |
Medium
|
Unassigned | |||
Bug Description
[Impact]
* Lack of proper synchronization in ifupdown causes a race condition resulting in occasional incorrect network interface initialization (e.g. in bonding case - wrong bonding settings, network unavailable because slave<->master interfaces initialization order was wrong
* This is very annoying in case of large deployments (e.g. when bringing up 1000 machines it is almost guaranteed that at least a few of them will end up with network down).
* It has been fixed by introducing hierarchical and per-interface locking mechanism ensuring the right order (along with the correct order in the /e/n/interfaces file) of initialization
[Test Case]
1. Create a VM with bonding configured with at least 2 slave interfaces.
2. Reboot.
3. If all interfaces are up - go to 2.
[Regression Potential]
* This change has been introduced upstream in Debian.
* It does not require any config changes to existing installations.
[Other Info]
Original bug description:
* please consider my bonding examples are using eth1 and eth2 as slave
interfaces.
ifupdown some race conditions explained bellow. ifenslave does not
behave well with sysv networking and upstart network-interface scripts
running together.
!!!!
case 1)
(a) ifup eth0 (b) ifup -a for eth0
-------
1-1. Lock ifstate.lock file.
1-2. Read ifstate file to check
the target NIC.
1-3. close(=release) ifstate.lock
file.
1-4. Judge that the target NIC
isn't processed.
2. Lock and update ifstate file.
Release the lock.
!!!
to be explained
!!!
case 2)
(a) ifenslave of eth0 (b) ifenslave of eth0
-------
3. Execute ifenslave of eth0. 3. Execute ifenslave of eth0.
4. Link down the target NIC.
5. Write NIC id to
/sys/
/slaves then NIC gets up
!!!
#######
#### My setup:
root@provisioned:~# cat /etc/modprobe.
alias bond0 bonding options bonding mode=1 arp_interval=2000
Both, /etc/init.
enabled.
#### Beginning:
root@provisioned:~# cat /etc/network/
# /etc/network/
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
I'm able to boot with both scripts (networking and network-interface
enabled) with no problem. I can also boot with only "networking"
script enabled:
---
root@provisioned:~# initctl list | grep network
network-interface stop/waiting
networking start/running
---
OR only the script "network-interface" enabled:
---
root@provisioned:~# initctl list | grep network
network-interface (eth2) start/running
network-interface (lo) start/running
network-interface (eth0) start/running
network-interface (eth1) start/running
---
#### Enabling bonding:
Following ifenslave configuration example (/usr/share/
examples/
look like this:
---
auto eth1
iface eth1 inet manual
bond-master bond0
auto eth2
iface eth2 inet manual
bond-master bond0
auto bond0
iface bond0 inet static
bond-mode 1
bond-miimon 100
bond-primary eth1 eth2
address 192.168.169.1
netmask 255.255.255.0
broadcast 192.168.169.255
---
Having both scripts running does not make any difference since we
are missing "bond-slaves" keyword on slave interfaces, for ifenslave
to work, and they are set to "manual".
Ifenslave code:
"""
for slave in $BOND_SLAVES ; do
...
# Ensure $slave is down.
ip link set "$slave" down 2>/dev/null
if ! sysfs_add slaves "$slave" 2>/dev/null ; then
echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER
ready and a bonding interface ?" >&2
else
# Bring up slave if it is the target of an allow-bondX stanza.
# This is usefull to bring up slaves that need extra setup.
if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\"
--list | grep -q $slave; then
ifup $v --allow "$BOND_MASTER" "$slave"
fi
"""
Without the keyword "bond-slaves" on the master interface declaration,
ifenslave will NOT bring any slave interface up on the "master"
interface ifup invocation.
*********** Part 1
So, having networking sysv init script AND upstart network-interface
script running together... the following example works:
---
root@provisioned:~# cat /etc/network/
# /etc/network/
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
auto eth1
iface eth1 inet manual
bond-master bond0
auto eth2
iface eth2 inet manual
bond-master bond0
auto bond0
iface bond0 inet static
bond-mode 1
bond-miimon 100
bond-primary eth1
bond-slaves eth1 eth2
address 192.168.169.1
netmask 255.255.255.0
broadcast 192.168.169.255
---
Ifenslave script sets link down to all slave interfaces, declared by
"bond-slaves" keyword, and assigns them to correct bonding. Ifenslave
script ONLY tries to make a reentrant call to ifupdown if the slave
interfaces have "allow-bondX" stanza (not our case).
So this should not work, since when the master bonding interface
(bond0) is called, ifenslave does not configure slaves without
"allow-bondX" stanza. What is happening, why is it working ?
If we disable upstart "network-interface" script.. our bonding stops
to work on the boot. This is because upstart was the one setting
the slave interfaces up (with the configuration above) and not
sysv networking scripts.
It is clear that ifenslave from sysv script invocation can set the
slave interface down anytime (even during upstart script execution)
so it might work and might not:
"""
ip link set "$slave" down 2>/dev/null
"""
root@provisioned:~# initctl list | grep network-interface
network-interface (eth2) start/running
network-interface (lo) start/running
network-interface (bond0) start/running
network-interface (eth0) start/running
network-interface (eth1) start/running
Since having the interface down is a requirement to slave it,
running both scripts together (upstart and sysv) could create a
situation where upstart puts slave interface online but ifenslave
from sysv script puts it down and never bring it up again (because
it does not have "allow-bondX" stanza).
*********** Part 2
What if I disable upstart "network-
script but introduce the "allow-bondX" stanza to slave interfaces ?
The funny part begins... without upstart, the ifupdown tool calls
ifenslave, for bond0 interface, and ifenslave calls this line:
"""
for slave in $BOND_SLAVES ; do
...
if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\"
--list | grep -q $slave; then
ifup $v --allow "$BOND_MASTER" "$slave"
fi
"""
But ifenslave stays waiting for the bond0 interface to be online
forever. We do have a chicken egg situation now:
* ifupdown trys to put bond0 interface online.
* we are not running upstart network-interface script.
* ifupdown for bond0 calls ifenslave.
* ifenslave tries to find interfaces with "allow-bondX" stanza
* ifenslave tries to ifup slave interfaces with that stanza
* slave interfaces keep forever waiting for the master
* master is waiting for the slave interface
* slave interface is waiting for the master interface
... :D
And we have an infinite loop for ifenslave:
"""
# Wait for the master to be ready
[ ! -f /run/network/
echo "Waiting for bond master $BOND_MASTER to be ready"
while :; do
if [ -f /run/network/
break
fi
sleep 0.1
done
"""
*********** Conclusion
That can be achieved if correct triggers are set (like the ones I just
showed). Not having ifupdown parallel executions (sysv and upstart,
for example) can make an infinite loop to happen during the boot.
Having parallel ifupdown executions can trigger race conditions
between:
1) ifupdown itself (case a on the bug description).
2) ifupdown and ifenslave script (case b on the bug description).
| Changed in ifupdown (Ubuntu): | |
| status: | New → In Progress |
| assignee: | nobody → Rafael David Tinoco (inaddy) |
| Rafael David Tinoco (inaddy) wrote : | #1 |
| summary: |
- bonding initialization problems caused by race condition + Precise, Trusty, Utopic - bonding initialization problems caused by race + condition |
| summary: |
- Precise, Trusty, Utopic - bonding initialization problems caused by race - condition + Precise, Trusty, Utopic - ifupdown initialization problems caused by + race condition |
The attachment "ifupdown_
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]
| tags: | added: patch |
| Changed in ifupdown (Debian): | |
| status: | Unknown → New |
| description: | updated |
| description: | updated |
| description: | updated |
| Rafael David Tinoco (inaddy) wrote : | #5 |
CORRECT WAY OF SETTING INTERFACES FILE FOR BONDING:
1) This model has race conditions.
2) YOU HAVE to have both scripts running (networking and network-interfaces)
# /etc/network/
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
auto eth1
iface eth1 inet manual
bond-master bond0
auto eth2
iface eth2 inet manual
bond-master bond0
auto bond0
iface bond0 inet static
bond-mode 1
bond-miimon 100
bond-primary eth1
bond-slaves eth1 eth2
address 192.168.169.1
netmask 255.255.255.0
broadcast 192.168.169.255
You can expect this to fail from time to time but it works... working on a fix for this.
| description: | updated |
| Rafael David Tinoco (inaddy) wrote : | #7 |
I have introduced one big lock for ifupdown. The ifup, ifdown or ifquery commands cannot be run simultaneously.
Since SEVERAL ifupdown pre/post scripts do need to make reentrant calls do these commands I created on environment variable that disabled the locking when reentrant calls are made to these scripts. This way sysv and upstart networking scripts will never step into other's feet.
Attaching fix for ifupdown.
PS: This breaks even more ifenslave buggy behavior.. wait for next comments.
| Rafael David Tinoco (inaddy) wrote : | #8 |
Let's try everything again from the beggining but now with a fixed
ifupdown version (no race conditions between upstart and sysv scripts
). My interfaces file will be exactly the same as the one proposed for
ifenslave examples:
---
auto eth1
iface eth1 inet manual
bond-master bond0
auto eth2
iface eth2 inet manual
bond-master bond0
auto bond0
iface bond0 inet static
bond-mode 1
bond-miimon 100
bond-primary eth1 eth2
address 192.168.169.1
netmask 255.255.255.0
broadcast 192.168.169.255
---
We do have bond0 created but still no bonding configured:
---
root@provisioned:~# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 62:64:29:45:df:ef
BROADCAST MASTER MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
root@provisioned:~# cat /proc/net/
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: load balancing (round-robin)
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
---
| Rafael David Tinoco (inaddy) wrote : | #9 |
Lets try adding "bond-slaves" to the master interface and fixing
the "bond-primary" keyword:
---
root@provisioned:~# cat /etc/network/
# /etc/network/
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
auto eth1
iface eth1 inet manual
bond-master bond0
auto eth2
iface eth2 inet manual
bond-master bond0
auto bond0
iface bond0 inet static
bond-mode 1
bond-miimon 100
bond-primary eth1
bond-slaves eth1 eth2
address 192.168.169.1
netmask 255.255.255.0
broadcast 192.168.169.255
---
Still nothing...
---
root@provisioned:~# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 62:64:29:45:df:ef
BROADCAST MASTER MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
root@provisioned:~# cat /proc/net/
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: load balancing (round-robin)
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
---
| Rafael David Tinoco (inaddy) wrote : | #10 |
And you can check that upstart got deadlocked:
---
root@provisioned:~# ps -ef | grep ifup
root 618 1 0 10:21 ? 00:00:00 ifup --allow auto eth2
root 619 1 0 10:21 ? 00:00:00 ifup --allow auto eth1
root 620 1 0 10:21 ? 00:00:00 ifup --allow auto lo
root 621 1 0 10:21 ? 00:00:00 ifup --allow auto eth0
root 726 1 0 10:21 ? 00:00:00 ifup --allow auto bond0
root 739 733 0 10:21 ? 00:00:00 ifup -a
root@provisioned:~# for i in `ps -ef | grep ifup | grep -v grep | awk '{print $2}'`; do echo $i; cat /proc/$i/environ; done
618
...UPSTART_
...UPSTART_
...UPSTART_
...UPSTART_
...INSTANCE=
---
As I said before, sysv scripts and upstart scripts were depending on
each other to run in parallel (unfortunately with race conditions)
to configure bonding. We can see here that one of upstart networking
processes (networking or network-instance) got the lock and is
on an infinite loop waiting for other instance.. who is waiting for
the lock.
---
root@provisioned:~# ps -ef | grep ifenslave
root 647 641 0 10:21 ? 00:00:00 /bin/sh /etc/network/
---
| Rafael David Tinoco (inaddy) wrote : | #11 |
YES!
---
root@provisioned:~# pstree -a
...
├─ifup --allow auto eth2
│ └─sh -c run-parts /etc/network/
│ └─run-parts /etc/network/
│ └─ifenslave /etc/network/
│ └─sleep 0.1
---
One slave interface, eth2 in this case, got the ifupdown lock and is
running an infite loop waiting for the master bonding interface which
will never run without the lock.
Resuming:
So bonding had to have both networking scripts running (network-
interface and networking) to work AND having both scripts running
would case race conditions sometime. Disabling one of the scripts
would also cause race condition if right triggers are set (like i
showed in this example). Fixing ifupdown race conditions led me to
realize ifenslave is taking wrong decisions and can cause deadlocks.
Ifenslave must be fixed together...
* wait for next comments.
| Rafael David Tinoco (inaddy) wrote : | #12 |
Checking Ubuntu bzr tree...
---
<email address hidden>
Cloning into 'ifenslave'...
Most recent Ubuntu version: 3
Packaging branch version: 2.5ubuntu1
Packaging branch status: OUT-OF-DATE
Most recent Ubuntu version: 3
Packaging branch version: 2.5ubuntu1
Packaging branch status: OUT-OF-DATE
Most recent Ubuntu version: 3
Packaging branch version: 2.5ubuntu1
Packaging branch status: OUT-OF-DATE
Most recent Ubuntu version: 3
Packaging branch version: 2.5ubuntu1
Packaging branch status: OUT-OF-DATE
Checking connectivity... done.
---
---
<email address hidden>
2.4
2.4ubuntu1
2.5
2.5ubuntu1
---
---
<email address hidden>
Previous HEAD position was 64392a5... Re-apply Ubuntu delta to new source.
HEAD is now at 1d22c9b... Added "ifenslave-
<email address hidden>
Previous HEAD position was 1d22c9b... Added "ifenslave-
HEAD is now at 64392a5... Re-apply Ubuntu delta to new source.
---
---
<email address hidden>
Previous HEAD position was 64392a5... Re-apply Ubuntu delta to new source.
HEAD is now at 1701e16... * "ifupdown (>= 0.7.46)" compatibility update (Closes: #742410). Thanks to Andrew Shadura. * Added versioned Depends on "ifupdown (>= 0.7.46)".
<email address hidden>
Previous HEAD position was 1701e16... * "ifupdown (>= 0.7.46)" compatibility update (Closes: #742410). Thanks to Andrew Shadura. * Added versioned Depends on "ifupdown (>= 0.7.46)".
HEAD is now at e47d568... * Merge from Debian unstable. Remaining changes: - Upstart event based bond bringup: + Drop ethernet+wifi example + Drop two_ethernet example + Update ethernet+
---
I could see that we diverged from upstream code (Debian's) in favor of some other modifications.
Ubuntu fix would be different then a possible upstream fix.
We have nowadays:
<email address hidden>
ifenslave | 2.4ubuntu1 | trusty | source, all
ifenslave | 2.5ubuntu1 | utopic | source, all
Both need fixes for this particular case.
* wait for next comments.
| Rafael David Tinoco (inaddy) wrote : | #13 |
After talking to Stéphane Graber, from Ubuntu Core Foundations Team, we decided that I should implement independent locking for every interface (like I have already proposed to Debian upstream project) and to implement locking mechanisms for dependent interfaces inside the hooks.
So:
1) ifupdown would lock every given interface (or all if "-a" is given).
2) locking for child interfaces (slaves for bonding, attached to bridges, ...) is going to be done inside hooks. Today most important hooks for ifupdown are: bridging, vlan and bonding. I have to guarantee those 3 are ok with any change made to ifupdown tool.
* wait for next comments/
| Iain Lane (laney) wrote : | #14 |
(unsubscribing ~ubuntu-sponsors per comment #13, please re-subscribe when patches are ready)
| Rafael David Tinoco (inaddy) wrote : | #15 |
Discussing this with Foundations we concluded ifupdown should not only lock "per-interface" basis, but it should have also a way of creating an hierarchy of interfaces (which locking the master one would imply in all slaves to be locked also - for vlan, aliases, bridging, etc) so in a possible parallel execution ifupdown would obey those restrictions and configure interfaces in a proper order - guaranteeing locking.
I'm preparing those changes and I'll suggest them upstream. If they get accepted I'll provide SRUs for precise and trusty. If SRUs or upstream code proposal are not accepted I may created a parallel ifupdown package being maintained by me to address those issues.
Thank you.. Coming back to this soon.
| tags: | added: cts |
| Rafael David Tinoco (inaddy) wrote : | #16 |
Im getting back to this after sometime. After the discussion was brought to upstream we did not get feedback regarding proposed changes but investigating further it is clear that ifupdown is suffering from race conditions that cannot be solved simply by creating:
1) big lock - since its ifup/ifdown/ifquery are reentrant*
2) big lock - does not attend to interface order/priority for parallel executions**
3) fine-grained lock - does not attend interface order/priority for parallel executions**
* could be solved by ENV variable being set not to lock childs) by up/down scripts.
** group of interfaces such as "bridges" and all interfaces connected to it, interfaces and all vlans connected to it
Final approach here will be to guarantee:
1) interfaces should be locked independently on executions
2) locks have to respect interface hierarchy (locking group of inter-connected interfaces such as bridges/interfaces, interfaces/vlans)
3) all up/down scripts have to be reviewed after any locking mechanism change (deadlock by reentrant calls)
IMO
1) stanzas should be created to "group" interfaces to be locked (for parallel executions) respecting hierarchy/order between them
2) locking/state have to be together and independent
FINALLY
The change to guarantee all that will involve code AND interfaces file change (for adding special stanzas to make sure appropriate order and locking is done during interfaces activation). It is not clear if this change will be smooth enough for a "stable release update". If not I'll try to provide a PPA to address any needed code-change for those who suffer from this issue.
BY NOW
The only way to guarantee interfaces activation ordering (without suffering from intermittent race conditions like the one explained on this bug) would be to activate interface one by one outside sysv/upstart scripts OR to use "pre/post" commands with reentrant calls to ifupdown based on the desired order.
Any comments here are much appreciated.
Thank you
Rafael Tinoco
| Changed in ifupdown (Ubuntu): | |
| assignee: | Rafael David Tinoco (inaddy) → nobody |
| Changed in ifupdown (Ubuntu): | |
| assignee: | nobody → Dariusz Gadomski (dgadomski) |
| Launchpad Janitor (janitor) wrote : | #17 |
Status changed to 'Confirmed' because the bug affects multiple users.
| Changed in ifupdown (Ubuntu Precise): | |
| status: | New → Confirmed |
| Changed in ifupdown (Ubuntu Trusty): | |
| status: | New → Confirmed |
| Changed in ifupdown (Ubuntu Vivid): | |
| status: | New → Confirmed |
| Dariusz Gadomski (dgadomski) wrote : | #21 |
Adding SRU proposal for Vivid.
| Dariusz Gadomski (dgadomski) wrote : | #22 |
Adding SRU proposal for Trusty.
| Dariusz Gadomski (dgadomski) wrote : | #23 |
Adding SRU proposal for Trusty (to make ifenslave compatible with ifupdown changes).
| description: | updated |
| Changed in ifupdown (Ubuntu): | |
| importance: | Undecided → Medium |
| Changed in ifupdown (Ubuntu Precise): | |
| importance: | Undecided → Medium |
| Changed in ifupdown (Ubuntu Trusty): | |
| importance: | Undecided → Medium |
| Changed in ifupdown (Ubuntu Vivid): | |
| importance: | Undecided → Medium |
| Dariusz Gadomski (dgadomski) wrote : | #24 |
Adding SRU proposal for Xenial.
| Martin Pitt (pitti) wrote : | #25 |
Sponsored the patch for xenial. Let's give this some maturing there first.
| Changed in ifupdown (Ubuntu): | |
| status: | In Progress → Fix Committed |
| Sebastien Bacher (seb128) wrote : | #26 |
(unsubscribing sponsors for now then, please subscribe them back after getting some feedback from the xenial update)
| Launchpad Janitor (janitor) wrote : | #27 |
This bug was fixed in the package ifupdown - 0.7.54ubuntu2
---------------
ifupdown (0.7.54ubuntu2) xenial; urgency=medium
* Per-interface hierarchical locking. Backported from Debian git head.
(LP: #1337873, Closes: #753755)
-- Dariusz Gadomski <email address hidden> Thu, 10 Nov 2015 11:30:14 +0200
| Changed in ifupdown (Ubuntu): | |
| status: | Fix Committed → Fix Released |
| Martin Pitt (pitti) wrote : | #28 |
I sponsored the trusty and wily patches.
| Changed in ifupdown (Ubuntu Vivid): | |
| status: | Confirmed → Won't Fix |
| Changed in ifupdown (Ubuntu Wily): | |
| status: | New → In Progress |
| Changed in ifupdown (Ubuntu Trusty): | |
| status: | Confirmed → In Progress |
| Changed in ifenslave (Ubuntu Wily): | |
| status: | New → Fix Released |
| Changed in ifenslave (Ubuntu Vivid): | |
| status: | New → Fix Released |
| Martin Pitt (pitti) wrote : | #29 |
Setting precise tasks to "wontfix", this is too complex to backport and the bug is not nearly important enough to risk regressions due to too invasive backports.
| Changed in ifenslave (Ubuntu Trusty): | |
| status: | New → In Progress |
| Changed in ifenslave (Ubuntu Precise): | |
| status: | New → Won't Fix |
| Changed in ifupdown (Ubuntu Precise): | |
| status: | Confirmed → Won't Fix |
| Changed in ifenslave (Ubuntu): | |
| status: | New → Fix Released |
Hello Rafael, or anyone else affected,
Accepted ifupdown into trusty-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
| Changed in ifupdown (Ubuntu Trusty): | |
| status: | In Progress → Fix Committed |
| tags: | added: verification-needed |
| Brian Murray (brian-murray) wrote : | #31 |
Hello Rafael, or anyone else affected,
Accepted ifenslave into trusty-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
| Changed in ifenslave (Ubuntu Trusty): | |
| status: | In Progress → Fix Committed |
| Brian Murray (brian-murray) wrote : | #32 |
Hello Rafael, or anyone else affected,
Accepted ifupdown into wily-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
| Changed in ifupdown (Ubuntu Wily): | |
| status: | In Progress → Fix Committed |
| Changed in ifupdown (Ubuntu Wily): | |
| importance: | Undecided → Medium |
| Changed in ifenslave (Ubuntu): | |
| importance: | Undecided → Medium |
| Changed in ifenslave (Ubuntu Precise): | |
| importance: | Undecided → Medium |
| Changed in ifenslave (Ubuntu Trusty): | |
| importance: | Undecided → Medium |
| Changed in ifenslave (Ubuntu Vivid): | |
| importance: | Undecided → Medium |
| Changed in ifenslave (Ubuntu Wily): | |
| importance: | Undecided → Medium |
| Dariusz Gadomski (dgadomski) wrote : Re: Precise, Trusty, Utopic - ifupdown initialization problems caused by race condition | #33 |
I have verified both Trusty and Wily. The verification was automated cyclic rebooting of a VM containing 3 NICs - 2 of them were used in bonding in active-backup. Before the fix has been implemented this test failed with some interfaces uninitialized or the bonding mode being wrong (the default round-robin was set).
This time, with the -proposed versions, after over 48 hours of the test none of the symptoms occurred.
Thus, tagging as verified.
| tags: |
added: sts verification-done removed: cts verification-needed |
| Martin Pitt (pitti) wrote : | #34 |
Since per-interface locking landed in Xenial, we've been getting crashes, see bug 1532722. Until this is fixed, I'm marking this as v-failed. We'll then need to update the SRU with this fix as well.
| tags: |
added: verification-failed removed: verification-done |
| Dariusz Gadomski (dgadomski) wrote : | #35 |
SRU proposal for Trusty (extended with fix to bug #1532722)
| Dariusz Gadomski (dgadomski) wrote : | #36 |
SRU proposal for Wily (extended with fix to bug #1532722)
| summary: |
- Precise, Trusty, Utopic - ifupdown initialization problems caused by - race condition + ifupdown initialization problems caused by race condition |
| Dariusz Gadomski (dgadomski) wrote : | #37 |
Updated SRU proposal for Trusty (fix to bug #1532722)
| Dariusz Gadomski (dgadomski) wrote : | #38 |
New SRU proposal for Wily (with fix to bug #1532722)
| Martin Pitt (pitti) wrote : | #39 |
I sponsored the updated trusty/wily patches, thanks!
| Changed in ifupdown (Ubuntu Trusty): | |
| status: | Fix Committed → In Progress |
| Changed in ifupdown (Ubuntu Wily): | |
| status: | Fix Committed → In Progress |
| Changed in ifupdown (Ubuntu Trusty): | |
| status: | In Progress → Fix Committed |
| Changed in ifupdown (Ubuntu Wily): | |
| status: | In Progress → Fix Committed |
| tags: |
added: verification-needed removed: verification-failed |
| Dariusz Gadomski (dgadomski) wrote : | #40 |
Since I've added the fix to bug 1532722 and after several days of testing I did not observe any other issues on Trusty and Wily I'm tagging this as verified.
| tags: |
added: verification-done removed: verification-needed |
The verification of the Stable Release Update for ifenslave has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
| Launchpad Janitor (janitor) wrote : | #42 |
This bug was fixed in the package ifenslave - 2.4ubuntu1.2
---------------
ifenslave (2.4ubuntu1.2) trusty; urgency=medium
* Don't depend on /run/network/
-- Dariusz Gadomski <email address hidden> Thu, 01 Oct 2015 11:30:24 +0200
| Changed in ifenslave (Ubuntu Trusty): | |
| status: | Fix Committed → Fix Released |
| Launchpad Janitor (janitor) wrote : | #43 |
This bug was fixed in the package ifupdown - 0.7.47.2ubuntu4.3
---------------
ifupdown (0.7.47.2ubuntu4.3) trusty; urgency=medium
[ Martin Pitt ]
* Fix ifquery crash if interface state file does not exist yet.
(Closes: #810779, LP: #1532722)
-- Dariusz Gadomski <email address hidden> Tue, 12 Jan 2016 11:05:16 +0100
| Changed in ifupdown (Ubuntu Trusty): | |
| status: | Fix Committed → Fix Released |
| Changed in ifupdown (Ubuntu Wily): | |
| status: | Fix Committed → Fix Released |
| Max Krasilnikov (pseudo) wrote : | #44 |
Hello!
I am running Ubuntu 14.04.3 LTS.
This update introduces problem in my setup, adapted to old behavior:
auto eth2
iface eth2 inet manual
bond-master bond0
up ip link set $IFACE txqueuelen 10000
auto eth3
iface eth3 inet manual
bond-master bond0
up ip link set $IFACE txqueuelen 10000
auto bond0
iface bond0 inet static
address 10.0.66.3
netmask 255.255.255.0
bond-mode 802.3ad
bond-slaves none
pre-ip ifup eth2
pre-up ifup eth3
up ip link set $IFACE txqueuelen 10000
Interface bond0 is not becoming up:
root@storage003:~# ps axu |grep ifup
root 780 0.0 0.0 4392 1448 ? Ss 00:03 0:00 ifup --allow auto eth3
root 783 0.0 0.0 4392 1460 ? Ss 00:03 0:00 ifup --allow auto eth2
root 1067 0.0 0.0 4392 1516 ? Ss 00:03 0:00 ifup --allow auto bond0
root 1087 0.0 0.0 4448 668 ? S 00:03 0:00 /bin/sh -c ifup eth3
root 1088 0.0 0.0 4388 1344 ? S 00:03 0:00 ifup eth3
root 2150 0.0 0.0 4388 1548 ? S 00:03 0:00 ifup -a
root@storage003:~# ps axu |grep ifenslave
root 816 0.1 0.0 4448 1436 ? S 00:03 0:48 /bin/sh /etc/network/
root 817 0.1 0.0 4448 1504 ? S 00:03 0:48 /bin/sh /etc/network/
root 1792323 0.0 0.0 15124 2136 pts/0 S+ 11:39 0:00 grep --color=auto ifenslave
root@storage003:~# for i in `ps -ef | grep ifup | grep -v grep | awk '{print $2}'`; do echo $i; cat /proc/$i/environ; done
780
ID_BUS=
ID_BUS=
UPSTART_
| Dariusz Gadomski (dgadomski) wrote : | #45 |
Hello Max,
My guess (supported by a test I made in a test environment) is the cause of the problem are those lines under iface bond0:
pre-ip ifup eth2
pre-up ifup eth3
Those are most probably causing a deadlock, since the new release aims to fix the race condition causing the original issue (described above).
Removing those lines (and hence following the convention described in /usr/share/
In your case ifupdown will be responsible for bringing eth2 and eth3 devices while setting up bond0, so you don't need to undertake any additional actions in the bond0 section - please depend on this.
| Max Krasilnikov (pseudo) wrote : | #46 |
Thanx a lot, all is working now. My bad.
| sirswa (sirswa) wrote : | #47 |
Hi all
I upgraded ifenslave and ifupdown to 2.4ubuntu1.2 and 0.7.47.2ubuntu4.3 respectively. After reboot, the bonding did not come up correctly; mtu were set wrongly (default to 1500), default gateway was not set, nameserver information were not set in /etc/resolv.conf
After downgrading ifupdown to 0.7.47.2ubuntu4 and rebooted the server, everything came up fine again.
root@rcstodc1r2
ifenslave:
Installed: 2.4ubuntu1.2
Candidate: 2.4ubuntu1.2
root@rcstodc1r2
ifupdown:
Installed: 0.7.47.2ubuntu4
Candidate: 0.7.47.2ubuntu4.3
root@rcstodc1r2
Linux rcstodc1r24-01-ac 3.13.0-45-generic #74-Ubuntu SMP Tue Jan 13 19:36:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
root@rcstodc1r2
DISTRIB_ID=Ubuntu
DISTRIB_
DISTRIB_
DISTRIB_
/etc/network/
<snip>
auto p5p1
iface p5p1 inet manual
mtu 9000
auto p5p2
iface p5p2 inet manual
mtu 9000
auto p5p1.104
iface p5p1.104 inet manual
bond-master bond104
bond-primary p5p1.104
mtu 9000
auto p5p2.104
iface p5p2.104 inet manual
bond-master bond104
mtu 9000
auto bond104
iface bond104 inet static
address X.X.X.X
netmask 255.255.248.0
network X.X.X.X
broadcast X.X.X.X
gateway X.X.X.X
dns-nameservers X.X.X.X
dns-search erc.monash.edu.au
bond-miimon 100
bond-mode 1
mtu 9000
bond-primary p5p1.104
bond-slaves none
Since the server is in storage cluster server pool and could not hold on to it for long. I downgraded the ifupdown package and joined the production pool.
| Dariusz Gadomski (dgadomski) wrote : | #48 |
@sirswa I tried to reproduce this in my environment - unsuccessfully.
I created a similar config - please take a look at my it for reference: http://
Maybe you could spot a difference that I overlooked.
This could mean that this change may be interfering with something in your system we did not take under consideration.
Could you provide the output of:
service --status-all
and
find /etc/network
(to see what if-*.d scripts are you running).
Thanks!
| sirswa (sirswa) wrote : | #49 |
Hi Dariusz
Thanks for looking into it. The config looks like mine.
p3p1 interfaces are Mellanox CX PCI cards and could it be due to modules are not loaded yet when bonding starts?
I have upgraded ifupdown to 0.7.47.2ubuntu4.3 version to get more logs. I have attached
service --status-all output
apt-cache for ifenslave and ifupdown
dmesg
| Dariusz Gadomski (dgadomski) wrote : | #50 |
Thank you for the report sirswa.
I have analyzed your config and came to some conclusions. You may want to consider using one of the approaches below:
* giving up on bonding and replacing it with bridging in STP mode (please consult the man pages at http://
* implementing your VLANs on top of the bonding interfaces instead of physical interfaces (i.e. defining bond104.104, bond944.944 and bond945.945 istead of p5p1.104, p5p1.944 etc.). The configuration you were using places the VLAN layer below the bonding layer and could produce unexpected behaviour. Please use the approach described here as reference: https:/
https:/
I am aware that the configuration you are using was working before, but despite this fact it was never supported. The latest changes made to ifupdown just exposed that fact.
| Changed in ifupdown (Debian): | |
| status: | New → Fix Released |


Attaching script to reproduce described problem.