networking services restart hangs due to missing /etc/network/run/ifenslave.* files

Bug #1269921 reported by Keegan Holley
44
This bug affects 8 people
Affects Status Importance Assigned to Milestone
ifenslave (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Ubuntu 12.04 LTS
ifenslave-2.6

ifup hangs when bonds are configured and networking is restarted via '/etc/init.d/networking restart'. The script in '/etc/network/if-pre-up.d' does not recognize the bond master and hangs because the '/etc/network/run/ifenslave.*' files are not created.

##/etc/network/if-pre-up.d/ifenslave

#!/bin/sh

[ "$VERBOSITY" = 1 ] && set -x

IFSTATE=/run/network/ifstate

add_master()
{
 # Return if $BOND_MASTER is already a bonding interface.
 [ -f "/sys/class/net/$BOND_MASTER/bonding/slaves" ] && return

 # If the bonding module is not yet loaded, load it.
 if [ ! -r /sys/class/net/bonding_masters ]; then
  modprobe -q bonding

  FAILED=1
  echo "Waiting for bonding kernel module to be ready (will timeout after 5s)"
  for i in $(seq 50); do
   if [ -r /sys/class/net/bonding_masters ]; then
    FAILED=0
    break
   fi
   sleep 0.1
  done

  if [ "$FAILED" = "1" ]; then
   echo "/sys/class/net/bonding_masters doesn't exist. Unable to create $BOND_MASTER"
   exit 1
  fi

 fi

 # Create the master interface.
 if ! grep -sq "\\<$BOND_MASTER\\>" /sys/class/net/bonding_masters; then
  echo "+$BOND_MASTER" > /sys/class/net/bonding_masters
 fi
}

sysfs_change_down()
{
 # Called with :
 # $1 = basename of the file in bonding/ to write to.
 # $2 = value to write. Won't write if $2 is empty.
 if [ "$2" ] ; then
  # If the value we plan to write is different from the current one...
  if ! grep -sq "\\<$2\\>" "/sys/class/net/$BOND_MASTER/bonding/$1" ; then
   # ...and the master is up...
   if ip link show "$BOND_MASTER" | grep -sq '[<,]UP[,>]' ; then
    # ...bring the master down.
    ip link set dev "$BOND_MASTER" down
   fi
  fi
  sysfs "$1" "$2"
 fi
}

sysfs()
{
 # Called with :
 # $1 = basename of the file in bonding/ to write to.
 # $2 = value to write. Won't write if $2 is empty.
 if [ "$2" ] ; then
  echo "$2" > "/sys/class/net/$BOND_MASTER/bonding/$1"
  return $?
 fi
 return 0
}

sysfs_add()
{
 #??Called with :
 # $1 = target filename.
 # $2 = values to write.
 for value in $2; do
  # Do not add $2 to $1 if already present.
  if ! grep -sq "\\<$value\\>" /sys/class/net/$BOND_MASTER/bonding/$1
  then
      sysfs "$1" "+$value"
  fi
 done
}

# early_setup_master is the place where we do master setup that need to be done before enslavement.
early_setup_master()
{
 # Warning: the order in wich we write into the sysfs files is important.
 # Double check in drivers/net/bonding/bond_sysfs.c in linux kernel source tree
 # before changing anything here.

 # fail_over_mac must be set before enslavement of any slaves.
 sysfs fail_over_mac "$IF_BOND_FAIL_OVER_MAC"
}

# late_setup_master runs actions that need to happen after enslavement
late_setup_master()
{
 # primary must be set after mode (because only supported in some modes) and after enslavement.
 # The first slave in bond-primary found in current slaves becomes the primary.
 # If no slave in bond-primary is found, then primary does not change.
 for slave in $IF_BOND_PRIMARY ; do
  if grep -sq "\\<$slave\\>" "/sys/class/net/$BOND_MASTER/bonding/slaves" ; then
   sysfs primary "$slave"
   break
  fi
 done

 # primary_reselect should be set after mode (because only supported in some modes), after enslavement
 # and after primary. This is currently (2.6.35-rc1) not enforced by the bonding driver, but it is
 # probably safer to do it in that order.
 sysfs primary_reselect "$IF_BOND_PRIMARY_RESELECT"

 # queue_id must be set after enslavement.
 for iface_queue_id in $IF_BOND_QUEUE_ID
 do
  sysfs iface_queue_id $iface_queue_id
 done

 # active_slave must be set after mode and after enslavement.
 # The slave must be up and the underlying link must be up too.
 # FIXME: We should have a way to write an empty string to active_slave, to set the active_slave to none.
 if [ "$IF_BOND_ACTIVE_SLAVE" ] ; then
  # Need to force interface up before. Bonding will refuse to activate a down interface.
  ip link set "$IF_BOND_ACTIVE_SLAVE" up
  sysfs active_slave "$IF_BOND_ACTIVE_SLAVE"
 fi
}

enslave_slaves()
{
 case "$BOND_SLAVES" in
  none)
   BOND_SLAVES=""
   ;;
  all)
   BOND_SLAVES=`sed -ne 's/ *\(eth[^:]*\):.*/\1/p' /proc/net/dev`
   AUTOIF="yes"
   ;;
 esac

 [ "$VERBOSITY" = 1 ] && v=-v
 for slave in $BOND_SLAVES ; do
  if ( [ "$AUTOIF" ] && grep -q "^$slave=" $IFSTATE ) ; then
   echo "Not enslaving interface $slave since it is already configured"
  else
   # Ensure $slave is down.
   ip link set "$slave" down 2>/dev/null
   if ! sysfs_add slaves "$slave" 2>/dev/null ; then
    echo "Failed to enslave $slave to $BOND_MASTER. Is $BOND_MASTER ready and a bonding interface ?" >&2
   else
    # Bring up slave if it is the target of an allow-bondX stanza.
    # This is usefull to bring up slaves that need extra setup.
    if [ -z "$(which ifquery)" ] || ifquery --allow \"$BOND_MASTER\" --list | grep -q $slave; then
     ifup $v --allow "$BOND_MASTER" "$slave"
    fi
   fi
  fi
 done
}

setup_master()
{
 # Warning: the order in wich we write into the sysfs files is important.
 # Double check in drivers/net/bonding/bond_sysfs.c in linux kernel source tree
 # before changing anything here.

 # use_carrier can be set anytime.
 sysfs use_carrier "$IF_BOND_USE_CARRIER"
 # num_grat_arp can be set anytime.
 sysfs num_grat_arp "$IF_BOND_NUM_GRAT_ARP"
 # num_unsol_na can be set anytime.
 sysfs num_unsol_na "$IF_BOND_NUM_UNSOL_NA"

 # xmit_hash_policy can be set anytime.
 # Changing xmit_hash_policy requires $BOND_MASTER to be down.
 sysfs_change_down xmit_hash_policy "$IF_BOND_XMIT_HASH_POLICY"

 # arp_ip_target must be set before arp_interval.
 sysfs_add arp_ip_target "$IF_BOND_ARP_IP_TARGET"
 sysfs arp_interval "$IF_BOND_ARP_INTERVAL"

 # miimon must be set before updelay and downdelay.
 sysfs miimon "$IF_BOND_MIIMON"
 sysfs downdelay "$IF_BOND_DOWNDELAY"
 sysfs updelay "$IF_BOND_UPDELAY"

 # Changing ad_select requires $BOND_MASTER to be down.
 sysfs_change_down ad_select "$IF_BOND_AD_SELECT"

 # Changing mode requires $BOND_MASTER to be down.
 # Mode should be set after miimon or arp_interval, to avoid a warning in syslog.
 sysfs_change_down mode "$IF_BOND_MODE"

 # arp_validate must be after mode (because mode must be active-backup).
 sysfs arp_validate "$IF_BOND_ARP_VALIDATE"

 # lacp_rate must be set after mode (because mode must be 802.3ad).
 # Changing lacp_rate requires $BOND_MASTER to be down.
 sysfs_change_down lacp_rate "$IF_BOND_LACP_RATE"

 # Finally bring the bond up, note that without a slave it won't be usable though
 ip link set dev $BOND_MASTER up
}

# Option slaves deprecated, replaced by bond-slaves, but still supported for backward compatibility.
IF_BOND_SLAVES=${IF_BOND_SLAVES:-$IF_SLAVES}

if [ "$IF_BOND_MASTER" ] ; then
 BOND_MASTER="$IF_BOND_MASTER"
 BOND_SLAVES="$IFACE"
else
 if [ "$IF_BOND_SLAVES" ] ; then
  BOND_MASTER="$IFACE"
  BOND_SLAVES="$IF_BOND_SLAVES"
 fi
fi

# Exit if nothing to do...
[ -z "$BOND_MASTER$BOND_SLAVES" ] && exit

# Always try to create the master, returns if already exists
add_master

if [ "$BOND_MASTER" = "$IFACE" ]; then
 # Setup the master interface
 early_setup_master
 setup_master

 # Indicate that we're done setting up the master
 # this is required as ifstate is modified at the beginning
 # of the interface setup, not at the end
 touch /run/network/ifenslave.$IFACE

 # Wait for a slave to join, continuing without a slave
 # would make dhclient, vconfig or brctl fail, so better wait
 # Timeout after a minute
 FAILED=1
 echo "Waiting for a slave to join $BOND_MASTER (will timeout after 60s)"
 for i in $(seq 600); do
  if [ -n "$(cat /sys/class/net/$BOND_MASTER/bonding/slaves)" ]; then
   FAILED=0
   break
  fi
  sleep 0.1
 done
 if [ "$FAILED" = "1" ]; then
  echo "No slave joined $BOND_MASTER, continuing anyway"
 else
  # Trigger the udev bridging hook to bridge the bond if needed
  if [ -x /lib/udev/bridge-network-interface ]; then
   INTERFACE=$BOND_MASTER /lib/udev/bridge-network-interface
  fi

  # Trigger the udev bridging hook to tag the bond if needed
  if [ -x /lib/udev/vlan-network-interface ]; then
   INTERFACE=$BOND_MASTER /lib/udev/vlan-network-interface
  fi
 fi
else
 # Wait for the master to be ready
 [ ! -f /run/network/ifenslave.$BOND_MASTER ] && echo "Waiting for bond master $BOND_MASTER to be ready"
 while :; do
  if [ -f /run/network/ifenslave.$BOND_MASTER ]; then
   break
  fi
  sleep 0.1
 done

 # Only setup one slave at once
 BOND_SLAVES=$IFACE enslave_slaves

 # Call late_setup_master every time we add a slave as we don't have a way to know
 # when all the slaves are up
 BOND_SLAVES=$IFACE late_setup_master
fi
exit 0

## /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how they will be activated. For more information, see interfaces(5).
# chef managed!

auto eth1
iface eth1 inet manual
  bond-master bond0
  bond-primary eth1

# default bridge interface
 auto backend
iface backend inet manual
  bridge_fd 0
  bridge_ports vlan10
  bridge_stp off

# failover nic team
 auto bond0
iface bond0 inet manual
  bond-miimon 100
  bond-mode active-backup
  bond-primary eth1
  bond-slaves none

# The VM network interface
 auto eth0
iface eth0 inet static
  address 10.0.2.15
  broadcast 10.0.2.255
  gateway 10.0.2.0
  netmask 255.255.255.0

# The loopback interface
 auto lo
iface lo inet loopback

# default vlan interface
 auto vlan10
iface vlan10 inet manual
  vlan_raw_device bond0

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: ifenslave-2.6 1.1.0-19ubuntu5
ProcVersionSignature: Ubuntu 3.2.0-32.51-generic 3.2.30
Uname: Linux 3.2.0-32-generic x86_64
ApportVersion: 2.0.1-0ubuntu17.1
Architecture: amd64
Date: Thu Jan 16 17:34:48 2014
InstallationMedia: Ubuntu-Server 12.04.1 LTS "Precise Pangolin" - Release amd64 (20120817.3)
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: ifenslave-2.6
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Keegan Holley (keeganh) wrote :
Revision history for this message
Keegan Holley (keeganh) wrote :

The root cause is in /etc/networking/if-pre-up.d/ifenslave. In starting at line 260.

if the /etc/run/network/ifenslave.<BOND> file doesn't exist ifup goes into an unterminated while loop, since there is nothing inside the loop that will create the file.

 [ ! -f /run/network/ifenslave.$BOND_MASTER ] && echo "Waiting for bond master $BOND_MASTER to be ready"
 while :; do
  if [ -f /run/network/ifenslave.$BOND_MASTER ]; then
   break
  fi
  sleep 0.1
 done

affects: ifenslave-2.6 (Ubuntu) → ifenslave (Ubuntu)
Revision history for this message
Stéphane Graber (stgraber) wrote :

/etc/init.d/networking restart and "sudo restart networking" aren't supported ways of restarting the network, actually, we don't support any way of bouncing the whole network config as it's configured through event based bringup.

Rather than copy/paste half the code of ifenslave, could you instead post your current /etc/network/interfaces as well as the output of "ifdown -a -v" and "ifup -a -v"?

As a reminder, ifupdown is entirely sequential it's therefore required for the ordering to be roughly:
 - loopback
 - primary physical interface (bond member)
 - any other physical interface (bond member)
 - bond interface
 - any virtual interface (vlan, bridge, ...) based on top of the bond

If the interfaces aren't listed in precisely that order, "ifup -a" will hang (though the boot itself won't as it's using events to bring up individual interfaces rather than using ifup -a).

Changed in ifenslave (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for ifenslave (Ubuntu) because there has been no activity for 60 days.]

Changed in ifenslave (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Stefan Andres (s-andres) wrote :

I have the same problem. I've a box with 4 NICs.

2x1 GigE Intel cards (eth1, eth3)
2x10 GigE Broadcoam cards (eth0, eth2)

I'm booting up with the first Intel card which gets an DHCP IP which works fine:

https://gist.github.com/stefanandres/ba24835c3b36795f94c3#file-interfaces-without-bonding

Then I swap that interfaces file (in the end by puppet, but now manually for debugging):

https://gist.github.com/stefanandres/ba24835c3b36795f94c3#file-interfaces-with-minimal-bonding

Then I do ifdown -a which brings all interfaces down and then do a ifup -a -v and then ifup -a hangs forever in the [ -f /run/network/ifenslave.bond0 ] loop:

https://gist.github.com/stefanandres/ba24835c3b36795f94c3#file-ifup-verbose-output

Changed in ifenslave (Ubuntu):
status: Expired → New
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ifenslave (Ubuntu):
status: New → Confirmed
Revision history for this message
Johannes Grassler (jgr-launchpad) wrote :

We now fixed this locally by hacking up a less flexible but working alternative to ifenslave:

https://gist.github.com/jgrassler/7283f812bf69985f4c84

Revision history for this message
Cory Wright (corywright) wrote :

This bug is also affecting our 12.04 LTS systems. It seems to have something to do with whether or not the bonding module is already loaded.

Revision history for this message
Sven Hoexter (sven-timegate) wrote :

Instead of replacing the whole (broken) ifenslave script, it seems to work to just bring up the bond yourself via pre-up.
It's ugly and working somehow against upstart but works for the moment[tm] and you can stay to your own habbits of using ifup/ifdown. :(

auto eth1 eth2 bond0
iface eth1 inet manual
  bond-master bond0

iface eth2 inet manual
  bond-master bond0

iface bond0 inet static
  address 192.168.1.1
  netmask 255.255.255.0
  bond-primary eth1
  bond-slaves eth1 eth2
  pre-up echo +bond0 > /sys/class/net/bonding_masters
  pre-up ifenslave bond0 eth1 eth2

Revision history for this message
Sven Hoexter (sven-timegate) wrote :

small but important mistake in the bond0 stanza:
- pre-up echo +bond0 > /sys/class/net/bonding_masters
+ pre-up echo +bond0 > /sys/class/net/bonding_masters || true

Revision history for this message
Andrew McDermott (frobware) wrote :

I was trying to do something very similar for Juju - when the machine boots we rewrite /etc/network/interfaces to add bridges for active interfaces to support container networking.

Our sequence is:

  $ ifdown -a
  $ transform --in-place /etc/network/interfaces
  $ ifup -a

and with bonds configured using 802.3ad this would always hang on the ifup stage.

If I add a:

  $ sleep 0.5

after the transform stage then the sequence of ifdown/up seems to work very reliably.

Revision history for this message
Andrew McDermott (frobware) wrote :

To clarify, my "transform" script just bridges active interfaces.

Revision history for this message
Cory Wright (corywright) wrote :

This bug has persisted through three LTS releases (12.04, 14.04, 16.04) and had bitten me on each one.

Are there plans to fix this?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.