static-network-up event does not wait for interfaces to have an address

Bug #838968 reported by Scott Moser
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Fix Released
Medium
Unassigned
ifupdown (Ubuntu)
Fix Released
High
Scott Moser
isc-dhcp (Ubuntu)
Fix Released
Medium
Steve Langasek

Bug Description

cloud-init's /etc/init/cloud-init-nonet.conf is configured to block running of cloud-init.conf until "static-network-up" is emitted.

The point of that event was to wait until interfaces were all configured, and thus the network available.

In practice, what I'm seeing is that for dhcp interfaces configured like:
   # The primary network interface
   auto eth0
   iface eth0 inet dhcp

ifupdown runs /etc/network/if-up.d/upstart immediately upon forking off dhclient.

$ ls -ld --full-time /var/run/network/static-network-up-emitted
drwxr-xr-x 2 root root 40 2011-09-01 12:21:26.778833999 -0400 /var/run/network/static-network-up-emitted

Then, in /var/log/syslog, the first non-kernel messages for this boot start at 12:21:28. relevant dhcp messages are:
Sep 1 12:21:29 mabolo dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 7
Sep 1 12:21:30 mabolo kernel: [ 10.885045] bnx2 0000:02:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex
Sep 1 12:21:30 mabolo kernel: [ 10.886232] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Sep 1 12:21:36 mabolo dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 13
...
Sep 1 12:22:13 mabolo dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 14
Sep 1 12:22:13 mabolo dhclient: DHCPOFFER of 10.55.55.5 from 10.55.55.7
Sep 1 12:22:13 mabolo dhclient: DHCPREQUEST of 10.55.55.5 on eth0 to 255.255.255.255 port 67
Sep 1 12:22:14 mabolo dhclient: DHCPACK of 10.55.55.5 from 10.55.55.7
Sep 1 12:22:14 mabolo dhclient: can't create /var/lib/dhcp3/dhclient.eth0.leases: No such file or directory
Sep 1 12:22:14 mabolo dhclient: bound to 10.55.55.5 -- renewal in 1399 seconds.

So it would surely appear that the event is happening immediately.
However, to test this theory, I put the following into /etc/network/if-up.d/myfootest, and set up 'eth1' in /etc/network/interfaces.. (I knew that there would be no response on eth1's interface.)
#!/bin/sh

(
echo "===== $(date -R) ====="
echo "IFACE=$IFACE LOGICAL=$LOGICAL ADDRFAM=$ADDRFAM METHOD=$METHOD MODE=$MODE PHASE=$PHASE"
ifconfig $IFACE
) | tee -a /var/log/myfoo.log

exit 0

Then, i ran 'sudo ifup eth1' and that took almost exactly 1 minute before writing the log and to the screen. One difference between the two is that eth1 is not attached to a device per ethtool.

Related Bugs:
 * bug 974284: invoking dhclient3 with -1 causes issue if no dhcp server available

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: ifupdown 0.7~alpha5.1ubuntu3
ProcVersionSignature: User Name 3.0.0-9.15-virtual 3.0.3
Uname: Linux 3.0.0-9-virtual i686
Architecture: i386
Date: Thu Sep 1 17:02:21 2011
Ec2AMI: ami-00000070
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.small
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: ifupdown
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

Just to be clear, static-network-up should not fire until interfaces at least have an ipv4 (or ipv6) address and it appears to be firing immediately upon their coming up.

Revision history for this message
Scott Moser (smoser) wrote :

We could fairly easily write a helper program that would get run from /etc/network/if-up.d/upstart and take care of emitting the event.
Something like the following at the end of that script, replacing our 'all_interfaces_up' stanza.

if mkdir /var/run/network/static-network-up-daemon 2>/dev/null; then
   wait-for-interfaces-up --markdir=/var/run/network/static-network-up-emitted \
      --pid=/var/run/network/static-network-up-daemon/pid \
      --emit="static-network-up" \
      $(get_auto_interfaces)
fi

That program would just have to sit and wait until all the interfaces it was told were up, and then emit the signal.

Revision history for this message
Scott Moser (smoser) wrote :

I'm not sure how to avoid polling, but surely there is some way.

Revision history for this message
Steve Langasek (vorlon) wrote :

Discussed on IRC, the problem here is that ifupdown invokes dhclient in a way that it still exits with success when it has *failed* to acquire an IP address, causing ifupdown to emit the interface-up event.

Passing -1 to dhclient3 gives the correct behavior: daemonize when an address is acquired, and if the timeout is reached without acquiring an address, exit non-zero instead of daemonizing.

There's a question of whether we should also raise the timeout in dhclient to accomodate systems like the one where this was first encountered, where 60 seconds isn't long enough to get the interface up and get an address via dhcp; but at least the ifupdown change is a per-se-correct one-line fix.

Changed in ifupdown (Ubuntu):
status: New → Triaged
importance: Undecided → High
milestone: none → ubuntu-11.10-beta-2
assignee: nobody → Scott Moser (smoser)
Revision history for this message
Scott Moser (smoser) wrote :

cloud-init (0.6.1-0ubuntu16) oneiric; urgency=low

  * catch up with trunk at revision 439
  * warn on failure to set hostname (LP: #832175)
  * properly wait for all static interfaces to be up before
    cloud-init runs (depends on fix in LP:# 838968).
  * in DataSources NoCloud and OVF, do not set hostname to the
    static value 'ubuntuhost' if local-hostname is not in metadata
    (LP: #838280)
  * improve the way ssh_authorized_keys is updated, so that the
    values given will be used. (LP: #434076, LP: #833499)
  * cloud-init-notnet.conf: minor changes to config
 -- Scott Moser <email address hidden> Thu, 01 Sep 2011 21:14:09 -0400

Changed in cloud-init (Ubuntu):
importance: Undecided → Medium
status: New → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

I just uploaded, but wanted to put some evidence of this being fixed here.
On the system that was taking 45 seconds for dhcp (where we saw this bug) we have:

$ ls -altr --full-time /run/network
total 4
-rw-r--r-- 1 root root 0 2011-09-02 15:46:56.877861983 -0400 ifup.lo
drwxr-xr-x 2 root root 40 2011-09-02 15:47:34.157861476 -0400 static-network-up-emitted
-rw-r--r-- 1 root root 0 2011-09-02 15:47:34.157861476 -0400 ifup.eth0
-rw-r--r-- 1 root root 16 2011-09-02 15:47:34.157861476 -0400 ifstate

Then, in /var/log/cloud-init.log:
2011-09-02 15:46:57,745 - cloud-init[INFO]: cloud-init start-local running: Fri, 02 Sep 2011 19:46:57 +0000. up 7.91 seconds
2011-09-02 15:46:57,755 - __init__.py[DEBUG]: searching for data source in ['DataSourceNoCloud']
2011-09-02 15:46:57,764 - __init__.py[DEBUG]: Did not find data source. searched classes: ['DataSourceNoCloud']
2011-09-02 15:47:34,300 - cloud-init[INFO]: cloud-init start running: Fri, 02 Sep 2011 19:47:34 +0000. up 44.51 seconds

Note, 'ifup.lo' was created at 46:56, and that cloud-init local ran at (utc) 46:57.
ifup.eth0 and static-network-up-emitted were created at 47:34 (38 seconds later) and cloud-init started at 47:34 (immediately following).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.7~alpha5.1ubuntu4

---------------
ifupdown (0.7~alpha5.1ubuntu4) oneiric; urgency=low

  * invoke dhclient3 with '-1', meaning it should exit failure if it does
    not receive a response in 60 seconds. (LP: #838968)
  * fix the broken static-network-up-emitted event. This now keeps
    state of which interfaces have been brought up by marker files
    in /run/network named ifup.IFACE and a directory named
    static-network-up-emitted indicating static-network-up has been
    emitted.
 -- Scott Moser <email address hidden> Fri, 02 Sep 2011 15:50:47 -0400

Changed in ifupdown (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote :

We also need to protect against ever actually *hitting* the dhclient timeout during normal operation.

netcfg and network-manager already do their own dhcp timeout handling, so only ifupdown is still affected. We should raise the default timeout in dhclient, otherwise we just have interfaces that never come up instead of ones that say they're up when they aren't.

Changed in isc-dhcp (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Steve Langasek (vorlon)
milestone: none → ubuntu-11.10-beta-2
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.1.1-P1-17ubuntu8

---------------
isc-dhcp (4.1.1-P1-17ubuntu8) oneiric; urgency=low

  * debian/patches/dhclient-safer-timeout.dpatch: Increase default
    timeout to ensure we don't hit it. Closes LP: #838968.
 -- Steve Langasek <email address hidden> Fri, 02 Sep 2011 14:34:52 -0700

Changed in isc-dhcp (Ubuntu):
status: Triaged → Fix Released
Scott Moser (smoser)
description: updated
Revision history for this message
Christoph Martin (martin-uni-mainz) wrote :

I don't think this bug is really fixed. At least not in the versions which are in precise: isc-hdcp-client=4.1.ESV-R4-0ubuntu5.2, ifupdown=0.7~beta2ubuntu8.

The problem occurs, if a dhcp server is not availlable for some time. Then dhclient will send one DHCP-REQUEST on ifup or lease renewal and might never get an answer inside the timeout interval because this one DHCP-REQUEST package might get lost. Then dhclient will terminate and the interface will get never configured, thus making the host unavaillable from the network.

Revision history for this message
Stéphane Graber (stgraber) wrote :

That's another bug and yes the intended fix didn't work in 12.04.
There's a test isc-dhcp in precise-proposed with the fix, we're expecting this to land for 12.04.1.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers