TrustyTestNetwork boots with cloud-init-nonet timeout

Bug #1524452 reported by Scott Moser
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ifupdown (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

$ grep -ri "gave up waiting" --include "*-serial.log" output.success1/
output.success1/TrustyTestNetwork/logs/boot-serial.log:cloud-init-nonet[133.62]: gave up waiting for a network device.

This reproduces with:
./tools/jenkins-runner -vv tests/vmtests/test_network.py:TrustyTestNetwork

basically something is causing networking to not automatically come up there.

Scott Moser (smoser)
Changed in curtin:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Ryan Harper (raharper) wrote :

In curtin/examples/test/basic_network.yaml
You can see we define an eth2 that is present, but is not configured.

This results in the following eni:

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet static
    gateway 10.0.2.1
    address 10.0.2.100/24
    mtu 1492

auto eth1:1
iface eth1:1 inet static
    dns-nameservers 8.8.8.8
    dns-search barley.maas
    address 10.0.2.200/24
    mtu 1492

auto eth2
iface eth2 inet manual

The auto then triggers the ifwait service to wait on it; but it never will.

What is the expected behavior for curtin when we see an 'type: physical' in the config?

        - type: physical
          name: eth2
          mac_address: "52:54:00:12:34:04"

It feels like we should emit the udev rules to ensure we do the iface to mac_address mapping, but in the absence of
a subnet to skip including it eni.

Will need to see if this affects things like bonds/bridges which expect an iface entry for the device.

For the test case Speed, we can certainly apply a subnet configuration here which will ensure the iface comes up and we save the 120 seconds of execution time, but will need to see whether we should skip emitting the auto XXX if no subnet is attached.

Revision history for this message
Ryan Harper (raharper) wrote :

Lovely, so for bonding and bridging, the underlying interface should have an auto $iface and iface xxx config, to ensure that any attributes for the underlying interface are applied prior to bringing up bond0 or br0.

However, for an interface that doesn't have a network (nor is part of any virtual network device) there is no need for the interface to be brought up...

Revision history for this message
Ryan Harper (raharper) wrote :

Further debugging traces this down to the eth1:1 part of the eni.

It's not yet clear why eth1:1 stanza holds up cloud-init;

Revision history for this message
Ryan Harper (raharper) wrote :

Oi, this feels like it's been discussed before but, since there is no kernel object for eth1:1, udev won't emit a net-interface-add event, which is handled by /etc/init/network-interface.conf (which calls if up on the device).

/etc/init/networking.conf is responsible for the others and calls ifup -a (to hit the virtual interfaces). However, /etc/network/if-up.d/upstart includes query of which interfaces should already be up (ie, udev should have brought them up)

This script includes the following:

all_interfaces_up() {
        # return true if all interfaces listed in /etc/network/interfaces as 'auto'
        # are up. if no interfaces are found there, then "all [given] were up"
        local prefix="$1" iface=""
        for iface in $(get_auto_interfaces); do
                # if cur interface does is not up, then all have not been brought up
                [ -f "${prefix}${iface}" ] || return 1
        done
        return 0
}

and calls this:

get_auto_interfaces() {
        # write to stdout a list of interfaces configured as 'auto' in interfaces(5)
        local found=""
        # stderr redirected as it outputs things like:
        # Ignoring unknown interface eth0=eth0.
        found=$(ifquery --list --allow auto 2>/dev/null) || return
        set -- ${found}
        echo "$@"
}

ifquery --list --allow auto returns:
eth0
eth1
eth1:1
lo

However, /run/network/ifup.eth1:1 is not yet written since we're currently in the middle of an ifup for eth1:1.
It doesn't (to me) make sense to expect udev to bring up the virtual interfaces, thus the following filter applied to ifquery
resolves the blocking wait...

found=$(ifquery --list --allow auto 2>/dev/null | grep -v :) || return

drop the eth1:1 and it all boots fast and fine. /run/network has the correct files, eth1 and eth1:1 are up and assigned correctly.

Revision history for this message
Ryan Harper (raharper) wrote :

Possibly related to ifupdown since:

$ dpkg -S /etc/network/if-up.d/upstart
ifupdown: /etc/network/if-up.d/upstart

affects: curtin → ifupdown (Ubuntu)
Revision history for this message
Ryan Harper (raharper) wrote :

If anyone wants to reproduce:

1. uvt-simplestreams-libvirt sync release=trusty arch=amd64
2. uvt-kvm create --memory 1024 --cpu 2 --disk 10 t1 release=trusty
3. uvt-kvm wait --insecure t1
4. uvt-kvm ssh --insecure t1
# inside t1 vm
5. edit /etc/network/interfaces to look like:
auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet static
    gateway 10.0.2.1
    address 10.0.2.100/24
    mtu 1492

auto eth1:1
iface eth1:1 inet static
    dns-nameservers 8.8.8.8
    address 10.0.2.200/24
    mtu 1492

6. sudo init 0 # shutdown vm

# back on host
7. virsh edit t1

Add another interface by injecting into the <devices> section

 <interface type='network'>
      <source network='default'/>
      <model type='virtio'/>
    </interface>

8. virsh start --console t1

You'll watch and see the 120 second block upon boot.

Revision history for this message
Stéphane Graber (stgraber) wrote :

So unless ifup/ifdown brings the aliases up/down itself, I don't think filtering those out is the right fix.
Take this scenario as an example:
 - System boots quickly
 - networking.conf kicks in, fails to bring eth0 because it's not showed up yet (happen reasonably often with complex blade systems)
 - network-wait kicks in
 - eth0 does show up
 - interface is brought up but not the eth0:X ones

The interface labeling thing isn't as cool as it used to be and most people just add multiple IPs on the same interface nowadays, it used to be that ifconfig and other tools were confused by that, but they're not anymore.

Anyway, I guess people are still using that feature and so it's probably worth fixing. The fix I'd suggest is to do something similar to the bridge-utils and vlan udev hooks, that is, have a udev hook which triggers on new interfaces, have the hook use ifquery to check for downstream interfaces (INTERFACE:X) and then bring those up.

Revision history for this message
Ryan Harper (raharper) wrote :

It's definitely not the fix we want. Here's what's happening (after discussing with smoser).
Each of the physical devices are brough up via ifup, and all semephor on /run/networking/ looking for ifup.eth0, ifup.eth1 and ifup.eth1:1.

The first two files are written when /etc/init/network-interface.conf job is run, called via udev device add (and spawned via the upstart-udev-bridge).

The alias interface won't come up until /etc/init/networking.conf job is run, which is blocked until local-filesystem is emitted. That signal isn't emitted until the semephore (/etc/network/if-up.d/upstart, which writes and emits the signal) is released. The semephor won't be released until all interfaces are up (eth1:1 is the only one not up yet) but of course it *wont* be up until ifup -a is called inside networking.conf.

This is the blocker/race.

I'd like to play with some way of allowing /etc/init/networking.conf to run after local-filesystem, but not have the static-network-up signal be emitted until ifup -a has completed.

I'll go take a look at bridge-utils and vlan.

Revision history for this message
Ryan Harper (raharper) wrote :

After picking apart; this is a known issue and does not break anything.

Changed in ifupdown (Ubuntu):
importance: Medium → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.