ifup does not block for interface to be up with static addresses

Bug #1588915 reported by jwestfall
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
ifupdown (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Description: Ubuntu 16.04 LTS (server)
Release: 16.04

systemd:
  Installed: 229-4ubuntu6
  Candidate: 229-4ubuntu6
  Version table:
 *** 229-4ubuntu6 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     229-4ubuntu4 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

This issue is more about systemd integration then an issue with systemd package itself. The general problem is the distinction between the network being configured and the network being up and what that means for services that depend on the network.

upstream systemd has network.target (network is configured) and network-online.target (network is up), however xenial server (dunno about desktop) doesn't seem to use network-online.target. The result of which is, as soon as the network is configured both network.target and network-online.target are considered complete. At which point services dependent on either of those will get started. However this can lead to problems if the services in question are actually dependent on the network (ie dns lookups) when they are starting up.

Here's an example of what I mean. If you install openntpd and add the -s option there is a race on boot if it will or wont be able to do the initial sync.

Jun 02 16:58:02 xenial64 systemd[1]: Reached target Network.
Jun 02 16:58:02 xenial64 systemd[1]: Starting OpenNTPd Network Time Protocol...
Jun 02 16:58:02 xenial64 systemd[1]: Reached target Network is Online.
Jun 02 16:58:02 xenial64 ntpd[920]: constraint certificate verification turned off
Jun 02 16:58:02 xenial64 ntpd[928]: ntp engine ready
Jun 02 16:58:02 xenial64 kernel: tg3 0000:02:00.0 enp2s0: Link is down
Jun 02 16:58:05 xenial64 kernel: tg3 0000:02:00.0 enp2s0: Link is up at 1000 Mbps, full duplex
Jun 02 16:58:05 xenial64 kernel: tg3 0000:02:00.0 enp2s0: Flow control is on for TX and on for RX
Jun 02 16:58:05 xenial64 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
Jun 02 16:58:17 xenial64 ntpd[920]: no reply received in time, skipping initial time setting

openntpd was started 3 seconds before the interface even got link.

I believe this issue is only going to effect physical servers with static ips. vm's interfaces will instantly have link when they are brought up and dhcp, the interface will already be up or else they wouldn't have been able to get an ip.

thanks
jim

Revision history for this message
Robie Basak (racb) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better.

openntpd.service defines "After=network.target". Based on your description, should it be "network-online.target" instead? Does that help, or would we need to coordinate that change with some other change we ship in Ubuntu?

tags: added: systemd-boot
Revision history for this message
Martin Pitt (pitti) wrote :

> upstream systemd has network.target (network is configured)

This is *not* what network.target means. This is mostly being used for a correct shutdown order, at bootup it has very little meaning. (see man systemd.special(7)).

> Jun 02 16:58:02 xenial64 systemd[1]: Reached target Network.
> Jun 02 16:58:02 xenial64 systemd[1]: Starting OpenNTPd Network Time Protocol...

This indeed sounds like openntp's service should have an After=network-online.target.

Why is that reported against openvpn? Did you mean openntpd here?

Revision history for this message
Michael Biebl (mbiebl) wrote :

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ has more information on this topic.

Ideally openntpd would handle networks coming and going more gracefully.
A dependency on network-online.target is only a band-aid.

Revision history for this message
jwestfall (jwestfall) wrote :

Hi

openntpd was just one example of the issue. I tried setting After=network-online.target before opening this bug, it has no effect and is why I mentioned "xenial server (dunno about desktop) doesn't seem to use network-online.target".

ubuntu@xenial64:~$ systemctl list-dependencies network-online.target
network-online.target
● └─networking.service

network-online.target will be complete as soon as networking.service (ifup) has finished configuring interfaces. You can see this in my log output.

Jun 02 16:58:02 xenial64 systemd[1]: Reached target Network.
Jun 02 16:58:02 xenial64 systemd[1]: Starting OpenNTPd Network Time Protocol...
Jun 02 16:58:02 xenial64 systemd[1]: Reached target Network is Online. <----- network-online.target is complete
...
Jun 02 16:58:02 xenial64 kernel: tg3 0000:02:00.0 enp2s0: Link is down
Jun 02 16:58:05 xenial64 kernel: tg3 0000:02:00.0 enp2s0: Link is up at 1000 Mbps, full duplex
Jun 02 16:58:05 xenial64 kernel: tg3 0000:02:00.0 enp2s0: Flow control is on for TX and on for RX
Jun 02 16:58:05 xenial64 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready

Revision history for this message
Robie Basak (racb) wrote :

> Jun 02 16:58:02 xenial64 kernel: tg3 0000:02:00.0 enp2s0: Link is down

This seems odd. Is this a peculiarity of the tg3 driver or of your environment? I don't see why anything the system does would cause tg3 to drop the link.

Revision history for this message
Martin Pitt (pitti) wrote :

Not sure why your link is flaky, i. e. why the link goes down (and back up 3 seocnds later) after bringing up the interface. But that's nothing that individual userspace packages can control, I'm afraid.

Revision history for this message
jwestfall (jwestfall) wrote :

Hi

The logged output is typical of tg3 a physical interface. The first time you bring the interface up the kernel will log that the link is down, which it is.

boot server with no interfaces configured

root@xenial64:~# ifconfig
lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

root@xenial64:~# ifconfig enp2s0 up

The kernel logs

[ 66.397413] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[ 66.649067] tg3 0000:02:00.0 enp2s0: Link is down

Then a few seconds later

[ 70.648878] tg3 0000:02:00.0 enp2s0: Link is up at 1000 Mbps, full duplex
[ 70.648884] tg3 0000:02:00.0 enp2s0: Flow control is on for TX and on for RX
[ 70.648901] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready

Here is another example of the issue on a box with e1000e nic.

Jun 07 10:57:50 ubuntu64a kernel: IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
Jun 07 10:57:50 ubuntu64a ifup[692]: /sbin/ifup: waiting for lock on /run/network/ifstate.eno1
Jun 07 10:57:50 ubuntu64a systemd[1]: Started Raise network interfaces.
Jun 07 10:57:50 ubuntu64a systemd[1]: Reached target Network.
Jun 07 10:57:50 ubuntu64a systemd[1]: Reached target Network is Online. <- network-online.target is complete
... (bunch of services that were dependent on network.target start)
Jun 07 10:57:52 ubuntu64a kernel: e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jun 07 10:57:52 ubuntu64a kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready

eno1 gets link 2 seconds after network-online.target was completed. This is the issue. network-online.target shouldn't be completing until all 'auto' interfaces have link or some timeout is hit.

Revision history for this message
Martin Pitt (pitti) wrote :

> network-online.target shouldn't be completing until all 'auto' interfaces have link or some timeout is hit.

That already is the case -- networking.service (which is ifupdown, aka "Started Raise network interfaces") is Before=network-online.target, and the log proves that this works.

You only pasted a small log excerpt, where e. g. dhclient and ifupdown messages aren't visible. But ifupdown clearly thinks that your enp2s0/eno1 links *are* up, assuming that you actually configure them in /etc/network/interfaces or interfaces.d/*. What are the corresponding entries there? If they are "auto", then networking.service ought to wait. If they are "allow-hotplug", they they won't wait. You wrote "auto" above, but let's make sure.

summary: - network dependent services starting before network is up
+ networking.service does not wait for interfaces to be up
affects: openvpn (Ubuntu) → ifupdown (Ubuntu)
Changed in ifupdown (Ubuntu):
status: New → Incomplete
Revision history for this message
jwestfall (jwestfall) wrote : Re: networking.service does not wait for interfaces to be up
Download full text (81.2 KiB)

As I mention in the beginning the issue is likely specific to a physical box configured with a static ip. If a box is configured for dhcp, networking.service cant complete the configuration of the interface until its gotten an dhcp address. The act of getting dhcp address would require the interface be configured up, got link, and packets sent/recv'd from the dhcp server. This will mean the interface already has link when networking.service has completed its configuration of the interface. This however is not the case for a static ip.

ubuntu@ubuntu64a:~$ ls -la /etc/network/interfaces.d/
total 8
drwxr-xr-x 2 root root 4096 Jan 24 11:08 .
drwxr-xr-x 7 root root 4096 Jun 7 10:51 ..

ubuntu@ubuntu64a:~$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eno1
iface eno1 inet static
  address 10.0.0.28
  netmask 255.255.255.0
  gateway 10.0.0.1

ubuntu@ubuntu64a:~$ sudo journalctl -b --no-pager
-- Logs begin at Wed 2016-06-08 08:14:24 PDT, end at Wed 2016-06-08 08:22:04 PDT. --
Jun 08 08:14:24 ubuntu64a systemd-journald[396]: Runtime journal (/run/log/journal/) is 8.0M, max 79.1M, 71.1M free.
Jun 08 08:14:24 ubuntu64a kernel: Initializing cgroup subsys cpuset
Jun 08 08:14:24 ubuntu64a kernel: Initializing cgroup subsys cpu
Jun 08 08:14:24 ubuntu64a kernel: Initializing cgroup subsys cpuacct
Jun 08 08:14:24 ubuntu64a kernel: Linux version 4.4.0-22-generic (buildd@lgw01-41) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #40-Ubuntu SMP Thu May 12 22:03:46 UTC 2016 (Ubuntu 4.4.0-22.40-generic 4.4.8)
Jun 08 08:14:24 ubuntu64a kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-22-generic root=UUID=9d883025-5129-4977-bc1b-627be950eb79 ro
Jun 08 08:14:24 ubuntu64a kernel: KERNEL supported cpus:
Jun 08 08:14:24 ubuntu64a kernel: Intel GenuineIntel
Jun 08 08:14:24 ubuntu64a kernel: AMD AuthenticAMD
Jun 08 08:14:24 ubuntu64a kernel: Centaur CentaurHauls
Jun 08 08:14:24 ubuntu64a kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
Jun 08 08:14:24 ubuntu64a kernel: x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
Jun 08 08:14:24 ubuntu64a kernel: x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
Jun 08 08:14:24 ubuntu64a kernel: x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
Jun 08 08:14:24 ubuntu64a kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Jun 08 08:14:24 ubuntu64a kernel: x86/fpu: Using 'eager' FPU context switches.
Jun 08 08:14:24 ubuntu64a kernel: e820: BIOS-provided physical RAM map:
Jun 08 08:14:24 ubuntu64a kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
Jun 08 08:14:24 ubuntu64a kernel: BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
Jun 08 08:14:24 ubuntu64a kernel: BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
Jun 08 08:14:24 ubuntu64a kernel: BIOS-e820: [mem 0x0000000000100000-0x000000001fffffff] usable
Jun 08 08:14:24 ...

Revision history for this message
Martin Pitt (pitti) wrote :

Ah, I missed the fact that it has a static address, sorry. I haven't tried to reproduce yet, but this at least seems clear enough. Adjusting bug title accordingly

summary: - networking.service does not wait for interfaces to be up
+ ifup does not block for interface to be up with static addresses
Changed in ifupdown (Ubuntu):
status: Incomplete → New
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ifupdown (Ubuntu):
status: New → Confirmed
Revision history for this message
Maciej Puzio (maciej-puzio) wrote :

I am also affected by this issue. In my case both network.target and network-online.target are reached too early, which results in sssd and autofs to start before network interfaces are up; autofs then fails to obtain mount maps and does not attempt to retry; it is necessary to restart autofs manually after network has started fully.

Similarly as the bug reporter, I run Ubuntu 16.04 server, and I use static addresses only, configured in /etc/network/interfaces.

The root cause is ifup exiting with success before network interfaces start operating. However, ifup behaved the same way in Ubuntu 14.04, and there it did not cause the problem. Apparently Upstart and its network-related units were able to deal with this issue correctly. In contrast, systemd configuration files packaged in ifupdown Xenial package are causing the "async" behavior of ifup to surface and create problems.

During system boot there are two attempts to start physical network interfaces by ifup: in ifup@.service (one service per interface) and in networking.service. All these services start in parallel and call ifup in parallel, but ifup uses a lock to prevent its concurrent execution (per interface), effectively serializing ifup@.service(s) and networking.service. In my tests, ifup@ services always win this race. This actually works fine if all interfaces are configured successfully. If not, networking.service performs a second attempt after ifup@.service failed, which may be problematic (see below).

Targets network.target and network-online.target are reached when networking.service finishes. This happens after interfaces are configured by ifup. Interfaces became operable significantly later.

The problem may be worked around in a particular configuration in several ways:

1. We can make ifup wait for the interface to come up, for example by including a loop in post-up command in /etc/network/interfaces, checking /sys/class/net/$INTERFACE/operstate. This would require using some timeout, after which the command would fail, to deal with situations such as a disconnected cable. In such a case, the delay would be twice as long as the specified timeout, because that timeout will have to be reached in both ifup@.service and networking service. If we do not configure any virtual interfaces, we can deal with this by disabling networking.service.

2. We can modify ifup@.service to wait for the interface to come up, similarly as above. It will also be necessary to change the type of this service from "simple" to "oneshot", to make it wait for its ExecStart command before exiting.

3. We can also modify networking.service to wait in a similar manner.

4. We can introduce an additional service which will wait for all or some interfaces to come up, and will delay network.target and network-online.target until it is done. It would be advantageous to wait on all interfaces in parallel, and I am not sure if that would be easy to implement in one service.

5. An active waiting in a loop is obviously not the most beautiful solution, so perhaps a cleaner way could be found?

Revision history for this message
Iskren Ivov Chernev (iskren-chernev) wrote :

I'm also affected by this issue.

So network-online.target and network.target are a good catch all that should work, but also there should be a way to wait on a particular (especially statically configured) interface to be up. I would think <email address hidden> would finish only after XXX is up, but that is not the case.

Revision history for this message
lakmilis (lakmilis) wrote :

I would chime in a bit on this too. 14.04 _did_ have such issues. In fact, I had a physical box with static IP's (2 nics), if a cable was unpligged, it would hang forever and also become inconsistent. I had these discussions a year ago or so. Anyway, I had ot abandon ubuntu server (along with some nft issues at the time) and run archlinux as my xen host.

But I ran zfs as my xen backend, so on arch, I sometimes* could have issues ~ once a year where it best be I had physical access to the box to be able to fix any potential issues.

I am now going to move abroad and so I thought , with archlinux unpriv. lxc containers out, to leave xen (as I don't have the expected network output from the servers) to lxc. With arch and benig abroad, I think ubuntu 16.04 now with systemd would have solved this (my server with arch on xen has always gracefully sorted it out: ~1.5-2 years).

Seeing this bug and seein git has to do with ifup makes me think it's partly a remnance from the older logic from sysv/upstart.

This probably doesn't help much butjust saying the ifup is a bit archaic now that systemd is in use and is not evebn present in archlinux and creates seemingly less problems.

I have only a brief moment before leaving so seeing this now worries me.

Revision history for this message
vishal yadav (vishalcdac07) wrote :
Download full text (3.6 KiB)

Facing the same issue but with dhcp and on ubuntu 16.04 VM.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

Just after initial launch of VM, it's not accessible. /run/network/ifstate files for interfaces are locked by ifup which only gets released and dhcp probe begins when networking service is restarted. After this VM becomes accessible.

$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto ens3
iface ens3 inet dhcp

auto ens4
iface ens4 inet dhcp

auto ens5
iface ens5 inet dhcp

After initial launch (VM not accessible)
-----------------------------------------
$ grep -nri "ifup" /var/log/syslog
20691:Oct 26 03:10:43 IansUbunturequest systemd[1]: Started ifup for ens5.
20692:Oct 26 03:10:43 IansUbunturequest systemd[1]: Started ifup for ens3.
20693:Oct 26 03:10:43 IansUbunturequest systemd[1]: Started ifup for ens4.
20990:Oct 26 03:10:43 IansUbunturequest ifup[1076]: /sbin/ifup: waiting for lock on /run/network/ifstate.ens
3
21001:Oct 26 03:10:43 IansUbunturequest ifup[1076]: /sbin/ifup: waiting for lock on /run/network/ifstate.ens
4
21002:Oct 26 03:10:43 IansUbunturequest ifup[1076]: /sbin/ifup: waiting for lock on /run/network/ifstate.ens
5
21004:Oct 26 03:10:43 IansUbunturequest systemd[1]: Started Raise network interfaces.
21005:Oct 26 03:10:43 IansUbunturequest systemd[1]: Reached target Network.
21015:Oct 26 03:10:43 IansUbunturequest systemd[1]: Reached target Network is Online.

After network restart (VM becomes accessible)
---------------------------------------------
$ less /var/log/syslog
Oct 26 03:34:34 IansUbunturequest systemd[1]: Stopping Raise network interfaces...
Oct 26 03:34:34 IansUbunturequest dhclient[2125]: Killed old client process
Oct 26 03:34:34 IansUbunturequest ifdown[2104]: Killed old client process
Oct 26 03:34:36 IansUbunturequest dhclient[2166]: DHCPRELEASE on ens3 to 10.64.247.13 port 67 (xid=0x6a1fe99f)
Oct 26 03:34:36 IansUbunturequest ifdown[2104]: DHCPRELEASE on ens3 to 10.64.247.13 port 67 (xid=0x6a1fe99f)
Oct 26 03:34:37 IansUbunturequest systemd[1]: Stopped Raise network interfaces.
Oct 26 03:34:37 IansUbunturequest systemd[1]: Starting Raise network interfaces...
Oct 26 03:34:37 IansUbunturequest ifup[2249]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 3 (xid=0x9fd17b39)
Oct 26 03:34:37 IansUbunturequest dhclient[2263]: Sending on LPF/ens3/fa:16:3e:5e:13:fb
Oct 26 03:34:37 IansUbunturequest dhclient[2263]: Sending on Socket/fallback
Oct 26 03:34:37 IansUbunturequest ifup[2249]: DHCPREQUEST of 10.64.247.24 on ens3 to 255.255.255.255 port 67 (xid=0x397bd19f)
Oct 26 03:34:37 IansUbunturequest ifup[2249]: DHCPOFFER of 10.64.247.24 from 10.64.247.13
Oct 26 03:34:37 IansUbunturequest dhclient[2263]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 3 (xid=0x9fd17b39)
Oct 26 03:34:37 IansUbunturequest dhclient[2263]: DHCPREQUEST of ...

Read more...

tags: added: network-online-ordering
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.