(on trusty) version 1.9-3ubuntu10.4 regression blocking boot completion

Bug #1701023 reported by Sven Mueller on 2017-06-28
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
ifupdown (Debian)
Fix Released
Unknown
ifupdown (Ubuntu)
Medium
Dan Streetman
Trusty
Medium
Dan Streetman
Xenial
Medium
Dan Streetman
Artful
Medium
Dan Streetman
Bionic
Medium
Dan Streetman
vlan (Debian)
New
Unknown
vlan (Ubuntu)
High
Dan Streetman
Trusty
High
Dan Streetman
Xenial
High
Dan Streetman
Artful
High
Dan Streetman
Bionic
High
Dan Streetman

Bug Description

[impact]

in bug 1573272, the vlan pkg was changed to perform a full ifup inside its if-pre-up.d/vlan script. This allowed correct ordering of ifup for a vlan and its raw-device, as previously there was a race condition between them (see that bug for details).

However, this causes hangs during ifup with certain specific configs. The reasons are given starting in comment 13.

The result is a regression for those using the specific ifupdown configs; when they try to reboot and/or ifup -a, it will hang trying to bring up their network, preventing boot from finishing (or hanging before the network is fully configured).

[test case]

upgrade to the latest vlan package and configure the system with an affected ifupdown config, then reboot. The reboot will hang while trying to bring the network up.

see the original description below for an example ifupdown config to reproduce this, although there are other possible configs that will/may trigger this regression.

[regression potential]

The fix for this moves the creation of the vlan(s) corresponding to a physical raw-device 'hotplug' event out of the udev processing path for the raw-device, and into an ifup post script for the raw-device ifup. If this is not done correctly, then any interfaces that are hotplugged, and have vlans configured on them, may fail to correctly create/configure their vlan(s).

This change does remove the direct call to ifup from the if-pre-up.d (or if-up.d) scripts, so there should not be any regression potential for more ifup deadlocks.

[other info]

this required both ifupdown and vlan to be patched. vlan was patched to remove the problematic call to ifup from the vlan pre-up script, and add a call to create the vlan interface(s) from a new post-up script, as well as adding a parameter to vlan-network-interface script to handle the call from udev itself differently than a call from elsewhere (such as the if-up.d/vlan script). this works for bootup and ifup/ifup -a, but fails for device hotplug because of a bug in ifupdown that prevents calling ifquery from an ifup script; that has been patched upstream already, and is the only ifupdown change needed here.

[original description]

When upgrading from version 1.9-3ubuntu10.1, a previously working machine can't successfully reboot completely.

ifup is hanging indefinitely, with this process structure (from "pstree -a 1299"):

ifup,1299 -a
  └─run-parts,1501 /etc/network/if-pre-up.d
      └─bridge,1502 /etc/network/if-pre-up.d/bridge
          └─bridge,1508 /etc/network/if-pre-up.d/bridge
              └─vlan,1511 /etc/network/if-pre-up.d/vlan
                  └─ifup,1532 eth0

<begin content of /etc/network/interfaces>
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
  address 192.168.10.65
  netmask 255.255.255.192
  gateway 192.168.10.66

auto eth0.11
  address 192.168.11.1
  netmask 255.255.255.0

auto br1134
iface br1134 inet manual
  bridge_ports eth0.1134
  bridge_stp off
  bridge_fd 0
<end content of /etc/network/interfaces>

The underlying interface eth0.1134 is not explicitly defined, but was previously auto-created during "ifup -a" execution. This apparently fails now.

Reverting back to the 10.1 version re-establishes old behavior.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in vlan (Ubuntu):
status: New → Confirmed
Sven Mueller (smu-u) wrote :

Note: This behavior is in direct violation of the Debian documentation: https://wiki.debian.org/NetworkConfiguration#Bringing_up_an_interface_without_an_IP_address
Specifies:
Note: If you create the VLAN interface only to put it into a bridge, there is no need to define the VLAN interface manually. Just configure the bridge, and the VLAN interface will be created automatically when creating the bridge (see below).

Sven Mueller (smu-u) wrote :

Can we get someone to fix this policy violation, please?

Zach Heilman (organman91) wrote :

This diff resulted in the boot breaking again: http://launchpadlibrarian.net/324174495/vlan_1.9-3ubuntu10.1_1.9-3ubuntu10.4.diff.gz - reverting the change fixed the issue.

Zach Heilman (organman91) wrote :

Apologies, I had my versions screwed up. Disregard my previous comment.

Joshua Powers (powersj) on 2018-01-22
Changed in vlan (Ubuntu):
importance: Undecided → High
Changed in vlan (Ubuntu Trusty):
status: New → Confirmed
importance: Undecided → High
Andreas Hasenack (ahasenack) wrote :

The changelog since 10.1 mentions #1573272 three times, looks like that was a tough nut to crack. I'm subscribing ddstreet for consideration.

Dan Streetman (ddstreet) wrote :

Well I can't say I'm too surprised that something broke, the ifupdown<->vlan<->udev love triangle of creating and configuring vlans, is a bit of a mess. I'll take a look at this as soon as I have a chance.

Dan Streetman (ddstreet) wrote :

I reproduced this hang on trusty and I'm investigating.

Changed in vlan (Ubuntu Trusty):
assignee: nobody → Dan Streetman (ddstreet)
status: Confirmed → In Progress
Dan Streetman (ddstreet) on 2018-04-11
tags: added: regression-update sts
Dan Streetman (ddstreet) wrote :

@smu-u can you please test with the vlan pkg from this ppa:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1701023

it fixes this specific issue in my test, but I'd like to make sure it works for you also.

I want to do some more testing with the change as well because ifupdown is so fragile in this area. This may also affect newer releases too, so I need to test those as well.

Changed in vlan (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Dan Streetman (ddstreet)
Dan Streetman (ddstreet) wrote :

It appears this ifupdown locking issue doesn't happen in Xenial, but I'll do more testing to confirm that.

Dan Streetman (ddstreet) wrote :

marking this as affecting T/X/A/B, although I need to specifically check each.

Changed in vlan (Ubuntu Xenial):
status: New → In Progress
Changed in vlan (Ubuntu Artful):
status: New → In Progress
assignee: nobody → Dan Streetman (ddstreet)
Changed in vlan (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → High
Changed in vlan (Ubuntu Artful):
importance: Undecided → High
Dan Streetman (ddstreet) wrote :

from bug 1759573, looks like this does affect X as well as T.

Dan Streetman (ddstreet) wrote :

This bug is caused in two different ways, both not directly from ifupdown or vlan, but instead because of 'helpful' actions in the ifenslave and bridge-utils if-pre-up scripts.

First, as reported in this bug, the bridge-utils package adds a if-pre-up.d/bridge ifupdown script, and in that script it does this:

bridge_parse_ports $INTERFACES | while read i
do
  for port in $i
  do
    # We attach and configure each port of the bridge
    if [ "$MODE" = "start" ] && [ ! -d /sys/class/net/$IFACE/brif/$port ]; then
      if [ -x /etc/network/if-pre-up.d/vlan ]; then
        env IFACE=$port /etc/network/if-pre-up.d/vlan
      fi

As pointed out in comment 2, this is done because debian decided ifupdown should handle the case of a bridge configuration with port(s) using vlan(s) that are not configured/created anywhere else in the ifupdown configuration. So, bridge-utils must assume it needs to create any new vlan(s) for its port(s).

Unfortunately, now that the if-pre-up.d/vlan tries to ifup its raw-device interface (due to bug 1573272) this can lead to a hang, as shown in this bug's description.

Second, as reported in bug 1759573, the ifenslave package adds a if-pre-up.d/ifenslave script, and that script does this:

    # Trigger the udev bridging hook to bridge the bond if needed
    if [ -x /lib/udev/bridge-network-interface ]; then
        INTERFACE=$BOND_MASTER /lib/udev/bridge-network-interface
    fi

    # Trigger the udev bridging hook to tag the bond if needed
    if [ -x /lib/udev/vlan-network-interface ]; then
        INTERFACE=$BOND_MASTER /lib/udev/vlan-network-interface
    fi

This is done for the reasons commented there. Unfortunately, BOTH the /lib/udev/bridge-network-interface as well as the /lib/udev/vlan-network-interface call back to the if-pre-up.d/vlan script, which then tries to ifup the vlan's raw-device, again leading to a hang.

Dan Streetman (ddstreet) wrote :

The reason the if-pre-up.d/vlan script was changed to call ifup for its raw-device, was for bug 1573272, where the problem was a race condition between the VLAN and its raw-device; if the VLAN interface was processed through if-pre-up.d/vlan before the raw-device was fully configured, then in some situations (such as if the raw-device is a bond device) the raw-device is brought down during its ifup processing (usually in order to set some device params that can't be set while the interface is up). When the raw-device is taken down, the VLAN also goes down, and any routing or other associated configuration is lost, such as a default gateway. When the raw-device comes back up, the VLAN comes up as well, but the (e.g.) default gateway is not restored, since the VLAN interface does not go through ifup again.

The reason there is a possibility for a race between a raw-device and its vlan(s) is because if the /lib/udev/vlan-network-interface script. That script is run for every interface that udev detects, and the script checks each 'auto' device that ifquery lists as having configuration. Any device that lists its 'vlan-raw-device' as matching the physical interface that udev is currently processing, triggers the script to call /etc/network/if-pre-up.d/vlan to create the vlan interface. After udev is finished processing the vlan's raw-device, it passes it to systemd (or upstart) which then calls ifup for the interface.

The script's creation of the vlan interface is also, in parallel, detected by udev, which passes it to systemd/upstart also, and ifup is called for it.

Since the raw-device and the vlan(s) are ifup'ed in parallel, they race with each other. In most situations, since the raw-device has a 'headstart' in coming up, it will complete before any of the vlans, but if the raw-device's configuration steps delays it, then the vlan(s) may finish their ifup before it, possibly causing the above-mentioned problem.

The addition of the 'ifup' call to if-pre-up.d/vlan script 'fixed' this by forcing an ifup of the vlan's raw-device during (before) the creation of the vlan interface. That works when if-pre-up.d/vlan is being called from the /lib/udev/vlan-network-interface script, from the udev processing thread; however when the if-pre-up.d/vlan script is called from elsewhere, such as during a different device's ifup, it can hang ifupdown, as shown in this bug.

Dan Streetman (ddstreet) wrote :

The reason that udev calls /lib/udev/vlan-network-interface is because of hotplugging.

During boot up, as network interfaces are detected by the kernel, they are reported to udev, and udev passes each to vlan-network-interface. If the vlan-network-interface script detects that a new interface is configured in ifupdown as a 'vlan-raw-device', then it calls /etc/network/if-pre-up.d/vlan which creates the vlan interface.

Udev also, after all other processing, calls systemd (or upstart) with the interface, and that in turn calls ifup for the interface. Thus, during boot up, all physical interfaces are ifup'ed directly, and all vlans are created and also ifup'ed manually. After all that completes, then systemd/upstart finally calls ifup -a to bring up any remaining configured interfaces.

The bootup process does not require vlan interfaces to be created and configured, as the final 'ifup -a' will create and configure them. However, after bootup, and physical devices that are hotplugged will be detected by udev, and passed to systemd/upstart which calls ifup for the hotplugged interface - but, without udev calling vlan-network-interface, the hotplugging of a physical interface will not cause any of its associated vlan interfaces to be created or configured.

So what this means is vlan interfaces do need to be created when their raw-device interface is detected and/or configured; however, it does not need to (and, should not happen during) udev processing of the raw-device interface. That is what causes the race condition between the raw device interface and its vlan(s), as described in bug 1573272.

Instead, an physical device's vlan interfaces should be created during ifup of that physical device. That way, the vlan interface's raw-device is guaranteed to be already up and configured, but also vlan interfaces will be created and configured when a physical device is hotplugged.

I'll investigate updating ifupdown and/or the vlan scripts to change to this method instead of the current design.

Dan Streetman (ddstreet) wrote :

Anyone experiencing this bug, can you please test with the vlan package from this test ppa:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1701023

I still need to do more testing with it as well.

Dan Streetman (ddstreet) wrote :

There are 3 test cases here to verify this:

1) the original bug 1573272. Test case is in that case's description. Note this test case needs to be (using trusty version numbers):
  a) FAIL with vlan 1.9-3ubuntu10.1
  b) PASS with vlan 1.9-3ubuntu10.5 (current version)
  c) PASS with the proposed fix for this bug

2) bug 1759573, which is the regression of this bug, using bridges. Test case is in bug 1759573 comment 8.

3) this bug, which is regression using bonds. Test case is in the description.

Dan Streetman (ddstreet) wrote :

Also, for each of those test cases, this fix needs to be tested for:

a) normal bootup
b) ifup -a
c) physical device hotplug (e.g. insmod driver, or hotplug removable nic)

all cases should bring up the configure the associated vlans correctly.

Dan Streetman (ddstreet) wrote :

Test case #1 (bug 1573272), part (a) and (b) only:

I'm unable to reproduce test case #1 failure on trusty, because upstart is used instead of systemd. I'm not sure if that means the race condition doesn't affect trusty, or if i'm just not able to properly introduce a upstart delay to bond0 to easily reproduce the problem. I did verify that the vlan pkg from my test ppa does not cause any regression for testcase 1(a) or 1(b) on trusty.

On Xenial, I verified vlan pkg 1.9-3.2ubuntu1.16.04.1 did fail, 1.9-3.2ubuntu1.16.04.4 did not fail, and my test pkg did not fail. I tested reboot (test 1a) and ifup -a (test 1b).

On Artful, I verified with 1.9-3.2ubuntu2 (failed as expected), 1.9-3.2ubuntu5 (worked as expected), and my ppa version (worked as expected). Again tested with reboot and ifup -a.

On Bionic, the vlan pkgs are the same as Artful; the same pkg versions behaved the same as Artful, as expected.

For test 1(c), hotplugging the bond slave interfaces - which I used module rmmod/modprobe to test - my change introduces failure for all releases. This is because I move the call to /lib/udev/vlan-network-interface into the if-up.d post-up scripts, and the ifquery call inside vlan-network-interface fails during that time because it claims it is a "recursive" call:

Apr 20 17:34:53 lp1701023-b sh[2588]: ifquery: recursion detected for interface bond0 in post-up phase
Apr 20 17:34:53 lp1701023-b sh[2588]: ifquery: recursion detected for parent interface bond0 in post-up phase

ifupdown seems broken in this manner; ifquery --list should not involve any "recursion" or locking, as it is simply reading the config files and listing all configured interfaces. It should not need to lock any interfaces, as it's not actually changing anything. So, I plan to continue with my design change and fix ifupdown as well.

Dan Streetman (ddstreet) wrote :

For testcase 2, from bug 1759573, I can't reproduce the reported hang from that bug.

@tom-verdaat, can you provide simpler/clearer reproduction steps, or can you test with the vlan pkg from this ppa to verify it fixes the problem you're seeing:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1701023

Without a way for me to reproduce that reported problem, I'll skip that testcase.

Dan Streetman (ddstreet) wrote :

For testcase 3, from this bug, I'm able to reproduce it on trusty using vlan 1.9-3ubuntu10.5, and using the pkg from my ppa it is fixed.

I'm not able to reproduce testcase 3 in xenial or later, so there is nothing to verify re: testcase 3 (this bug's reported error) in X/A/B. However, my redesign should still go into those, as calling ifup from the if-pre-up.d/vlan script is not good - there are unfortunately too many places that call that script (not just a direct call from ifupdown, unfortunately), and calling ifup from there will cause more problems later. Creating the vlan interface (if needed) in the raw-device post-up is the correct time to do it.

I'm going to open a bug for the ifupdown ifquery 'recursive' error described in comment 19, create a patch for that, and then start SRUing this change.

Dan Streetman (ddstreet) wrote :

@smu-u, please do let me know if you are seeing this issue in X/A/B. Otherwise, I'll assume you only see this issue in trusty, which matches what I found.

Dan Streetman (ddstreet) wrote :

Also re: testcase 3, I could only reproduce it on reboot and ifup -a; it did not fail when i tried hotplugging the interface (modprobing its driver).

Dan Streetman (ddstreet) on 2018-04-20
Changed in ifupdown (Ubuntu Trusty):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → Medium
status: New → In Progress
Changed in ifupdown (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → Medium
status: New → In Progress
Changed in ifupdown (Ubuntu Artful):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → Medium
status: New → In Progress
Changed in ifupdown (Ubuntu Bionic):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → Medium
status: New → In Progress
Dan Streetman (ddstreet) wrote :

Test pkgs for both ifupdown and vlan available in this ppa:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1701023

Changed in ifupdown (Debian):
status: Unknown → New
Changed in vlan (Debian):
status: Unknown → New
Tom Verdaat (tom-verdaat) wrote :

@ddstreet tested with the latest version of ifupdown and vlan from your PPA with my 4 testscenario's and can confirm it works as expected. Interfaces come up correctly both when doing an "ifup -a" and during boot.

One small thing I've noticed is a variation in the number of "Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config" messages in the ifup output depending on the different scenario, even though the number of vlans is the same in all 4 tests. With 2 vlans, the 2 scenario's with bonding generate 1 message, the ones without bonding generate 4. It doesn't hurt, just something I noticed. Please check the attached log for more details on my 4 tests.

Dan Streetman (ddstreet) wrote :

> @ddstreet tested with the latest version of ifupdown and vlan from your PPA with
> my 4 testscenario's and can confirm it works as expected. Interfaces come up
> correctly both when doing an "ifup -a" and during boot.

Excellent, thanks.

> One small thing I've noticed is a variation in the number of "Set name-type for
> VLAN subsystem. Should be visible in /proc/net/vlan/config" messages in the
> ifup output depending on the different scenario, even though the number of
> vlans is the same in all 4 tests

That's because in your config tests with vlans on bonds, your call to ifup -a doesn't actually process the bond or its vlans. Your ifup -a starts processing one of the bond slaves, and that causes the bond interface to by created. The bond creation is detected by udev, and udev tells systemd, and systemd then invokes ifup directly on the bond interface. By the time your ifup -a has gotten to trying to bring up the bond (or its vlans), they are already up because of udev detection which invokes the 'hotplug' ifup that logs to systemd logs, not your stdout.

Changed in ifupdown (Debian):
status: New → Fix Released
Dan Streetman (ddstreet) wrote :

Debian's included a variation of my patch to ifupdown, so I'll update my ppa to use that patch instead and do some testing, then start sru'ing both ifupdown and vlan.

Tom Verdaat (tom-verdaat) wrote :

We've been doing a lot more testing and debugging and I'd like to share our findings:

1) Unfortunately it turns out this change does not fix the issue of interfaces not coming up correctly for a bond with a (static) network configuration. The race condition seems to be removed so at least there are no more hangs between bonds and their vlan children. All the interfaces also say they are UP both when running ifup and after reboot. However:
- Running "ifup <slavename>" does bring up the bond (and its vlans) in a working state.
- Running "ifup -a" or rebooting don't actually work, causing "network not available" errors and "Destination Host Unreachable" when pinging other machines. Executing "ifdown -a; ifup -a" shows that ifupdown tries to bring up the bond BEFORE the slaves in stead of the other way around. Even though after the 60s timeout the bond and it's slaves say they are UP, they don't actually function.
- We're not seeing any issues with bonds that do not have a network configuration of their own

2) The networking script stack / concept seems fundamentally flawed in three areas:

2.A) bonds relying on slaves having "bond-master" and being started by bringing up the slaves, but not supporting the master having "bond-slaves" and being able to start a bond by just bringing up the bond directly.

2.B) bringing a specific interface up automatically brings up it's child vlans. This does not make a lot of sense. The other way around does - e.g. in order to bring up a vlan we need to bring up it's raw device - but why would the ifupdown scripts assume that I want to bring up all of it's vlans when I bring up an interface that (also) serves as a raw device? In that case I would probably run "ifup -a"!

2.C) a vlan running on top of a bond cannot be brought up directly due to /sys/class/net/<bondname>/ not existing. This results in the following:
> # ifup bo-adm.2
> Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config
> cat: /sys/class/net/bo-adm/mtu: No such file or directory
> Device "bo-adm" does not exist.
> bo-adm does not exist, unable to create bo-adm.2
> run-parts: /etc/network/if-pre-up.d/vlan exited with return code 1
> Failed to bring up bo-adm.2.

3) Our new workaround for boot has become this very intrusive systemd service:
> [Unit]
> Wants=network-online.target
> After=network-online.target
>
> [Install]
> WantedBy=multi-user.target
>
> [Service]
> Type=oneshot
> ExecStartPre=/sbin/ifdown bo-adm
> ExecStart=/sbin/ifup enp0s3
> ExecStart=/sbin/ifup enp0s10
> ExecStop=/sbin/ifdown bo-adm
> RemainAfterExit=yes
> TimeoutStartSec=5min

Dan Streetman (ddstreet) wrote :

> Unfortunately it turns out this change does not fix the issue of interfaces not
> coming up correctly for a bond with a (static) network configuration

Please keep in mind this bug is *specifically* about the regression caused by the change from bug 1573272, as described in this bug description. My test criteria for the patches for this bug are listed in comment 17 and comment 18. This bug is *not* about any issue that has never worked. If your configuration worked using (assuming Xenial) vlan version 1.9-3.2ubuntu1.16.04.1, but does not work now, then please let me know the details; however if your config doesn't work with (assuming Xenial) vlan version 1.9-3.2ubuntu1.16.04.1 then it's unrelated to this bug and you should open a new bug (or un-dup bug 1759573, if appropriate).

> Executing "ifdown -a; ifup -a" shows that ifupdown tries to bring up the bond BEFORE the slaves

what's your EXACT e/n/i configuration. single file or multiple files? ifup -a brings them up in order that they are listed in e/n/i

> bonds relying on slaves having "bond-master"...

this is nothing new and has nothing to do with this bug. If this is an issue for you, please open a new bug. I agree with you in principle that this should be better, but that's no guarantee it will actually get fixed.

> bringing a specific interface up automatically brings up it's child vlans...

this also is nothing new. this is how things have worked for a long time, and it has nothing to do with this bug. If this is a problem for you, please open a new bug and discuss there. Note I do not think this will change without some significant justification (provided in a new bug, not here please) for why it's a problem.

> a vlan running on top of a bond cannot be brought up directly...

also...nothing new. unrelated to this bug. please open a new bug if this is a problem for you. Also, doubt this will change without specific justification (not provided here, please) for why this is a problem.

I can understand your frustration with the delicate nature of ifupdown; its configuration is more 'delicate' than most users would like, and calling ifup directly for specific interfaces doesn't always work the way you would like, for more complicated configurations. However, it's been like this, and for the most part you just have to learn its limitations and live with them. Opening bugs for ifupdown limitations is fine, but some things just won't be changed.

In the future, netplan and/or systemd-networkd take over interface configuration, and I think you may find them much more reliable and robust, although maybe more complicated to configure and/or less "flexible" than ifupdown in some ways.

Dan Streetman (ddstreet) wrote :

I updated the ifupdown pkg builds in my test ppa using the upstream (debian) patch, which is a variation of my previous patch. Anyone affected by this regression please test with the latest ifupdown and vlan pkgs from my ppa.

Tom Verdaat (tom-verdaat) wrote :

Hi @ddstreet. Completely understand your need to limit the scope of this bug. Just shared our findings, but feel free to ignore the stuff in #28 under item 2. We did a lot of extensive testing over the weekend with the latest version of your PPA package and here are our main findings:

1) We migrated from separate files in /etc/networking/interfaces.d to just declaring everything in the single /etc/networking/interfaces file. This overcomes a lot of issues with regards to bringing interfaces up in the proper order and "ifup -a" now works perfectly again. Some lessons learned for future reference: (a) to have bonds come up correctly you absolutely have to define slaves before the bond master and the primary slave before secondary slaves in the configuration file, and (b) to have a vlan come up correctly define its raw device before the vlan device.

2) Even though "ifup -a" now works again, bringing bonds up correctly at boot does not. Pretty sure this has to do with the raw interfaces being detected by the kernel and brought up by systemd in a different order at boot. As said under (1) the order really really matters. Bringing up a secondary slave before the primary slave seems to break the bond (looks like due to using the wrong MAC address) and it looks like this is what sometimes happens at boot. Our workaround mentioned in #28 under (3) mitigates this, but it's not very elegant at all.

3) There is a problem when running a bond on top of vlans. Running ifup with verbose enabled shows run-parts being executed in (what seems like) alphabetical order, but to enslave a vlan interface, run-part /etc/network/if-pre-up.d/vlan should be executed before run-part /etc/network/if-pre-up.d/ifenslave. For now we added "pre-up export IFACE=<name> IF_VLAN_RAW_DEVICE=<raw device name>; /etc/network/if-pre-up.d/vlan" to all vlan slaves as a workaround, but it would be better to fix this in the ifupdown package itself.

Dan Streetman (ddstreet) wrote :

> We migrated from separate files in /etc/networking/interfaces.d to just declaring
> everything in the single /etc/networking/interfaces file. This overcomes a lot of
> issues with regards to bringing interfaces up in the proper order and "ifup -a"
> now works perfectly again

great.

> (a) to have bonds come up correctly you absolutely have to define slaves before the
> bond master and the primary slave before secondary slaves in the configuration file,
> and (b) to have a vlan come up correctly define its raw device before the vlan device.

yep, I'm pretty sure this is how ifupdown has always behaved. However, please do feel free to open a bug, as it should work better and not require specific ordering like this (I just can't promise it will get fixed soon). Also, if it ever worked in the past that you know of, it's more likely to be fixed, so do note that if you open a new bug.

> bringing bonds up correctly at boot does not

it should - can you please downgrade to the vlan version 1.9-3.2ubuntu1.16.04.1 (assuming you're on Xenial), and if it still fails there, please open a new bug. If it doesn't fail there, then let me know in this bug, as it's a regression from my changes.

You can download the older vlan package from here:
https://launchpad.net/ubuntu/+source/vlan/1.9-3.2ubuntu1.16.04.1

click on the arch you're using under 'Builds' to get to the arch-specific page with binary deb links, e.g. for amd64:
http://launchpadlibrarian.net/289303685/vlan_1.9-3.2ubuntu1.16.04.1_amd64.deb

< There is a problem when running a bond on top of vlans

also please test this with above-mentioned vlan package version, and open a new bug if it fails there as well, or post here if it works with that older vlan version.

Tom Verdaat (tom-verdaat) wrote :

Neither issue is fixed by the downgrade. As said, neither seems to have to do with vlan but with ifupdown.

Dan Streetman (ddstreet) wrote :

> Neither issue is fixed by the downgrade

ok, thanks, so those are ifupdown issues unrelated to this vlan regression - please feel free to open bug(s) for them. I'll start an sru for this.

Łukasz Zemczak (sil2100) wrote :

@ddstreet - could we get the SRU template information included in this bug?

Dan Streetman (ddstreet) wrote :

> could we get the SRU template information included in this bug?

doh. sorry, been awfully busy lately. added.

description: updated

Hello Sven, or anyone else affected,

Accepted vlan into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vlan/1.9-3.2ubuntu6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in vlan (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Changed in ifupdown (Ubuntu Bionic):
status: In Progress → Fix Committed
Łukasz Zemczak (sil2100) wrote :

Hello Sven, or anyone else affected,

Accepted ifupdown into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ifupdown/0.8.17ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Łukasz Zemczak (sil2100) wrote :

Seeing that the previous change caused regressions, I would feel better if a wider call-for-testing could be done on these packages in bionic-proposed, performing tests on various different configurations.

Dan Streetman (ddstreet) wrote :

> I would feel better if a wider call-for-testing could be done on these packages
> in bionic-proposed

yes i agree; the mailing lists that comes to mind is ubuntu-server, do you have any other ML suggestion?

Łukasz Zemczak (sil2100) wrote :

Hello Sven, or anyone else affected,

Accepted vlan into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vlan/1.9-3.2ubuntu5.17.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in vlan (Ubuntu Artful):
status: In Progress → Fix Committed
tags: added: verification-needed-artful
Łukasz Zemczak (sil2100) wrote :

Hello Sven, or anyone else affected,

Accepted ifupdown into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ifupdown/0.8.16ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ifupdown (Ubuntu Artful):
status: In Progress → Fix Committed
Changed in vlan (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Łukasz Zemczak (sil2100) wrote :

Hello Sven, or anyone else affected,

Accepted vlan into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vlan/1.9-3.2ubuntu1.16.04.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ifupdown (Ubuntu Xenial):
status: In Progress → Fix Committed
Łukasz Zemczak (sil2100) wrote :

Hello Sven, or anyone else affected,

Accepted ifupdown into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ifupdown/0.8.10ubuntu1.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in vlan (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed-trusty
Łukasz Zemczak (sil2100) wrote :

Hello Sven, or anyone else affected,

Accepted vlan into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vlan/1.9-3ubuntu10.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Łukasz Zemczak (sil2100) wrote :

Hello Sven, or anyone else affected,

Accepted ifupdown into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ifupdown/0.7.47.2ubuntu4.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ifupdown (Ubuntu Trusty):
status: In Progress → Fix Committed
Dan Streetman (ddstreet) wrote :

I sent an email to ubuntu-server list requesting anyone with vlan ifupdwon config to give these -proposed pkgs a test.

I'll also perform verification for them when I have a chance.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vlan - 1.9-3.2ubuntu6

---------------
vlan (1.9-3.2ubuntu6) bionic; urgency=medium

  * Revert change for lp1573272; instead fix by redesigning when vlan
    interfaces are created; after raw-device ifup, not during raw-device
    udev processing. (LP: #1701023)

 -- Dan Streetman <email address hidden> Thu, 19 Apr 2018 18:10:17 -0400

Changed in vlan (Ubuntu):
status: In Progress → Fix Released
Dan Streetman (ddstreet) wrote :

tested bionic per comment 19 with:

root@lp1701023-b:~# dpkg -l | grep ifupdown
ii ifupdown 0.8.17ubuntu1 amd64 high level tools to configure network interfaces
root@lp1701023-b:~# dpkg -l | grep vlan
ii vlan 1.9-3.2ubuntu5 amd64 user mode programs to enable VLANs on your ethernet devices

and with:

root@lp1701023-b:~# dpkg -l | grep ifupdown
ii ifupdown 0.8.17ubuntu1.1 amd64 high level tools to configure network interfaces
root@lp1701023-b:~# dpkg -l | grep vlan
ii vlan 1.9-3.2ubuntu6 amd64 user mode programs to enable VLANs on your ethernet devices

all tests from comment 19 do work with both package versions.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Dan Streetman (ddstreet) wrote :

tested on artful with:

root@lp1701023-a:~# dpkg -l | grep ifupdown
ii ifupdown 0.8.16ubuntu2.1 amd64 high level tools to configure network interfaces
root@lp1701023-a:~# dpkg -l | grep vlan
ii vlan 1.9-3.2ubuntu5.17.10.1 amd64 user mode programs to enable VLANs on your ethernet devices

all tests from comment 19 work. as the previous versions also all worked, no need to test them again.

tags: added: verification-done-artful
removed: verification-needed-artful
Dan Streetman (ddstreet) wrote :

xenial:

root@lp1701023-x:/etc/network/interfaces.d# dpkg -l |grep ifupdown
ii ifupdown 0.8.10ubuntu1.4 amd64 high level tools to configure network interfaces
root@lp1701023-x:/etc/network/interfaces.d# dpkg -l |grep vlan
ii vlan 1.9-3.2ubuntu1.16.04.5 amd64 user mode programs to enable VLANs on your ethernet devices

all comment 19 tests work.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Dan Streetman (ddstreet) wrote :

trusty:

root@lp1701023:~# dpkg -l | grep ifupdown
ii ifupdown 0.7.47.2ubuntu4.4 amd64 high level tools to configure network interfaces
root@lp1701023:~# dpkg -l | grep vlan
ii vlan 1.9-3ubuntu10.5 amd64 user mode programs to enable VLANs on your ethernet devices
root@lp1701023:~#

I verified comment 19 tests all passed (ifup -a, reboot, hotplug) as expected for these pkg versions.

I also verified with the e/n/i config from this bug's description does cause the hang failure as expected (tested with ifup -a).

root@lp1701023:/etc/network/interfaces.d# dpkg -l | grep ifupdown
ii ifupdown 0.7.47.2ubuntu4.5 amd64 high level tools to configure network interfaces
root@lp1701023:/etc/network/interfaces.d# dpkg -l | grep vlan
ii vlan 1.9-3ubuntu10.6 amd64 user mode programs to enable VLANs on your ethernet devices

With the updated -proposed pkgs, I verified it passes all comment 19 tests (ifup -a, reboot, hotplug).

I also verified that the e/n/i config from this bug's description does not hang, and properly brings up all configured interfaces with the right config (for ifup -a, reboot, and hotplug).

tags: added: verification-done verification-done-trusty
removed: verification-needed verification-needed-trusty
Łukasz Zemczak (sil2100) wrote :

I think this aged long enough in -proposed - releasing.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vlan - 1.9-3.2ubuntu6

---------------
vlan (1.9-3.2ubuntu6) bionic; urgency=medium

  * Revert change for lp1573272; instead fix by redesigning when vlan
    interfaces are created; after raw-device ifup, not during raw-device
    udev processing. (LP: #1701023)

 -- Dan Streetman <email address hidden> Thu, 19 Apr 2018 18:10:17 -0400

Changed in vlan (Ubuntu Bionic):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for vlan has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.8.17ubuntu1.1

---------------
ifupdown (0.8.17ubuntu1.1) bionic; urgency=medium

  * We are not even reading the contents of the per-interface state files
    when running ifquery, so there is no need to lock them. Not locking
    will allow ifquery to be called recursively from ifup and ifdown.
    (LP: #1701023)

 -- Dan Streetman <email address hidden> Fri, 27 Apr 2018 08:00:07 -0400

Changed in ifupdown (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vlan - 1.9-3.2ubuntu5.17.10.1

---------------
vlan (1.9-3.2ubuntu5.17.10.1) artful; urgency=medium

  * Revert change for lp1573272; instead fix by redesigning when vlan
    interfaces are created; after raw-device ifup, not during raw-device
    udev processing. (LP: #1701023)

 -- Dan Streetman <email address hidden> Thu, 19 Apr 2018 18:10:17 -0400

Changed in vlan (Ubuntu Artful):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.8.16ubuntu2.1

---------------
ifupdown (0.8.16ubuntu2.1) artful; urgency=medium

  * We are not even reading the contents of the per-interface state files
    when running ifquery, so there is no need to lock them. Not locking
    will allow ifquery to be called recursively from ifup and ifdown.
    (LP: #1701023)

 -- Dan Streetman <email address hidden> Fri, 27 Apr 2018 08:53:27 -0400

Changed in ifupdown (Ubuntu Artful):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vlan - 1.9-3.2ubuntu1.16.04.5

---------------
vlan (1.9-3.2ubuntu1.16.04.5) xenial; urgency=medium

  * Revert change for lp1573272; instead fix by redesigning when vlan
    interfaces are created; after raw-device ifup, not during raw-device
    udev processing. (LP: #1701023)

 -- Dan Streetman <email address hidden> Thu, 19 Apr 2018 18:10:17 -0400

Changed in vlan (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.8.10ubuntu1.4

---------------
ifupdown (0.8.10ubuntu1.4) xenial; urgency=medium

  * We are not even reading the contents of the per-interface state files
    when running ifquery, so there is no need to lock them. Not locking
    will allow ifquery to be called recursively from ifup and ifdown.
    (LP: #1701023)

 -- Dan Streetman <email address hidden> Fri, 27 Apr 2018 08:56:12 -0400

Changed in ifupdown (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vlan - 1.9-3ubuntu10.6

---------------
vlan (1.9-3ubuntu10.6) trusty; urgency=medium

  * Revert change for lp1573272; instead fix by redesigning when vlan
    interfaces are created; after raw-device ifup, not during raw-device
    udev processing. (LP: #1701023)

 -- Dan Streetman <email address hidden> Thu, 19 Apr 2018 18:10:17 -0400

Changed in vlan (Ubuntu Trusty):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.7.47.2ubuntu4.5

---------------
ifupdown (0.7.47.2ubuntu4.5) trusty; urgency=medium

  * We are not even reading the contents of the per-interface state files
    when running ifquery, so there is no need to lock them. Not locking
    will allow ifquery to be called recursively from ifup and ifdown.
    (LP: #1701023)

 -- Dan Streetman <email address hidden> Fri, 27 Apr 2018 09:03:18 -0400

Changed in ifupdown (Ubuntu Trusty):
status: Fix Committed → Fix Released
Changed in ifupdown (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.