mtu not always set properly on bond/vlan interface

Bug #1224007 reported by Chris J Arges on 2013-09-11
84
This bug affects 13 people
Affects Status Importance Assigned to Milestone
vlan (Debian)
New
Unknown
vlan (Ubuntu)
Medium
Unassigned
Precise
Medium
Unassigned
Trusty
Medium
Unassigned
Xenial
Medium
Unassigned

Bug Description

* Description

When configuring a network with bonding+vlan and setting the MTU,
occasionally the MTU doesn't get set properly on the vlan interface.

In addition if one checks /var/log/upstart/networking.log whenever there is a failure the following message is printed:
SIOCSIFMTU: Numerical result out of range

I've tested the latest ifupdown package (0.7.44) and the problem still exists.
Multi/single CPU settings both exhibit the issue.

* Versions
This affects latest ifupdown and ubuntu p/q/r/s.

* Test Case

# Create a p/q/r/s server vm with two network interfaces
# This is reproducible on real hardware as well

# Install the following
sudo apt-get install vlan ifenslave-2.6 bridge-utils
sudo modprobe bonding 8021q

# Edit the interfaces file
/etc/networking/interfaces:

auto bond0
iface bond0 inet manual
  bond-mode 802.3ad
  bond-miimon 100
  bond-lacp-rate 1
  bond-slaves eth0 eth1
  post-up ifconfig bond0 mtu 9000

auto eth0
iface eth0 inet manual
  bond-master bond0
  post-up ifconfig eth0 mtu 9000

auto eth1
iface eth1 inet manual
  bond-master bond0
  post-up ifconfig eth1 mtu 9000

auto bond0.123
iface bond0.123 inet static
  address 192.168.122.68
  netmask 255.255.255.0
  gateway 192.168.122.1
  post-up ifconfig bond0.123 mtu 9000

# edit rc.local (or another startup script) so we reboot until we hit the error
/etc/rc.local:

DEVS="eth0 eth1 bond0 bond0.123"
for d in $DEVS; do
        mtu=$(cat /sys/class/net/$d/mtu)
        if [ $mtu != 9000 ]; then
                echo "FAIL"
                exit 1
        fi
done

reboot
exit 0

# Now reboot the machine, within 10m or so you should be at the login prompt
# if you ifconfig | grep MTU you will see some of our interfaces did not get
# the MTU properly set and the test failed.
# Essentially we want to ensure that all MTU's (except lo) were set to 9000

* Regression Potential

- The biggest regression potential would be around an unknown other corner case between MTU and VLANs which could cause that interface to fail to come up (by this fix trying to start it in a different order).
- It might also be possible that some users really don't want the raw MTU increased while the vlan should be.

* Workaround

Change the bond0.123 post-up command to:
  post-up sleep 2 && ifconfig bond0.123 mtu 9000

Now when rebooting the interfaces will all be brought up with the proper MTU.

Changed in ifupdown (Debian):
status: Unknown → New
Chris J Arges (arges) wrote :

In addition I've been able to set the MTU on the tagged bond using the following:

auto bond0.123
iface bond0.123 inet static
  address 192.168.122.68
  netmask 255.255.255.0
  gateway 192.168.122.1
  mtu 9000

As the documentation mentions (for both ipv4/ipv6):
   The static Method
       Options
              mtu size
                     MTU size

I think this would be the preferred way of setting this, however if I use this method I get:
"Waiting on network configuration" messages, which delay the boot by almost 2 minutes.

Chris J Arges (arges) wrote :

The "Waiting on network configuration" is most likely caused by bug 1065077.
I think using the documented method the real bug is this, and thus I'll mark it a duplicate.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ifupdown (Ubuntu):
status: New → Confirmed
Tomasz Głuch (tomekg) wrote :

This is definitely not a duplicate of bug #1065077. It is amazing that having LACP+VLAN+Jumboframes configuration doesn't work since 2013 on two consecutive LTS server editions, so I explored this problem.
It was probably introduced when Upstart started to manage interfaces.

A race condition occurs during ifup phase, because configuration of descdenant interface (especially setting MTU) depends on successful configuration of parent interface.

In this particular case, bond0.X's MTU cannot be set to value greater than MTU of bond0, so ifup fails on execution for bond0.X with
"RTNETLINK answers: Numerical result out of range" and status code of 2. As a result, interface is left in half-configured state.

Direct cause of problem is a lack of synchronization between starting networking tasks for main interface and its subinterfaces.
It is especially an issue for bonding with LACP, because it takes some time to LACP negotiation.
MTU is set by ifupdown binary itself, with some delay on bond0 (1-2s in my case).
Unfortunatelly, before it is finished, task for bond0.X is also fired and failed immediatelly.
I've checked timings and I needed to wait about 0.6s in bond0.X's pre-up to have MTU on bond0 properly set.

I attached a poor man's synchronization script, which solves the problem by implementing sleep until parent interface has correct MTU.

I'm unsure if it's possible to enforce correct order in Upstart, maybe a Upstart master is here and could confirm this?

It is very likely that the similar problem occurs in VLAN+Bridge+MTU configuration.

Changed in ifupdown (Debian):
status: New → Fix Released
Dan Streetman (ddstreet) wrote :

I should note that this is marked as Fix Released in debian, but from the debian bug comments it just looks like Chris asked for that bug to be closed as a configuration issue (which this is not).

This is actually only a VLAN issue with higher-than-default mtu. The bond does not have anything to do with the problem. The problem is the vlan package (required for ifupdown vlan support) installs a udev rule that triggers for each new interface; it checks the ifupdown configuration, and creates any corresponding vlan interfaces. The real interface and all its vlan interfaces then race to finish udev processing and notify upstart, which then calls ifup directly on each interface. If upstart calls ifup for the vlan before the actual interface (and the vlan has a high mtu), it will fail while being brought up.

I think this can be fixed in a similar way to bug 1609367, where the problem is higher-than-default mtu on ipv6 (i.e. inet6 section); although in an easier way, since ifupdown already requires the 'vlan' package for vlan support, and that package provides an if-pre-up script already.

If we edit the /etc/network/if-pre-up.d/vlan script (from the vlan package) like below (for trusty, similar patch for x and y), it will increase the raw device's mtu if needed.

--- vlan.orig 2016-09-08 02:12:55.901172000 +0000
+++ vlan 2016-09-08 02:10:49.213172000 +0000
@@ -51,7 +51,14 @@
         echo "$IF_VLAN_RAW_DEVICE does not exist, unable to create $IFACE"
         exit 1
     fi
- ip link set up dev $IF_VLAN_RAW_DEVICE
+ if [ -n "$IF_MTU" ]; then
+ CUR_DEV_MTU=`cat /sys/class/net/$IF_VLAN_RAW_DEVICE/mtu`
+ # increase the vlan raw device mtu if needed
+ if [ -n "$CUR_DEV_MTU" ] && [ $CUR_DEV_MTU -lt $IF_MTU ]; then
+ MTU_PARAM="mtu $IF_MTU"
+ fi
+ fi
+ ip link set up dev $IF_VLAN_RAW_DEVICE $MTU_PARAM
     vconfig add $IF_VLAN_RAW_DEVICE $VLANID
 fi

Dan Streetman (ddstreet) wrote :

pls ignore patch from above comment; i have a simpler one i'll attach.

The attachment "patch to yakkety debian/network/if-pre-up.d/vlan script" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Changed in ifupdown (Debian):
status: Fix Released → New
tags: added: rls-y-incoming
Dan Streetman (ddstreet) wrote :

sorry, the first patch had an issue where it didn't raise the device mtu if the vlan already existed. the v2 checks and increases if needed the dev mtu, even if the vlan already exists.

Robie Basak (racb) wrote :

Subscribing ~ubuntu-sponsors

affects: ifupdown (Debian) → vlan (Debian)
no longer affects: ifupdown (Ubuntu)
Changed in vlan (Ubuntu):
importance: Undecided → Medium
sakishrist (sakishrist) wrote :

Hi Dan,

On the first line there needs to be a space just before the closing ']'. Otherwise the bracket is considered part of the string and it starts complaining about not finding it.

Other than that, I just applied your patch to a few machines and it seems to work as expected.

Dan Streetman (ddstreet) wrote :
Dan Streetman (ddstreet) wrote :
Dan Streetman (ddstreet) wrote :

@sakishrist, thanks, I updated all the debdiffs to add the missing space.

tags: added: sts-sru
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in vlan (Ubuntu):
status: New → Confirmed
Dr. Jens Harbott (j-harbott) wrote :

The patch for xenial solves this issue for me, too. How does the timeline for getting fixed packages look like?

Robie Basak (racb) on 2016-10-05
Changed in vlan (Ubuntu):
status: Confirmed → Triaged
Changed in vlan (Ubuntu Precise):
status: New → Triaged
Changed in vlan (Ubuntu Trusty):
status: New → Triaged
Changed in vlan (Ubuntu Xenial):
status: New → Triaged
Changed in vlan (Ubuntu Precise):
importance: Undecided → Medium
Changed in vlan (Ubuntu Trusty):
importance: Undecided → Medium
Changed in vlan (Ubuntu Xenial):
importance: Undecided → Medium
Robie Basak (racb) wrote :

The fix looks great. Uploaded to Yakkety, thank you for your efforts.

This bug is missing full SRU information, so I can't upload the SRUs right now. This would be best done by the person most familiar with the bug and fix to minimise risk to users. Please can you complete these, eg. the "Regression Potential" section? See https://wiki.ubuntu.com/StableReleaseUpdates#Procedure for details.

For the SRU version numbers, I'd prefer to see the scheme used that is documented at https://wiki.ubuntu.com/SecurityTeam/UpdatePreparation#Update_the_packaging. This reduces the potential for mistakes and makes it clear that the version is an SRU. For example, your version number for Precise is outright wrong as version 1.9-3ubuntu7 has already been published in Quantal; using the documented scheme would have avoided this error.

Please complete the SRU information, and I'll sponsor the SRUs for you. Thanks!

Changed in vlan (Ubuntu):
status: Triaged → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vlan - 1.9-3.2ubuntu2

---------------
vlan (1.9-3.2ubuntu2) yakkety; urgency=medium

  * If VLAN is configured with higher MTU than raw device MTU, which can
    happen if VLAN is ifup'ed before raw device, then increase raw device
    MTU first so the VLAN ifup does not fail. (LP: #1224007)

 -- Dan Streetman <email address hidden> Thu, 08 Sep 2016 12:47:31 -0400

Changed in vlan (Ubuntu):
status: Fix Committed → Fix Released
Bryan Quigley (bryanquigley) wrote :

Added two possible regressions potentials.

description: updated
Dan Streetman (ddstreet) wrote :
Dan Streetman (ddstreet) wrote :
Dan Streetman (ddstreet) wrote :

@racb i changed the version numbers in the debdiffs for precise and trusty, i think the xenial debdiff number is already ok. thanks!

Robie Basak (racb) wrote :

Uploaded, thanks! I appreciate the high quality changelog entry.

I modified the Xenial version to 1.9-3.2ubuntu1.16.04.1 since 1.9-3.2ubuntu1 also exists in Vivid and Wily. Otherwise we'd have to work around not having something less than 1.9-3.2ubuntu1.1 available in case of an SRU to Vivid or to Wily by using tildes or something. Granted, Vivid and Wily are EOL, but Vivid still appears active, presumably on the phone. And vlan is unlikely to be needed to be SRU'd there, but I prefer to be consistent given that I can see a potential issue even if it is unlikely.

Now awaiting review from the SRU team.

Changed in vlan (Ubuntu Precise):
status: Triaged → In Progress
Changed in vlan (Ubuntu Trusty):
status: Triaged → In Progress
Changed in vlan (Ubuntu Xenial):
status: Triaged → In Progress

Hello Chris, or anyone else affected,

Accepted vlan into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vlan/1.9-3.2ubuntu1.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in vlan (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in vlan (Ubuntu Trusty):
status: In Progress → Fix Committed
Chris J Arges (arges) wrote :

Hello Chris, or anyone else affected,

Accepted vlan into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vlan/1.9-3ubuntu10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in vlan (Ubuntu Precise):
status: In Progress → Fix Committed
Chris J Arges (arges) wrote :

Hello Chris, or anyone else affected,

Accepted vlan into precise-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/vlan/1.9-3ubuntu6.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Bryan Quigley (bryanquigley) wrote :

We've verified it in the VLAN case (without bridges) for 12.04, 14.04 and 16.04.

tags: added: verification-done
removed: verification-needed

The verification of the Stable Release Update for vlan has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vlan - 1.9-3ubuntu6.1

---------------
vlan (1.9-3ubuntu6.1) precise; urgency=medium

  * If VLAN is configured with higher MTU than raw device MTU, which can
    happen if VLAN is ifup'ed before raw device, then increase raw device
    MTU first so the VLAN ifup does not fail. (LP: #1224007)

 -- Dan Streetman <email address hidden> Thu, 08 Sep 2016 12:47:31 -0400

Changed in vlan (Ubuntu Precise):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vlan - 1.9-3.2ubuntu1.16.04.1

---------------
vlan (1.9-3.2ubuntu1.16.04.1) xenial; urgency=medium

  * If VLAN is configured with higher MTU than raw device MTU, which can
    happen if VLAN is ifup'ed before raw device, then increase raw device
    MTU first so the VLAN ifup does not fail. (LP: #1224007)

 -- Dan Streetman <email address hidden> Thu, 08 Sep 2016 12:47:31 -0400

Changed in vlan (Ubuntu Xenial):
status: Fix Committed → Fix Released
Chris J Arges (arges) wrote :

This was also released in trusty:
Proposed: 1.9-3ubuntu10.1
Release: 1.9-3ubuntu10
Copied to trusty-updates

Chris J Arges (arges) wrote :

Also thanks to Dan et all for bringing this bug back up! I was wrong about the duplicate.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vlan - 1.9-3ubuntu10.1

---------------
vlan (1.9-3ubuntu10.1) trusty; urgency=medium

  * If VLAN is configured with higher MTU than raw device MTU, which can
    happen if VLAN is ifup'ed before raw device, then increase raw device
    MTU first so the VLAN ifup does not fail. (LP: #1224007)

 -- Dan Streetman <email address hidden> Thu, 08 Sep 2016 12:47:31 -0400

Changed in vlan (Ubuntu Trusty):
status: Fix Committed → Fix Released
Louis Bouchard (louis) on 2016-11-09
tags: removed: sts-sru
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.