gre_sys set to default 1472 when using path_mtu > 1500 with ovs 2.8.x

Bug #1742505 reported by David Ames
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
High
James Page
Pike
Fix Released
Critical
James Page
Queens
Fix Released
High
James Page
neutron
Invalid
Undecided
Unassigned
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Artful
Won't Fix
Undecided
Unassigned
Bionic
Confirmed
Undecided
Unassigned
openvswitch (Ubuntu)
Fix Released
High
James Page
Artful
Fix Released
Critical
James Page
Bionic
Fix Released
High
James Page

Bug Description

[Impact]
OpenStack Clouds using GRE overlay tunnels with > 1500 MTU's will observe packet fragmentation/networking issues for traffic in overlay networks.

[Test Case]
Deploy OpenStack Pike (xenial + pike UCA or artful)
Create tenant networks using GRE segmentation
Boot instances
Instance networking will be broken/slow

gre_sys devices will be set to mtu=1472 on hypervisor hosts.

[Regression Potential]
Minimal; the fix to OVS works around an issue for GRE tunnel port setup via rtnetlink by performing a second request once the gre device is setup to set the MTU to a high value (65000).

[Original Bug Report]
Setup:
Pike neutron 11.0.2-0ubuntu1.1~cloud0
OVS 2.8.0
Jumbo frames setttings per: https://docs.openstack.org/mitaka/networking-guide/config-mtu.html
global_physnet_mtu = 9000
path_mtu = 9000

Symptoms:
gre_sys MTU is 1472
Instances with MTUs > 1500 fail to communicate across GRE

Temporary Workaround:
ifconfig gre_sys MTU 9000
Note: When ovs rebuilds tunnels, such as on a restart, gre_sys MTU is set back to default 1472.

Note: downgrading from OVS 2.8.0 to 2.6.1 resolves the issue.

Previous behavior:
With Ocata or Pike and OVS 2.6.x
gre_sys MTU defaults to 65490
It remains at 65490 through restarts.

This may be related to some combination of the following changes in OVS which seem to imply MTUs must be set in the ovs database for tunnel interfaces and patches:
https://github.com/openvswitch/ovs/commit/8c319e8b73032e06c7dd1832b3b31f8a1189dcd1
https://github.com/openvswitch/ovs/commit/3a414a0a4f1901ba015ec80b917b9fb206f3c74f
https://github.com/openvswitch/ovs/blob/6355db7f447c8e83efbd4971cca9265f5e0c8531/datapath/vport-internal_dev.c#L186

Ryan Beisner (1chb1n)
tags: added: serverstack uosci upgrade
David Ames (thedac)
description: updated
tags: added: ovs
Revision history for this message
James Page (james-page) wrote :

Pinged upstream dev mailing list for openvswitch for feedback

Revision history for this message
James Page (james-page) wrote :

Eric Garver:

"Most likely you are seeing this bug:

  https://bugzilla.redhat.com/show_bug.cgi?id=1488484

It's actually an issue with the kernel GRE driver ignoring IFLA_MTU when
the device is created."

Creating test package with proposed fix:

https://patchwork.ozlabs.org/patch/860192/

Changed in openvswitch (Ubuntu):
status: New → Confirmed
importance: Undecided → Critical
Changed in neutron:
status: New → Invalid
James Page (james-page)
Changed in openvswitch (Ubuntu Artful):
assignee: nobody → James Page (james-page)
Changed in openvswitch (Ubuntu Bionic):
assignee: nobody → James Page (james-page)
Changed in openvswitch (Ubuntu Artful):
importance: Undecided → Critical
status: New → In Progress
Changed in openvswitch (Ubuntu Bionic):
status: Confirmed → In Progress
importance: Critical → High
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1742505

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Artful):
status: New → Incomplete
Revision history for this message
James Page (james-page) wrote :

Mirroring some of the thread on the ovs dev ML:

From Christina

"Thanks Eric, that matches my findings, glad that there seems to be an
accepted fix already.
But it is fairly recent and only in since 4.15-rc8 levels afaik.

But OTOH its description at [1] reads pretty much like my notes so far.

@James - do you think you could test a super-recent mainline kernel
build from [2] in regard to this issue?

[1]: https://github.com/torvalds/linux/commit/cfddd4c33c254954927942599d299b3865743146
[2]: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc8/"

This kernel fix resolves a secondary issue I saw with the 4.13 kernel where the workaround we used for 4.4 did not work - the kernel was applying a 1500 max limit to all GRE devices.

This was fixed in [1] which I've confirmed on my test but the original OVS bug still exists - gre_sys is still configured with MTU 1472, implying the IFLA_MTU value provided via netlink is being ignored by the kernel.

Changed in linux (Ubuntu Artful):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Bionic):
status: Incomplete → Confirmed
Revision history for this message
James Page (james-page) wrote :

Marking kernel tasks as confirmed as we have at least one related bug which we have confirmed is fixed under 4.15 (and looks like an easy pick for 4.13).

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

Simple reproducer:

sudo add-apt-repository cloud-archive:pike
sudo apt update
sudo apt install openvswitch-switch
sudo ovs-vsctl add-br br-tun
sudo ovs-vsctl add-port br-tun gre0 -- set interface gre0 type=gre options:remote_ip=10.100.1.1

this will cause ovs to create the gre_sys device in the kernel; the remote_ip can be anything but is obviously not functional.

Revision history for this message
James Page (james-page) wrote :

Reference bug 1743746 for MTU hardware limitation in 4.13 kernel

Revision history for this message
James Page (james-page) wrote :

I've re-tested with the proposed changes upstream; MTU is correctly set to 65000 for all tunnel types including GRE on 4.4 kernels; the kernel fix for bug 1743746 is required for support on >= Artful or if the HWE edge kernel is in use on Xenial.

Revision history for this message
James Page (james-page) wrote :

Test packages for Artful and Xenial in:

  ppa:james-page/pike

Revision history for this message
James Page (james-page) wrote :

Leaving the kernel bug tasks open for now; there is an underlying issue in the kernel specific to GRE tunnel ports where the MTU passed during new device creation is not being used; the fixes to OVS workaround this issue rather than actually resolving it.

description: updated
Revision history for this message
James Page (james-page) wrote :

I've uploaded the fixes for this issue to bionic (currently in proposed awaiting builds across all archs) and for artful (stacked ontop of the current 2.8.1 stable release in proposed).

I'd like todo this fix ontop of the 2.8.1 release, rather than have end-users deal with two sets of updates which are disruptive to the data plane in an openstack cloud.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

@james-page When will the 2.8.1 release be?

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello David, or anyone else affected,

Accepted openvswitch into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/openvswitch/2.8.1-0ubuntu0.17.10.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in openvswitch (Ubuntu Artful):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-artful
Revision history for this message
James Page (james-page) wrote :

Hello David, or anyone else affected,

Accepted openvswitch into pike-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:pike-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-pike-needed to verification-pike-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-pike-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-pike-needed
Revision history for this message
James Page (james-page) wrote :

Tested OK on Xenial with Pike UCA:

16: gre_sys@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc pfifo_fast master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 56:5c:6e:19:f5:4b brd ff:ff:ff:ff:ff:ff

However testing on artful currently blocked pending resolution of bug 1743746 which prevents an increase in the MTU of the gre devices.

tags: added: verification-pike-done
removed: verification-pike-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openvswitch - 2.8.1-0ubuntu3

---------------
openvswitch (2.8.1-0ubuntu3) bionic; urgency=medium

  * Updates to systemd configuration:
    - Move to distinct units for ovsdb-server and ovs-vswitchd.
  * Drop obsolete upstart configuration file.
  * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
  * d/control: Bump minimum debhelper version to 10, drop BD on
    dh-systemd.
  * d/p/dpif-kernel-gre-mtu-workaround.patch,
    d/p/dpif-netlink-rtnl-Use-65000-instead-of-65535-as-tunnel-MTU.patch:
    Cherry pick in-flight fixes for workaround to correctly set MTU
    of GRE devices via netlink (LP: #1742505).

 -- James Page <email address hidden> Thu, 18 Jan 2018 15:26:41 +0200

Changed in openvswitch (Ubuntu Bionic):
status: In Progress → Fix Released
Revision history for this message
James Page (james-page) wrote :

Hello David, or anyone else affected,

Accepted openvswitch into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-queens-needed
Revision history for this message
James Page (james-page) wrote :

successfully tested with bug 1743746

tags: added: verification-done-artful verification-queens-done
removed: verification-needed-artful verification-queens-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openvswitch - 2.8.1-0ubuntu0.17.10.2

---------------
openvswitch (2.8.1-0ubuntu0.17.10.2) artful; urgency=medium

  * d/p/dpif-kernel-gre-mtu-workaround.patch,
    d/p/dpif-netlink-rtnl-Use-65000-instead-of-65535-as-tunnel-MTU.patch:
    Cherry pick in-flight fixes for workaround to correctly set MTU
    of GRE devices via netlink (LP: #1742505).

openvswitch (2.8.1-0ubuntu0.17.10.1) artful; urgency=medium

  * New upstream stable release (LP: #1724622).

 -- James Page <email address hidden> Sat, 20 Jan 2018 10:22:31 +0000

Changed in openvswitch (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Robie Basak (racb) wrote : Update Released

The verification of the Stable Release Update for openvswitch has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

The verification of the Stable Release Update for openvswitch has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package openvswitch - 2.8.1-0ubuntu0.17.10.2~cloud0
---------------

 openvswitch (2.8.1-0ubuntu0.17.10.2~cloud0) xenial-pike; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 openvswitch (2.8.1-0ubuntu0.17.10.2) artful; urgency=medium
 .
   * d/p/dpif-kernel-gre-mtu-workaround.patch,
     d/p/dpif-netlink-rtnl-Use-65000-instead-of-65535-as-tunnel-MTU.patch:
     Cherry pick in-flight fixes for workaround to correctly set MTU
     of GRE devices via netlink (LP: #1742505).

Revision history for this message
Brian Murray (brian-murray) wrote :

Ubuntu 17.10 (Artful Aardvark) has reached end of life, so this bug will not be fixed for that specific release.

Changed in linux (Ubuntu Artful):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.