MTU concerns for the Linux bridge agent

Bug #1542108 reported by Matt Kassawara
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Kevin Benton

Bug Description

I ran some experiments with the Linux bridge agent [1] to determine the source of MTU problems and offer a potential solution. The environment for these experiments contains the following items:

1) A physical (underlying) network supporting MTU of 1500 or 9000 bytes.
2) One controller node running the neutron server, Linux bridge agent, L3 agent, DHCP agent, and metadata agent.
3) One compute node running the Linux bridge agent.
4) A neutron provider/public network.
5) A neutron self-service/private network.
6) A neutron router between the provider and self-service networks.
7) The self-service network uses the VXLAN protocol with IPv4 endpoints which adds 50 bytes of overhead.
8) An instance on the self-service network with a floating IP address from an allocation pool on the provider network.

Background:

1. For tunnel interfaces, Linux automatically subtracts protocol overhead from the parent interface MTU. For example, if eth0 has a 1500 MTU, a VXLAN interface using it as a parent device has a 1450 MTU.

2. For bridge devices, Linux assumes a 1500 MTU and changes the MTU to the lowest MTU of any port on the bridge. For example, a bridge without ports has a 1500 MTU. If eth0 has a 9000 MTU and you add it as a port on the bridge, the bridge changes to a 9000 MTU. If eth1 has a 1500 MTU and you add it as a port on the bridge, the bridge changes to a 1500 MTU.

3. Only devices that operate at layer-3 can participate in path MTU discovery (PMTUD). Therefore, a change of MTU in a layer-2 device such as a bridge or veth pair causes that device to discard packets larger than the smallest MTU.

Observations:

1. For any physical network MTU, instances must use a MTU value that accounts for overlay protocol overhead. Neutron currently offers a way to provide a correct value via DHCP. However, it only addresses packets outbound from instances. The next two items address packets inbound to instances.

2. For any physical network MTU, each end of the veth pair between the self-service network router interface (qr) in the router namespace (qrouter) and the self-service network bridge on the controller node (qbr) contains a different MTU. The qr end has a 1500 MTU, the default value, and the qbr end has a 1450 MTU because the bridge contains a VXLAN interface with a 1450 MTU. Thus, the veth pair discards packets with a payload larger than 1450 bytes.

3. For a physical network MTU larger than 1500, each end of the veth pair between the provider network router gateway (qg) in the router namespace (qrouter) and the provider network bridge on the controller node (qbr) contains a different MTU. The qg end has a 1500 MTU, the default value, and the qbr end inherits the larger MTU of physical network interface. Thus, the veth pair discards packets with a payload larger than 1500 bytes.

Potential solution:

As per background item (3), MTU disparities must occur in a device that operates at layer-3. For example, a router namespace that contains interfaces with IP addresses. We can accomplish this task in neutron by always using the same MTU on both ends of a veth pair. In observation item (2), both ends of the veth pair should use 1450, the self-service network MTU. In observation item (3), both ends of the veth pair should use 9000, the provider network MTU. If a packet from the provider network to the instance has a payload larger than 1450 bytes, the router can send an ICMP message to the source telling it to use a 1450 MTU.

[1] http://lists.openstack.org/pipermail/openstack-dev/2016-January/084241.html

Revision history for this message
Nate Johnston (nate-johnston) wrote :

Matt,

Thanks for the thorough discussion and analysis. It is much appreciated. But I am not sure what, specific, things you would like changed that go above and beyond the change in https://review.openstack.org/#/c/276411

Marking this 'Incomplete' for now, because I can't define specifically what actions would result in this bug being satisfied.

Thanks!

Changed in neutron:
status: New → Incomplete
Revision history for this message
Matt Kassawara (ionosphere80) wrote :

Changing the 'path_mtu' default to 1500 only addresses the instance side of the MTU problem... and only for physical (underlying) network MTU values of 1500 or less.

Changed in neutron:
status: Incomplete → Confirmed
assignee: nobody → Sean M. Collins (scollins)
Changed in neutron:
importance: Undecided → High
tags: added: ops
tags: added: usability
Changed in neutron:
milestone: none → mitaka-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/276411
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7a4633a9ca213ee03cbcd45027189e0b3e7df2f5
Submitter: Jenkins
Branch: master

commit 7a4633a9ca213ee03cbcd45027189e0b3e7df2f5
Author: Sean M. Collins <email address hidden>
Date: Thu Feb 4 14:03:06 2016 -0500

    ML2: Configure path_mtu to default to 1500 bytes

    I8287677c7ad0f13fa9f5cb194f9372d04b78cb61 changed the behavior of
    DevStack, so that path_mtu is being set to 1500 by default. Let's take
    the next step and set path_mtu to 1500 by default, instead of relying on
    deployment tools like DevStack to set it.

    When using the ML2 plugin and setting path_mtu to default to 1500,
    the second order effect is that all Neutron
    Network objects will have the mtu attribute populated. For tenant
    network types like VXLAN, the MTU will be set as path_mtu less the
    protocol overhead. So for example, a Neutron Network backed by VXLAN
    will have the mtu attribute set to 1450.

    Related-Bug: #1542108
    Co-Authored-By: Matt Kassawara <email address hidden>
    Change-Id: I4096a3e7704032fa4aa5c3aa8bcaec4e38d0d06d

Changed in neutron:
assignee: Sean M. Collins (scollins) → Matt Kassawara (ionosphere80)
status: Confirmed → In Progress
Changed in neutron:
assignee: Matt Kassawara (ionosphere80) → Kevin Benton (kevinbenton)
Changed in neutron:
assignee: Kevin Benton (kevinbenton) → Matt Kassawara (ionosphere80)
Changed in neutron:
assignee: Matt Kassawara (ionosphere80) → Kevin Benton (kevinbenton)
Changed in neutron:
assignee: Kevin Benton (kevinbenton) → Matt Kassawara (ionosphere80)
Changed in neutron:
assignee: Matt Kassawara (ionosphere80) → Kevin Benton (kevinbenton)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/283798
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=12ef585f91d01e1a8bf450e66d1cdbb79bf86f99
Submitter: Jenkins
Branch: master

commit 12ef585f91d01e1a8bf450e66d1cdbb79bf86f99
Author: Kevin Benton <email address hidden>
Date: Mon Feb 22 16:56:02 2016 -0800

    Deprecate network_device_mtu

    All interface plugging MTUs should be calculated based on the
    MTU of the network object that the port is being created on.

    Partial-Bug: #1542108
    Partial-Bug: #1542475

    Change-Id: Idf6221fee2c7da86123b330ad3c235ecc6868242

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/283790
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4df8d9a7016ab20fce235833d792b89309ec98a7
Submitter: Jenkins
Branch: master

commit 4df8d9a7016ab20fce235833d792b89309ec98a7
Author: Kevin Benton <email address hidden>
Date: Mon Feb 22 16:41:45 2016 -0800

    Make agent interface plugging utilize network MTU

    This changes the 'plug' and 'plug_new' interfaces of the
    LinuxInterfaceDriver to accept an MTU argument. It then
    updates the dhcp agent and l3 agent to pass the MTU that
    is set on the network that the port belongs to. This allows
    it to take into account the overhead calculations that are
    done for encapsulation types.

    It's necessary for the L3 agent to have the MTU because it
    must recognize when fragmentation is needed so it can fragment
    or generate an ICMP error.

    It's necessary for the DHCP agent to have the MTU so it doesn't
    interfere when it plugs into a bridge with a larger than 1500
    MTU (the bridge would reduce its MTU to match the agent).

    If an operator sets 'network_device_mtu', the value of that
    will be used instead to preserve previous behavior.

    Closes-Bug: #1549470
    Closes-Bug: #1542108
    Closes-Bug: #1542475
    DocImpact: Neutron agents now support arbitrary MTU
               configurations on each network (including
               jumbo frames). This is accomplished by checking
               the MTU value defined for each network on which
               it is wiring VIFs.
    Co-Authored-By: Matt Kassawara <email address hidden>
    Change-Id: Ic091fa78dfd133179c71cbc847bf955a06cb248a

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0b3

This issue was fixed in the openstack/neutron 8.0.0.0b3 development milestone.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

The issue is indeed fixed, but a variable to be deprecate was left over.

Changed in neutron:
milestone: mitaka-3 → mitaka-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/284814
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ae45cd5732b8b75a5b2d35df555f46e11145eb85
Submitter: Jenkins
Branch: master

commit ae45cd5732b8b75a5b2d35df555f46e11145eb85
Author: Kevin Benton <email address hidden>
Date: Tue Feb 23 13:11:40 2016 -0800

    Add global_physnet_mtu and deprecate segment_mtu

    Introduce the neutron-wide 'global_physnet_mtu' option that
    references the underlying physical network MTU. This also
    introduces a method in plugin.common.utils that all plugins
    should use to retrieve it. This value should be used to
    calculate the proper MTU for virtual network components.

    This patch also deprecate the 'segment_mtu' option specific
    to the ML2 plug-in and makes ML2 reference this new option.

    Closes-Bug: #1542475
    Closes-Bug: #1542108
    Change-Id: I6ffc8973c9b8f46cc19922ff04fdd2d23646b878

tags: added: deprecation
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0rc1

This issue was fixed in the openstack/neutron 8.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/305782

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/305782
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c55aba1dba31c6730c80db0118286f1f9e84cd9b
Submitter: Jenkins
Branch: stable/liberty

commit c55aba1dba31c6730c80db0118286f1f9e84cd9b
Author: Kevin Benton <email address hidden>
Date: Mon Feb 22 16:41:45 2016 -0800

    Make agent interface plugging utilize network MTU

    This changes the 'plug' and 'plug_new' interfaces of the
    LinuxInterfaceDriver to accept an MTU argument. It then
    updates the dhcp agent and l3 agent to pass the MTU that
    is set on the network that the port belongs to. This allows
    it to take into account the overhead calculations that are
    done for encapsulation types.

    It's necessary for the L3 agent to have the MTU because it
    must recognize when fragmentation is needed so it can fragment
    or generate an ICMP error.

    It's necessary for the DHCP agent to have the MTU so it doesn't
    interfere when it plugs into a bridge with a larger than 1500
    MTU (the bridge would reduce its MTU to match the agent).

    If an operator sets 'network_device_mtu', the value of that
    will be used instead to preserve previous behavior.

    Conflicts:
     neutron/agent/l3/dvr_edge_ha_router.py
     neutron/agent/l3/dvr_edge_router.py
     neutron/agent/l3/ha_router.py
     neutron/agent/linux/interface.py
     neutron/tests/functional/agent/l3/test_dvr_router.py
     neutron/tests/functional/agent/test_dhcp_agent.py

    Additional modifications for Liberty:
    - test_dvr_router_lifecycle_ha_with_snat_with_fips_nmtu renamed into
      test_dvr_router_lifecycle_without_ha_with_snat_with_fips_nmtu,
    - the test validates DVR without HA.

    Reason for the change: Liberty does not support DVR + HA routers (the
    test raises DvrHaRouterNotSupported without those modifications).

    Closes-Bug: #1549470
    Closes-Bug: #1542108
    Closes-Bug: #1542475
    DocImpact: Neutron agents now support arbitrary MTU
               configurations on each network (including
               jumbo frames). This is accomplished by checking
               the MTU value defined for each network on which
               it is wiring VIFs.
    Co-Authored-By: Matt Kassawara <email address hidden>
    (cherry picked from commit 4df8d9a7016ab20fce235833d792b89309ec98a7)

    ===

    Also squashing in the following fix to pass unit tests for midonet
    interface driver:

    Support interface drivers that don't support mtu parameter for plug_new

    The method signature before Mitaka did not have the mtu= parameter. We
    should continue supporting the old signature, since it can be used in
    out of tree interface drivers. The class is part of public neutron API,
    so we should make an effort to not break out of tree code.

    Local modifications:
    - don't issue a deprecation warning in minor release update.

    Change-Id: I8e0c07c76fd0b4c55b66c20ebe29cdb7c07d6f27
    Closes-Bug: #1570392
    (cherry picked from commit 8a86ba1d014a5e758c0569aaf16cfe92492cc7f1)

    ===

    Change-Id: Ic091fa78dfd133179c71cbc847bf955a06cb248a

tags: added: in-stable-liberty
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 7.1.0

This issue was fixed in the openstack/neutron 7.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/342958

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/342958
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a9133b7255355942a386756ef98df0bee6f0c33b
Submitter: Jenkins
Branch: master

commit a9133b7255355942a386756ef98df0bee6f0c33b
Author: Ihar Hrachyshka <email address hidden>
Date: Fri Jul 15 18:14:12 2016 +0200

    Remove deprecated network_device_mtu option

    The right way to configure Neutron to work with infrastructure MTU is by
    using plugin agnostic global_physnet_mtu and ml2 specific
    path_mtu/physical_network_mtus options. The deprecated option is error
    prone and does not allow to use different MTUs per network.

    Closes-Bug: #1603493
    Related-Bug: #1549470
    Related-Bug: #1542108
    Related-Bug: #1542475

    DocImpact Remove all references to network_device_mtu option from
              Neutron documentation. Note that Nova has a deprecated option
              with the same name that will need a separate patch to be removed.

    Depends-On: I8e6cc99fe70d0c41a705431fb3160e8fccacff10
    Depends-On: I337b284076a794027fbd63796119d56bd1923cf2
    Change-Id: I7287db9df25a78a59b2dfa28acfde7fe69d17f40

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.