MTU concerns for the Open vSwitch agent

Bug #1542475 reported by Matt Kassawara on 2016-02-05
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
neutron
High
Kevin Benton

Bug Description

I ran some experiments with the Open vSwitch (OVS) agent [1] to determine the source of MTU problems and offer a potential solution. The environment for these experiments contains the following items:

1) A physical (underlying) network supporting MTU of 1500 or 9000 bytes.
2) One controller node running the neutron server, OVS agent, L3 agent, DHCP agent, metadata agent, and OVS provider network bridge br-ex.
3) One compute node running the Open vSwitch agent.
4) A neutron provider/public network.
5) A neutron self-service/private network.
6) A neutron router between the provider and self-service networks.
7) The self-service network uses the VXLAN protocol with IPv4 endpoints which adds 50 bytes of overhead.
8) An instance on the self-service network with a floating IP address from an allocation pool on the provider network.

Background:

1. Interfaces (or ports) on OVS bridges such as those for overlay network tunnels appear to use an arbitrarily large MTU. Thus, OVS bridges and tunnel interfaces somewhat inherit the MTU of physical network interfaces. For example, if OVS uses the IP address of eth0 for a tunnel overlay network endpoint and eth0 has a 1500 MTU, the tunnel interface can only send packets with a payload of up to 1500 bytes including overlay protocol overhead.

2. OVS creates interfaces (ports) in the host namespace and moves them to the appropriate namespace(s) rather than creating veth pairs between namespaces.

3. For Linux bridge devices such as those on the compute node that implement security groups, Linux assumes a 1500 MTU and changes the MTU to the lowest MTU of any port on the bridge. For example, a bridge without ports has a 1500 MTU. If eth0 has a 9000 MTU and you add it as a port on the bridge, the bridge changes to a 9000 MTU. If eth1 has a 1500 MTU and you add it as a port on the bridge, the bridge changes to a 1500 MTU.

4. Only devices that operate at layer-3 can participate in path MTU discovery (PMTUD). Therefore, a change of MTU in a layer-2 device such as a bridge or veth pair causes that device to discard packets larger than the smallest MTU.

Observations:

1. For any physical network MTU, the port for the self-service network router interface (qr) in the router namespace (qrouter) has a 1500 MTU. Background item (2) prevents a MTU disparity at layer-2 between the router namespace and OVS bridge br-int. If a packet from the provider network to the instance has a payload larger than 1500 bytes, the router can send an ICMP message to the source telling it to use a 1500 MTU. However, the correct MTU for a private network using the VXLAN overlay protocol should account for 50 bytes of overhead. Thus, OVS fragments the packet over the tunnel and reassembles it on the compute node containing the instance.

2. For a physical network MTU larger than 1500, the port for the provider network router gateway (qg) in the router namespace (qrouter) has a 1500 MTU. Background item (2) prevents a MTU disparity between the router namespace and OVS provider network bridge br-ex. If a packet from the provider network to the instance has a payload larger than 1500 bytes, the router can send an ICMP messages to the source telling it to use a 1500 MTU regardless of the private network overlay protocol. Thus, the agent cannot realize a physical network MTU larger than 1500.

3. If a provider or private network uses DHCP, the port in the DHCP namespace has a 1500 MTU for any physical network MTU.

4. The Linux bridge that implements security groups on the compute node lacks any ports on physical network interfaces. Background item (3) causes the bridge to assume a 1500 MTU. Nova actually manages this bridge and creates a veth pair between it and the Open vSwitch bridge br-int. Both ends of the veth pair have a 1500 MTU. Background item (1) indicates that the OVS bridge br-int could have a larger MTU. Thus, OVS discards packets inbound to instances with a payload larger than 1500 bytes.

5. Instances must use a MTU value the accounts for overlay protocol overhead. Neutron currently offers a way to provide a correct value via DHCP. However, considering observation item (4), providing a MTU value larger than 1500 causes a disparity at layer-2 between the VM and tap interface port on the Linux bridge that implements security groups on the compute node. Thus, the bridge discards packets outbound from instances with a payload larger than 1500 bytes.

6. The nova 'network_device_mtu' option controls the MTU of all devices that it manages in observation items (4) and (5). For example, using a value of 9000 causes the bridge, veth pair, and tap to have a 9000 MTU. Combining this option with providing the correct value to instances via DHCP essentially resolves MTU problems on compute nodes.

Potential solution:

1. The port for the self-service network router interface (qr) in the router namespace (qrouter) must use the MTU of the physical network accounting for any overlay protocol overhead. For example, if the physical network has a 9000 MTU and the private network uses the VXLAN overlay protocol, the port must have a 8950 MTU.

2. The port for the provider network router gateway (qg) in the router namespace (qrouter) must use the MTU of the physical network. For example, if the physical network has a 9000 MTU, the port must have a 9000 MTU. If the provider network uses an overlay protocol, the MTU of the port must also account for any overhead.

3. For networks using DHCP, the port in the DHCP namespace (qdhcp) should use the MTU of the network on which it provides services accounting for any overlay protocol overhead.

4. The Linux bridge that implements security groups on the compute node and all ports on it must use the MTU of the physical network accounting for any overlay protocol overhead.

[1] http://lists.openstack.org/pipermail/openstack-dev/2016-January/084241.html

Nate Johnston (nate-johnston) wrote :

Close cousin to the Linux Bridge bug https://bugs.launchpad.net/neutron/+bug/1542108

Changed in neutron:
importance: Undecided → High
status: New → Confirmed
tags: added: ops
tags: added: usability
Changed in neutron:
milestone: none → mitaka-3
Changed in neutron:
assignee: nobody → Sean M. Collins (scollins)
Changed in neutron:
assignee: Sean M. Collins (scollins) → Matt Kassawara (ionosphere80)
status: Confirmed → In Progress
Changed in neutron:
assignee: Matt Kassawara (ionosphere80) → Kevin Benton (kevinbenton)
Changed in neutron:
assignee: Kevin Benton (kevinbenton) → Matt Kassawara (ionosphere80)
Changed in neutron:
assignee: Matt Kassawara (ionosphere80) → Kevin Benton (kevinbenton)
Changed in neutron:
assignee: Kevin Benton (kevinbenton) → Matt Kassawara (ionosphere80)
Changed in neutron:
assignee: Matt Kassawara (ionosphere80) → Kevin Benton (kevinbenton)

Reviewed: https://review.openstack.org/283798
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=12ef585f91d01e1a8bf450e66d1cdbb79bf86f99
Submitter: Jenkins
Branch: master

commit 12ef585f91d01e1a8bf450e66d1cdbb79bf86f99
Author: Kevin Benton <email address hidden>
Date: Mon Feb 22 16:56:02 2016 -0800

    Deprecate network_device_mtu

    All interface plugging MTUs should be calculated based on the
    MTU of the network object that the port is being created on.

    Partial-Bug: #1542108
    Partial-Bug: #1542475

    Change-Id: Idf6221fee2c7da86123b330ad3c235ecc6868242

Reviewed: https://review.openstack.org/283790
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4df8d9a7016ab20fce235833d792b89309ec98a7
Submitter: Jenkins
Branch: master

commit 4df8d9a7016ab20fce235833d792b89309ec98a7
Author: Kevin Benton <email address hidden>
Date: Mon Feb 22 16:41:45 2016 -0800

    Make agent interface plugging utilize network MTU

    This changes the 'plug' and 'plug_new' interfaces of the
    LinuxInterfaceDriver to accept an MTU argument. It then
    updates the dhcp agent and l3 agent to pass the MTU that
    is set on the network that the port belongs to. This allows
    it to take into account the overhead calculations that are
    done for encapsulation types.

    It's necessary for the L3 agent to have the MTU because it
    must recognize when fragmentation is needed so it can fragment
    or generate an ICMP error.

    It's necessary for the DHCP agent to have the MTU so it doesn't
    interfere when it plugs into a bridge with a larger than 1500
    MTU (the bridge would reduce its MTU to match the agent).

    If an operator sets 'network_device_mtu', the value of that
    will be used instead to preserve previous behavior.

    Closes-Bug: #1549470
    Closes-Bug: #1542108
    Closes-Bug: #1542475
    DocImpact: Neutron agents now support arbitrary MTU
               configurations on each network (including
               jumbo frames). This is accomplished by checking
               the MTU value defined for each network on which
               it is wiring VIFs.
    Co-Authored-By: Matt Kassawara <email address hidden>
    Change-Id: Ic091fa78dfd133179c71cbc847bf955a06cb248a

Changed in neutron:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/286449
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2c9530c91705f60a41ccdac2889507e4cb9edb01
Submitter: Jenkins
Branch: master

commit 2c9530c91705f60a41ccdac2889507e4cb9edb01
Author: Kevin Benton <email address hidden>
Date: Mon Feb 29 22:43:23 2016 -0800

    Set veth_mtu default to 9000

    Unfortunately we may have to continue to support veth connections
    in the OVS agent for QoS use-cases. Related discussion:
    https://bugs.launchpad.net/bugs/1550501

    For the particular veth connections that reference the 'veth_mtu'
    setting, they are constructed long before we know the MTUs of the
    networks that will be going over them. So this patch changes their
    default to be 9000 to try to ensure they won't be silently dropping
    frames in jumbo MTU deployments.

    Change-Id: I6859ebdde1f7e3a8163b49d705620e522ada606a
    Related-bug: #1542475

This issue was fixed in the openstack/neutron 8.0.0.0b3 development milestone.

The issue is indeed fixed, but a variable to be deprecate was left over.

Changed in neutron:
milestone: mitaka-3 → mitaka-rc1

Reviewed: https://review.openstack.org/284814
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ae45cd5732b8b75a5b2d35df555f46e11145eb85
Submitter: Jenkins
Branch: master

commit ae45cd5732b8b75a5b2d35df555f46e11145eb85
Author: Kevin Benton <email address hidden>
Date: Tue Feb 23 13:11:40 2016 -0800

    Add global_physnet_mtu and deprecate segment_mtu

    Introduce the neutron-wide 'global_physnet_mtu' option that
    references the underlying physical network MTU. This also
    introduces a method in plugin.common.utils that all plugins
    should use to retrieve it. This value should be used to
    calculate the proper MTU for virtual network components.

    This patch also deprecate the 'segment_mtu' option specific
    to the ML2 plug-in and makes ML2 reference this new option.

    Closes-Bug: #1542475
    Closes-Bug: #1542108
    Change-Id: I6ffc8973c9b8f46cc19922ff04fdd2d23646b878

tags: added: deprecation

This issue was fixed in the openstack/neutron 8.0.0.0rc1 release candidate.

Reviewed: https://review.openstack.org/305782
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c55aba1dba31c6730c80db0118286f1f9e84cd9b
Submitter: Jenkins
Branch: stable/liberty

commit c55aba1dba31c6730c80db0118286f1f9e84cd9b
Author: Kevin Benton <email address hidden>
Date: Mon Feb 22 16:41:45 2016 -0800

    Make agent interface plugging utilize network MTU

    This changes the 'plug' and 'plug_new' interfaces of the
    LinuxInterfaceDriver to accept an MTU argument. It then
    updates the dhcp agent and l3 agent to pass the MTU that
    is set on the network that the port belongs to. This allows
    it to take into account the overhead calculations that are
    done for encapsulation types.

    It's necessary for the L3 agent to have the MTU because it
    must recognize when fragmentation is needed so it can fragment
    or generate an ICMP error.

    It's necessary for the DHCP agent to have the MTU so it doesn't
    interfere when it plugs into a bridge with a larger than 1500
    MTU (the bridge would reduce its MTU to match the agent).

    If an operator sets 'network_device_mtu', the value of that
    will be used instead to preserve previous behavior.

    Conflicts:
     neutron/agent/l3/dvr_edge_ha_router.py
     neutron/agent/l3/dvr_edge_router.py
     neutron/agent/l3/ha_router.py
     neutron/agent/linux/interface.py
     neutron/tests/functional/agent/l3/test_dvr_router.py
     neutron/tests/functional/agent/test_dhcp_agent.py

    Additional modifications for Liberty:
    - test_dvr_router_lifecycle_ha_with_snat_with_fips_nmtu renamed into
      test_dvr_router_lifecycle_without_ha_with_snat_with_fips_nmtu,
    - the test validates DVR without HA.

    Reason for the change: Liberty does not support DVR + HA routers (the
    test raises DvrHaRouterNotSupported without those modifications).

    Closes-Bug: #1549470
    Closes-Bug: #1542108
    Closes-Bug: #1542475
    DocImpact: Neutron agents now support arbitrary MTU
               configurations on each network (including
               jumbo frames). This is accomplished by checking
               the MTU value defined for each network on which
               it is wiring VIFs.
    Co-Authored-By: Matt Kassawara <email address hidden>
    (cherry picked from commit 4df8d9a7016ab20fce235833d792b89309ec98a7)

    ===

    Also squashing in the following fix to pass unit tests for midonet
    interface driver:

    Support interface drivers that don't support mtu parameter for plug_new

    The method signature before Mitaka did not have the mtu= parameter. We
    should continue supporting the old signature, since it can be used in
    out of tree interface drivers. The class is part of public neutron API,
    so we should make an effort to not break out of tree code.

    Local modifications:
    - don't issue a deprecation warning in minor release update.

    Change-Id: I8e0c07c76fd0b4c55b66c20ebe29cdb7c07d6f27
    Closes-Bug: #1570392
    (cherry picked from commit 8a86ba1d014a5e758c0569aaf16cfe92492cc7f1)

    ===

    Change-Id: Ic091fa78dfd133179c71cbc847bf955a06cb248a

tags: added: in-stable-liberty

This issue was fixed in the openstack/neutron 7.1.0 release.

Reviewed: https://review.openstack.org/342958
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a9133b7255355942a386756ef98df0bee6f0c33b
Submitter: Jenkins
Branch: master

commit a9133b7255355942a386756ef98df0bee6f0c33b
Author: Ihar Hrachyshka <email address hidden>
Date: Fri Jul 15 18:14:12 2016 +0200

    Remove deprecated network_device_mtu option

    The right way to configure Neutron to work with infrastructure MTU is by
    using plugin agnostic global_physnet_mtu and ml2 specific
    path_mtu/physical_network_mtus options. The deprecated option is error
    prone and does not allow to use different MTUs per network.

    Closes-Bug: #1603493
    Related-Bug: #1549470
    Related-Bug: #1542108
    Related-Bug: #1542475

    DocImpact Remove all references to network_device_mtu option from
              Neutron documentation. Note that Nova has a deprecated option
              with the same name that will need a separate patch to be removed.

    Depends-On: I8e6cc99fe70d0c41a705431fb3160e8fccacff10
    Depends-On: I337b284076a794027fbd63796119d56bd1923cf2
    Change-Id: I7287db9df25a78a59b2dfa28acfde7fe69d17f40

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers