MTU concerns for the Linux bridge agent
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Kevin Benton |
Bug Description
I ran some experiments with the Linux bridge agent [1] to determine the source of MTU problems and offer a potential solution. The environment for these experiments contains the following items:
1) A physical (underlying) network supporting MTU of 1500 or 9000 bytes.
2) One controller node running the neutron server, Linux bridge agent, L3 agent, DHCP agent, and metadata agent.
3) One compute node running the Linux bridge agent.
4) A neutron provider/public network.
5) A neutron self-service/
6) A neutron router between the provider and self-service networks.
7) The self-service network uses the VXLAN protocol with IPv4 endpoints which adds 50 bytes of overhead.
8) An instance on the self-service network with a floating IP address from an allocation pool on the provider network.
Background:
1. For tunnel interfaces, Linux automatically subtracts protocol overhead from the parent interface MTU. For example, if eth0 has a 1500 MTU, a VXLAN interface using it as a parent device has a 1450 MTU.
2. For bridge devices, Linux assumes a 1500 MTU and changes the MTU to the lowest MTU of any port on the bridge. For example, a bridge without ports has a 1500 MTU. If eth0 has a 9000 MTU and you add it as a port on the bridge, the bridge changes to a 9000 MTU. If eth1 has a 1500 MTU and you add it as a port on the bridge, the bridge changes to a 1500 MTU.
3. Only devices that operate at layer-3 can participate in path MTU discovery (PMTUD). Therefore, a change of MTU in a layer-2 device such as a bridge or veth pair causes that device to discard packets larger than the smallest MTU.
Observations:
1. For any physical network MTU, instances must use a MTU value that accounts for overlay protocol overhead. Neutron currently offers a way to provide a correct value via DHCP. However, it only addresses packets outbound from instances. The next two items address packets inbound to instances.
2. For any physical network MTU, each end of the veth pair between the self-service network router interface (qr) in the router namespace (qrouter) and the self-service network bridge on the controller node (qbr) contains a different MTU. The qr end has a 1500 MTU, the default value, and the qbr end has a 1450 MTU because the bridge contains a VXLAN interface with a 1450 MTU. Thus, the veth pair discards packets with a payload larger than 1450 bytes.
3. For a physical network MTU larger than 1500, each end of the veth pair between the provider network router gateway (qg) in the router namespace (qrouter) and the provider network bridge on the controller node (qbr) contains a different MTU. The qg end has a 1500 MTU, the default value, and the qbr end inherits the larger MTU of physical network interface. Thus, the veth pair discards packets with a payload larger than 1500 bytes.
Potential solution:
As per background item (3), MTU disparities must occur in a device that operates at layer-3. For example, a router namespace that contains interfaces with IP addresses. We can accomplish this task in neutron by always using the same MTU on both ends of a veth pair. In observation item (2), both ends of the veth pair should use 1450, the self-service network MTU. In observation item (3), both ends of the veth pair should use 9000, the provider network MTU. If a packet from the provider network to the instance has a payload larger than 1450 bytes, the router can send an ICMP message to the source telling it to use a 1450 MTU.
[1] http://
Changed in neutron: | |
status: | Incomplete → Confirmed |
assignee: | nobody → Sean M. Collins (scollins) |
Changed in neutron: | |
importance: | Undecided → High |
tags: | added: ops |
tags: | added: usability |
Changed in neutron: | |
milestone: | none → mitaka-3 |
Changed in neutron: | |
assignee: | Sean M. Collins (scollins) → Matt Kassawara (ionosphere80) |
status: | Confirmed → In Progress |
Changed in neutron: | |
assignee: | Matt Kassawara (ionosphere80) → Kevin Benton (kevinbenton) |
Changed in neutron: | |
assignee: | Kevin Benton (kevinbenton) → Matt Kassawara (ionosphere80) |
Changed in neutron: | |
assignee: | Matt Kassawara (ionosphere80) → Kevin Benton (kevinbenton) |
Changed in neutron: | |
assignee: | Kevin Benton (kevinbenton) → Matt Kassawara (ionosphere80) |
Changed in neutron: | |
assignee: | Matt Kassawara (ionosphere80) → Kevin Benton (kevinbenton) |
tags: | added: deprecation |
Matt,
Thanks for the thorough discussion and analysis. It is much appreciated. But I am not sure what, specific, things you would like changed that go above and beyond the change in https:/ /review. openstack. org/#/c/ 276411
Marking this 'Incomplete' for now, because I can't define specifically what actions would result in this bug being satisfied.
Thanks!