[OVN] packet drops when provider network MTU exceeds tenant network MTU

Bug #2032817 reported by Mohammed Naser
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Confirmed
High
Unassigned

Bug Description

At the moment, we have a provider network running 1500 MTU and then some tenant networks running 1450 MTU.

We've noticed that southbound traffic from the provider network to the tenant network will result in packets being dropped since the MTU is smaller.

So, the VM would not be able to get any traffic if the packet size > 1450, since the provider network operates at 1500 and hands over 1500 packets to the tap interface which gets dropped by the kernel:

[Wed Aug 23 14:33:19 2023] tapdf9341b0-6d: dropped over-mtu packet: 1456 > 1450
...

It seems that OVN should fragment packets when it notices the MTU of the target NAT is smaller than it's own, otherwise traffic goes into a nowhere.

OVN: 23.03.0

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Note: the external (provider) network type is VLAN (as commented in IRC)

Changed in neutron:
importance: Undecided → High
Revision history for this message
Brian Haley (brian-haley) wrote :

I would have thought an ICMP "packet too big" would have been sent to the source?

Also, is this UDP or TCP traffic?

Revision history for this message
Mohammed Naser (mnaser) wrote :

I did not see this packet, even when running a ping that’s anything bigger than 1450 without DF. So observed both with UDP initially, then ICMP

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

I can reproduce this by simply doing a large ping to an instance in devstack, geneve tenant network, so not related to vlan. Also no special provider network setup, default public network set up by devstack.

The OVN todo contains "MTU handling (fragmentation on output)" in https://github.com/ovn-org/ovn/blob/main/TODO.rst, so very likely something not implemented yet?

Changed in neutron:
status: New → Confirmed
Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

Further findings:

OVN also doesn't handle the DF bit, where OVS would generate an ICMP error response, it just tries to forward the packet, which gets then duefully dropped by the kernel.

East-west traffic is affected as well, if two tenant networks with different MTUs are connected via a router, the same issue happens.

And just to double check I verified that in an OVS based deployment, both fragmentation and ICMP responses work as expected (and as mandated by the relevant RFCs).

Revision history for this message
Brian Haley (brian-haley) wrote :

So the difference seems to be OVN doesn't handle the DF bit, something else for the gap document.

Jens - in the OVS case did it generate a packet to big, install a route, then stop fragmenting? Or did the OVS router actually fragment? Just wanted to make sure we document it right.

The reason I asked about the protocol is that UDP (and IP) isn't guaranteed to work when the MTU changes like this, it's technically up to the user to do path MTU discovery, detect the failure and backoff [0]. Not saying we shouldn't try to address the issue...

[0] https://datatracker.ietf.org/doc/html/rfc5405#section-3.2

Revision history for this message
Mohammed Naser (mnaser) wrote :

i feel like in ml2/ovs world since we had netns, the linux system would fragment it as part of handing the packets from one tap interface connected to the external network to the other one connected in the internal.

this logic is missing from ovn, i think.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

According to [1], OVN is honouring the DF bit. From the testing report:
"""
All works as expected according the following guidelines:
DF=0 --> the device will learn the real MTU and after first pkt, the traffic flows
DF=1 --> the router discard the traffic but the sender does not perform fragmentation so traffic does not flow
"""

Having said that, that test is done in the opposite direction: from the VM to an external network with smaller MTU. But that should work in both directions, N/S and E/W traffic.

Can you trace the packets and the DF flag reaching the router GW port?

Regards.

[1]https://bugzilla.redhat.com/show_bug.cgi?id=1833813#c26

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

Did you enable "ovn_emit_need_to_frag" option? This is "False" by default.

Regards.

[1]https://opendev.org/openstack/neutron/src/branch/master/neutron/conf/plugins/ml2/drivers/ovn/ovn_conf.py#L196

Revision history for this message
Mohammed Naser (mnaser) wrote :

Hi Rodolfo:

I enabled it, but it seems like it did not set the `gateway_mtu` to any of the existing networks on restart (or even on neutron-ovn-db-sync).

Did I miss something somewhere?

Thanks
Mohammed

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

For what it is worth, I hit a similar issue attempting to set up an OVN job for ironic. Except the packet size on the far side of OVN's internals seemed to end up being 1430 bytes, as opposed to the configured interface sizes and MTUs of 1400 bytes.

Aug 21 15:44:55 np0035005087 kernel: ovs-node-0i1: dropped over-mtu packet: 1430 > 1400

Keep in mind, this was:

Devstack-host http service -> host kernel -> ip route to OVN router (also, local)-> OVN router -> OVN tenant networking -> OVS tap interface bridge and bridges connecting to static vtep (so we can packet capture) (all configured at 1400 bytes, afaik).

I'm assuming it silently dropped because the kernel had no IP interface to emit the packet on based on it's origin, which actually makes sense, albeit is annoying, and ended up getting dropped after OVN bits in the kernel networking itself handing off to the VM. I guess that is not *really* OVN's direct fault, but that there does seem to be a lack of awareness of the destination MTU.

With that, it also looks like the back side of the ovn code plugged into br-ex is on a 1500 bytes in my case, so it might be reasonable for it to assume it could send 1500 bytes not directly being on the same networking. Hope that makes sense and provides another mental data point.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Mohammed:

Unfortunately, the activation of this option applies for new router ports. If the option is enabled/disabled, that will no trigger any LRP update.

If needed, we can implement a new maintenance method that could be executed only once (when the Neutron API is restarted and the config loaded). This method can loop over the LRP and apply/remove the mentioned "gateway_mtu" value depending on the config option.

Let me know if that method is something valuable and I'll work on it.

Regards.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/892839

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote : Re: OVN: Distributed FIP packet drops when provider network MTU exceeds tenant network MTU

> in the OVS case did it generate a packet too big, install a route, then stop fragmenting? Or did the OVS router actually fragment? Just wanted to make sure we document it right.

OVS does both, depending on whether the DF bit is set. OVN does neither in all tests that I have performed, I also haven't seen any change when enabling "ovn_emit_need_to_frag".

I have created https://review.opendev.org/c/openstack/neutron/+/892839 in order to at least document these deficiencies.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

NOTE: this is happening both in N/S and E/W traffic.

summary: - OVN: Distributed FIP packet drops when provider network MTU exceeds
- tenant network MTU
+ [OVN] packet drops when provider network MTU exceeds tenant network MTU
Revision history for this message
Mohammed Naser (mnaser) wrote :

Shouldn't we be raising the importance of this? I feel like it makes OVN very broken with this limitaton.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

For the E/W use case I've opened a bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2238494. I'll keep you updated if there is any private message that should be shared.

Regards.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/894620

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

I've opened another BZ for the case of N/S traffic when MTU external > MTU internal [1].

Regards.

[1]https://bugzilla.redhat.com/show_bug.cgi?id=2238969

Revision history for this message
Brian Haley (brian-haley) wrote :

I wonder if this other OVN issue is coming into play as well:

https://issues.redhat.com/browse/FDP-39

Wrong source IP address in ICMP "need fragmentation" message causes it to get dropped.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Brian Haley <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/894620
Reason: Use https://review.opendev.org/c/openstack/neutron/+/892839 instead

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/892839
Committed: https://opendev.org/openstack/neutron/commit/e4542bca8019b4f4e096ec33b296e1d2f59d2479
Submitter: "Zuul (22348)"
Branch: master

commit e4542bca8019b4f4e096ec33b296e1d2f59d2479
Author: Dr. Jens Harbott <email address hidden>
Date: Sun Aug 27 15:25:06 2023 +0200

    ovn: Document fragmentation / pmtud gaps

    OVN does not correctly fragment packets or send ICMP
    "packet too big" responses that would allow pmtud to work.

    Related-Bug: #2032817
    Change-Id: Ibc19ec6a9625124fb19e33c3bd6af40266aa5003

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.