SNAT/DNAT - Traffic sent to LRP port recirculate until TTL=0 (drop recirc action)

Bug #1976285 reported by Roberto Bartzen Acosta
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openvswitch
New
Unknown
ovn (Ubuntu)
New
Undecided
Unassigned

Bug Description

Hey there,

I'm looking through the docs quite extensively for references on how SNAT and DNAT flow work to try to understand the problem related to the issues reported in the links below:

https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
https://mail.openvswitch.org/pipermail/ovs-dev/2021-August/386720.html

I can see these same log messages "kernel: openvswitch: ovs-system: deferred action limit reached, drop recirc action" on gateway nodes in my OpenStack installation.

The main problem is related to TCP/UDP traffic sent to the address of an LRP port that is not part of any SNAT/DNAT conversation, it will keep recirculating in the OVS data plane until TTL is 0.

The message is shown in the kernel log due to the size of the FIFO "DEFERRED_ACTION_FIFO_SIZE", but this is a consequence of the packets not matching the flow tables of the datapath. See kernel - net/openvswitch/actions.c

I can reproduce on a local ovn/ovs installation building the ovn main/master branch and ovs submodule(github projetcs). This problem also occurs in all the latest released tags from OVN and OVS for Ubuntu 20.04 LTS.

Basically, it only happens when there is a SNAT rule to translate an entire network (masquerade) and the return traffic does not have an open port. If a DNAT is used for a specific host (even if the ports have not been mapped, but if there is a 'host' to redirect the DNAT), the traffic is forwarded and is not sent via netlink through the slowpath until it is dropped.

The patch proposed by Krzysztof Klimonda aims to modify the flow table via OVN communication - inserting a drp rule for traffic related to this issue. This patch was not accepted in the project, but it made me intrigued as to how to solve this problem (I can't just increase the kernel DEFERRED_ACTION_FIFO_SIZE). The proposed patch is very old and does not apply to the current code structure. I tried to adapt ovn-northd.c to the new northd/northd.c format and applied it to upstream, but the problem still occurs. ovn_upstream.txt[https://github.com/openvswitch/ovs-issues/files/8798982/ovn_upstream.txt]

I believe the patch does not solve the problem because I keep seeing messages in the log.

Do you have any ideas on how to solve this problem?

I am adding a reproducer for this issue in the attached file.
issue_reproducer.txt[https://github.com/openvswitch/ovs-issues/files/8798161/issue_reproducer.txt]

Kind regards,
Roberto

Changed in openvswitch:
status: Unknown → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.