On high loaded systems conntrack table could be filled with ovn connection

Bug #1978806 reported by Nikita Koltsov
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Neutron Open vSwitch Charm
Fix Committed
Undecided
Marcin Wilk
charm-ovn-chassis
Fix Released
Undecided
Mustafa Kemal Gilor
charm-ovn-dedicated-chassis
Fix Released
Undecided
Mustafa Kemal Gilor

Bug Description

conntrack kernel module when loaded tracking state on all connections on the node.
For highly loaded ovn networks this could fill conntrack table and leads to package losses.

It would be good to add an option to stop tracking ovn connections.
This could be achieved by adding the following iptables rules:
iptables -t raw -A PREROUTING -p udp --dport 6081 -j NOTRACK
iptables -t raw -A OUTPUT -p udp --dport 6081 -j NOTRACK

Revision history for this message
Trent Lloyd (lathiat) wrote :

The problem here is that because OVN loads the nf_conntrack module - it starts tracking connections for all interfaces on the host even though there are 0 iptables/nftables rules. These connections thus take up space in the tables that is needed by OVN.

For normal host connections it is not normally a big problem however for both GENEVE and VXLAN overlayed packets they set the SOURCE port of every packet to a different port in the range 1-65534.

It assigns a static port to each "tunelled flow" by hashing the tunneled packets (src_ip, dst_ip, src_port, dst_port) tuple with the intent that the GENEVE/VXLAN packets will get distributed over multiple links of an LACP or bonded link - however the same tunneled flow will always have the port and thus take the same path and preserve network packet ordering. If they used the same source/destination port then all traffic between the same 2 HVs would not get distributed over LACP and if the port was random it would not reserve network packet order as packets of the same connection may go over multiple links.

Reference: https://datatracker.ietf.org/doc/html/draft-gross-geneve-01/#section-3.3

Thus when you have a very busy cloud you potentially have (NUMBER_OF_HYPERVISORS ^ 2) * 65534 potential conntrack entries per hypervisor. FOr 40 hypervisors that is about 2.5 million entries. The default nf_conntrack_max in ovn-chassis 1,000,000.

In practice in production we saw hundreds of thousands of conntrack entries normally and in excess of 1 million in some cases leading to dropped connections as the nf_conntrack_max was exceeded. This is notably worse in a public internet facing application which speaks to many different remote Internet IPs as well.

There is precedence for this change in other projects:
https://bugs.launchpad.net/tripleo/+bug/1885551
https://github.com/antrea-io/antrea/issues/1133
https://bugzilla.redhat.com/show_bug.cgi?id=1985336

Changed in charm-ovn-chassis:
status: New → Confirmed
tags: added: sts
Changed in charm-ovn-chassis:
assignee: nobody → Mustafa Kemal Gilor (mustafakemalgilor)
Changed in charm-ovn-chassis:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-chassis (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/x/charm-ovn-chassis/+/854607

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ovn-chassis (master)

Change abandoned by "Mustafa Kemal Gilor <email address hidden>" on branch: master
Review: https://review.opendev.org/c/x/charm-ovn-chassis/+/854607
Reason: Will re-target the changes to base charm & submit a simpler patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Mustafa Kemal Gilor <email address hidden>" on branch: master
Review: https://review.opendev.org/c/x/charm-ovn-chassis/+/854607
Reason: Superseded, see: https://github.com/openstack-charmers/charm-layer-ovn/pull/67

Changed in charm-ovn-dedicated-chassis:
status: New → In Progress
assignee: nobody → Mustafa Kemal Gilor (mustafakemalgilor)
Frode Nordahl (fnordahl)
Changed in charm-ovn-chassis:
status: In Progress → Fix Committed
Changed in charm-ovn-dedicated-chassis:
status: In Progress → Fix Committed
Revision history for this message
Frode Nordahl (fnordahl) wrote :

The fix is now available in the 22.03/stable channel.

Changed in charm-ovn-chassis:
status: Fix Committed → Fix Released
Changed in charm-ovn-dedicated-chassis:
status: Fix Committed → Fix Released
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Also affects neutron-openvswitch charm for non-ovn deployments (vxlan/gre)

Revision history for this message
Brian Haley (brian-haley) wrote :

Just wanted to put a link to the change that was merged to fix this for OVN.

https://github.com/openstack-charmers/charm-layer-ovn/pull/67

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (master)
Changed in charm-neutron-openvswitch:
status: New → In Progress
Marcin Wilk (wilkmarcin)
Changed in charm-neutron-openvswitch:
assignee: nobody → Marcin Wilk (wilkmarcin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (master)

Reviewed: https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/929821
Committed: https://opendev.org/openstack/charm-neutron-openvswitch/commit/d6d657097373edd5e4b981849e362bdc23386cce
Submitter: "Zuul (22348)"
Branch: master

commit d6d657097373edd5e4b981849e362bdc23386cce
Author: Marcin Wilk <email address hidden>
Date: Wed Sep 18 17:51:22 2024 +0200

    iptables rules for not tracking GRE/VXLAN traffic

    On some busy cloud deployments, it has been reported that the nodes
    hosting the neutron-openvswitch are getting their nf-conntrack tables
    full and starting to drop connections. The reason is that GRE/VXLAN
    use source port randomization, and the number of "unique" flows is high
    from the nf-conntrack's perspective. This is no surprise since flows
    are usually identified with their 5-tuple (srcip/[srcport]/dstip/
    dstport/tproto) by network elements, and GRE/VXLAN are leveraging this
    fact to have an even distribution in load-balancing systems in between
    [1][2]. The randomization causes the nf_conntrack table to be filled
    with many GRE/VXLAN-related flows, eventually leading to connection
    drops in a busy environment. As there is no particular reason and
    benefit to track these flows at the moment, the solution is to exclude
    GRE/VXLAN traffic from nf-conntrack tracking. This can be done by
    putting rules with `-j NOTRACK` jump into relevant iptables chains,
    which many people already use as a solution to this problem.

    This change incorporates the relevant rules to the charm code, so the
    rules become present by default.

    [1] https://www.rfc-editor.org/rfc/rfc8086.html#section-3.2
    [2] https://www.rfc-editor.org/rfc/rfc7348.html#section-5

    Closes-bug: #1978806
    Change-Id: I9f6c7ca5207a3d587cc9cc2995d9938921ad88f1
    Signed-off-by: Marcin Wilk <email address hidden>

Changed in charm-neutron-openvswitch:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (stable/2024.1)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (stable/2023.2)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (stable/2023.1)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (stable/zed)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (stable/2024.1)

Reviewed: https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/948505
Committed: https://opendev.org/openstack/charm-neutron-openvswitch/commit/f02fa8ddd8ad2be3cea170f5d3a09834f2de0f65
Submitter: "Zuul (22348)"
Branch: stable/2024.1

commit f02fa8ddd8ad2be3cea170f5d3a09834f2de0f65
Author: Marcin Wilk <email address hidden>
Date: Wed Sep 18 17:51:22 2024 +0200

    iptables rules for not tracking GRE/VXLAN traffic

    On some busy cloud deployments, it has been reported that the nodes
    hosting the neutron-openvswitch are getting their nf-conntrack tables
    full and starting to drop connections. The reason is that GRE/VXLAN
    use source port randomization, and the number of "unique" flows is high
    from the nf-conntrack's perspective. This is no surprise since flows
    are usually identified with their 5-tuple (srcip/[srcport]/dstip/
    dstport/tproto) by network elements, and GRE/VXLAN are leveraging this
    fact to have an even distribution in load-balancing systems in between
    [1][2]. The randomization causes the nf_conntrack table to be filled
    with many GRE/VXLAN-related flows, eventually leading to connection
    drops in a busy environment. As there is no particular reason and
    benefit to track these flows at the moment, the solution is to exclude
    GRE/VXLAN traffic from nf-conntrack tracking. This can be done by
    putting rules with `-j NOTRACK` jump into relevant iptables chains,
    which many people already use as a solution to this problem.

    This change incorporates the relevant rules to the charm code, so the
    rules become present by default.

    [1] https://www.rfc-editor.org/rfc/rfc8086.html#section-3.2
    [2] https://www.rfc-editor.org/rfc/rfc7348.html#section-5

    Closes-bug: #1978806
    Change-Id: I9f6c7ca5207a3d587cc9cc2995d9938921ad88f1
    Signed-off-by: Marcin Wilk <email address hidden>
    (cherry picked from commit d6d657097373edd5e4b981849e362bdc23386cce)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/948506
Committed: https://opendev.org/openstack/charm-neutron-openvswitch/commit/85dc6ea58daf890e699a07f0c5cd7b5e19a4df9c
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 85dc6ea58daf890e699a07f0c5cd7b5e19a4df9c
Author: Marcin Wilk <email address hidden>
Date: Wed Sep 18 17:51:22 2024 +0200

    iptables rules for not tracking GRE/VXLAN traffic

    On some busy cloud deployments, it has been reported that the nodes
    hosting the neutron-openvswitch are getting their nf-conntrack tables
    full and starting to drop connections. The reason is that GRE/VXLAN
    use source port randomization, and the number of "unique" flows is high
    from the nf-conntrack's perspective. This is no surprise since flows
    are usually identified with their 5-tuple (srcip/[srcport]/dstip/
    dstport/tproto) by network elements, and GRE/VXLAN are leveraging this
    fact to have an even distribution in load-balancing systems in between
    [1][2]. The randomization causes the nf_conntrack table to be filled
    with many GRE/VXLAN-related flows, eventually leading to connection
    drops in a busy environment. As there is no particular reason and
    benefit to track these flows at the moment, the solution is to exclude
    GRE/VXLAN traffic from nf-conntrack tracking. This can be done by
    putting rules with `-j NOTRACK` jump into relevant iptables chains,
    which many people already use as a solution to this problem.

    This change incorporates the relevant rules to the charm code, so the
    rules become present by default.

    [1] https://www.rfc-editor.org/rfc/rfc8086.html#section-3.2
    [2] https://www.rfc-editor.org/rfc/rfc7348.html#section-5

    Closes-bug: #1978806
    Change-Id: I9f6c7ca5207a3d587cc9cc2995d9938921ad88f1
    Signed-off-by: Marcin Wilk <email address hidden>
    (cherry picked from commit d6d657097373edd5e4b981849e362bdc23386cce)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/948507
Committed: https://opendev.org/openstack/charm-neutron-openvswitch/commit/95dd343e3d43a6929e8f7b6285ecd28197894bd0
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 95dd343e3d43a6929e8f7b6285ecd28197894bd0
Author: Marcin Wilk <email address hidden>
Date: Wed Sep 18 17:51:22 2024 +0200

    iptables rules for not tracking GRE/VXLAN traffic

    On some busy cloud deployments, it has been reported that the nodes
    hosting the neutron-openvswitch are getting their nf-conntrack tables
    full and starting to drop connections. The reason is that GRE/VXLAN
    use source port randomization, and the number of "unique" flows is high
    from the nf-conntrack's perspective. This is no surprise since flows
    are usually identified with their 5-tuple (srcip/[srcport]/dstip/
    dstport/tproto) by network elements, and GRE/VXLAN are leveraging this
    fact to have an even distribution in load-balancing systems in between
    [1][2]. The randomization causes the nf_conntrack table to be filled
    with many GRE/VXLAN-related flows, eventually leading to connection
    drops in a busy environment. As there is no particular reason and
    benefit to track these flows at the moment, the solution is to exclude
    GRE/VXLAN traffic from nf-conntrack tracking. This can be done by
    putting rules with `-j NOTRACK` jump into relevant iptables chains,
    which many people already use as a solution to this problem.

    This change incorporates the relevant rules to the charm code, so the
    rules become present by default.

    [1] https://www.rfc-editor.org/rfc/rfc8086.html#section-3.2
    [2] https://www.rfc-editor.org/rfc/rfc7348.html#section-5

    Closes-bug: #1978806
    Change-Id: I9f6c7ca5207a3d587cc9cc2995d9938921ad88f1
    Signed-off-by: Marcin Wilk <email address hidden>
    (cherry picked from commit d6d657097373edd5e4b981849e362bdc23386cce)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/948508
Committed: https://opendev.org/openstack/charm-neutron-openvswitch/commit/460be5d420ea800950e249e631a5bebf9c3f249b
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 460be5d420ea800950e249e631a5bebf9c3f249b
Author: Marcin Wilk <email address hidden>
Date: Wed Sep 18 17:51:22 2024 +0200

    iptables rules for not tracking GRE/VXLAN traffic

    On some busy cloud deployments, it has been reported that the nodes
    hosting the neutron-openvswitch are getting their nf-conntrack tables
    full and starting to drop connections. The reason is that GRE/VXLAN
    use source port randomization, and the number of "unique" flows is high
    from the nf-conntrack's perspective. This is no surprise since flows
    are usually identified with their 5-tuple (srcip/[srcport]/dstip/
    dstport/tproto) by network elements, and GRE/VXLAN are leveraging this
    fact to have an even distribution in load-balancing systems in between
    [1][2]. The randomization causes the nf_conntrack table to be filled
    with many GRE/VXLAN-related flows, eventually leading to connection
    drops in a busy environment. As there is no particular reason and
    benefit to track these flows at the moment, the solution is to exclude
    GRE/VXLAN traffic from nf-conntrack tracking. This can be done by
    putting rules with `-j NOTRACK` jump into relevant iptables chains,
    which many people already use as a solution to this problem.

    This change incorporates the relevant rules to the charm code, so the
    rules become present by default.

    [1] https://www.rfc-editor.org/rfc/rfc8086.html#section-3.2
    [2] https://www.rfc-editor.org/rfc/rfc7348.html#section-5

    Closes-bug: #1978806
    Change-Id: I9f6c7ca5207a3d587cc9cc2995d9938921ad88f1
    Signed-off-by: Marcin Wilk <email address hidden>
    (cherry picked from commit d6d657097373edd5e4b981849e362bdc23386cce)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (stable/yoga)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (stable/ussuri)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/953855
Committed: https://opendev.org/openstack/charm-neutron-openvswitch/commit/2b0fa9befa9619b165040580730751b3451baf7e
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 2b0fa9befa9619b165040580730751b3451baf7e
Author: Marcin Wilk <email address hidden>
Date: Wed Sep 18 17:51:22 2024 +0200

    iptables rules for not tracking GRE/VXLAN traffic

    On some busy cloud deployments, it has been reported that the nodes
    hosting the neutron-openvswitch are getting their nf-conntrack tables
    full and starting to drop connections. The reason is that GRE/VXLAN
    use source port randomization, and the number of "unique" flows is high
    from the nf-conntrack's perspective. This is no surprise since flows
    are usually identified with their 5-tuple (srcip/[srcport]/dstip/
    dstport/tproto) by network elements, and GRE/VXLAN are leveraging this
    fact to have an even distribution in load-balancing systems in between
    [1][2]. The randomization causes the nf_conntrack table to be filled
    with many GRE/VXLAN-related flows, eventually leading to connection
    drops in a busy environment. As there is no particular reason and
    benefit to track these flows at the moment, the solution is to exclude
    GRE/VXLAN traffic from nf-conntrack tracking. This can be done by
    putting rules with `-j NOTRACK` jump into relevant iptables chains,
    which many people already use as a solution to this problem.

    This change incorporates the relevant rules to the charm code, so the
    rules become present by default.

    [1] https://www.rfc-editor.org/rfc/rfc8086.html#section-3.2
    [2] https://www.rfc-editor.org/rfc/rfc7348.html#section-5

    Closes-bug: #1978806
    Change-Id: I9f6c7ca5207a3d587cc9cc2995d9938921ad88f1
    Signed-off-by: Marcin Wilk <email address hidden>

tags: added: in-stable-yoga
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.