[ML2/OVN] DGP/Floating IP issue - no flows for chassis gateway port

Bug #2035281 reported by Roberto Bartzen Acosta
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Roberto Bartzen Acosta

Bug Description

Hello everyone.

I noticed a problem with DGP feature when configured by OpenStack Neutron using multiple external (provider) subnets.

For example, the OpenStack external provider network has multiple subnets, such as:

subnet1: 172.16.10.0/24
subnet2: 172.16.20.0/24

When the Logical Router attaches the external gateway port to this network, only one subnet is configured (static or dinamically), e.g. IP address = 172.16.10.1/24.

If the Floating IP assigned for some VM uses the same subnet range as the router's IP network, the dnat_and_snat rule will be created correctly and inbound/outbound traffic will work. However, when the Floating IP uses the other one subnet (not on the same network of the external router port), the dnat_and_snat is not created and we can see the warning message in the log as below:

2023-09-08T13:29:40.721Z|00202|northd|WARN|Unable to determine gateway_port for NAT with external_ip: 172.16.20.157 configured on logical router: neutron-477cf920-21e3-46e5-8c8f-7b8caef7f549 with multiple distributed gateway ports

This problem occurs because Neutron has not configured the "gateway-port" param in the OVN NAT rule. In this case, the northd [1] automatically obtains the gateway port using the external IP from the NAT rule and the external network configured on the OVN logical router. This issue was introduced with these commits [1][2] and affect ML2/OVN backend since OVN version 21.09.

This problem was discussed on the ovs-discuss mailing list [3], but technically it seems to me that it is a required change in the CMS to guarantee the creation of FIP flows without having to rely on OVN to automatically discover the gateway port.

If OVN is using Distributed Gateway Port on the router, the FIP created by Neutron will not work due to lack of openflow flows to the gateway port:

Before set the gateway_port:

ovn-nbctl lr-nat-list 078fd69b-f4c7-4469-a900-918d0a229bd1
TYPE GATEWAY_PORT EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT
dnat_and_snat 172.16.20.10 10.0.0.232
snat 172.16.10.41 10.0.0.0/24

ovn-sbctl lflow-list | grep 172.16.20.10
  table=25(ls_in_l2_lkup ), priority=80 , match=(flags[1] == 0 && arp.op == 1 && arp.tpa == 172.16.20.10), action=(clone {outport = "admin-rt1-tenant1"; output; }; outport = "_MC_flood_l2"; output;)
  table=25(ls_in_l2_lkup ), priority=80 , match=(flags[1] == 0 && arp.op == 1 && arp.tpa == 172.16.20.10), action=(clone {outport = "1cda494c-4e86-4941-9680-b949341b12a5"; output; }; outport = "_MC_flood_l2"; output;)
  table=25(ls_in_l2_lkup ), priority=80 , match=(flags[1] == 0 && arp.op == 1 && arp.tpa == 172.16.20.10), action=(clone {outport = "bdf0ad70-8677-4340-b5ec-f26af6575e5e"; output; }; outport = "_MC_flood_l2"; output;)
  table=25(ls_in_l2_lkup ), priority=80 , match=(flags[1] == 0 && arp.op == 1 && arp.tpa == 172.16.20.10), action=(clone {outport = "e77e522c-5170-4566-a7b5-1b6ef9f88000"; output; }; outport = "_MC_flood_l2"; output;)
  table=3 (lr_in_ip_input ), priority=90 , match=(arp.op == 1 && arp.tpa == 172.16.20.10), action=(eth.dst = eth.src; eth.src = xreg0[0..47]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[0..47]; arp.tpa <-> arp.spa; outport = inport; flags.loopback = 1; output;)
  table=25(ls_in_l2_lkup ), priority=80 , match=(flags[1] == 0 && arp.op == 1 && arp.tpa == 172.16.20.10), action=(clone {outport = "bb89ed8d-a60d-4a9d-8210-205770490180"; output; }; outport = "_MC_flood_l2"; output;)

After set the gateway_port:
ovn-nbctl lr-nat-list 078fd69b-f4c7-4469-a900-918d0a229bd1
TYPE GATEWAY_PORT EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT
dnat_and_snat lrp-bb89ed8d-a60d- 172.16.20.10 10.0.0.232
snat 172.16.10.41 10.0.0.0/24

ovn-sbctl lflow-list | grep 172.16.20.10
  table=25(ls_in_l2_lkup ), priority=80 , match=(flags[1] == 0 && arp.op == 1 && arp.tpa == 172.16.20.10), action=(clone {outport = "admin-rt1-tenant1"; output; }; outport = "_MC_flood_l2"; output;)
  table=25(ls_in_l2_lkup ), priority=80 , match=(flags[1] == 0 && arp.op == 1 && arp.tpa == 172.16.20.10), action=(clone {outport = "1cda494c-4e86-4941-9680-b949341b12a5"; output; }; outport = "_MC_flood_l2"; output;)
  table=25(ls_in_l2_lkup ), priority=80 , match=(flags[1] == 0 && arp.op == 1 && arp.tpa == 172.16.20.10), action=(clone {outport = "bdf0ad70-8677-4340-b5ec-f26af6575e5e"; output; }; outport = "_MC_flood_l2"; output;)
  table=25(ls_in_l2_lkup ), priority=80 , match=(flags[1] == 0 && arp.op == 1 && arp.tpa == 172.16.20.10), action=(clone {outport = "e77e522c-5170-4566-a7b5-1b6ef9f88000"; output; }; outport = "_MC_flood_l2"; output;)
  table=3 (lr_in_ip_input ), priority=92 , match=(inport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && arp.op == 1 && arp.tpa == 172.16.20.10 && is_chassis_resident("cr-lrp-bb89ed8d-a60d-4a9d-8210-205770490180")), action=(eth.dst = eth.src; eth.src = xreg0[0..47]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[0..47]; arp.tpa <-> arp.spa; outport = inport; flags.loopback = 1; output;)
  table=3 (lr_in_ip_input ), priority=91 , match=(inport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && arp.op == 1 && arp.tpa == 172.16.20.10), action=(drop;)
  table=3 (lr_in_ip_input ), priority=90 , match=(arp.op == 1 && arp.tpa == 172.16.20.10), action=(eth.dst = eth.src; eth.src = xreg0[0..47]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[0..47]; arp.tpa <-> arp.spa; outport = inport; flags.loopback = 1; output;)
  table=4 (lr_in_unsnat ), priority=100 , match=(ip && ip4.dst == 172.16.20.10 && inport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && flags.loopback == 0 && is_chassis_resident("cr-lrp-bb89ed8d-a60d-4a9d-8210-205770490180")), action=(ct_snat_in_czone;)
  table=4 (lr_in_unsnat ), priority=100 , match=(ip && ip4.dst == 172.16.20.10 && inport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && flags.loopback == 1 && flags.use_snat_zone == 1 && is_chassis_resident("cr-lrp-bb89ed8d-a60d-4a9d-8210-205770490180")), action=(ct_snat;)
  table=7 (lr_in_dnat ), priority=100 , match=(ip && ip4.dst == 172.16.20.10 && inport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && is_chassis_resident("cr-lrp-bb89ed8d-a60d-4a9d-8210-205770490180")), action=(ct_dnat_in_czone(10.0.0.232);)
  table=17(lr_in_arp_resolve ), priority=150 , match=(inport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && outport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && ip4.dst == 172.16.20.10), action=(drop;)
  table=17(lr_in_arp_resolve ), priority=100 , match=(outport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && reg0 == 172.16.20.10), action=(eth.dst = fa:16:3e:dc:fb:47; next;)
  table=0 (lr_out_chk_dnat_local), priority=50 , match=(ip && ip4.dst == 172.16.20.10 && is_chassis_resident("cr-lrp-bb89ed8d-a60d-4a9d-8210-205770490180")), action=(reg9[4] = 1; next;)
  table=3 (lr_out_snat ), priority=162 , match=(ip && ip4.src == 10.0.0.232 && outport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && is_chassis_resident("cr-lrp-bb89ed8d-a60d-4a9d-8210-205770490180") && reg9[4] == 1), action=(reg9[4] = 0; ct_snat(172.16.20.10);)
  table=3 (lr_out_snat ), priority=161 , match=(ip && ip4.src == 10.0.0.232 && outport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && is_chassis_resident("cr-lrp-bb89ed8d-a60d-4a9d-8210-205770490180")), action=(ct_snat_in_czone(172.16.20.10);)
  table=5 (lr_out_egr_loop ), priority=100 , match=(ip4.dst == 172.16.20.10 && outport == "lrp-bb89ed8d-a60d-4a9d-8210-205770490180" && is_chassis_resident("cr-lrp-bb89ed8d-a60d-4a9d-8210-205770490180")), action=(clone { ct_clear; inport = outport; outport = ""; eth.dst <-> eth.src; flags = 0; flags.loopback = 1; flags.use_snat_zone = reg9[4]; reg0 = 0; reg1 = 0; reg2 = 0; reg3 = 0; reg4 = 0; reg5 = 0; reg6 = 0; reg7 = 0; reg8 = 0; reg9 = 0; reg9[0] = 1; next(pipeline=ingress, table=0); };)
  table=25(ls_in_l2_lkup ), priority=80 , match=(flags[1] == 0 && arp.op == 1 && arp.tpa == 172.16.20.10), action=(clone {outport = "bb89ed8d-a60d-4a9d-8210-205770490180"; output; }; outport = "_MC_flood_l2"; output;)

Basically, after the change [2], OVN allows operation without setting the NAT rule gateway_port for a very restricted use cases and Neutron ends up being vulnerable to different OVN architectures (using DGP, for example). If Neutron configured the gateway_port parameter on NAT rule command it would work, but the issue is that Neutron never configured this before and automatically gateway port discovery may break the working of FIPs.

My suggestion is that Neutron adds this column when creating/updating FIP rules (if the OVN backend supports this column in the Northbound DB Schema), and updating the previously existing dnat_and_snat entries (FIPs) to set the gateway_port in the maintenance task (once time).

[1] https://github.com/ovn-org/ovn/commit/15348b7b806f7a9680606c3e9348708980129949
[2] https://github.com/ovn-org/ovn/commit/2d942be7db1799f2778492331513ae2b5a556b92
[3] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-September/052655.html

Tags: ovn
Changed in neutron:
assignee: nobody → Roberto Bartzen Acosta (rbartzen)
Miro Tomaska (mtomaska)
tags: added: ovn
Revision history for this message
Miro Tomaska (mtomaska) wrote :

Thank you for the bug Roberto. I would categorize this bug as a feature request for the Neutron. Is the OVN change breaking some existing functionality?

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Roberto Bartzen Acosta (rbartzen) wrote :

Hi Miro, including the gateway_port column in the OVN NAT rule does not break any functionality.
Thanks

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/895260

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/895260
Committed: https://opendev.org/openstack/neutron/commit/78b5fe2ff417e176a3ccc356d9428287a0ee3c64
Submitter: "Zuul (22348)"
Branch: master

commit 78b5fe2ff417e176a3ccc356d9428287a0ee3c64
Author: Roberto Bartzen Acosta <email address hidden>
Date: Fri Sep 15 08:44:52 2023 -0300

    [ML2/OVN] Add gateway_port support for FIP

    The OVN changed support for NAT rules including a new column and auto discovery logic (which may not work in some cases) [1][2].
    If the OVN backend supports this column in the Northbound DB Schema, set gateway port uuid to any floating IP to prevent North/South traffic issues for floating IPs.

    This patch updates the method for creating FIP NAT rules in OVN backend and updates previously created FIP rules to include the gateway_port reference. This NAT rule update task runs only once during the maintenance task, and if all entries are already configured no action is performed.

    [1] https://github.com/ovn-org/ovn/commit/15348b7b806f7a9680606c3e9348708980129949
    [2] https://github.com/ovn-org/ovn/commit/2d942be7db1799f2778492331513ae2b5a556b92

    Closes-Bug: 2035281
    Change-Id: I802b6bd8c281cb6dacdee2e9c15285f069d4e04c

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 24.0.0.0b1

This issue was fixed in the openstack/neutron 24.0.0.0b1 development milestone.

Revision history for this message
alisafari (alisafar1212) wrote :

Is it possible to have backports to antelope?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.