RFE: add possibility of advertising entire tenant subnet prefixes instead of single-host routes

Bug #2017886 reported by Luis Tomas Bolivar
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ovn-bgp-agent
In Progress
High
Unassigned

Bug Description

ovn-bgp-agent currently advertises VM addresses on tenant network in BGP as single-host routes (IPv4 /32 and IPv6 /128).

This is the case both for the BGP driver (if expose_tenant_networks=True) and for the EVPN driver (if the router's port on the tenant networks has the neutron_bgpvpn:{as,vni} annotations).

This is rather inefficient, as it creates a large amount of advertisements (e.g., 192.0.2.1/32, .2/32, 3/32 and so on) that could easily been aggregated into a single advertisement (192.0.2.0/24). This ought to work equally well, considering that all the traffic to the tenant networks needs to pass through the cr-lrp port (gateway chassis) anyway.

If the feature [1] is implemented, it is obviously necessary to advertise advertise single-host routes directly from the compute nodes in order to bypass the gateway chassis. That said, having the ability to advertise the subnet prefixes would still be valuable in certain situations:

One example of such as situation would be two VMs maintaining (let's say 192.0.2.10 and 192.0.2.11) maintaining a failover address using Keepalived (192.0.2.234). Since the Keepalived failover address is not known to Neutron (beyond being present in the VM port's allowed_address_pairs maybe), it can't be advertised directly from the compute nodes. But even though 192.0.2.10 and .11 are advertised directly from the compute nodes hosting the VMs in question, bypassing the gateway chassis, the failover address 192.0.2.234 could in theory still be reached through the less specific route 192.0.2.0/24 via the cr-lrp port, if this was also being injected into BGP.

[1] https://bugs.launchpad.net/ovn-bgp-agent/+bug/2017885

Changed in ovn-bgp-agent:
importance: Undecided → Medium
importance: Medium → Wishlist
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ovn-bgp-agent (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/ovn-bgp-agent/+/907057

Revision history for this message
Freerk-Ole Zakfeld (su-freerk) wrote :

I also think this is crucial for IPv6. In a traditional deployment, the neutron dynamic routing BGP speaker announces the tenants GUA prefix as a whole /64. Floating IPs are somewhat limited, but single IPv6 addresses are all over the place (every instance) which would result in a lot of specific routes across the fabric.

Luis Tomas (luis5tb)
Changed in ovn-bgp-agent:
status: New → Confirmed
Changed in ovn-bgp-agent:
importance: Wishlist → High
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ovn-bgp-agent (master)

Reviewed: https://review.opendev.org/c/openstack/ovn-bgp-agent/+/907057
Committed: https://opendev.org/openstack/ovn-bgp-agent/commit/5da36a2638499d1fcb222385019ae4304b18e218
Submitter: "Zuul (22348)"
Branch: master

commit 5da36a2638499d1fcb222385019ae4304b18e218
Author: Michel Nederlof <email address hidden>
Date: Wed Jan 24 11:56:37 2024 +0100

    Disable exposing remote_ips, when only the lrp prefix is sufficient

    This also requires to use redist kernel in FRR, so there is a change
    here which allows to define the default redistribute options in the FRR
    template.

    Since now this method is now available, the separate KERNEL_LEAK template
    can be removed, as the only difference was the redist kernel, instead of
    redist connected.

    Related-Bug: #2017886
    Change-Id: I570d8c482f3d17d63d66699e402c84dc61787638

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ovn-bgp-agent (stable/2023.2)

Related fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/ovn-bgp-agent/+/910303

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ovn-bgp-agent (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/ovn-bgp-agent/+/910303
Committed: https://opendev.org/openstack/ovn-bgp-agent/commit/590880838c582e08e1f088eaed541295b7ba65c3
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 590880838c582e08e1f088eaed541295b7ba65c3
Author: Michel Nederlof <email address hidden>
Date: Wed Jan 24 11:56:37 2024 +0100

    Disable exposing remote_ips, when only the lrp prefix is sufficient

    This also requires to use redist kernel in FRR, so there is a change
    here which allows to define the default redistribute options in the FRR
    template.

    Since now this method is now available, the separate KERNEL_LEAK template
    can be removed, as the only difference was the redist kernel, instead of
    redist connected.

    Related-Bug: #2017886
    Change-Id: I570d8c482f3d17d63d66699e402c84dc61787638
    (cherry picked from commit 5da36a2638499d1fcb222385019ae4304b18e218)

Revision history for this message
Jay Jahns (jayjahns) wrote :

I don't think this is working. When I configure a router with SNAT disabled, the exposing of the subnet that connected southbound of the router is not being exposed correctly.

Here is what I observe:

Route added to br-ex connecting my subnet (192.168.0.0/24) via the router gateway (10.0.0.10). Redistribute kernel is enabled, yet there are no routes in the FRR routing table being announced.

I believe for SNAT disabled, we need to be wiring the subnet's port (192.168.0.1/24) up so that we can see a route in FRR for 192.168.0.0/24 as directly connected.

In the code, I see that we detect 192.168.0.1/24 and convert it to 192.168.0.0/24 so it can have a route added with a next hop being the router gateway. For disabled SNAT I believe this to be wrong, as the purpose of a disabled SNAT is to eliminate that next hop.

What I believe we should be doing, is make the subnet's gateway (192.168.0.1/24) appear as directly connected. This will allow the 192.168.0.0/24 route to be announced via BGP.

Revision history for this message
Luis Tomas Bolivar (ltomasbo) wrote :

With redistribute connected we used to add .1 instead of .0. And for the evpn work Michel changed it to .0 for redistribute kernel. Perhaps we need to add both? I think he was having problems with the resync action due to a mismatch between the bpg route and the kernel learned one. Perhaps we should fall back to .1 and handle the mismatch on the resync instead

Revision history for this message
Jay Jahns (jayjahns) wrote :

Even with redistribute kernel, its not being announced. Either way, with SNAT turned off, we're still adding a route for that network (192.168.0.0/24) via nexthop the logical router port.

I believe we need to expose the port/IP connecting the tenant subnet to the logical router in a disabled SNAT configuration. Also, I am wondering if having SNAT in an expose tenant network configuration is a good idea, because leveraging the NAT kind of defeats the purpose of having it.

Revision history for this message
Jay Jahns (jayjahns) wrote :

So moving back down to the ovn_bgp_driver, I am seeing that we can reach no-nat VMs just fine; however, in response to the original OP, IPs shared by 2 VMs (i.e. keepalived/haproxy) still fail. The only difference with the ovn_bgp_driver vs the nb_ovn_bgp_driver, is we can simply add an IP to bgp-nic covering the entire subnet, and suddenly we now have full access to allowed address pairs.

I think this requires higher severity.

Revision history for this message
Luis Tomas Bolivar (ltomasbo) wrote :

Regarding SNAT, yep, it makes sense to have it disabled for tenant networks, as otherwise the outgoing traffic from the VMs is not going to work as intended. This new flag was added for that: require_snat_disabled_for_tenant_networks

When you say IPs shared by 2 VMs, you mean allowed_address_pairs and then relying on OVN to associate that IP to one or another VM (over time), right? That is the same use case as amphora loadbalancer and that used to work just fine (with exposing individual IPs, not subnets). Maybe you are hitting something similar to Michel on live migration with evpn, where he needed to add the option "anycast_evpn_gateway_mode" on https://review.opendev.org/c/openstack/ovn-bgp-agent/+/906505 to ensure the same mac everywhere

Revision history for this message
Jay Jahns (jayjahns) wrote :
Download full text (8.5 KiB)

I tried without setting advertisement_method_tenant_networks and explicitly setting it to host. Here is the log sequence from my attachment of my router to external gateway (--disable-snat).

2024-05-14 02:52:48.558 7 DEBUG ovn_bgp_agent.drivers.openstack.nb_ovn_bgp_driver [-] Adding BGP route for logical port with ip ['10.196.3.74'] _expose_ip /var/lib/kolla/venv/lib64/python3.9/site-packages/ovn_bgp_agent/drivers/openstack/nb_ovn_bgp_driver.py:410
2024-05-14 02:52:48.569 7 DEBUG ovn_bgp_agent.utils.linux_net [-] Creating route at table 53: {'dst': '10.196.3.74', 'dst_len': 32, 'oif': 12, 'table': 53, 'proto': 3, 'scope': 253} add_ip_route /var/lib/kolla/venv/lib64/python3.9/site-packages/ovn_bgp_agent/utils/linux_net.py:696
2024-05-14 02:52:48.570 7 DEBUG ovn_bgp_agent.utils.linux_net [-] Route created at table 53: {'dst': '10.196.3.74', 'dst_len': 32, 'oif': 12, 'table': 53, 'proto': 3, 'scope': 253} add_ip_route /var/lib/kolla/venv/lib64/python3.9/site-packages/ovn_bgp_agent/utils/linux_net.py:698
2024-05-14 02:52:48.609 7 DEBUG ovn_bgp_agent.drivers.openstack.nb_ovn_bgp_driver [-] Added BGP route for logical port with ip ['10.196.3.74'] _expose_ip /var/lib/kolla/venv/lib64/python3.9/site-packages/ovn_bgp_agent/drivers/openstack/nb_ovn_bgp_driver.py:443

After this, I captured the output from attaching a subnet to the router.

2024-05-14 02:54:33.677 7 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: LogicalSwitchPortSubnetAttachEvent(events=('update',), table='Logical_Switch_Port', conditions=None, old_conditions=None), priority=20 to row=Logical_Switch_Port(port_security=[], addresses=['router'], type=router, dhcpv4_options=[], name=4ffd7337-0593-42c3-b0f9-b40e75864e76, up=[True], options={'router-port': 'lrp-4ffd7337-0593-42c3-b0f9-b40e75864e76'}, ha_chassis_group=[], external_ids={'neutron:cidrs': '10.197.0.1/25', 'neutron:device_id': '120eb8f3-9575-4454-b975-aa9e416729aa', 'neutron:device_owner': 'network:router_interface', 'neutron:mtu': '', 'neutron:network_name': 'neutron-59b9bdf5-0bcc-40e4-8f29-31eb8db4aa39', 'neutron:port_capabilities': '', 'neutron:port_name': '', 'neutron:project_id': '61bb60cb823e4a7987b48dc29ba70cd4', 'neutron:revision_number': '1', 'neutron:security_group_ids': '', 'neutron:subnet_pool_addr_scope4': 'eb5af6f5-2b92-4c89-87ca-7c7964ca70bd', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, dynamic_addresses=[], tag=[], parent_name=[], mirror_rules=[], tag_request=[], enabled=[True], dhcpv6_options=[]) old=Logical_Switch_Port(up=[False]) matches /var/lib/kolla/venv/lib64/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43
2024-05-14 02:54:33.678 7 DEBUG oslo_concurrency.lockutils [-] Acquiring lock "nbbgp" by "ovn_bgp_agent.drivers.openstack.nb_ovn_bgp_driver.NBOVNBGPDriver.expose_subnet" inner /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py:404
2024-05-14 02:54:33.678 7 DEBUG oslo_concurrency.lockutils [-] Lock "nbbgp" acquired by "ovn_bgp_agent.drivers.openstack.nb_ovn_bgp_driver.NBOVNBGPDriver.expose_subnet" :: waited 0.000s inner /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py:409
2024-05-14 02:54:33.679 7...

Read more...

Revision history for this message
Luis Tomas Bolivar (ltomasbo) wrote :

I tried locally and it works for me, but I did not use the address_scopes. But that should only block the exposing, and it seems the agent actually created the steps to expose the ip, perhaps it is something wrong in the frr base template? though if the ping is from the local node where the cr-lrp port is, the traffic should work regardless either. Perhaps you have the wrong "ip rules" and the routing table is not being hit? could you also paste the output of that?, based on your example you should have something like

$ ip rule
0: from all lookup local
1000: from all lookup [l3mdev-table]
32000: from all to 10.196.3.74 lookup br-ex
32000: from all to 10.197.0.0/25 lookup br-ex

Revision history for this message
Jay Jahns (jayjahns) wrote :

I am going to remove my address scope and see what kind of impact that holds. I do have the correct routes and such but FRR is not seeing the route.

Revision history for this message
Jay Jahns (jayjahns) wrote :

Okay - address scope removed. This did not help. When I added the subnet to my router, I see the following:

# ip rule show
0: from all lookup local
304: from all iif br-ex lookup 10000 proto zebra
1000: from all lookup [l3mdev-table]
32000: from all to 10.196.3.74 lookup 55
32000: from all to 10.197.0.0/25 lookup 55
32766: from all lookup main
32767: from all lookup default

In ip route for br-ex:

# ip route show table 55
default dev br-ex scope link
10.196.3.74 dev br-ex scope link
10.197.0.0/25 via 10.196.3.74 dev br-ex

Revision history for this message
Jay Jahns (jayjahns) wrote :

I identified a bug in kolla-ansible where the routing and rule tables would change at container restart, polluting the routing table on the network node.

Now I can see a little farther into the stack. On the network node, I can reach everything on 10.197.0.0/25 just fine; however, there is no route announced to FRR.

As a workaround, I added an address of 10.197.0.1/25 to bgp-nic.

Once I did that, the network was announced and I could reach everything. However after a sync, this goes away.

The issue appears that we are missing that particular item to complete this.

Revision history for this message
Jay Jahns (jayjahns) wrote :

Further on this - if I wanted to make the change persistent, I can simply add the static route that's on the br-ex routing table to the default routing table, and frr will see it in the kernel routing table. it will not distribute it until i tell frr to redistribute kernel, but then everything works.

Revision history for this message
Luis Tomas Bolivar (ltomasbo) wrote :

I was going to say that you have an extra ip rule that could be interfering with the ovn-bgp-agent ones:
304: from all iif br-ex lookup 10000 proto zebra

But it seems you have pass that point already

Yep, if host IPs are exposed, then IPs are added to bgp-nic, but if subnets are exposed, then no IP gets added to bgp-nic, but redistribute kernel but route at br-ex routing table are used.

On the first case (host IP), the network is not advertise, so if you have a virtual IP it will only be advertised in the node/vm that gets to be the chassis of the virtual port. There was a patch in neutron to update the information about those virtual IPs, perhaps you are missing it? this on ovn-bgp-agent side: https://review.opendev.org/c/openstack/ovn-bgp-agent/+/883187, and this the patch on neutron:
https://review.opendev.org/c/openstack/neutron/+/882705

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.