[BGP DVR] Incorrect /32 Fixed IP Route Advertisement Logic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Expired
|
Undecided
|
Unassigned |
Bug Description
The current logic in neutron_
The code correctly checks for matching hosts and address scopes between the fixed IP port and the potential floatingip_
Because of this problem, we are not able to migrate from DVR to Centralized router. Even removing the network from the DVR router, the /32 routes still advertised.
To correctly advertise a /32 fixed IP route in a DVR setup via BGP:
- A Fixed IP exists on a port (e.g., compute port) located on a specific compute host.
- A floatingip_
- The subnet containing the Fixed IP and the subnet containing the Agent Gateway IP must belong to the same address scope. (Existing check is correct).
- The network containing the Fixed IP must have an interface attached to a router.
- That router must be distributed (router.
- That router must have an external gateway port configured (necessary for external connectivity).
- The critical missing link in the original logic was strongly enforcing condition #4 and #5 together before relying solely on the host and scope match (#1, #2, #3) for the final join.
# Environment Details
OpenStack Version: Zed (cluster installed via Kolla-Ansible)
OS Version: Ubuntu 22.04.4 LTS Hosts (Kernel: 5.15.0-117-generic)
Neutron Version: 21.1.3.dev24
Services: neutron-server, neutron-dhcp-agent, neutron-
Controller & Network Nodes: 5 nodes
Networking Backend: OpenvSwitch (DVR mode)
Router HA: Disabled (l3_ha = false)
BGP Dynamic Routing: neutron-bgp-dragent used to announce unique tenant networks.
Tenant Network Type: VXLAN
External Network Type: VLAN
We have recognized this when trying to migrate from DVR to Centralized routers. (Because of https:/ /bugs.launchpad .net/neutron/ +bug/2107634)
Our migration details can be summarized as below.
The code responsible for advertising /32 IP routes was introduced 7 years ago via the Merge Request titled "Implement DVR-aware fixed IP lookups". (https:/ /review. opendev. org/c/openstack /neutron- dynamic- routing/ +/581098)
This code seems to have been written somewhat superficially.
Fundamentally, the _get_dvr_ fixed_ip_ routes_ by_bgp_ speaker function handles this task. However, unlike other route calculation functions, this one performs very little filtering. It directly proceeds to advertise a /32 route if the Tenant network and the Provider network share the same address scope, and if the floatingip_ agent_gateway port and the tenant network's port (implied fixed IP port) reside on the same compute host. (Note: floatingip_ agent_gateway refers to the IP address assigned from the Provider network to each compute host).
However, in addition to these conditions (same host, same scope), the logic should have also verified that the fixed IP's network has an interface attached to a router, that this specific router is distributed, and that it has an external gateway configured. This ensures the fixed IP is correctly associated with the distributed routing context before matching host and scope for next-hop determination.
If the code had operated correctly with these checks, removing the network interface from the old router would have caused the associated /32 routes to be completely withdrawn. Subsequently, attaching the network to a new router wouldn't result in advertisements if that new router was not configured as distributed (DVR).
This lack of proper checking is specific to the /32 fixed IP route logic. For other types of routes (e.g., tenant network prefixes), the system already correctly checks the router association, and advertisements cease when a network interface or the external gateway is removed from the relevant router.
Setting up a secondary BGP instance or adding a second BGP provider network is not a viable workaround for this issue. As long as the original provider network remains associated with the original address scope, the incorrect /32 route advertisements will persist based on the flawed logic. Furthermore, introducing a second provider network into the same address scope could potentially lead to unpredictable behavior or further issues within the code.
Outside of applying a direct code fix, a potential (but highly impractical) workaround might involve consolidating all VM instances from the affected network onto a single compute host temporarily. Once the underlying network/router changes are complete, the instances could then be redistributed. However, the operational cost and disruption associated with such a procedure would likely be prohibitive.