[BGP DVR] Incorrect /32 Fixed IP Route Advertisement Logic

Bug #2108985 reported by Yusuf Güngör
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned

Bug Description

The current logic in neutron_dynamic_routing/db/bgp_db.py (specifically within the _get_dvr_fixed_ip_routes_by_bgp_speaker function and its helpers) for advertising /32 fixed IP host routes associated with Distributed Virtual Routers (DVR) is incomplete.

The code correctly checks for matching hosts and address scopes between the fixed IP port and the potential floatingip_agent_gateway next-hop port. However, it does not sufficiently verify that the fixed IP's network is actually attached as an interface to the specific distributed router whose scope matches the agent gateway's scope. This could lead to scenarios where a fixed IP route is advertised via an agent gateway, even if that fixed IP belongs to a network not directly routed by the distributed router relevant to that agent gateway's external network scope.

Because of this problem, we are not able to migrate from DVR to Centralized router. Even removing the network from the DVR router, the /32 routes still advertised.

To correctly advertise a /32 fixed IP route in a DVR setup via BGP:

- A Fixed IP exists on a port (e.g., compute port) located on a specific compute host.
- A floatingip_agent_gateway port exists on the same compute host.
- The subnet containing the Fixed IP and the subnet containing the Agent Gateway IP must belong to the same address scope. (Existing check is correct).
- The network containing the Fixed IP must have an interface attached to a router.
- That router must be distributed (router.distributed=True).
- That router must have an external gateway port configured (necessary for external connectivity).
- The critical missing link in the original logic was strongly enforcing condition #4 and #5 together before relying solely on the host and scope match (#1, #2, #3) for the final join.

# Environment Details

OpenStack Version: Zed (cluster installed via Kolla-Ansible)
OS Version: Ubuntu 22.04.4 LTS Hosts (Kernel: 5.15.0-117-generic)
Neutron Version: 21.1.3.dev24
Services: neutron-server, neutron-dhcp-agent, neutron-openvswitch-agent, neutron-l3-agent, neutron-bgp-dragent, neutron-metadata-agent
Controller & Network Nodes: 5 nodes
Networking Backend: OpenvSwitch (DVR mode)
Router HA: Disabled (l3_ha = false)
BGP Dynamic Routing: neutron-bgp-dragent used to announce unique tenant networks.
Tenant Network Type: VXLAN
External Network Type: VLAN

Tags: l3-bgp
Revision history for this message
Yusuf Güngör (yusuf2) wrote :

We have recognized this when trying to migrate from DVR to Centralized routers. (Because of https://bugs.launchpad.net/neutron/+bug/2107634)

Our migration details can be summarized as below.

The code responsible for advertising /32 IP routes was introduced 7 years ago via the Merge Request titled "Implement DVR-aware fixed IP lookups". (https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/581098)

This code seems to have been written somewhat superficially.

Fundamentally, the _get_dvr_fixed_ip_routes_by_bgp_speaker function handles this task. However, unlike other route calculation functions, this one performs very little filtering. It directly proceeds to advertise a /32 route if the Tenant network and the Provider network share the same address scope, and if the floatingip_agent_gateway port and the tenant network's port (implied fixed IP port) reside on the same compute host. (Note: floatingip_agent_gateway refers to the IP address assigned from the Provider network to each compute host).

However, in addition to these conditions (same host, same scope), the logic should have also verified that the fixed IP's network has an interface attached to a router, that this specific router is distributed, and that it has an external gateway configured. This ensures the fixed IP is correctly associated with the distributed routing context before matching host and scope for next-hop determination.

If the code had operated correctly with these checks, removing the network interface from the old router would have caused the associated /32 routes to be completely withdrawn. Subsequently, attaching the network to a new router wouldn't result in advertisements if that new router was not configured as distributed (DVR).

This lack of proper checking is specific to the /32 fixed IP route logic. For other types of routes (e.g., tenant network prefixes), the system already correctly checks the router association, and advertisements cease when a network interface or the external gateway is removed from the relevant router.

Setting up a secondary BGP instance or adding a second BGP provider network is not a viable workaround for this issue. As long as the original provider network remains associated with the original address scope, the incorrect /32 route advertisements will persist based on the flawed logic. Furthermore, introducing a second provider network into the same address scope could potentially lead to unpredictable behavior or further issues within the code.

Outside of applying a direct code fix, a potential (but highly impractical) workaround might involve consolidating all VM instances from the affected network onto a single compute host temporarily. Once the underlying network/router changes are complete, the instances could then be redistributed. However, the operational cost and disruption associated with such a procedure would likely be prohibitive.

yatin (yatinkarel)
tags: added: l3-bgp
Revision history for this message
yatin (yatinkarel) wrote :

Hi Yusuf Güngör, zed is quite old and unmaintained release. Can you check if this issue is seen with master release as well?
Also raised over IRC https://meetings.opendev.org/irclogs/%23openstack-neutron/%23openstack-neutron.2025-04-25.log.html#t2025-04-25T07:48:45

Changed in neutron:
status: New → Incomplete
Revision history for this message
Yusuf Güngör (yusuf2) wrote :

Hi Yatin, we are planning an upgrade in a month. We will check and update this issue, thanks

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.