[NB DB] Announcements are not withdrawn from bgp-nic in timely manner

Bug #2057962 reported by Dmitriy Rabotyagov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ovn-bgp-agent
New
Undecided
Unassigned

Bug Description

In non-DVR scenario when router is moved from one gateway node to another, router IP announcements remain on bgp-nic interface for around 5 minutes, after router is being failovered to another net node.

In the meanwhile, outgoing traffic is stuck during this time:

# ovn-nbctl list Logical_router_port db79bd3a-1e5c-447e-ad6f-844ab9c8f98e
_uuid : db79bd3a-1e5c-447e-ad6f-844ab9c8f98e
enabled : []
external_ids : {"neutron:is_ext_gw"=True, "neutron:network_name"=neutron-2cf0fb33-1a7a-4f83-84fe-be02ce9a23b1, "neutron:revision_number"="210", "neutron:router_name"="76384dc7-7b37-43be-b9d7-98112058a6f7", "neutron:subnet_ids"="db24e22a-1452-49ee-ab5a-11b72a515ea8"}
gateway_chassis : [2f89e011-d50f-489a-8130-bcf9c1466c9b, b229f46f-dbcb-42f5-83d1-d59f1ec260ff]
ha_chassis_group : []
ipv6_prefix : []
ipv6_ra_configs : {}
mac : "fa:16:3e:ce:f7:a3"
name : lrp-16555e74-fbef-4ecb-918c-2fb76bf5d42d
networks : ["203.0.113.54/28"]
options : {reside-on-redirect-chassis="true"}
peer : []
status : {hosting-chassis="67b47de0-ff8a-4bb5-94f7-792757484bee"}
#
# openstack network agent show 67b47de0-ff8a-4bb5-94f7-792757484bee -c host -c configuration -c agent_type -c alive
+---------------+-------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------+-------------------------------------------------------------------------------------------------+
| agent_type | OVN Controller Gateway agent |
| alive | :-) |
| configuration | {'chassis_name': '67b47de0-ff8a-4bb5-94f7-792757484bee', 'bridge-mappings': 'vlan:br-provider'} |
| host | os-net01-az1 |
+---------------+-------------------------------------------------------------------------------------------------+

root@os-net02-az1:/home/dr5005# ip a sh bgp-nic
38: bgp-nic: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bgp-external state UNKNOWN group default qlen 1000
    link/ether 2e:82:37:d0:1e:8d brd ff:ff:ff:ff:ff:ff
    inet 203.0.113.60/32 scope global bgp-nic
       valid_lft forever preferred_lft forever
    inet 203.0.113.54/32 scope global bgp-nic
       valid_lft forever preferred_lft forever
    inet6 fe80::2c82:37ff:fed0:1e8d/64 scope link
       valid_lft forever preferred_lft forever
root@os-net02-az1:/home/dr5005#

root@os-net01-az1# ip a sh bgp-nic
23: bgp-nic: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bgp-external state UNKNOWN group default qlen 1000
    link/ether 2e:ba:cb:41:49:20 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2cba:cbff:fe41:4920/64 scope link
       valid_lft forever preferred_lft forever
root@os-net01-az1#

root@os-net01-az1:/home/dr5005# date
Thu Mar 14 18:25:29 UTC 2024
root@os-net01-az1:/home/dr5005# tcpdump -nn -c 1000 -i bond0.3114 dst host 203.0.113.54
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on bond0.3114, link-type EN10MB (Ethernet), snapshot length 262144 bytes
18:28:12.118309 IP 203.0.113.54 > 8.8.8.8: ICMP echo request, id 5264, seq 772, length 64
18:28:13.142381 IP 203.0.113.54 > 8.8.8.8: ICMP echo request, id 5264, seq 773, length 64
18:28:14.166532 IP 203.0.113.54 > 8.8.8.8: ICMP echo request, id 5264, seq 774, length 64
18:28:15.190529 IP 203.0.113.54 > 8.8.8.8: ICMP echo request, id 5264, seq 775, length 64
18:28:16.214656 IP 203.0.113.54 > 8.8.8.8: ICMP echo request, id 5264, seq 776, length 64
^C
5 packets captured
5 packets received by filter
0 packets dropped by kernel
root@os-net01-az1:/home/dr5005#

This issue does not happen with SB DB driver, where announcements are synced instantly on action in NB DB.

summary: - [NB DB] Announcements are not withdrawn from bgp-nic
+ [NB DB] Announcements are not withdrawn from bgp-nic in timely manner
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Actually, after checking more on that, I assume that this report is invalid. It;s just NB/SB drivers behaviour difference that makes think there's smth wrong with the driver.

While SB DB driver does move IPs announcements right away - hosts are still being unreachable for significant amount of time, since routes are not withdrawn by FRR.

At the same time NB DB driver just moves IPs together with FRR reconciliation from what I got.

So feel free to mark as invalid (or maybe it's just a thing that worth documenting).

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote (last edit ):

Ok, I guess this is actually an issue in SB DB driver instead, because NB driver keeps IPs on the bgp-nic to reduce potential downtime and waits for FRR to withdraw them.

While SB driver moves them right away, which makes 3m downtime...

Or I don't really know... Looking into even more details, it is super dependent on the usecase.... Like with SB DB I see downtime for FIP and router IP.

For NB DB I still can reach router, but SRC NAT does not work anyway...

summary: - [NB DB] Announcements are not withdrawn from bgp-nic in timely manner
+ [SB DB] Announcements are withdrawn from bgp-nic prematurely causing
+ downtimes
summary: - [SB DB] Announcements are withdrawn from bgp-nic prematurely causing
- downtimes
+ [NB DB] Announcements are not withdrawn from bgp-nic in timely manner
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.