[ndr] neutron-bgp-dragent is racy when a service restart is made just before a speaker is added
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
New
|
Undecided
|
Unassigned | ||
neutron-dynamic-routing (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Hit a race with the Antelope (22.0.0) version of NDR in one of our functional test runs:
1) neutron-bgp-dragent got restarted right before creating a speaker and adding an external network and tenant network to it;
2) As can be seen in the service log below, just after neutron-bgp-dragent started, it tried to advertise a route (00:03:21.766) before a speaker got added to it (00:03:22.251) - which it failed to do with the `BgpSpeakerNotA
https:/
3) As a result, the peer (FRR in our case) only got a floating IP route (/32) in the test result in the tenant network route (/24) was never advertised.
Test steps (downstream) that generated the log lines: https:/
The service restart is done prior to calling the test code above (notably, it was done as a workaround for something else but inadvertently helped to trigger this edge case):
https:/
The lack of a route at the peer side can be seen at 2023-06-19 00:03:32 here:
https:/
2023-06-19 00:03:32.346994 | focal-medium |
2023-06-19 00:03:32.347012 | focal-medium | B>* 100.64.0.144/32 [20/0] via 172.16.27.207, ens3, weight 1, 00:00:07
2023-06-19 00:03:32.347045 | focal-medium |
Summary: It looks like neutron-bgp-dragent may try to advertise routes it gets from a DB before a speaker is added by it. It should properly make sure a speaker is present before trying to advertise routes. If speakers aren't scheduled to it yet, it should attempt to advertise as soon as one is present on it.
-------
Functional test log:
2023-06-19 00:03:19.709430 | focal-medium | 2023-06-19 00:03:19 [INFO] Setting up BGP speaker
2023-06-19 00:03:20.307141 | focal-medium | 2023-06-19 00:03:20 [INFO] Creating BGP Speaker
2023-06-19 00:03:20.434428 | focal-medium | 2023-06-19 00:03:20 [INFO] Advertising BGP routes
2023-06-19 00:03:20.678231 | focal-medium | 2023-06-19 00:03:20 [INFO] Advertising ext_net network on BGP Speaker bgp-speaker
2023-06-19 00:03:20.919232 | focal-medium | 2023-06-19 00:03:20 [INFO] Advertising private network on BGP Speaker bgp-speaker
2023-06-19 00:03:21.155337 | focal-medium | 2023-06-19 00:03:21 [INFO] Setting up BGP peer
2023-06-19 00:03:22.099859 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating BGP Peer
2023-06-19 00:03:22.142524 | focal-medium | 2023-06-19 00:03:22 [INFO] Adding BGP peer to BGP speaker
2023-06-19 00:03:22.143374 | focal-medium | 2023-06-19 00:03:22 [INFO] Adding peer osci-frr on BGP Speaker bgp-speaker
2023-06-19 00:03:22.208265 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating floating IP to advertise
2023-06-19 00:03:22.301280 | focal-medium | 2023-06-19 00:03:22 [INFO] Creating port: NDR_TEST_FIP
2023-06-19 00:03:23.599942 | focal-medium | 2023-06-19 00:03:23 [INFO] Creating floatingip
2023-06-19 00:03:26.351808 | focal-medium | 2023-06-19 00:03:26 [INFO] Advertised floating IP: 100.64.0.144
neutron-
2023-06-19 00:03:20.751 26428 INFO neutron.
2023-06-19 00:03:20.751 26428 INFO neutron.
2023-06-19 00:03:21.533 26428 INFO neutron_
2023-06-19 00:03:21.533 26428 INFO neutron_
2023-06-19 00:03:21.578 26428 INFO neutron_
2023-06-19 00:03:21.748 26428 INFO bgpspeaker.api.base [None req-3e563ce5-
2023-06-19 00:03:21.766 26428 ERROR neutron_
2023-06-19 00:03:21.768 26428 INFO neutron_
2023-06-19 00:03:22.249 26428 INFO bgpspeaker.api.base [None req-eac9e066-
2023-06-19 00:03:22.251 26428 INFO neutron_
2023-06-19 00:03:23.261 26428 INFO bgpspeaker.peer [-] Connection to peer: 172.16.0.66 established
2023-06-19 00:03:23.263 26428 INFO neutron_
2023-06-19 00:03:24.370 26428 INFO neutron_
2023-06-19 00:03:25.450 26428 INFO bgpspeaker.api.base [None req-af11f291-
2023-06-19 00:03:25.451 26428 INFO neutron_
2023-06-19 00:03:25.507 26428 ERROR bgpspeaker.peer [-] AS_PATH on UPDATE message has loops. Ignoring this message: BGPUpdate(
description: | updated |
description: | updated |
description: | updated |
no longer affects: | neutron (Ubuntu) |
Not sure if related or not, but the final error "ERROR bgpspeaker.peer [-] AS_PATH on UPDATE message has loops." indicates that you have a broken BGP peer configuration, sending prefixes back to the bgp-agent, which shouldn't happen. In another recent bug, fixing that configuration also fixed other issues, so I would start with repairing that and see if you still can reproduce the issue afterwards.