Automatic cleanup of BGP speakers is too aggressive

Bug #1943725 reported by Andrew Bonney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Undecided
Unassigned

Bug Description

We have seen regular issues with the neutron-bgp-dragent service when one or more network nodes fail or are undergoing maintenance.

In the most problematic case, we have a deployment with four network nodes. Each of these runs a neutron-bgp-dragent process, and each is associated with the same BGP speaker. When one of these network nodes goes down, a short time later a cleanup process runs, but rather than just removing the speaker association from the absent network node, they are removed from all but one of them.

During this process, the running neutron-bgp-dragent processes report errors such as the following (observed using the latest neutron-dynamic-routing code from stable/victoria):

"Unable to sync BGP speaker state.: RuntimeError: dictionary changed size during iteration"

or

Sep 15 13:36:26 neutron-bgp-dragent[1308396]: 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server [req-094dad10-b4da-4c50-8e32-f7814d446705 - - - - -] Exception during message handling: TypeError: unhashable type: 'dict'
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 185, in bgp_speaker_create_end
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_helper(bgp_speaker_id)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 249, in add_bgp_speaker_helper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_on_dragent(bgp_speaker)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 344, in add_bgp_speaker_on_dragent
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.cache.put_bgp_speaker(bgp_speaker)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 582, in put_bgp_speaker
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.remove_bgp_speaker_by_id(self.cache[bgp_speaker['id']])
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 600, in remove_bgp_speaker_by_id
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server if bgp_speaker_id in self.cache:
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server TypeError: unhashable type: 'dict'
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server

This issue appears to match a comment here: https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/780675/3#message-c409a4fb83a44216e03a041921c7067f44eb70d0

We will test out https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/780675 as in our case the automatic behaviour appears mostly unnecessary, but a fix for the underlying issue would still be appreciated.

Tags: l3-bgp
tags: added: l3-bgp
Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

This seems to be essentially the same as https://bugs.launchpad.net/neutron/+bug/1920065 , which is the bug related to the above patch. Not sure if I would call it a duplicate, though, but certainly worth further investigation.

Changed in neutron:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.