So we further debugged this and below are the findings:-
When the issue reproduces for a server/port:-
- PortBindingUpdateUpEvent is received and put into queue, at this point self.notifications Queue size is large, seen 250+
- The Queue is filled with PortBindingChassisEvent for chassisredirect port in just 2-3 seconds
- All the PortBindingChassisEvent is for same port just switching chassis[1], this is just snippet there were total 274 enteries for this particular case, for some cases seen 350+ too.
- Same can be seen in ovn-controller log[2], added just snippet and there were in total 134 enteries on one controller and 135 on other.
[2] Changing chassis for lport cr-lrp-b0858034-b5e1-475e-a59e-f19ce3191155 from 2dd2e070-e65d-47b0-a458-49fb7eb3e0eb to 30a04401-a973-4ddd-a087-fd45b12116b7 on one controller
Changing chassis for lport cr-lrp-b0858034-b5e1-475e-a59e-f19ce3191155 from 30a04401-a973-4ddd-a087-fd45b12116b7 to 2dd2e070-e65d-47b0-a458-49fb7eb3e0eb on other controller
Pushed revert to neutron master:- https:/ /review. opendev. org/c/openstack /neutron/ +/843426, will also be backported to other branches.
Pasting investigation results from https:/ /bugzilla. redhat. com/show_ bug.cgi? id=2081631# c6 here for reference:-
So we further debugged this and below are the findings:-
When the issue reproduces for a server/port:- teUpEvent is received and put into queue, at this point self.notifications Queue size is large, seen 250+ sisEvent for chassisredirect port in just 2-3 seconds sisEvent is for same port just switching chassis[1], this is just snippet there were total 274 enteries for this particular case, for some cases seen 350+ too.
- PortBindingUpda
- The Queue is filled with PortBindingChas
- All the PortBindingChas
- Same can be seen in ovn-controller log[2], added just snippet and there were in total 134 enteries on one controller and 135 on other.
And this resulted into a known old unfixed OVN bug https:/ /bugzilla. redhat. com/show_ bug.cgi? id=1974898. So until that is fixed seems we need to revert https:/ /review. opendev. org/c/openstack /networking- ovn/+/823279 which likely causing the issue more often as that switched monitoring to SB DB instead of NB DB, and NB and SB queues are different and NB events will not be impacted with large SB event queue.
[1] 2022-05-25 09:11:04.511 15 DEBUG networking_ ovn.ovsdb. ovsdb_monitor [-] Hash Ring: Node a3570719- 1079-4d61- a0c8-f3171fb07f 85 (host: controller- 2.redhat. local) handling event "update" for row 3831cbcf- fc7c-4b55- 8af4-12e3a3dc21 c2 (table: Port_Binding) notify /usr/lib/ python3. 6/site- packages/ networking_ ovn/ovsdb/ ovsdb_monitor. py:742 backend. ovs_idl. event [-] Matched UPDATE: PortBindingChas sisEvent( events= ('update' ,), table=' Port_Binding' , conditions= (('type' , '=', 'chassisredirec t'),), old_conditions= None) to row=Port_ Binding( parent_ port=[] , chassis= [<ovs.db. idl.Row object at 0x7fb4a760e710>], mac=['fa: 16:3e:70: a1:12 10.0.0.220/24 2620:52: 0:13b8: :1000:21/ 64'], options= {'always- redirect' : 'true', 'distributed-port': 'lrp-b0858034- b5e1-475e- a59e-f19ce31911 55'}, ha_chassis_ group=[ ], type=chassisred irect, tag=[], requested_ chassis= [], tunnel_key=2, up=[True], logical_ port=cr- lrp-b0858034- b5e1-475e- a59e-f19ce31911 55, gateway_chassis=[], encap=[], external_ids={}, virtual_parent=[], nat_addresses=[], datapath= 75657e9e- 7e7d-4cb5- 95bc-97f0e3a37d 9a) old=Port_ Binding( chassis= [], up=[False]) matches /usr/lib/ python3. 6/site- packages/ ovsdbapp/ backend/ ovs_idl/ event.py: 44 ovn.ovsdb. ovsdb_monitor [-] Hash Ring: Node a3570719- 1079-4d61- a0c8-f3171fb07f 85 (host: controller- 2.redhat. local) handling event "update" for row 3831cbcf- fc7c-4b55- 8af4-12e3a3dc21 c2 (table: Port_Binding) notify /usr/lib/ python3. 6/site- packages/ networking_ ovn/ovsdb/ ovsdb_monitor. py:742 backend. ovs_idl. event [-] Matched UPDATE: PortBindingChas sisEvent( events= ('update' ,), table=' Port_Binding' , conditions= (('type' , '=', 'chassisredirec t'),), old_conditions= None) to row=Port_ Binding( parent_ port=[] , chassis= [<ovs.db. idl.Row object at 0x7fb4a75b2198>], mac=['fa: 16:3e:70: a1:12 10.0.0.220/24 2620:52: 0:13b8: :1000:21/ 64'], options= {'always- redirect' : 'true', 'distributed-port': 'lrp-b0858034- b5e1-475e- a59e-f19ce31911 55'}, ha_chassis_ group=[ ], type=chassisred irect, tag=[], requested_ chassis= [], tunnel_key=2, up=[True], logical_ port=cr- lrp-b0858034- b5e1-475e- a59e-f19ce31911 55, gateway_chassis=[], encap=[], external_ids={}, virtual_parent=[], nat_addresses=[], datapath= 75657e9e- 7e7d-4cb5- 95bc-97f0e3a37d 9a) old=Port_ Binding( chassis= [<ovs.db. idl.Row object at 0x7fb4a760e710>]) matches /usr/lib/ python3. 6/site- packages/ ovsdbapp/ backend/ ovs_idl/ event.py: 44 ovn.ovsdb. ovsdb_monitor [-] Hash Ring: Node a3570719- 1079-4d61- a0c8-f3171fb07f 85 (host: controller- 2.redhat. local) handling event "update" for row 3831cbcf- fc7c-4b55- 8af4-12e3a3dc21 c2 (table: Port_Binding) notify /usr/lib/ python3. 6/site- packages/ networking_ ovn/ovsdb/ ovsdb_monitor. py:742 backend. ovs_idl. event [-] Matched UPDATE: PortBindingChas sisEvent( events= ('update' ,), table=' Port_Binding' , conditions= (('type' , '=', 'chassisredirec t'),), old_conditions= None) to row=Port_ Binding( parent_ port=[] , chassis= [<ovs.db. idl.Row object at 0x7fb4a760e710>], mac=['fa: 16:3e:70: a1:12 10.0.0.220/24 2620:52: 0:13b8: :1000:21/ 64'], options= {'always- redirect' : 'true', 'distributed-port': 'lrp-b0858034- b5e1-475e- a59e-f19ce31911 55'}, ha_chassis_ group=[ ], type=chassisred irect, tag=[], requested_ chassis= [], tunnel_key=2, up=[True], logical_ port=cr- lrp-b0858034- b5e1-475e- a59e-f19ce31911 55, gateway_chassis=[], encap=[], external_ids={}, virtual_parent=[], nat_addresses=[], datapath= 75657e9e- 7e7d-4cb5- 95bc-97f0e3a37d 9a) old=Port_ Binding( chassis= [<ovs.db. idl.Row object at 0x7fb4a75b2198>]) matches /usr/lib/ python3. 6/site- packages/ ovsdbapp/ backend/ ovs_idl/ event.py: 44 ovn.ovsdb. ovsdb_monitor [-] Hash Ring: Node a3570719- 1079-4d61- a0c8-f3171fb07f 85 (host: controller- 2.redhat. local) handling event "update" for row 3831cbcf- fc7c-4b55- 8af4-12e3a3dc21 c2 (table: Port_Binding) notify /usr/lib/ python3. 6/site- packages/ networking_ ovn/ovsdb/ ovsdb_monitor. py:742 backend. ovs_idl. event [-] Matched UPDATE: PortBindingChas sisEvent( events= ('update' ,), table=' Port_Binding' , conditions= (('type' , '=', 'chassisredirec t'),), old_conditions= None) to row=Port_ Binding( parent_ port=[] , chassis= [<ovs.db. idl.Row object at 0x7fb4a75b2198>], mac=['fa: 16:3e:70: a1:12 10.0.0.220/24 2620:52: 0:13b8: :1000:21/ 64'], options= {'always- redirect' : 'true', 'distributed-port': 'lrp-b0858034- b5e1-475e- a59e-f19ce31911 55'}, ha_chassis_ group=[ ], type=chassisred irect, tag=[], requested_ chassis= [], tunnel_key=2, up=[True], logical_ port=cr- lrp-b0858034- b5e1-475e- a59e-f19ce31911 55, gateway_chassis=[], encap=[], external_ids={}, virtual_parent=[], nat_addresses=[], datapath= 75657e9e- 7e7d-4cb5- 95bc-97f0e3a37d 9a) old=Port_ Binding( chassis= [<ovs.db. idl.Row object at 0x7fb4a760e710>]) matches /usr/lib/ python3. 6/site- packages/ ovsdbapp/ backend/ ovs_idl/ event.py: 44
2022-05-25 09:11:04.513 15 DEBUG ovsdbapp.
2022-05-25 09:11:04.554 15 DEBUG networking_
2022-05-25 09:11:04.557 15 DEBUG ovsdbapp.
2022-05-25 09:11:04.560 15 DEBUG networking_
2022-05-25 09:11:04.563 15 DEBUG ovsdbapp.
2022-05-25 09:11:04.567 15 DEBUG networking_
2022-05-25 09:11:04.569 15 DEBUG ovsdbapp.
[2] Changing chassis for lport cr-lrp- b0858034- b5e1-475e- a59e-f19ce31911 55 from 2dd2e070- e65d-47b0- a458-49fb7eb3e0 eb to 30a04401- a973-4ddd- a087-fd45b12116 b7 on one controller b0858034- b5e1-475e- a59e-f19ce31911 55 from 30a04401- a973-4ddd- a087-fd45b12116 b7 to 2dd2e070- e65d-47b0- a458-49fb7eb3e0 eb on other controller
Changing chassis for lport cr-lrp-