I inspected the logs and it seems there are problems with the transaction order:
What happens is that during router creation, we create one transaction and add commands to create logical router and then to create logical router port for the gateway, in this order. However, in NB DB, the logical router port is created before:
2021-03-08T22:51:15.603Z|01041|jsonrpc|DBG|unix#5: received request, method="transact", params=["OVN_Northbound",{"uuid-name":"rowc664e9f4_477a_42b9_94e1_1cfd75e37232","table":"Logical_Switch_Port","op":"insert","row":{"name":"488b3887-1770-4a72-86cb-7306e78c954a","addresses":["set",["12:34:56:78:84:c3 30.0.0.35","unknown"]],"options":["map",[["requested-chassis",""]]],"external_ids":["map",[["neutron:cidrs","30.0.0.35/24"],["neutron:device_id","0eff9eae-4642-4abe-9c06-67df13921296"],["neutron:device_owner","network:router_gateway"],["neutron:network_name","neutron-a0ce1add-b6ec-41df-bf85-44bac81781ed"],["neutron:port_name",""],["neutron:project_id",""],["neutron:revision_number","1"],["neutron:security_group_ids",""]]],"type":"","enabled":true}},{"where":[["_uuid","==",["uuid","6317396c-75ed-457b-a75b-09812e134099"]]],"mutations":[["ports","insert",["set",[["named-uuid","rowc664e9f4_477a_42b9_94e1_1cfd75e37232"]]]]],"table":"Logical_Switch","op":"mutate"}], id=1958
2021-03-08T22:51:17.278Z|01069|jsonrpc|DBG|unix#5: received request, method="transact", params=["OVN_Northbound",{"uuid-name":"rowd215a94b_9275_4fd0_b73c_c2c7e864379a","table":"Logical_Router","op":"insert","row":{"name":"neutron-0eff9eae-4642-4abe-9c06-67df13921296","external_ids":["map",[["neutron:availability_zone_hints",""],["neutron:gw_port_id","488b3887-1770-4a72-86cb-7306e78c954a"],["neutron:revision_number","2"],["neutron:router_name","router14"]]],"enabled":true}},{"where":[["_uuid","==",["named-uuid","rowd215a94b_9275_4fd0_b73c_c2c7e864379a"]]],"mutations":[["ports","insert",["set",[["named-uuid","row34af8860_ab43_4091_ac45_d832a818fa95"]]]],["static_routes","insert",["set",[["named-uuid","row9d1df734_4604_4cd5_a431_855104e1c2b7"]]]]],"table":"Logical_Router","op":"mutate"},{"uuid-name":"row34af8860_ab43_4091_ac45_d832a818fa95","table":"Logical_Router_Port","op":"insert","row":{"name":"lrp-488b3887-1770-4a72-86cb-7306e78c954a","mac":"12:34:56:78:84:c3","external_ids":["map",[["neutron:network_name","neutron-a0ce1add-b6ec-41df-bf85-44bac81781ed"],["neutron:revision_number","1"],["neutron:router_name","0eff9eae-4642-4abe-9c06-67df13921296"],["neutron:subnet_ids","4f8bbafc-ef7c-41bf-917d-cfd52b105586"]]],"gateway_chassis":["named-uuid","row759e3421_f7a6_4d0b_9d43_ffd4ce28158d"],"networks":"30.0.0.35/24"}},{"uuid-name":"row759e3421_f7a6_4d0b_9d43_ffd4ce28158d","table":"Gateway_Chassis","op":"insert","row":{"name":"lrp-488b3887-1770-4a72-86cb-7306e78c954a_92ca516b-a624-42e1-8460-5dcf6b57b98c","priority":1,"chassis_name":"92ca516b-a624-42e1-8460-5dcf6b57b98c"}},{"where":[["_uuid","==",["uuid","7145cd6b-c2a5-4cb7-93fa-af8e1e3e009e"]]],"table":"Logical_Switch_Port","op":"update","row":{"addresses":"router","options":["map",[["nat-addresses","router"],["router-port","lrp-488b3887-1770-4a72-86cb-7306e78c954a"]]],"type":"router"}},{"uuid-name":"row9d1df734_4604_4cd5_a431_855104e1c2b7","table":"Logical_Router_Static_Route","op":"insert","row":{"ip_prefix":"0.0.0.0/0","nexthop":"30.0.0.1","external_ids":["map",[["neutron:is_ext_gw","true"],["neutron:subnet_id","4f8bbafc-ef7c-41bf-917d-cfd52b105586"]]]}}], id=1961
The insertion of logical router port emits and OVSDB event that looks for the logical router, which is not there and is created 2 seconds later, causing failures in Neutron:
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command [req-7659b27f-4a72-42fe-8cbb-11312eda462c - tenant1 - - -] Error executing command (LrGetCommand): ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Router with name=neutron-0eff9eae-4642-4abe-9c06-67df13921296
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 39, in execute
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command self.run_idl(None)
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 329, in run_idl
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command self.result = self.api.lookup(self.table, self.record)
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 177, in lookup
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command return self._lookup(table, record)
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 224, in _lookup
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command row = idlutils.row_by_value(self, rl.table, rl.column, record)
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 114, in row_by_value
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command raise RowNotFound(table=table, col=column, match=match)
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Router with name=neutron-0eff9eae-4642-4abe-9c06-67df13921296
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command
We don't have northd logs (patch sent: https://review.opendev.org/c/openstack/neutron/+/782295), northd is I think responsible for creating the port binding that we failed at waiting for (the reason for the test failure).
However, in SB DB logs we can see the port binding being actually created:
2021-03-08T22:51:15.611Z|00540|jsonrpc|DBG|unix#1: send notification, method="update2", params=["a8b17ec0-8060-11eb-8da7-e30ae6074bb5",{"Port_Binding":{"558dbef9-3ea8-4e78-990b-bb5b4c92dd82":{"insert":{"tunnel_key":16,"datapath":["uuid","57484c3a-1d46-4bcc-9a5b-40678ea4bde3"],"mac":["set",["12:34:56:78:84:c3 30.0.0.35","unknown"]],"logical_port":"488b3887-1770-4a72-86cb-7306e78c954a","external_ids":["map",[["neutron:cidrs","30.0.0.35/24"],["neutron:device_id","0eff9eae-4642-4abe-9c06-67df13921296"],["neutron:device_owner","network:router_gateway"],["neutron:network_name","neutron-a0ce1add-b6ec-41df-bf85-44bac81781ed"],["neutron:port_name",""],["neutron:project_id",""],["neutron:revision_number","1"],["neutron:security_group_ids",""]]],"options":["map",[["requested-chassis",""]]]}}}}]
And Neutron reacting on the port binding:
2021-03-08 22:51:15.645 60832 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [req-10b17c5e-f649-4ab4-846f-4b93733eec79 - - - - -] Hash Ring: Node 9def8e11-aa3f-4e51-95e8-d6a8d665a7a9 (host: ubuntu-focal-ovh-bhs1-0023379399) handling event "create" for row 558dbef9-3ea8-4e78-990b-bb5b4c92dd82 (table: Port_Binding) notify /home/zuul/src/opendev.org/openstack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:609
We lack some debug logs there to see what happened. I didn't manage to reproduce neither on Centos nor Ubuntu machines and I saw in the last week, this test failed only once. I may need to send some patch with debug messages and wait for the reproducer in the gate.
I inspected the logs and it seems there are problems with the transaction order:
What happens is that during router creation, we create one transaction and add commands to create logical router and then to create logical router port for the gateway, in this order. However, in NB DB, the logical router port is created before:
2021-03- 08T22:51: 15.603Z| 01041|jsonrpc| DBG|unix# 5: received request, method="transact", params= ["OVN_Northboun d",{"uuid- name":" rowc664e9f4_ 477a_42b9_ 94e1_1cfd75e372 32","table" :"Logical_ Switch_ Port"," op":"insert" ,"row": {"name" :"488b3887- 1770-4a72- 86cb-7306e78c95 4a","addresses" :["set" ,["12:34: 56:78:84: c3 30.0.0. 35","unknown" ]],"options" :["map" ,[["requested- chassis" ,""]]], "external_ ids":[" map",[[ "neutron: cidrs", "30.0.0. 35/24"] ,["neutron: device_ id","0eff9eae- 4642-4abe- 9c06-67df139212 96"],[" neutron: device_ owner", "network: router_ gateway" ],["neutron: network_ name"," neutron- a0ce1add- b6ec-41df- bf85-44bac81781 ed"],[" neutron: port_name" ,""],[" neutron: project_ id",""] ,["neutron: revision_ number" ,"1"],[ "neutron: security_ group_ids" ,""]]], "type": "","enabled" :true}} ,{"where" :[["_uuid" ,"==",[ "uuid", "6317396c- 75ed-457b- a75b-09812e1340 99"]]], "mutations" :[["ports" ,"insert" ,["set" ,[["named- uuid"," rowc664e9f4_ 477a_42b9_ 94e1_1cfd75e372 32"]]]] ],"table" :"Logical_ Switch" ,"op":" mutate" }], id=1958
2021-03- 08T22:51: 17.278Z| 01069|jsonrpc| DBG|unix# 5: received request, method="transact", params= ["OVN_Northboun d",{"uuid- name":" rowd215a94b_ 9275_4fd0_ b73c_c2c7e86437 9a","table" :"Logical_ Router" ,"op":" insert" ,"row": {"name" :"neutron- 0eff9eae- 4642-4abe- 9c06-67df139212 96","external_ ids":[" map",[[ "neutron: availability_ zone_hints" ,""],[" neutron: gw_port_ id","488b3887- 1770-4a72- 86cb-7306e78c95 4a"],[" neutron: revision_ number" ,"2"],[ "neutron: router_ name"," router14" ]]],"enabled" :true}} ,{"where" :[["_uuid" ,"==",[ "named- uuid"," rowd215a94b_ 9275_4fd0_ b73c_c2c7e86437 9a"]]], "mutations" :[["ports" ,"insert" ,["set" ,[["named- uuid"," row34af8860_ ab43_4091_ ac45_d832a818fa 95"]]]] ,["static_ routes" ,"insert" ,["set" ,[["named- uuid"," row9d1df734_ 4604_4cd5_ a431_855104e1c2 b7"]]]] ],"table" :"Logical_ Router" ,"op":" mutate" },{"uuid- name":" row34af8860_ ab43_4091_ ac45_d832a818fa 95","table" :"Logical_ Router_ Port"," op":"insert" ,"row": {"name" :"lrp-488b3887- 1770-4a72- 86cb-7306e78c95 4a","mac" :"12:34: 56:78:84: c3","external_ ids":[" map",[[ "neutron: network_ name"," neutron- a0ce1add- b6ec-41df- bf85-44bac81781 ed"],[" neutron: revision_ number" ,"1"],[ "neutron: router_ name"," 0eff9eae- 4642-4abe- 9c06-67df139212 96"],[" neutron: subnet_ ids","4f8bbafc- ef7c-41bf- 917d-cfd52b1055 86"]]], "gateway_ chassis" :["named- uuid"," row759e3421_ f7a6_4d0b_ 9d43_ffd4ce2815 8d"],"networks" :"30.0. 0.35/24" }},{"uuid- name":" row759e3421_ f7a6_4d0b_ 9d43_ffd4ce2815 8d","table" :"Gateway_ Chassis" ,"op":" insert" ,"row": {"name" :"lrp-488b3887- 1770-4a72- 86cb-7306e78c95 4a_92ca516b- a624-42e1- 8460-5dcf6b57b9 8c","priority" :1,"chassis_ name":" 92ca516b- a624-42e1- 8460-5dcf6b57b9 8c"}},{ "where" :[["_uuid" ,"==",[ "uuid", "7145cd6b- c2a5-4cb7- 93fa-af8e1e3e00 9e"]]], "table" :"Logical_ Switch_ Port"," op":"update" ,"row": {"addresses" :"router" ,"options" :["map" ,[["nat- addresses" ,"router" ],["router- port"," lrp-488b3887- 1770-4a72- 86cb-7306e78c95 4a"]]], "type": "router" }},{"uuid- name":" row9d1df734_ 4604_4cd5_ a431_855104e1c2 b7","table" :"Logical_ Router_ Static_ Route", "op":"insert" ,"row": {"ip_prefix" :"0.0.0. 0/0","nexthop" :"30.0. 0.1","external_ ids":[" map",[[ "neutron: is_ext_ gw","true" ],["neutron: subnet_ id","4f8bbafc- ef7c-41bf- 917d-cfd52b1055 86"]]]} }], id=1961
The insertion of logical router port emits and OVSDB event that looks for the logical router, which is not there and is created 2 seconds later, causing failures in Neutron: backend. ovs_idl. command [req-7659b27f- 4a72-42fe- 8cbb-11312eda46 2c - tenant1 - - -] Error executing command (LrGetCommand): ovsdbapp. backend. ovs_idl. idlutils. RowNotFound: Cannot find Logical_Router with name=neutron- 0eff9eae- 4642-4abe- 9c06-67df139212 96 backend. ovs_idl. command Traceback (most recent call last): backend. ovs_idl. command File "/home/ zuul/src/ opendev. org/openstack/ neutron/ .tox/dsvm- functional/ lib/python3. 8/site- packages/ ovsdbapp/ backend/ ovs_idl/ command. py", line 39, in execute backend. ovs_idl. command self.run_idl(None) backend. ovs_idl. command File "/home/ zuul/src/ opendev. org/openstack/ neutron/ .tox/dsvm- functional/ lib/python3. 8/site- packages/ ovsdbapp/ backend/ ovs_idl/ command. py", line 329, in run_idl backend. ovs_idl. command self.result = self.api. lookup( self.table, self.record) backend. ovs_idl. command File "/home/ zuul/src/ opendev. org/openstack/ neutron/ .tox/dsvm- functional/ lib/python3. 8/site- packages/ ovsdbapp/ backend/ ovs_idl/ __init_ _.py", line 177, in lookup backend. ovs_idl. command return self._lookup(table, record) backend. ovs_idl. command File "/home/ zuul/src/ opendev. org/openstack/ neutron/ .tox/dsvm- functional/ lib/python3. 8/site- packages/ ovsdbapp/ backend/ ovs_idl/ __init_ _.py", line 224, in _lookup backend. ovs_idl. command row = idlutils. row_by_ value(self, rl.table, rl.column, record) backend. ovs_idl. command File "/home/ zuul/src/ opendev. org/openstack/ neutron/ .tox/dsvm- functional/ lib/python3. 8/site- packages/ ovsdbapp/ backend/ ovs_idl/ idlutils. py", line 114, in row_by_value backend. ovs_idl. command raise RowNotFound( table=table, col=column, match=match) backend. ovs_idl. command ovsdbapp. backend. ovs_idl. idlutils. RowNotFound: Cannot find Logical_Router with name=neutron- 0eff9eae- 4642-4abe- 9c06-67df139212 96 backend. ovs_idl. command
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.
We don't have northd logs (patch sent: https:/ /review. opendev. org/c/openstack /neutron/ +/782295), northd is I think responsible for creating the port binding that we failed at waiting for (the reason for the test failure).
However, in SB DB logs we can see the port binding being actually created: 08T22:51: 15.611Z| 00540|jsonrpc| DBG|unix# 1: send notification, method="update2", params= ["a8b17ec0- 8060-11eb- 8da7-e30ae6074b b5",{"Port_ Binding" :{"558dbef9- 3ea8-4e78- 990b-bb5b4c92dd 82":{"insert" :{"tunnel_ key":16, "datapath" :["uuid" ,"57484c3a- 1d46-4bcc- 9a5b-40678ea4bd e3"],"mac" :["set" ,["12:34: 56:78:84: c3 30.0.0. 35","unknown" ]],"logical_ port":" 488b3887- 1770-4a72- 86cb-7306e78c95 4a","external_ ids":[" map",[[ "neutron: cidrs", "30.0.0. 35/24"] ,["neutron: device_ id","0eff9eae- 4642-4abe- 9c06-67df139212 96"],[" neutron: device_ owner", "network: router_ gateway" ],["neutron: network_ name"," neutron- a0ce1add- b6ec-41df- bf85-44bac81781 ed"],[" neutron: port_name" ,""],[" neutron: project_ id",""] ,["neutron: revision_ number" ,"1"],[ "neutron: security_ group_ids" ,""]]], "options" :["map" ,[["requested- chassis" ,""]]]} }}}]
2021-03-
And Neutron reacting on the port binding: plugins. ml2.drivers. ovn.mech_ driver. ovsdb.ovsdb_ monitor [req-10b17c5e- f649-4ab4- 846f-4b93733eec 79 - - - - -] Hash Ring: Node 9def8e11- aa3f-4e51- 95e8-d6a8d665a7 a9 (host: ubuntu- focal-ovh- bhs1-0023379399 ) handling event "create" for row 558dbef9- 3ea8-4e78- 990b-bb5b4c92dd 82 (table: Port_Binding) notify /home/zuul/ src/opendev. org/openstack/ neutron/ neutron/ plugins/ ml2/drivers/ ovn/mech_ driver/ ovsdb/ovsdb_ monitor. py:609
2021-03-08 22:51:15.645 60832 DEBUG neutron.
We lack some debug logs there to see what happened. I didn't manage to reproduce neither on Centos nor Ubuntu machines and I saw in the last week, this test failed only once. I may need to send some patch with debug messages and wait for the reproducer in the gate.