Comment 1 for bug 1918266

Revision history for this message
Jakub Libosvar (libosvar) wrote :

I inspected the logs and it seems there are problems with the transaction order:

What happens is that during router creation, we create one transaction and add commands to create logical router and then to create logical router port for the gateway, in this order. However, in NB DB, the logical router port is created before:

2021-03-08T22:51:15.603Z|01041|jsonrpc|DBG|unix#5: received request, method="transact", params=["OVN_Northbound",{"uuid-name":"rowc664e9f4_477a_42b9_94e1_1cfd75e37232","table":"Logical_Switch_Port","op":"insert","row":{"name":"488b3887-1770-4a72-86cb-7306e78c954a","addresses":["set",["12:34:56:78:84:c3 30.0.0.35","unknown"]],"options":["map",[["requested-chassis",""]]],"external_ids":["map",[["neutron:cidrs","30.0.0.35/24"],["neutron:device_id","0eff9eae-4642-4abe-9c06-67df13921296"],["neutron:device_owner","network:router_gateway"],["neutron:network_name","neutron-a0ce1add-b6ec-41df-bf85-44bac81781ed"],["neutron:port_name",""],["neutron:project_id",""],["neutron:revision_number","1"],["neutron:security_group_ids",""]]],"type":"","enabled":true}},{"where":[["_uuid","==",["uuid","6317396c-75ed-457b-a75b-09812e134099"]]],"mutations":[["ports","insert",["set",[["named-uuid","rowc664e9f4_477a_42b9_94e1_1cfd75e37232"]]]]],"table":"Logical_Switch","op":"mutate"}], id=1958

2021-03-08T22:51:17.278Z|01069|jsonrpc|DBG|unix#5: received request, method="transact", params=["OVN_Northbound",{"uuid-name":"rowd215a94b_9275_4fd0_b73c_c2c7e864379a","table":"Logical_Router","op":"insert","row":{"name":"neutron-0eff9eae-4642-4abe-9c06-67df13921296","external_ids":["map",[["neutron:availability_zone_hints",""],["neutron:gw_port_id","488b3887-1770-4a72-86cb-7306e78c954a"],["neutron:revision_number","2"],["neutron:router_name","router14"]]],"enabled":true}},{"where":[["_uuid","==",["named-uuid","rowd215a94b_9275_4fd0_b73c_c2c7e864379a"]]],"mutations":[["ports","insert",["set",[["named-uuid","row34af8860_ab43_4091_ac45_d832a818fa95"]]]],["static_routes","insert",["set",[["named-uuid","row9d1df734_4604_4cd5_a431_855104e1c2b7"]]]]],"table":"Logical_Router","op":"mutate"},{"uuid-name":"row34af8860_ab43_4091_ac45_d832a818fa95","table":"Logical_Router_Port","op":"insert","row":{"name":"lrp-488b3887-1770-4a72-86cb-7306e78c954a","mac":"12:34:56:78:84:c3","external_ids":["map",[["neutron:network_name","neutron-a0ce1add-b6ec-41df-bf85-44bac81781ed"],["neutron:revision_number","1"],["neutron:router_name","0eff9eae-4642-4abe-9c06-67df13921296"],["neutron:subnet_ids","4f8bbafc-ef7c-41bf-917d-cfd52b105586"]]],"gateway_chassis":["named-uuid","row759e3421_f7a6_4d0b_9d43_ffd4ce28158d"],"networks":"30.0.0.35/24"}},{"uuid-name":"row759e3421_f7a6_4d0b_9d43_ffd4ce28158d","table":"Gateway_Chassis","op":"insert","row":{"name":"lrp-488b3887-1770-4a72-86cb-7306e78c954a_92ca516b-a624-42e1-8460-5dcf6b57b98c","priority":1,"chassis_name":"92ca516b-a624-42e1-8460-5dcf6b57b98c"}},{"where":[["_uuid","==",["uuid","7145cd6b-c2a5-4cb7-93fa-af8e1e3e009e"]]],"table":"Logical_Switch_Port","op":"update","row":{"addresses":"router","options":["map",[["nat-addresses","router"],["router-port","lrp-488b3887-1770-4a72-86cb-7306e78c954a"]]],"type":"router"}},{"uuid-name":"row9d1df734_4604_4cd5_a431_855104e1c2b7","table":"Logical_Router_Static_Route","op":"insert","row":{"ip_prefix":"0.0.0.0/0","nexthop":"30.0.0.1","external_ids":["map",[["neutron:is_ext_gw","true"],["neutron:subnet_id","4f8bbafc-ef7c-41bf-917d-cfd52b105586"]]]}}], id=1961

The insertion of logical router port emits and OVSDB event that looks for the logical router, which is not there and is created 2 seconds later, causing failures in Neutron:
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command [req-7659b27f-4a72-42fe-8cbb-11312eda462c - tenant1 - - -] Error executing command (LrGetCommand): ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Router with name=neutron-0eff9eae-4642-4abe-9c06-67df13921296
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 39, in execute
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command self.run_idl(None)
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 329, in run_idl
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command self.result = self.api.lookup(self.table, self.record)
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 177, in lookup
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command return self._lookup(table, record)
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 224, in _lookup
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command row = idlutils.row_by_value(self, rl.table, rl.column, record)
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 114, in row_by_value
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command raise RowNotFound(table=table, col=column, match=match)
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Router with name=neutron-0eff9eae-4642-4abe-9c06-67df13921296
2021-03-08 22:51:16.216 60832 ERROR ovsdbapp.backend.ovs_idl.command

We don't have northd logs (patch sent: https://review.opendev.org/c/openstack/neutron/+/782295), northd is I think responsible for creating the port binding that we failed at waiting for (the reason for the test failure).

However, in SB DB logs we can see the port binding being actually created:
2021-03-08T22:51:15.611Z|00540|jsonrpc|DBG|unix#1: send notification, method="update2", params=["a8b17ec0-8060-11eb-8da7-e30ae6074bb5",{"Port_Binding":{"558dbef9-3ea8-4e78-990b-bb5b4c92dd82":{"insert":{"tunnel_key":16,"datapath":["uuid","57484c3a-1d46-4bcc-9a5b-40678ea4bde3"],"mac":["set",["12:34:56:78:84:c3 30.0.0.35","unknown"]],"logical_port":"488b3887-1770-4a72-86cb-7306e78c954a","external_ids":["map",[["neutron:cidrs","30.0.0.35/24"],["neutron:device_id","0eff9eae-4642-4abe-9c06-67df13921296"],["neutron:device_owner","network:router_gateway"],["neutron:network_name","neutron-a0ce1add-b6ec-41df-bf85-44bac81781ed"],["neutron:port_name",""],["neutron:project_id",""],["neutron:revision_number","1"],["neutron:security_group_ids",""]]],"options":["map",[["requested-chassis",""]]]}}}}]

And Neutron reacting on the port binding:
2021-03-08 22:51:15.645 60832 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [req-10b17c5e-f649-4ab4-846f-4b93733eec79 - - - - -] Hash Ring: Node 9def8e11-aa3f-4e51-95e8-d6a8d665a7a9 (host: ubuntu-focal-ovh-bhs1-0023379399) handling event "create" for row 558dbef9-3ea8-4e78-990b-bb5b4c92dd82 (table: Port_Binding) notify /home/zuul/src/opendev.org/openstack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:609

We lack some debug logs there to see what happened. I didn't manage to reproduce neither on Centos nor Ubuntu machines and I saw in the last week, this test failed only once. I may need to send some patch with debug messages and wait for the reproducer in the gate.