Activity log for bug #1605089

Date Who What changed Old value New value Message
2016-07-21 07:02:15 Han Zhou bug added bug
2016-07-21 07:03:20 Han Zhou description In ML2 mechanism driver, OVN updates are performed in post-commit, which is after neutron DB transaction. With multi-workers/nodes, the order of OVN updates are not guaranteed, which means the order of OVN updates can be different from that of DB transaction commit when there are racings between workers/nodes. This will result in inconsistent data between OVN and Neutron DB. I had a simple test with below patch to make the problem easy to be reproduced (test port_update here). diff --git a/networking_ovn/ml2/mech_driver.py b/networking_ovn/ml2/mech_driver.py index e3e5050..277f487 100644 --- a/networking_ovn/ml2/mech_driver.py +++ b/networking_ovn/ml2/mech_driver.py @@ -11,7 +11,8 @@ # License for the specific language governing permissions and limitations # under the License. # - +from time import sleep +from random import randint import collections from neutron_lib.api import validators @@ -568,6 +569,7 @@ class OVNMechanismDriver(driver_api.MechanismDriver): self._update_port_in_ovn(original_port, port, ovn_port_info) def _update_port_in_ovn(self, original_port, port, ovn_port_info): + time.sleep(random.randint(1, 3)) external_ids = { ovn_const.OVN_PORT_NAME_EXT_ID_KEY: port['name']} admin_context = n_context.get_admin_context() With this change I can easily get inconsistency between Neutron and OVN. Below is the example test for updating IPv4 address for same port at same time by 2 clients. $ neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.8 & neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.10 & [1] 29415 [2] 29416 $ Updated port: 182165b7-a634-430a-bbf0-f96783bca112 [2]+ Done neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.10 $ Updated port: 182165b7-a634-430a-bbf0-f96783bca112 [1]+ Done neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.8 $ neutron port-show 182165b7-a634-430a-bbf0-f96783bca112 +-----------------------+-------------------------------------------+ | Field | Value | +-----------------------+-------------------------------------------+ | admin_state_up | True | | allowed_address_pairs | | | binding:vnic_type | normal | | created_at | 2016-07-08T22:04:24 | | description | | | device_id | | | device_owner | | | extra_dhcp_opts | | | fixed_ips | {"subnet_id": "d34a3b2f-cb0e- | | | 417c-8125-5dfeb2d191a9", "ip_address": | | | "10.0.0.8"} | | | {"subnet_id": "1be4ecac- | | | 2df5-4b18-a9b0-9ae1e191e721", | | | "ip_address": | | | "fd2a:66ba:3d0d:0:f816:3eff:fe8d:65"} | | id | 182165b7-a634-430a-bbf0-f96783bca112 | | mac_address | fa:16:3e:8d:00:65 | | name | | | network_id | 8367ac53-8681-41df-bdcb-199b7438aff4 | | port_security_enabled | True | | security_groups | 36263eeb-603b-4fe7-b931-8dcbb9bb332b | | status | DOWN | | tenant_id | 05179830bd974e468120406696324866 | | updated_at | 2016-07-08T22:04:24 | +-----------------------+-------------------------------------------+ $ ovn-nbctl lsp-get-addresses 182165b7-a634-430a-bbf0-f96783bca112 fa:16:3e:8d:00:65 10.0.0.10 fd2a:66ba:3d0d:0:f816:3eff:fe8d:65 We can see that in Neutron side the final result is IP "10.0.0.8", but in OVN north DB the final result is "10.0.0.10" for this port. This race condition exists for most update operations that requires change to OVN side. It seems to be a general problem of neutron ML2 plugin (although the same problem existed on the old networking-ovn monolithic plugin). In ML2 mechanism driver, OVN updates are performed in post-commit, which is after neutron DB transaction. With multi-workers/nodes, the order of OVN updates are not guaranteed, which means the order of OVN updates can be different from that of DB transaction commit when there are racings between workers/nodes. This will result in inconsistent data between OVN and Neutron DB. I had a simple test with below patch to make the problem easy to be reproduced (test port_update here). diff --git a/networking_ovn/ml2/mech_driver.py b/networking_ovn/ml2/mech_driver.py index e3e5050..277f487 100644 --- a/networking_ovn/ml2/mech_driver.py +++ b/networking_ovn/ml2/mech_driver.py @@ -11,7 +11,8 @@  # License for the specific language governing permissions and limitations  # under the License.  # - +from time import sleep +from random import randint  import collections  from neutron_lib.api import validators @@ -568,6 +569,7 @@ class OVNMechanismDriver(driver_api.MechanismDriver):          self._update_port_in_ovn(original_port, port, ovn_port_info)      def _update_port_in_ovn(self, original_port, port, ovn_port_info): + time.sleep(random.randint(1, 3))          external_ids = {              ovn_const.OVN_PORT_NAME_EXT_ID_KEY: port['name']}          admin_context = n_context.get_admin_context() With this change I can easily get inconsistency between Neutron and OVN. Below is the example test for updating IPv4 address for same port at same time by 2 clients. $ neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.8 & neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.10 & [1] 29415 [2] 29416 $ Updated port: 182165b7-a634-430a-bbf0-f96783bca112 [2]+ Done neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.10 $ Updated port: 182165b7-a634-430a-bbf0-f96783bca112 [1]+ Done neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.8 $ neutron port-show 182165b7-a634-430a-bbf0-f96783bca112 +-----------------------+-------------------------------------------+ | Field | Value | +-----------------------+-------------------------------------------+ ... | fixed_ips | {"subnet_id": "d34a3b2f-cb0e- | | | 417c-8125-5dfeb2d191a9", "ip_address": | | | "10.0.0.8"} | | | {"subnet_id": "1be4ecac- | | | 2df5-4b18-a9b0-9ae1e191e721", | | | "ip_address": | | | "fd2a:66ba:3d0d:0:f816:3eff:fe8d:65"} | | id | 182165b7-a634-430a-bbf0-f96783bca112 | ... +-----------------------+-------------------------------------------+ $ ovn-nbctl lsp-get-addresses 182165b7-a634-430a-bbf0-f96783bca112 fa:16:3e:8d:00:65 10.0.0.10 fd2a:66ba:3d0d:0:f816:3eff:fe8d:65 We can see that in Neutron side the final result is IP "10.0.0.8", but in OVN north DB the final result is "10.0.0.10" for this port. This race condition exists for most update operations that requires change to OVN side. It seems to be a general problem of neutron ML2 plugin (although the same problem existed on the old networking-ovn monolithic plugin).
2016-07-21 07:04:52 Han Zhou networking-ovn: status New Confirmed
2016-07-21 07:05:06 Han Zhou networking-ovn: importance Undecided Medium
2016-07-21 11:15:00 Richard Theis bug added subscriber Richard Theis
2016-07-29 06:35:35 Han Zhou networking-ovn: assignee Han Zhou (zhouhan)
2016-08-22 07:28:05 OpenStack Infra networking-ovn: status Confirmed In Progress
2017-04-28 16:52:59 Lucas Alvares Gomes networking-ovn: assignee Han Zhou (zhouhan) Lucas Alvares Gomes (lucasagomes)
2018-01-29 10:32:40 Daniel Alvarez networking-ovn: milestone 2015.1.1
2018-01-29 10:33:16 Daniel Alvarez networking-ovn: milestone 2015.1.1
2019-04-29 15:09:53 Lucas Alvares Gomes networking-ovn: status In Progress Fix Released