RuntimeError: OVSDB Error: is seen on an ovn worker which doesn't have the 'neutron_ovn_event_lock'

Bug #1762933 reported by Numan Siddique
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
High
Lucas Alvares Gomes

Bug Description

The below exceptions is seen on a neutron-server during start up if the neutron-server's ovn-worker hasn't acquired the lock - 'neutron_ovn_event_lock'. This is seen on a tripleo containerized setup with 3 controllers and if i restart the neutron-server docker container on any node.

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/eventlet/queue.py", line 118, in switch
    self.greenlet.switch(value)
  File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/networking_ovn/ovn_db_sync.py", line 956, in do_sync
    self.sync_hostname_and_physical_networks(ctx)
  File "/usr/lib/python2.7/site-packages/networking_ovn/ovn_db_sync.py", line 962, in sync_hostname_and_physical_networks
    host_phynets_map = self.ovn_api.get_chassis_hostname_and_physnets()
  File "/usr/lib/python2.7/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 673, in get_chassis_hostname_and_physnets
    for ch in self.chassis_list().execute(check_error=True):
  File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 35, in execute
    txn.add(self)
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 94, in transaction
    self._nested_txn = None
  File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 54, in __exit__
    self.result = self.commit()
  File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
    raise result.ex
RuntimeError: OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/eventlet/queue.py", line 118, in switch
    self.greenlet.switch(value)
  File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/networking_ovn/ovn_db_sync.py", line 90, in do_sync
    self.sync_networks_ports_and_dhcp_opts(ctx)
  File "/usr/lib/python2.7/site-packages/networking_ovn/ovn_db_sync.py", line 896, in sync_networks_ports_and_dhcp_opts
    lport_info['port'])['uuid']))
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/lib/python2.7/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 158, in transaction
    yield t
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 94, in transaction
    self._nested_txn = None
  File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 54, in __exit__
    self.result = self.commit()
  File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
    raise result.ex
RuntimeError: OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it

Changed in networking-ovn:
assignee: nobody → Numan Siddique (numansiddique)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/560295

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (master)

Change abandoned by Numan Siddique (<email address hidden>) on branch: master
Review: https://review.openstack.org/560295
Reason: My analysis for the bug in this patch is wrong. The patch https://review.openstack.org/#/c/561995/ fixes the bug.

Revision history for this message
Numan Siddique (numansiddique) wrote :

This issue is also seen not just during startup, but anytime ovn worker when handling update events from IDL has a lock on north db but not on south db. The patch https://review.openstack.org/#/c/561995/ fixes this issue.

Changed in networking-ovn:
importance: Medium → High
assignee: Numan Siddique (numansiddique) → nobody
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/562141

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/561995
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=974d0176dac4ac68edf9af848aadf04d86b2da65
Submitter: Zuul
Branch: master

commit 974d0176dac4ac68edf9af848aadf04d86b2da65
Author: Lucas Alvares Gomes <email address hidden>
Date: Tue Apr 17 16:50:57 2018 +0100

    Do not use a transaction for get_logical_port_chassis_and_datapath()

    The OVSDB monitor for the Northbound database is invoking
    set_port_status_up(), this method will invoke another function
    called _wait_for_metadata_provisioned_if_needed() which
    will then invoke get_logical_port_chassis_and_datapath(). The
    get_logical_port_chassis_and_datapath() starts a transaction in the OVSDB
    Southbound database but, if the IDL for the Southbound database doesn't
    have a valid lock [0] it will raise an exception which will prevent the
    port status to be marked as UP.

    Since the get_logical_port_chassis_and_datapath() is just reading from
    the Southbound database we do not need to use a transaction for it and
    just look for the row we want directly from the in-memory replica of the
    Southbound database.

    [0] https://github.com/openstack/networking-ovn/blob/df880435d3a7e98aaa8a0ba3a0a9b81286129586/networking_ovn/ovsdb/ovsdb_monitor.py#L285

    Closes-bug: #1762933
    Change-Id: I650c19164d619a9318b25f48dc113774f8ba67ed

Changed in networking-ovn:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 5.0.0.0b1

This issue was fixed in the openstack/networking-ovn 5.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/queens)

Reviewed: https://review.openstack.org/562141
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=6c939e6b544cbb9f28317ebe2229d849443f3cb9
Submitter: Zuul
Branch: stable/queens

commit 6c939e6b544cbb9f28317ebe2229d849443f3cb9
Author: Lucas Alvares Gomes <email address hidden>
Date: Tue Apr 17 16:50:57 2018 +0100

    Do not use a transaction for get_logical_port_chassis_and_datapath()

    The OVSDB monitor for the Northbound database is invoking
    set_port_status_up(), this method will invoke another function
    called _wait_for_metadata_provisioned_if_needed() which
    will then invoke get_logical_port_chassis_and_datapath(). The
    get_logical_port_chassis_and_datapath() starts a transaction in the OVSDB
    Southbound database but, if the IDL for the Southbound database doesn't
    have a valid lock [0] it will raise an exception which will prevent the
    port status to be marked as UP.

    Since the get_logical_port_chassis_and_datapath() is just reading from
    the Southbound database we do not need to use a transaction for it and
    just look for the row we want directly from the in-memory replica of the
    Southbound database.

    [0] https://github.com/openstack/networking-ovn/blob/df880435d3a7e98aaa8a0ba3a0a9b81286129586/networking_ovn/ovsdb/ovsdb_monitor.py#L285

    Closes-bug: #1762933
    Change-Id: I650c19164d619a9318b25f48dc113774f8ba67ed
    (cherry picked from commit 974d0176dac4ac68edf9af848aadf04d86b2da65)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 4.0.2

This issue was fixed in the openstack/networking-ovn 4.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 4.0.3

This issue was fixed in the openstack/networking-ovn 4.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.