When multiple neutron servers are restarted, the update from neutron MariaDB to ovn North database is triggered, and the neutron server generates contention lock, loses lock and reports an error

Bug #1935888 reported by zhangtongjian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
In Progress
Medium
ZhouHeng

Bug Description

Multiple neutrron server restart, triggering the update of neutron MariaDB to ovn North database. Because acceleration is asynchronous, when a neutron server obtains OVS IDL lock (ovn_ db_ inconsistencies_ After Periodics), but other neutron server can not get the lock, and update it directly without judging whether to acquire the lock, it will cause an error:
  "the transaction failed because the IDL has" will be reported
 "been configured to require a database lock "

 "but didn't get it yet or has already lost it"

neutron ml2 config:
   netron_sync_mode = repair

python ovs version:
   2.13.0
   leader_only' is set to True (default value) the IDL will only
        monitor and transact with the leader of the cluster

Revision history for this message
zhangtongjian (zhangtongjian) wrote :
Revision history for this message
Terry Wilson (otherwiseguy) wrote :

Usually this error is just showing that only the single server we want to be taking an action is doing that, and the others fail like they are supposed to.

Are you seeing something break, or is this just a logging issue?

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi zhangtongjian:

I had the same question as Terry. Reviewing the links you provided, I think what you are saying is that in the maintenance worker, spawned once by all servers, creates one single OVN client, with one unique OVN NB IDL instance.

In "DBInconsistenciesPeriodics", we set a lock named "ovn_db_inconsistencies_periodics". This class methods check the lock status but not the OvnNbSynchronizer" class, sharing the same NB_IDL. Because of that, you see the txn.NOT_LOCKED errors in some servers.

Is that what you are reporting?

Regards.

Revision history for this message
zhangtongjian (zhangtongjian) wrote :

Hi Rodolfo:
  yes,as you described,
  when other neutron server restart but has not get lock, but do OvnNbSynchronizer when there is no logical judgment on whether to acquire the lock, see error logs and keep printing,no break.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/800751

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium
assignee: nobody → zhangtongjian (zhangtongjian)
Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: zhangtongjian (zhangtongjian) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/800751
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
status: New → In Progress
ZhouHeng (zhouhenglc)
Changed in neutron:
assignee: nobody → ZhouHeng (zhouhenglc)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/834429
Reason: Please, feel free to restore the patch, address the comments and rebase the patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.