Race condition in ZK node locking

Bug #1556063 reported by Proskurin Kirill
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-mesos
Fix Released
High
Proskurin Kirill

Bug Description

Right now we check if ZK node exist first and if it's not - we lock node and do stuff.
So, in multinode deployment, several nodes do this and all of them try to get lock. So we end with situation, then 1 node got lock and run bootstrap for example and other waits for lock, after this first node done with bootstrap, it releases the lock and second node get it and start to do the same thing, since it didnt re-check if node flag is already ".done". Which lead to multiple runs of "run_once" commands.

Changed in kolla-mesos:
assignee: nobody → Proskurin Kirill (kproskurin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-mesos (master)

Fix proposed to branch: master
Review: https://review.openstack.org/291693

Changed in kolla-mesos:
status: New → In Progress
Angus Salkeld (asalkeld)
Changed in kolla-mesos:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-mesos (master)

Reviewed: https://review.openstack.org/291693
Committed: https://git.openstack.org/cgit/openstack/kolla-mesos/commit/?id=d4e186cb66e7a867e40ac1441d8bcd7b7768738d
Submitter: Jenkins
Branch: master

commit d4e186cb66e7a867e40ac1441d8bcd7b7768738d
Author: Proskurin Kirill <email address hidden>
Date: Tue Mar 15 13:26:09 2016 +0000

    run_once commands locking rework

    - lock first, then check the state
    - only check the state of the global node
    - Make sure global and local node states are kept in sync

    Change-Id: Icc3a0a8902f811482d8f9ff1b10663dd09b66696
    Closes-Bug: 1556063

Changed in kolla-mesos:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.