Convergence doubles up on DB writes for resource locks

Bug #1662585 reported by Zane Bitter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Crag Wolfe

Bug Description

Every resource operation requires two database writes - one when the Resource status is set to IN_PROGRESS, and another when it is set to COMPLETE (or FAILED). Convergence introduced the concept of a lock on a Resource, which was designed to be stored in the Resource table (rather than the StackLock table or something equivalent) explicitly so that the lock/unlock could piggy-back on these two writes and avoid increasing the volume of DB writes required. (Convergence is already pretty hard on the database due to the number of reads.)

Unfortunately, the actual implementation does not take advantage of the architecture, and instead does separate writes to lock, set IN_PROGRESS, set COMPLETE/FAILED, and unlock. (See the Resource.lock() context manager.) Thus we are doing twice as many writes to the DB as we need.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/431347

Changed in heat:
assignee: nobody → Crag Wolfe (cwolfe)
status: New → In Progress
Revision history for this message
Zane Bitter (zaneb) wrote :

Just to make it trickier: according to bug 1568685 we shouldn't release the lock until after the SyncPoint is written. I'm not sure that that really matters though.

tags: added: convergence-bugs
Rico Lin (rico-lin)
Changed in heat:
milestone: pike-1 → pike-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/431347
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=a7376f7494b310e9367ebe5dcb43b432a4053023
Submitter: Jenkins
Branch: master

commit a7376f7494b310e9367ebe5dcb43b432a4053023
Author: Crag Wolfe <email address hidden>
Date: Thu Feb 9 06:21:37 2017 +0000

    Consolidate resource locking with state changes

    Change-Id: I261b2f0968e16d35b7d5d791a3edb4b265a4f1d1
    Closes-Bug: #1662585

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 9.0.0.0b2

This issue was fixed in the openstack/heat 9.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/486265

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/486265
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=1da0790e0de43ea58233033355936b3eba5cc114
Submitter: Jenkins
Branch: master

commit 1da0790e0de43ea58233033355936b3eba5cc114
Author: Zane Bitter <email address hidden>
Date: Fri Jul 21 21:32:45 2017 -0400

    Respect locks when changing FAILED resource to COMPLETE

    If a resource is in a FAILED state, but the plugin overrides
    _needs_update() to not raise UpdateReplace, and also no actual update is
    required, we simply change the resource's state to UPDATE_COMPLETE.
    However, since the locking/unlocking steps were merged with the resource
    state updates in a7376f7494b310e9367ebe5dcb43b432a4053023, in doing so we
    were ignoring the lock (since at this stage the resource is still unlocked,
    and we don't need to take a lock). If another update traversal had acquired
    the lock in the meantime, we would still write to the locked resource.

    This change adds a new LOCK_RESPECT lock mode to Resource.store(). In this
    mode the lock is not acquired or released, but an error will be raised if
    the resource is found to be locked.

    Change-Id: I8b5cd1e05b88dd13abc13899a73f23810f7e6135
    Closes-Bug: #1705794
    Related-Bug: #1662585

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/514476

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/514476
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=1d24d47e7e4b6295fa12b6ec59efa0db5b2d79c7
Submitter: Zuul
Branch: master

commit 1d24d47e7e4b6295fa12b6ec59efa0db5b2d79c7
Author: Zane Bitter <email address hidden>
Date: Tue Oct 24 13:34:36 2017 -0400

    Don't try resource-level locks when stack locked

    The actions SUSPEND, RESUME, SNAPSHOT, CHECK, and also DELETE when
    abandoning rather than deleting a stack, have not (yet) been converted
    to the convergence workflow and still do a legacy-style in-memory
    traversal with the Stack lock held, even on stacks with convergence
    enabled. Since a7376f7494b310e9367ebe5dcb43b432a4053023 we have been
    attempting to get a lock on the Resource to do these operations, even
    though there is no engine ID set (which caused a WARNING-level log). For
    these operations, respect existing Resource locks when convergence is
    enabled, but don't try to acquire the lock.

    Change-Id: I6f232380398a7caf9664717debfe39e3422e70d8
    Related-Bug: #1727142
    Related-Bug: #1662585

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.