OpenStack Heat

Convergence doubles up on DB writes for resource locks

Bug #1662585 reported by Zane Bitter on 2017-02-07

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Heat	Fix Released	High	Crag Wolfe	OpenStack Heat pike-2

Bug Description

Every resource operation requires two database writes - one when the Resource status is set to IN_PROGRESS, and another when it is set to COMPLETE (or FAILED). Convergence introduced the concept of a lock on a Resource, which was designed to be stored in the Resource table (rather than the StackLock table or something equivalent) explicitly so that the lock/unlock could piggy-back on these two writes and avoid increasing the volume of DB writes required. (Convergence is already pretty hard on the database due to the number of reads.)

Unfortunately, the actual implementation does not take advantage of the architecture, and instead does separate writes to lock, set IN_PROGRESS, set COMPLETE/FAILED, and unlock. (See the Resource.lock() context manager.) Thus we are doing twice as many writes to the DB as we need.

Tags:

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-09: Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/431347

Changed in heat:
assignee:	nobody → Crag Wolfe (cwolfe)
status:	New → In Progress

Revision history for this message

Zane Bitter (zaneb) wrote on 2017-02-09:

Just to make it trickier: according to bug 1568685 we shouldn't release the lock until after the SyncPoint is written. I'm not sure that that really matters though.

tags:

added: convergence-bugs

Rico Lin (rico-lin) on 2017-04-11

Changed in heat:
milestone:	pike-1 → pike-2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-04-25: Fix merged to heat (master)

Reviewed: https://review.openstack.org/431347
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=a7376f7494b310e9367ebe5dcb43b432a4053023
Submitter: Jenkins
Branch: master

commit a7376f7494b310e9367ebe5dcb43b432a4053023
Author: Crag Wolfe <email address hidden>
Date: Thu Feb 9 06:21:37 2017 +0000

Consolidate resource locking with state changes

Change-Id: I261b2f0968e16d35b7d5d791a3edb4b265a4f1d1
Closes-Bug: #1662585

Changed in heat:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-06-07: Fix included in openstack/heat 9.0.0.0b2

This issue was fixed in the openstack/heat 9.0.0.0b2 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-07-22: Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/486265

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-08-12: Related fix merged to heat (master)

Reviewed: https://review.openstack.org/486265
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=1da0790e0de43ea58233033355936b3eba5cc114
Submitter: Jenkins
Branch: master

commit 1da0790e0de43ea58233033355936b3eba5cc114
Author: Zane Bitter <email address hidden>
Date: Fri Jul 21 21:32:45 2017 -0400

Respect locks when changing FAILED resource to COMPLETE

    If a resource is in a FAILED state, but the plugin overrides
    _needs_update() to not raise UpdateReplace, and also no actual update is
    required, we simply change the resource's state to UPDATE_COMPLETE.
    However, since the locking/unlocking steps were merged with the resource
    state updates in a7376f7494b310e9367ebe5dcb43b432a4053023, in doing so we
    were ignoring the lock (since at this stage the resource is still unlocked,
    and we don't need to take a lock). If another update traversal had acquired
    the lock in the meantime, we would still write to the locked resource.

    This change adds a new LOCK_RESPECT lock mode to Resource.store(). In this
    mode the lock is not acquired or released, but an error will be raised if
    the resource is found to be locked.

    Change-Id: I8b5cd1e05b88dd13abc13899a73f23810f7e6135
    Closes-Bug: #1705794
    Related-Bug: #1662585

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-10-23: Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/514476

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-04: Related fix merged to heat (master)

Reviewed: https://review.openstack.org/514476
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=1d24d47e7e4b6295fa12b6ec59efa0db5b2d79c7
Submitter: Zuul
Branch: master

commit 1d24d47e7e4b6295fa12b6ec59efa0db5b2d79c7
Author: Zane Bitter <email address hidden>
Date: Tue Oct 24 13:34:36 2017 -0400

Don't try resource-level locks when stack locked

    The actions SUSPEND, RESUME, SNAPSHOT, CHECK, and also DELETE when
    abandoning rather than deleting a stack, have not (yet) been converted
    to the convergence workflow and still do a legacy-style in-memory
    traversal with the Stack lock held, even on stacks with convergence
    enabled. Since a7376f7494b310e9367ebe5dcb43b432a4053023 we have been
    attempting to get a lock on the Resource to do these operations, even
    though there is no engine ID set (which caused a WARNING-level log). For
    these operations, respect existing Resource locks when convergence is
    enabled, but don't try to acquire the lock.

    Change-Id: I6f232380398a7caf9664717debfe39e3422e70d8
    Related-Bug: #1727142
    Related-Bug: #1662585

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.