Race condition when two updates are acting on a resource

Bug #1722371 reported by Zane Bitter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Zane Bitter

Bug Description

In convergence, if we fail to acquire the lock on a resource because another traversal is still acting on it, there is a race where we attempt to update the resource with the current traversal ID and any new requirements. If the previous update finishes just after we try to get the lock but before this call then it will succeed, which would be bad.

In the usual case where the resource is still locked, the call will fail with an UpdateInProgress exception, which what we want anyway, but there will be an ERROR-level log reported:

Oct 09 16:18:58.527198 ubuntu-xenial-ovh-bhs1-11290899 heat-engine[8340]: ERROR root [None req-743503a9-1b58-4706-a575-0eaea5fc72a0 demo None] Original exception being dropped: ['Traceback (most recent call last):\n', ' File "/opt/stack/new/heat/heat/engine/resource.py", line 1384, in update_convergence\n runner(timeout=timeout, progress_callback=progress_callback)\n', ' File "/opt/stack/new/heat/heat/engine/scheduler.py", line 168, in __call__\n progress_callback=progress_callback):\n', ' File "/opt/stack/new/heat/heat/engine/scheduler.py", line 244, in as_task\n self.start(timeout=timeout)\n', ' File "/opt/stack/new/heat/heat/engine/scheduler.py", line 190, in start\n self.step()\n', ' File "/opt/stack/new/heat/heat/engine/scheduler.py", line 217, in step\n poll_period = next(self._runner)\n', ' File "/opt/stack/new/heat/heat/engine/scheduler.py", line 366, in wrapper\n subtask = next(parent)\n', ' File "/opt/stack/new/heat/heat/engine/resource.py", line 1556, in update\n with self._action_recorder(action, UpdateReplace):\n', ' File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__\n return self.gen.next()\n', ' File "/opt/stack/new/heat/heat/engine/resource.py", line 843, in _action_recorder\n LOG.info(\'Update in progress for %s\', self.name)\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n', ' File "/opt/stack/new/heat/heat/engine/resource.py", line 839, in _action_recorder\n set_in_progress()\n', ' File "/usr/local/lib/python2.7/dist-packages/tenacity/__init__.py", line 171, in wrapped_f\n return self.call(f, *args, **kw)\n', ' File "/usr/local/lib/python2.7/dist-packages/tenacity/__init__.py", line 248, in call\n start_time=start_time)\n', ' File "/usr/local/lib/python2.7/dist-packages/tenacity/__init__.py", line 216, in iter\n raise RetryError(
Oct 09 16:18:58.527838 ubuntu-xenial-ovh-bhs1-11290899 heat-engine[8340]: fut).reraise()\n', ' File "/usr/local/lib/python2.7/dist-packages/tenacity/__init__.py", line 297, in reraise\n raise self.last_attempt.result()\n', ' File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 422, in result\n return self.__get_result()\n', ' File "/usr/local/lib/python2.7/dist-packages/tenacity/__init__.py", line 251, in call\n result = fn(*args, **kwargs)\n', ' File "/opt/stack/new/heat/heat/engine/resource.py", line 836, in set_in_progress\n self.state_set(action, self.IN_PROGRESS, lock=lock_acquire)\n', ' File "/opt/stack/new/heat/heat/engine/resource.py", line 2206, in state_set\n self.store(set_metadata, lock=lock)\n', ' File "/opt/stack/new/heat/heat/engine/resource.py", line 2001, in store\n self._store_with_lock(rs, lock)\n', ' File "/opt/stack/new/heat/heat/engine/resource.py", line 2032, in _store_with_lock\n raise exception.UpdateInProgress(self.name)\n', 'UpdateInProgress: The resource test2 is already being updated.\n']: UpdateInProgress: The resource test2 is already being updated.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/510674

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/510674
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=79cc0cc7b93b98ebb1e7e0d7f838fc1186d7bc47
Submitter: Zuul
Branch: master

commit 79cc0cc7b93b98ebb1e7e0d7f838fc1186d7bc47
Author: Zane Bitter <email address hidden>
Date: Wed Oct 18 16:46:39 2017 -0400

    Don't attempt to update tmpl ID when resource in progress

    If we attempt to do a convergence update on a resource and find it already
    locked by another traversal, don't try to update the resource's current
    template ID or requirements data. Doing so will usually fail with the same
    exception, but it is unnecessary and leaves ERROR-level messages in the
    log. However, there is a race which could result in the call succeeding
    (i.e. if the other task releases the lock just after we fail to get it),
    and that could result in the resource not being updated at all.

    Change-Id: I6bde1f9359cd52c99cca092e8abc660bac8b3065
    Closes-Bug: #1722371

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 10.0.0.0b1

This issue was fixed in the openstack/heat 10.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.