*_COMPLETE/FAILED status is set too early

Bug #1450314 reported by Sirushti Murugesan
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Angus Salkeld

Bug Description

I run into this fairly often while running heat_integrationtests.

heat_integrationtests.functional.test_template_resource.TemplateResourceUpdateTest.test_template_resource_update_template_schema(main_tmpl_change)
--------------------------------------------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/opt/stack/new/heat/heat_integrationtests/functional/test_template_resource.py", line 347, in test_template_resource_update_template_schema
        files={'the.yaml': self.provider})
      File "/opt/stack/new/heat/heat_integrationtests/common/test.py", line 317, in update_stack
        environment=env
      File "/opt/stack/new/heat/.tox/integration/local/lib/python2.7/site-packages/heatclient/v1/stacks.py", line 142, in update
        data=kwargs, headers=headers)
      File "/opt/stack/new/heat/.tox/integration/local/lib/python2.7/site-packages/heatclient/common/http.py", line 265, in json_request
        resp = self._http_request(url, method, **kwargs)
      File "/opt/stack/new/heat/.tox/integration/local/lib/python2.7/site-packages/heatclient/common/http.py", line 220, in _http_request
        raise exc.from_response(resp)
    heatclient.exc.HTTPConflict: ERROR: Stack TemplateResourceUpdateTest-730796453 already has an action (CREATE) in progress.

--------------------------------------------------------------------------

Sometimes, there's considerable time between setting the state of a stack to *_COMPLETE/FAILED and releasing the StackLock associated with it. Since we check for the stack status only, we run into the above error^ when trying to act on the stack.

Here's the log for h-eng:

2015-04-14 02:39:03.310 2107 DEBUG heat.engine.stack_lock [req-1f86b47a-6fe9-4989-9d1d-106590c2a3e6 demo demo] Lock on stack 4b6b0ecb-00ba-4c41-bcbc-6df90243a5f9 is owned by engine 3f33a6d2-d73f-4d5a-91e1-dd9c121eff68 acquire /opt/stack/new/heat/heat/engine/stack_lock.py:87

2015-04-14 02:39:03.310 2107 DEBUG oslo_messaging.rpc.dispatcher [req-1f86b47a-6fe9-4989-9d1d-106590c2a3e6 demo demo] Expected exception during message handling (Stack TemplateResourceUpdateTest-730796453 already has an action (CREATE) in progress.) _dispatch_and_reply /usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py:145

--------------------------------------------------------------------------

http://logs.openstack.org/39/172339/6/check/check-heat-dsvm-functional-mysql/989ed6b/console.html.gz

Revision history for this message
Angus Salkeld (asalkeld) wrote :

Somewhat related to this bug: https://bugs.launchpad.net/heat/+bug/1433340

Changed in heat:
status: New → Triaged
importance: Undecided → High
milestone: none → liberty-1
Revision history for this message
Sergey Kraynev (skraynev) wrote :

another log from another test looks similar:

2015-04-30 09:03:51.570 | 2015-04-30 09:03:51.550 | Captured traceback:
2015-04-30 09:03:51.572 | 2015-04-30 09:03:51.552 | ~~~~~~~~~~~~~~~~~~~
2015-04-30 09:03:51.573 | 2015-04-30 09:03:51.553 | Traceback (most recent call last):
2015-04-30 09:03:51.575 | 2015-04-30 09:03:51.555 | File "/opt/stack/new/heat/heat_integrationtests/functional/test_template_resource.py", line 399, in test_update_on_failed_create
2015-04-30 09:03:51.606 | 2015-04-30 09:03:51.556 | files={'server_fail.yaml': self.nested_templ})
2015-04-30 09:03:51.607 | 2015-04-30 09:03:51.558 | File "/opt/stack/new/heat/heat_integrationtests/common/test.py", line 317, in update_stack
2015-04-30 09:03:51.607 | 2015-04-30 09:03:51.559 | self._wait_for_stack_status(**kwargs)
2015-04-30 09:03:51.607 | 2015-04-30 09:03:51.561 | File "/opt/stack/new/heat/heat_integrationtests/common/test.py", line 275, in _wait_for_stack_status
2015-04-30 09:03:51.607 | 2015-04-30 09:03:51.562 | stack_status_reason=stack.stack_status_reason)
2015-04-30 09:03:51.607 | 2015-04-30 09:03:51.564 | heat_integrationtests.common.exceptions.StackBuildErrorException: Stack TemplateResourceUpdateFailedTest-1508413043/861a0ffb-f79f-4d47-ba6d-e1f54dd2f3ca is in UPDATE_FAILED status due to 'ActionInProgress: Stack TemplateResourceUpdateFailedTest-1508413043-server-zcn4zjtjjfry already has an action (CREATE) in progress.'

Also logs from heat-engine:

2015-04-30 09:03:47.269 5485 INFO heat.engine.stack [-] Stack CREATE COMPLETE (TemplateResourceUpdateFailedTest-1508413043-server-zcn4zjtjjfry): Stack CREATE completed successfully

2015-04-30 09:03:47.298 5485 DEBUG oslo_messaging.rpc.dispatcher [req-4da843af-91c9-41d9-b4cb-01acd8bebb1d demo demo] Expected exception during message handling (Stack TemplateResourceUpdateFailedTest-1508413043-server-zcn4zjtjjfry already has an action (CREATE) in progress.) _dispatch_and_reply /usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py:145

So it really looks like issue with our bad handling of update via rpc.

description: updated
Changed in heat:
assignee: nobody → Sirushti Murugesan (sirushtim)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/179325

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/179325
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=8f31346c1c306f0a5735466e0ca8dcced8e4ed18
Submitter: Jenkins
Branch: master

commit 8f31346c1c306f0a5735466e0ca8dcced8e4ed18
Author: Sirushti Murugesan <email address hidden>
Date: Fri May 1 09:47:09 2015 +0530

    Wait for the stack lock to be released

    Sometimes, when trying to update a stack in a *_COMPLETE/FAILED status,
    it can fail since the stack lock is not released yet. As a short-term
    workaround, I've put in place an ugly hack that will simply retry the
    update if the operation failed because of the stack lock. The downside of
    this is that we can't now add functional tests which will test updates on
    Stack Lock itself since we now simply just ignore the HTTPConflict Exception
    on updates.

    I couldn't think of another solution that doesn't include a giant refactoring
    that will solve this problem.

    PS: Convergence doesn't have the notion of a Stack-Lock, so that should
    take care of this by default.

    Partially-Closes Bug: #1450314

    Change-Id: Ib1a9d5c425285ebcffcb9ff8a362a56fd8f3574a

Changed in heat:
assignee: Sirushti Murugesan (sirushtim) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/183374

Changed in heat:
assignee: nobody → Angus Salkeld (asalkeld)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/183374
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=309f674f5cc69354397bbf1f4ab55e61d6ed13b3
Submitter: Jenkins
Branch: master

commit 309f674f5cc69354397bbf1f4ab55e61d6ed13b3
Author: Angus Salkeld <email address hidden>
Date: Fri May 15 12:03:15 2015 +1000

    Update the stack state as late as possible

    The event and notifications do not look the stack up in
    the DB so we can rely on the in memory status.

    Change-Id: Iceb9bf6fa86f7f5ada7d3a8672788f09a48b1f43
    Partial-Bug: #1450314

Angus Salkeld (asalkeld)
Changed in heat:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in heat:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: liberty-1 → 5.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/238576

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/238576
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=def34d7a0525e2b7f0a611efee2d10142c2e6005
Submitter: Jenkins
Branch: master

commit def34d7a0525e2b7f0a611efee2d10142c2e6005
Author: Thomas Herve <email address hidden>
Date: Thu Oct 22 17:29:21 2015 +0200

    Expand test in-progress workaround

    When working around bug #1450314, we changed integration tests to ignore
    the lock when doing an update. Unfortunately, this issue happens for
    actions other than update like delete and suspend, so let's apply the
    same fix for those actions too.

    Change-Id: I04c5f35a1851163bc849f5c8be0f6e865ee0088e
    Closes-Bug: #1508990
    Related-Bug: #1450314

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.