ActionInProgress_Remote: Stack TemplateResourceUpdateFailedTest already has an action (CREATE) in progress.

Bug #1498495 reported by Rabi Mishra
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Steve Baker

Bug Description

Updating nested stack with parent in 'CREATE_FAILED' state fails at times with "ActionInProgress_Remote: Stack TemplateResourceUpdateFailedTest-1535330590-server-7pudt7u7gs7q already has an action (CREATE) in progress" error.

This same codepath is also used by RG for create (StackResource.update_with_template), where CREATE_FAILED flow is used create an empty stack and then update it.

Below are the gate failures in the last 7 days.

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOidBY3Rpb25JblByb2dyZXNzX1JlbW90ZSciLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0NDI5MDU2Mjg3MDl9.

I did notice this error locally as well, with following backtrace.

2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource ActionInProgress_Remote: Stack ResourceGroupTestNullParams-1449185262-random_group-ziqjlb7esvwh already has an action (CREATE) in progress.
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource Traceback (most recent call last):
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource File "/opt/stack/heat/heat/common/context.py", line 305, in wrapped
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource return func(self, ctx, *args, **kwargs)
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource File "/opt/stack/heat/heat/engine/service.py", line 855, in update_stack
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource event=event)
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource File "/opt/stack/heat/heat/engine/service.py", line 146, in start_with_lock
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource with lock.thread_lock():
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource return self.gen.next()
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource File "/opt/stack/heat/heat/engine/stack_lock.py", line 134, in thread_lock
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource self.acquire()
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource File "/opt/stack/heat/heat/engine/stack_lock.py", line 80, in acquire
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource action=stack.action)
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource ActionInProgress: Stack ResourceGroupTestNullParams-1449185262-random_group-ziqjlb7esvwh already has an action (CREATE) in progress.
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource
2015-09-22 08:23:23.340 64860 ERROR heat.engine.resources.stack_resource
2015-09-22 08:23:23.403 64859 INFO heat.engine.environment [req-313bd318-a976-42d4-993a-313efd0b79d3 3835ca7a282a4b59b7a6e4184ab8ba9a 453419721f9e40b38795bd06ab3ca28c] Registering My::RandomString -> my_random.yaml
2015-09-22 08:23:23.401 64860 INFO heat.engine.resource [-] CREATE: ResourceGroup "random_group" [fc525c91-390b-46f4-91fd-d09e3021b76e] Stack "ResourceGroupTestNullParams-1449185262" [e930df2b-f1d4-477c-aad2-0a4f76d3263e]
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource Traceback (most recent call last):
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resource.py", line 607, in _action_recorder
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource yield
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resource.py", line 678, in _do_action
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args)
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/scheduler.py", line 303, in wrapper
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource step = next(subtask)
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resource.py", line 649, in action_handler_task
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource handler_data = handler(*args)
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resources/openstack/heat/resource_group.py", line 373, in handle_create
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource checkers[0].start()
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/scheduler.py", line 203, in start
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource self.step()
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/scheduler.py", line 226, in step
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource next(self._runner)
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resources/openstack/heat/resource_group.py", line 388, in _run_to_completion
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource timeout)
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resources/stack_resource.py", line 438, in update_with_template
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource self.raise_local_exception(ex)
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource File "/opt/stack/heat/heat/engine/resources/stack_resource.py", line 329, in raise_local_exception
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource raise exception.ResourceFailure(message, self, action=self.action)
2015-09-22 08:23:23.401 64860 ERROR heat.engine.resource ResourceFailure: resources.random_group: Stack ResourceGroupTestNullParams-1449185262-random_gro

Rabi Mishra (rabi)
Changed in heat:
assignee: nobody → Rabi Mishra (rabi)
summary: - ActionInProgress_Remote: Stack
- TemplateResourceUpdateFailedTest-1535330590-server-7pudt7u7gs7q already
+ ActionInProgress_Remote: Stack TemplateResourceUpdateFailedTest already
has an action (CREATE) in progress.
Changed in heat:
status: New → Triaged
importance: Undecided → High
tags: added: liberty-rc-potential
Changed in heat:
milestone: none → mitaka-1
Revision history for this message
Rabi Mishra (rabi) wrote :

From the initial investigation, it looks like an issue of stack locking. As we are creating an empty stack and then updating it immediately, it seems the nested stack lock is not released by the thread doing 'stack_create' before 'stack_update' starts.

As we're checking for stack create_complete by checking the db[1], it is possible that the create_stack thread may not finish when the check detects that the the status(COMPLETE) is updated in db.

I've pushed a workaround for this issue[2]. Working on a more elegant solution which would be difficult to implement though:)

[1] https://github.com/openstack/heat/blob/master/heat/engine/resources/stack_resource.py#L335
[2] https://review.openstack.org/#/c/227156/

Changed in heat:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/227156
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=e2294da861c24d538cafe1c54d038e2b5d4ba594
Submitter: Jenkins
Branch: master

commit e2294da861c24d538cafe1c54d038e2b5d4ba594
Author: Rabi Mishra <email address hidden>
Date: Thu Sep 24 12:10:43 2015 +0530

    Fix action (CREATE) in progress error for TemplateResource/RG

    When StackResource 'update_with_template' is called and there
    is no existing nested_stack, we call create_stack with an empty
    template(TemplateResource CREATE_FAILED->UPDATE flow). We check
    db for CREATE_COMPLETE before calling update_stack.There is a
    possibility that the create_stack green thread has not finished
    and released the stack lock. By persisting the stack state for
    COMPLETE/FAILED in the same db session as releasing the lock,
    when green thread finishes it's run, we can avoid this race
    condition.

    Change-Id: I1442a7bcedb4fa340ee4f0e6ebc3705544c2a9dd
    Closes-Bug: #1498495

Changed in heat:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/231275

Thierry Carrez (ttx)
no longer affects: heat/mitaka
no longer affects: heat/liberty
Changed in heat:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/liberty)

Reviewed: https://review.openstack.org/231275
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=49976fb7bea770b230539b7b23492972666077a3
Submitter: Jenkins
Branch: stable/liberty

commit 49976fb7bea770b230539b7b23492972666077a3
Author: Rabi Mishra <email address hidden>
Date: Thu Sep 24 12:10:43 2015 +0530

    Fix action (CREATE) in progress error for TemplateResource/RG

    When StackResource 'update_with_template' is called and there
    is no existing nested_stack, we call create_stack with an empty
    template(TemplateResource CREATE_FAILED->UPDATE flow). We check
    db for CREATE_COMPLETE before calling update_stack.There is a
    possibility that the create_stack green thread has not finished
    and released the stack lock. By persisting the stack state for
    COMPLETE/FAILED in the same db session as releasing the lock,
    when green thread finishes it's run, we can avoid this race
    condition.

    Change-Id: I1442a7bcedb4fa340ee4f0e6ebc3705544c2a9dd
    Closes-Bug: #1498495
    (cherry picked from commit e2294da861c24d538cafe1c54d038e2b5d4ba594)

tags: added: in-stable-liberty
Thierry Carrez (ttx)
Changed in heat:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: liberty-rc2 → 5.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/235237

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)
Download full text (9.5 KiB)

Reviewed: https://review.openstack.org/235237
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=4bf298763005b4b8e0c9c8d64ad7ad0863405b50
Submitter: Jenkins
Branch: master

commit 4348b7ababd1048e17ec354ca38c903c908cd74e
Author: Steve Baker <email address hidden>
Date: Tue Oct 6 12:54:24 2015 +1300

    Do no net/subnet checks during validate

    Currently validation fails if the network or the subnet are defined in
    the same template as the server because they don't exist during
    validate.

    _validate_belonging_subnet_to_net is called during _build_nics so this
    check still happens when the required resources actually exist.

    Also, test method validate_internal_port_subnet_not_this_network was not
    running because its name was not prefixed with test_. Fixing and
    updating this test revealed issues with the validation failed message.

    Closes-Bug: #1503060
    Change-Id: I2e062abf5b9fb7446b979ba6acc2e2b531d957d6

commit 7e911cec083e98f344bfcfeb7e2e843ce2330d5d
Author: Pratik Mallya <email address hidden>
Date: Mon Oct 5 00:34:12 2015 -0500

    Correctly filter events on resource name

    Filtering on resource name can be accomplished by using the db
    filters instead of using a filter method. Furthermore, using a
    custom filter breaks certain assumptions made by other parts of
    the codebase, which lead to pagination being broken.
    Specifically, custom filter for resource name is applied only
    after other filtering is finished. As a result, we try to find
    result from only 1 event and this event belongs to whole
    stack and thus its resource_name is equal to the stack name.
    This returns an event not found error instead of returning
    the first event assosciated with the resource name.

    Change-Id: Ie7d28ffb88a79d6e0f516f0db385be2eed59f7f0
    Closes-Bug: #1502751
    (cherry picked from commit 6322ff6fceabdfa226e5359328bde292d189e703)

commit fecc2bacb51c7dabdb83b87313df48dea527feae
Author: Pratik Mallya <email address hidden>
Date: Thu Oct 1 17:41:09 2015 -0500

    Check if os-interface extension is present before saving ports

    If that extension is not present, then the code to extract and
    save ports created during server create by nova will fail.

    Change-Id: I6008a0da0c16a34c40e343af4e7237520db4ae36
    Closes-Bug: #1499877
    (cherry picked from commit 46d8c276b70e7698438988faa9d997577a7e1750)

commit 6c3fa4e25d3e35adf53ad1e80d5455bf9f7a924a
Author: Peter Razumovsky <email address hidden>
Date: Mon Sep 14 17:43:49 2015 +0300

    Fix translating for props with get_param value

    If some map or list type properties specified with
    json-type or commadelimitedlist parameters, error
    raised in case of wrong properties data parsing.
    Fix this case by adding check if data is GetParam
    instance, resolve it. Other function can be safely
    replaced without resolve.

    Change-Id: I0c9a6af29b56b629cbdad2acb868c3033e38b5ef
    Closes-bug: #1494364
    (cherry picked from commit aea59ecdac6ed6635125eb2064554140e3e645fc)

commit 8810a28a9016b3bfa515fc11efd5f5dfa103b380
Author: OpenStack Propos...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/239659

tags: added: kilo-backport-potential
removed: liberty-rc-potential
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/heat 6.0.0.0b1

This issue was fixed in the openstack/heat 6.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (stable/kilo)

Change abandoned by Rakesh H S (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/239659
Reason: Most of us have already moved to Liberty/Mitaka by now.
We could restore this patch later if required.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/284766

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (stable/kilo)

Change abandoned by Rabi Mishra (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/284766

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.