functional.test_autoscaling.AutoscalingGroupUpdatePolicyTest heat_integrationtests.common.exceptions.StackBuildErrorException

Bug #1503180 reported by Rabi Mishra
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Rabi Mishra

Bug Description

http://logs.openstack.org/43/230843/3/gate/gate-heat-dsvm-functional-orig-mysql/3d164c3/console.html

This looks like a concurrency issue.

heat_integrationtests.functional.test_autoscaling.AutoscalingGroupUpdatePolicyTest.test_instance_group_update_replace_huge_min_in_service
2015-10-06 02:56:12.108 | 2015-10-06 02:56:12.088 | -----------------------------------------------------------------------------------------------------------------------------------------
2015-10-06 02:56:12.109 | 2015-10-06 02:56:12.089 |
2015-10-06 02:56:12.109 | 2015-10-06 02:56:12.090 | Captured traceback:
2015-10-06 02:56:12.109 | 2015-10-06 02:56:12.093 | ~~~~~~~~~~~~~~~~~~~
2015-10-06 02:56:12.109 | 2015-10-06 02:56:12.094 | Traceback (most recent call last):
2015-10-06 02:56:12.110 | 2015-10-06 02:56:12.096 | File "heat_integrationtests/functional/test_autoscaling.py", line 483, in test_instance_group_update_replace_huge_min_in_service
2015-10-06 02:56:12.110 | 2015-10-06 02:56:12.098 | update_replace=True)
2015-10-06 02:56:12.111 | 2015-10-06 02:56:12.099 | File "heat_integrationtests/functional/test_autoscaling.py", line 377, in update_instance_group
2015-10-06 02:56:12.112 | 2015-10-06 02:56:12.101 | environment=env, files=files)
2015-10-06 02:56:12.114 | 2015-10-06 02:56:12.102 | File "heat_integrationtests/common/test.py", line 378, in update_stack
2015-10-06 02:56:12.115 | 2015-10-06 02:56:12.104 | self._wait_for_stack_status(**kwargs)
2015-10-06 02:56:12.143 | 2015-10-06 02:56:12.105 | File "heat_integrationtests/common/test.py", line 313, in _wait_for_stack_status
2015-10-06 02:56:12.143 | 2015-10-06 02:56:12.107 | fail_regexp):
2015-10-06 02:56:12.144 | 2015-10-06 02:56:12.108 | File "heat_integrationtests/common/test.py", line 274, in _verify_status
2015-10-06 02:56:12.144 | 2015-10-06 02:56:12.110 | stack_status_reason=stack.stack_status_reason)
2015-10-06 02:56:12.144 | 2015-10-06 02:56:12.111 | heat_integrationtests.common.exceptions.StackBuildErrorException: Stack AutoscalingGroupUpdatePolicyTest-1680323451/4b38c55d-afb4-4463-818a-13cbfd7abf02 is in UPDATE_FAILED status due to 'resources.JobServerGroup: Stack AutoscalingGroupUpdatePolicyTest-1680323451-JobServerGroup-hl4ptu674vbf already has an action (UPDATE) in progress.'

Tags: gate-failure
Revision history for this message
Rabi Mishra (rabi) wrote :

Assigned it to me. Please feel free to push a patch if you find the root cause.

Changed in heat:
assignee: nobody → Rabi Mishra (rabi)
Changed in heat:
importance: Undecided → Medium
status: New → Triaged
milestone: none → mitaka-1
status: Triaged → Confirmed
Zane Bitter (zaneb)
tags: added: gate-failure
Revision history for this message
Zane Bitter (zaneb) wrote :

Logstash query:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22already%20has%20an%20action%20(UPDATE)%20in%20progress%5C%22

I'm not sure how far back our logs go, but it looks like possibly a recent regression, starting around the 9th/10th of November? I don't see any before that (although obviously there were, because this bug was raised on the 6th of October).

I'd have thought the fix for bug 1498495 would have resolved this, so I'm not sure what's going on.

Revision history for this message
Rabi Mishra (rabi) wrote :

Yeah, the issue is there for a long time.

bug 1498495 changes 'state_set' not to update the DB and leave it for the thread to update while releasing the lock . However, it's not being used in 'update_tack'[1]

https://github.com/openstack/heat/blob/master/heat/engine/stack.py#L1256-L1269

I've been looking for a solution. But it seems pretty complicated to handle it. Any suggestions?

Revision history for this message
Zane Bitter (zaneb) wrote :

Oh, right, we already had to abandon that abstraction for stack_update.

I don't have any great ideas. It's clear that the abstractions we have for threads, locks, and db entries are not working, and we need to integrate them together in some way. It's hard to justify a lot of work on that with convergence just around the corner though.

Changed in heat:
milestone: mitaka-1 → mitaka-2
Changed in heat:
milestone: mitaka-2 → mitaka-3
Changed in heat:
milestone: mitaka-3 → mitaka-rc1
Changed in heat:
milestone: mitaka-rc1 → newton-1
Rabi Mishra (rabi)
Changed in heat:
milestone: newton-1 → newton-2
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

According to logstash this isn't occurring anymore.

Changed in heat:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.