tempest.api.orchestration.stacks.test_stacks fails An object with that identifier already exists

Bug #1355942 reported by Steve Martinelli
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Jason Dunsmore
tempest
Invalid
Undecided
Jason Dunsmore

Bug Description

Noticed this failure in a recent patch: https://review.openstack.org/#/c/113378/

Full console log from jenkins run: http://logs.openstack.org/78/113378/1/gate/gate-tempest-dsvm-postgres-full/d90bf37/console.html

Looks very similar to https://bugs.launchpad.net/tempest/+bug/1344989 (marked as fix-released)

Snippet below:
2014-08-12 16:35:24.766 | ==============================
2014-08-12 16:35:24.766 | Failed 1 tests - output below:
2014-08-12 16:35:24.766 | ==============================
2014-08-12 16:35:24.766 |
2014-08-12 16:35:24.766 | tempest.api.orchestration.stacks.test_stacks.StacksTestJSON.test_stack_crud_no_resources[gate,smoke]
2014-08-12 16:35:24.766 | ----------------------------------------------------------------------------------------------------
2014-08-12 16:35:24.766 |
2014-08-12 16:35:24.766 | Captured traceback:
2014-08-12 16:35:24.767 | ~~~~~~~~~~~~~~~~~~~
2014-08-12 16:35:24.767 | Traceback (most recent call last):
2014-08-12 16:35:24.767 | File "tempest/api/orchestration/stacks/test_stacks.py", line 65, in test_stack_crud_no_resources
2014-08-12 16:35:24.767 | resp = self.client.delete_stack(stack_identifier)
2014-08-12 16:35:24.767 | File "tempest/services/orchestration/json/orchestration_client.py", line 142, in delete_stack
2014-08-12 16:35:24.767 | return self.delete("stacks/%s" % str(stack_identifier))
2014-08-12 16:35:24.767 | File "tempest/common/rest_client.py", line 224, in delete
2014-08-12 16:35:24.767 | return self.request('DELETE', url, extra_headers, headers, body)
2014-08-12 16:35:24.767 | File "tempest/common/rest_client.py", line 430, in request
2014-08-12 16:35:24.767 | resp, resp_body)
2014-08-12 16:35:24.768 | File "tempest/common/rest_client.py", line 484, in _error_checker
2014-08-12 16:35:24.768 | raise exceptions.Conflict(resp_body)
2014-08-12 16:35:24.768 | Conflict: An object with that identifier already exists
2014-08-12 16:35:24.768 | Details: {u'title': u'Conflict', u'explanation': u'There was a conflict when trying to complete your request.', u'code': 409, u'error': {u'message': u'Stack heat-178867617 already has an action (CREATE) in progress.', u'traceback': u'Traceback (most recent call last):\n\n File "/opt/stack/new/heat/heat/engine/service.py", line 63, in wrapped\n return func(self, ctx, *args, **kwargs)\n\n File "/opt/stack/new/heat/heat/engine/service.py", line 781, in delete_stack\n stack.delete)\n\n File "/opt/stack/new/heat/heat/engine/service.py", line 113, in start_with_lock\n with lock.thread_lock(stack.id):\n\n File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__\n return self.gen.next()\n\n File "/opt/stack/new/heat/heat/engine/stack_lock.py", line 136, in thread_lock\n self.release(stack_id)\n\n File "/opt/stack/new/heat/heat/openstack/common/excutils.py", line 82, in __exit__\n six.reraise(self.type_, self.value, self.tb)\n\n File "/opt/stack/new/heat/heat/engine/stack_lock.py", line 132, in thread_lock\n self.acquire()\n\n File "/opt/stack/new/heat/heat/engine/stack_lock.py", line 81, in acquire\n action=self.stack.action)\n\nActionInProgress: Stack heat-178867617 already has an action (CREATE) in progress.\n', u'type': u'ActionInProgress'}}

description: updated
Revision history for this message
Steven Hardy (shardy) wrote :

Hmm, looks similar to bug #1344989 but isn't AFAICS - that one was because we were doing two deletes, but this one is failing saying we have a *create* in progress when we trigger the delete, despite the fact that we just checked the stack was CREATE_COMPLETE:

https://github.com/openstack/tempest/blob/master/tempest/api/orchestration/stacks/test_stacks.py#L45

Revision history for this message
Steven Hardy (shardy) wrote :

I'm wondering if this is actually a race, where we see the status of the stack change to CREATE_COMPLETE, but there's a short window where the create_stack still holds the lock, we may be hitting the delete so fast that it can't get the lock?

https://github.com/openstack/heat/blob/master/heat/engine/service.py#L582

Revision history for this message
Steven Hardy (shardy) wrote :

After discussions with jasond, this may be a race in the heat delete code, as the thread running the create should have been killed (and released the lock) when the delete started, so marking as affecting Heat while we work out what is going on.

Changed in heat:
importance: Undecided → High
status: New → Triaged
Changed in tempest:
assignee: nobody → Jason Dunsmore (jasondunsmore)
Steven Hardy (shardy)
Changed in heat:
milestone: none → juno-3
Revision history for this message
Jason Dunsmore (jasondunsmore) wrote :

> but this one is failing saying we have a *create* in progress when
> we trigger the delete, despite the fact that we just checked the
> stack was CREATE_COMPLETE

The delete should work even if the create were in-progress. It's even
stranger that we're seeing that error message when the stack is
CREATE_COMPLETE.

Steven, per our IRC conversation, I verified that the linked functions
are indeed called whenever a thread is killed, so apparently that's
not causing this.

This might be a tempest bug. I correlated the timestamps from the
error message in the Heat API log[1] to the ones in the Heat engine
log[2], and found that there weren't any tracebacks in the engine log
like I expected. In addition, the debug message on line 77 was
nowhere to be found. That leads me to believe that another engine
process responded with the ActionInProgress error.

Steven, do you know if it's possible that the tempest test was talking
to more than one Heat engine?

[1] http://logs.openstack.org/78/113378/1/gate/gate-tempest-dsvm-postgres-full/d90bf37/logs/screen-h-api.txt.gz#_2014-08-12_16_24_44_429
[2] http://logs.openstack.org/78/113378/1/gate/gate-tempest-dsvm-postgres-full/d90bf37/logs/screen-h-eng.txt.gz?#_2014-08-12_16_24_43_737
[3] https://github.com/openstack/heat/blob/master/heat/engine/stack_lock.py#L77-L79

Revision history for this message
Steven Hardy (shardy) wrote :

Jason: I don't think the tempest tests run with more than one engine is the default in our config atm, I'm not 100% sure if anything modifies the num_engine_workers for gate tests, but I don't think so (stevebaker may know more)

I've seen another instance of this failure here:

https://review.openstack.org/#/c/106824/12
http://logs.openstack.org/24/106824/12/check/check-tempest-dsvm-postgres-full/1e6e98a/console.html

Revision history for this message
Steven Hardy (shardy) wrote :

I mean, ", as one engine is the default in our config atm"

Revision history for this message
Jason Dunsmore (jasondunsmore) wrote :

Interesting, I did a search for the request ID in the engine and API logs and the logs clearly don't match up. The backtrace embedded in the JSON response from the API log should be in the engine log as well. Still not sure what's going on here.

Thierry Carrez (ttx)
Changed in heat:
milestone: juno-3 → juno-rc1
Revision history for this message
David Kranz (david-kranz) wrote :

This really does not look like a tempest issue. Please reopen if contrary evidence arises.

Changed in tempest:
status: New → Invalid
Zane Bitter (zaneb)
Changed in heat:
assignee: nobody → Jason Dunsmore (jasondunsmore)
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

I wonder if this change will help with this bug https://review.openstack.org/#/c/123023/

I'll set up an elastic-recheck search to track this failure

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Most failures are in havana or icehouse jobs. The failure occured in check-tempest-dsvm-neutron-full only 3 times, all on the 16th:

logstash query:

  message:test_stack_crud_no_resources AND build_name:check-tempest-dsvm-neutron-full AND tags:"console"

This is looking like a rare race, probably worth tracking it though

Changed in heat:
milestone: juno-rc1 → none
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Marking as fixed for now, it is not showing up in logstash searches.

Changed in heat:
status: Triaged → Fix Committed
Thierry Carrez (ttx)
Changed in heat:
milestone: none → juno-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: juno-rc1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.