Cannot delete overcloud heat stack when the stack creation failed

Bug #1308916 reported by Tom Howley
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Expired
Medium
Unassigned

Bug Description

A creation of the overcloud heat failed, after which I tried to delete the heat stack:

root@stratus37:~/.cache/tripleo/tripleo-incubator/scripts# heat stack-list
+--------------------------------------+------------+---------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+---------------+----------------------+
| ea739758-efe3-43ce-a99a-e51b9842bf08 | overcloud | CREATE_FAILED | 2014-04-17T08:14:18Z |
+--------------------------------------+------------+---------------+----------------------+
root@stratus37:~/.cache/tripleo/tripleo-incubator/scripts# heat stack-delete overcloud
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| ea739758-efe3-43ce-a99a-e51b9842bf08 | overcloud | DELETE_IN_PROGRESS | 2014-04-17T08:14:18Z |
+--------------------------------------+------------+--------------------+----------------------+
root@stratus37:~/.cache/tripleo/tripleo-incubator/scripts# heat stack-list
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| ea739758-efe3-43ce-a99a-e51b9842bf08 | overcloud | DELETE_IN_PROGRESS | 2014-04-17T08:14:18Z |
+--------------------------------------+------------+--------------------+----------------------+

- This got stuck in this state for some time (heat stack-delete is usually pretty quick):

root@stratus37:~/.cache/tripleo/tripleo-incubator/scripts# nova list
+--------------------------------------+-------------------------------------+--------+------------+-------------+---------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------------------------------------+--------+------------+-------------+---------------------+
| 8f5dec9e-898c-44d7-a908-09e29cfd69d5 | overcloud-NovaCompute0-c523uvgahoh6 | BUILD | deleting | NOSTATE | ctlplane=192.0.2.27 |
| 120f9e5c-b74f-4e74-9d70-b659f4ec45bc | overcloud-notcompute1-y2mtsy3wf6ss | BUILD | spawning | NOSTATE | ctlplane=192.0.2.26 |
| c67a13bb-8021-4121-be67-7dfacbe32573 | overcloud-notcompute2-6wvxiw2uobn4 | BUILD | spawning | NOSTATE | ctlplane=192.0.2.28 |
+--------------------------------------+-------------------------------------+--------+------------+-------------+---------------------+

- As a result, I tried to delete the nova instances, which didn't work and then tried a force-delete, which is not allowed:

root@stratus37:~/.cache/tripleo/tripleo-incubator/scripts# nova force-delete 8f5dec9e-898c-44d7-a908-09e29cfd69d5
ERROR: Cannot 'forceDelete' while instance is in vm_state building (HTTP 409) (Request-ID: req-6b87f27c-8650-48b2-a657-97d2c087dbe6)

- At this point (apart from database hacking), I think I have to rerun devtest to bring up my overcloud again. The resource list is stuck in this state:

root@stratus37:~/.cache/tripleo/tripleo-incubator/scripts# heat resource-list overcloud
+---------------------+------------------------------------------+--------------------+----------------------+
| resource_name | resource_type | resource_status | updated_time |
+---------------------+------------------------------------------+--------------------+----------------------+
| AccessPolicy | OS::Heat::AccessPolicy | CREATE_COMPLETE | 2014-04-17T08:14:19Z |
| CompletionHandle | AWS::CloudFormation::WaitConditionHandle | CREATE_COMPLETE | 2014-04-17T08:14:19Z |
| ComputeAccessPolicy | OS::Heat::AccessPolicy | CREATE_COMPLETE | 2014-04-17T08:14:19Z |
| ComputeUser | AWS::IAM::User | CREATE_COMPLETE | 2014-04-17T08:14:19Z |
| User | AWS::IAM::User | CREATE_COMPLETE | 2014-04-17T08:14:19Z |
| ComputeKey | AWS::IAM::AccessKey | CREATE_COMPLETE | 2014-04-17T08:14:20Z |
| Key | AWS::IAM::AccessKey | CREATE_COMPLETE | 2014-04-17T08:14:20Z |
| notcompute1 | OS::Nova::Server | CREATE_FAILED | 2014-04-17T08:14:21Z |
| NovaCompute0 | OS::Nova::Server | DELETE_IN_PROGRESS | 2014-04-17T08:14:22Z |
| notcompute2 | OS::Nova::Server | CREATE_FAILED | 2014-04-17T08:14:25Z |
| notcompute0 | OS::Nova::Server | DELETE_IN_PROGRESS | 2014-04-17T08:14:29Z |
| CompletionCondition | AWS::CloudFormation::WaitCondition | DELETE_COMPLETE | 2014-04-17T08:45:11Z |
| NovaCompute0Config | AWS::AutoScaling::LaunchConfiguration | DELETE_COMPLETE | 2014-04-17T08:45:11Z |
| RabbitCookie | OS::Heat::RandomString | DELETE_COMPLETE | 2014-04-17T08:45:11Z |
| notcompute0Config | AWS::AutoScaling::LaunchConfiguration | DELETE_COMPLETE | 2014-04-17T08:45:11Z |
| notcompute1Config | AWS::AutoScaling::LaunchConfiguration | DELETE_COMPLETE | 2014-04-17T08:45:11Z |
| notcompute2Config | AWS::AutoScaling::LaunchConfiguration | DELETE_COMPLETE | 2014-04-17T08:45:11Z |
+---------------------+------------------------------------------+--------------------+----------------------+

Ben Nemec (bnemec)
Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Ben Nemec (bnemec) wrote :

I think this is more a nova problem than a heat problem - it can't delete the instances when they're spawning, so Heat can't delete the stack. I think I might change this to incomplete for now, because I think we'd need to see the nova logs to figure out how it got into this bad state in order to figure out what needs to be fixed (but I agree this should definitely be fixed).

Changed in tripleo:
status: Triaged → Incomplete
Revision history for this message
Tom Howley (tom-howley) wrote :

Thanks for the feedback, Ben. Yes, wasn't initially sure what this should be raised against.

I'll update the bug with nova logs when I come across this problem again.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for tripleo because there has been no activity for 60 days.]

Changed in tripleo:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.