cannot delete the stack after heat-engine down in middle of stack-update

Bug #1334514 reported by Mitsuru Kanabuchi
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Rikimaru Honjo

Bug Description

[Issue]

I'm checking the behavior when system failure cases.
Stack can't delete forever after heat-engine down in middle of stack-update.
Probably, similar case of following bug:
  https://bugs.launchpad.net/heat/+bug/1270775

[How to reproduce]

1) Create stack with following template, this template is using boot from volume

$ cat template
{
  "AWSTemplateFormatVersion" : "2010-09-09",
  "Resources" : {
    "volume": {
      "Type": "OS::Cinder::Volume",
      "Properties": {
        "name": "volume",
        "size": 1,
        "image": "0dd84e83-c271-4723-97fe-fcc59ecd4f60"
      }
    },
    "instance": {
      "Type": "OS::Nova::Server",
      "Properties": {
        "name": "instance",
        "flavor": "m1.tiny",
        "image": "0dd84e83-c271-4723-97fe-fcc59ecd4f60",
        "block_device_mapping": [
          {
            "device_name": "vdb",
            "volume_id": { "Ref": "volume" }
          }
        ]
      }
    }
  }
}
$ heat stack-create -f template test
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 1b1bff1b-b5d1-4b7a-b074-3630de6228fa | test | CREATE_IN_PROGRESS | 2014-06-26T05:29:40Z |
+--------------------------------------+------------+--------------------+----------------------+

2) Wait for CREATE_COMPLETE

$ heat stack-list
+--------------------------------------+------------+-----------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+-----------------+----------------------+
| 1b1bff1b-b5d1-4b7a-b074-3630de6228fa | test | CREATE_COMPLETE | 2014-06-26T05:29:40Z |
+--------------------------------------+------------+-----------------+----------------------+

3) Update stack with following template, name and volume's image id was changed

$ cat template-up
{
  "AWSTemplateFormatVersion" : "2010-09-09",
  "Resources" : {
    "volume": {
      "Type": "OS::Cinder::Volume",
      "Properties": {
        "name": "volume-up",
        "size": 1,
        "image": "8580e3cb-a84c-4943-a20f-f518fec3fc30"
      }
    },
    "instance": {
      "Type": "OS::Nova::Server",
      "Properties": {
        "name": "instance",
        "flavor": "m1.tiny",
        "image": "0dd84e83-c271-4723-97fe-fcc59ecd4f60",
        "block_device_mapping": [
          {
            "device_name": "vdb",
            "volume_id": { "Ref": "volume" }
          }
        ]
      }
    }
  }
}
$ heat stack-update -f template-up test
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 1b1bff1b-b5d1-4b7a-b074-3630de6228fa | test | UPDATE_IN_PROGRESS | 2014-06-26T05:29:40Z |
+--------------------------------------+------------+--------------------+----------------------+

4) Kill heat-engine for simulating system failure

$ ps aux|grep heat-engine|grep -v grep
devstack 27680 0.1 2.3 74068 48544 pts/43 S+ 13:29 0:06 python bin/heat-engine --config-file=/etc/heat/heat.conf
$ sudo kill -SIGKILL 27680

5) Restart heat-engine

$ /usr/bin/python /usr/local/bin/heat-engine --config-file=/etc/heat/heat.conf > /dev/null 2>&1 &
[1] 31877
$ ps aux|grep heat-engine|grep -v grep
devstack 31877 9.5 2.2 71480 46344 pts/11 S 14:44 0:01 /usr/bin/python /usr/local/bin/heat-engine --config-file=/etc/heat/heat.conf

6) Stack status is in "UPDATE_IN_PROGRESS", it's no problem

$ heat stack-list
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 1b1bff1b-b5d1-4b7a-b074-3630de6228fa | test | UPDATE_IN_PROGRESS | 2014-06-26T05:29:40Z |
+--------------------------------------+------------+--------------------+----------------------+

7) But stack deletion never succeed

$ heat stack-list
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 1b1bff1b-b5d1-4b7a-b074-3630de6228fa | test | UPDATE_IN_PROGRESS | 2014-06-26T05:29:40Z |
+--------------------------------------+------------+--------------------+----------------------+
$ heat stack-delete 1b1bff1b-b5d1-4b7a-b074-3630de6228fa
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 1b1bff1b-b5d1-4b7a-b074-3630de6228fa | test | DELETE_IN_PROGRESS | 2014-06-26T05:29:40Z |
+--------------------------------------+------------+--------------------+----------------------+
$ heat stack-list
+--------------------------------------+------------+---------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+---------------+----------------------+
| 1b1bff1b-b5d1-4b7a-b074-3630de6228fa | test | DELETE_FAILED | 2014-06-26T05:29:40Z |
+--------------------------------------+------------+---------------+----------------------+
$ heat stack-delete 1b1bff1b-b5d1-4b7a-b074-3630de6228fa
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 1b1bff1b-b5d1-4b7a-b074-3630de6228fa | test | DELETE_IN_PROGRESS | 2014-06-26T05:29:40Z |
+--------------------------------------+------------+--------------------+----------------------+
$ heat stack-list
+--------------------------------------+------------+---------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+---------------+----------------------+
| 1b1bff1b-b5d1-4b7a-b074-3630de6228fa | test | DELETE_FAILED | 2014-06-26T05:29:40Z |
+--------------------------------------+------------+---------------+----------------------+

[trace]

* Stack deletion failed by "Volume in use"

2014-06-26 14:51:30.138 ERROR heat.engine.resource [-] Delete CinderVolume "volume" [30936955-c285-4d01-aaa0-dbcd87b210cf] Stack "test*" [20c71486-578f-49dc-b6bd-f68707c9769d]
2014-06-26 14:51:30.138 TRACE heat.engine.resource Traceback (most recent call last):
2014-06-26 14:51:30.138 TRACE heat.engine.resource File "/opt/stack/heat/heat/engine/resource.py", line 671, in delete
2014-06-26 14:51:30.138 TRACE heat.engine.resource handle_data = self.handle_delete()
2014-06-26 14:51:30.138 TRACE heat.engine.resource File "/opt/stack/heat/heat/engine/resources/volume.py", line 179, in handle_delete
2014-06-26 14:51:30.138 TRACE heat.engine.resource delete_task.start()
2014-06-26 14:51:30.138 TRACE heat.engine.resource File "/opt/stack/heat/heat/engine/scheduler.py", line 162, in start
2014-06-26 14:51:30.138 TRACE heat.engine.resource self.step()
2014-06-26 14:51:30.138 TRACE heat.engine.resource File "/opt/stack/heat/heat/engine/scheduler.py", line 190, in step
2014-06-26 14:51:30.138 TRACE heat.engine.resource next(self._runner)
2014-06-26 14:51:30.138 TRACE heat.engine.resource File "/opt/stack/heat/heat/engine/scheduler.py", line 251, in wrapper
2014-06-26 14:51:30.138 TRACE heat.engine.resource subtask = next(parent)
2014-06-26 14:51:30.138 TRACE heat.engine.resource File "/opt/stack/heat/heat/engine/resources/volume.py", line 159, in _delete
2014-06-26 14:51:30.138 TRACE heat.engine.resource raise exception.Error(_('Volume in use'))
2014-06-26 14:51:30.138 TRACE heat.engine.resource Error: Volume in use
2014-06-26 14:51:30.138 TRACE heat.engine.resource

* Because instance that's using the volume is alive

$ nova list
+--------------------------------------+----------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+----------+--------+------------+-------------+----------+
| 2b0592c5-5354-4888-9296-d1ef68b829b5 | instance | ACTIVE | - | Running | |
+--------------------------------------+----------+--------+------------+-------------+----------+

* It means, deletion order wasn't good

* This case's update method is UpdateReplace. According to DB, current_stack and backup_stack are having following resources when heat-engine down

  current_stack has instance, new volume
  backup_stack has old volume

* stack-delete try to delete backup_stack at first, but instance connected old volume at that time. Eventually, DELETE_FAILED occured with "Volume in use"

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

Recently patch (commitid: 589b26698d20a113fa42552983c79e2576c26421) changed volume's update_allowed, it affects route of update method.
So I updated bug description's template for reproducing this bug.

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/102767

Changed in heat:
assignee: nobody → Mitsuru Kanabuchi (kanabuchi)
status: New → In Progress
Revision history for this message
Ethan Lynn (ethanlynn) wrote :

Why is the status stucked in “ UPDATE_IN_PROGRESS”

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

This circumstance occurred by heat-engine down in middle of stack processing.
In this case, stack status stuck "UPDATE_IN_PROGRESS".

But I think it's not serious problem basically.
Because IN_PROGRESS stack can clean up by stack-delete.

However, I reported problem is little bit different.
Stack can't clean up because stack-delete failed forever.
I think, it should be fixed, please see the patch.

Revision history for this message
Ethan Lynn (ethanlynn) wrote :

If we don't modify paser.py, just add an option to heatclient to set stack status to fail. Can we delete stack after setting your broken stack status to failed?

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

Hi Ethan, thank you for comment.
I think, your idea can't solve this problem. This stack can't delete after status changed to "FAILED" forcefully. Please see similar bug report #1270775.
The problem is, stack structure isn't appropriate after stack-update interrupted.
My patch aims to fix stack structure to be deletable.

Changed in heat:
assignee: Mitsuru Kanabuchi (kanabuchi) → Rikimaru Honjo (honjo-rikimaru-c6)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/102767
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=3fca8cb69f5d68c76064d6de7d4494c7a9e0d46e
Submitter: Jenkins
Branch: master

commit 3fca8cb69f5d68c76064d6de7d4494c7a9e0d46e
Author: Rikimaru, Honjo <email address hidden>
Date: Mon Aug 4 16:38:33 2014 +0900

    Restore resource_id from backup_stack

    When UpdateReplace is occurred, the resource is moved to backup_stack,
    and create new one. Then, resources which has dependency for updated
    resource will be moved to backup_stack.

    If the creation of update is interrupted by some reason(e.g. heat-engine
    down), dependencies between backup_stack and exist_stack will exist.

    Heat deletes backup_stack before deleting exist_stack. The dependency
    still exists, so it will be failed. Thus, add behavior that restores
    resource_id in the backup_stack when the stack is deleted.

    Change-Id: I32cecba17ae160d4631fb2611e34b91bb921dc5e
    Closes-bug: #1334514

Changed in heat:
status: In Progress → Fix Committed
Zane Bitter (zaneb)
Changed in heat:
importance: Undecided → Medium
milestone: none → juno-3
Thierry Carrez (ttx)
Changed in heat:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: juno-3 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.