Cannot delete the Stack after UpdateReplace was failed.

Bug #1270775 reported by Mitsuru Kanabuchi
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Mitsuru Kanabuchi

Bug Description

=============================================================

This is updated bug description that is belong to discussion.
We find there are two ploblems about Heat:

1) when UpdateReplace is failed, cannot delete the Stack if there are
    dependency between backup_stack and existing_stack.
2) cannot access backup_stack via API.

This bug report treats 1). New bug report about 2) will be created.

=============================================================

* Problem and Reproduce

When deleting the stack after UPDATE_FAILED, it causes DELETE_FAILED and "Error
deleting backup resources" in the Stack. This is an example.

devstack@heat-test-env:~/work_heat/updatereplace$ cat template.before
{
   "AWSTemplateFormatVersion" : "2010-09-09",
   "Resources" : {
     "Net": {
       "Type": "OS::Neutron::Net",
       "Properties": {
         "name": "network"
       }
     },
     "SubNet": {
       "Type": "OS::Neutron::Subnet",
       "Properties": {
         "network_id": {"Ref": "Net"},
         "ip_version": 4,
         "cidr": "10.0.1.0/24",
         "allocation_pools": [{"start": "10.0.1.20", "end": "10.0.1.150"}]
       }
     },
     "Port": {
       "Type": "OS::Neutron::Port",
       "Properties": {
         "network_id": {"Ref": "Net"},
         "fixed_ips": [{"subnet_id": {"Ref": "SubNet"}, "ip_address": "10.0.1.30"}]
       }
     }
   }
}
devstack@heat-test-env:~/work_heat/updatereplace$ heat stack-create -f template.before stack1
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 8ea7366e-a870-495b-9854-86b007f4d927 | stack1 | CREATE_IN_PROGRESS | 2014-03-28T05:24:15Z |
+--------------------------------------+------------+--------------------+----------------------+
devstack@heat-test-env:~/work_heat/updatereplace$ heat stack-list
+--------------------------------------+------------+-----------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+-----------------+----------------------+
| 8ea7366e-a870-495b-9854-86b007f4d927 | stack1 | CREATE_COMPLETE | 2014-03-28T05:24:15Z |
+--------------------------------------+------------+-----------------+----------------------+
devstack@heat-test-env:~/work_heat/updatereplace$ cat template.after
{
   "AWSTemplateFormatVersion" : "2010-09-09",
   "Resources" : {
     "Net": {
       "Type": "OS::Neutron::Net",
       "Properties": {
         "name": "network"
       }
     },
     "SubNet": {
       "Type": "OS::Neutron::Subnet",
       "Properties": {
         "network_id": {"Ref": "Net"},
         "ip_version": 4,
         "cidr": "10.0.2.0/24",
         "allocation_pools": [{"start": "10.0.2.20", "end": "10.0.2.999"}]
       }
     },
     "Port": {
       "Type": "OS::Neutron::Port",
       "Properties": {
         "network_id": {"Ref": "Net"},
         "fixed_ips": [{"subnet_id": {"Ref": "SubNet"}, "ip_address": "10.0.2.30"}]
       }
     }
   }
}
devstack@heat-test-env:~/work_heat/updatereplace$ heat stack-update -f template.after stack1
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 8ea7366e-a870-495b-9854-86b007f4d927 | stack1 | UPDATE_IN_PROGRESS | 2014-03-28T05:24:15Z |
+--------------------------------------+------------+--------------------+----------------------+
devstack@heat-test-env:~/work_heat/updatereplace$ heat stack-list
+--------------------------------------+------------+---------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+---------------+----------------------+
| 8ea7366e-a870-495b-9854-86b007f4d927 | stack1 | UPDATE_FAILED | 2014-03-28T05:24:15Z |
+--------------------------------------+------------+---------------+----------------------+
devstack@heat-test-env:~/work_heat/updatereplace$ heat stack-delete stack1
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 8ea7366e-a870-495b-9854-86b007f4d927 | stack1 | DELETE_IN_PROGRESS | 2014-03-28T05:24:15Z |
+--------------------------------------+------------+--------------------+----------------------+
devstack@heat-test-env:~/work_heat/updatereplace$ heat stack-list
+--------------------------------------+------------+---------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+---------------+----------------------+
| 8ea7366e-a870-495b-9854-86b007f4d927 | stack1 | DELETE_FAILED | 2014-03-28T05:24:15Z |
+--------------------------------------+------------+---------------+----------------------+
devstack@heat-test-env:~/work_heat/updatereplace$ heat stack-show stack1
+----------------------+-------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+----------------------+-------------------------------------------------------------------------------------------------------------------------------+
| capabilities | [] |
| creation_time | 2014-03-28T05:24:15Z |
| description | No description |
| disable_rollback | True |
| id | 8ea7366e-a870-495b-9854-86b007f4d927 |
| links | http://192.168.10.99:8004/v1/c1685cb61c4243efa3660550a0b99627/stacks/stack1/8ea7366e-a870-495b-9854-86b007f4d927 |
| notification_topics | [] |
| parameters | { |
| | "AWS::StackId": "arn:openstack:heat::c1685cb61c4243efa3660550a0b99627:stacks/stack1/8ea7366e-a870-495b-9854-86b007f4d927", |
| | "AWS::Region": "ap-southeast-1", |
| | "AWS::StackName": "stack1" |
| | } |
| stack_name | stack1 |
| stack_status | DELETE_FAILED |
| stack_status_reason | Failed to DELETE : Error deleting backup resources: |
| | Resource DELETE failed: NeutronClientException: |
| | 409-{u'NeutronError': {u'message': u'Unable to complete |
| | operation on subnet b06cdbb0-8898-4ebb-999e- |
| | 5e7145322912. One or more ports have an IP allocation f |
| template_description | No description |
| timeout_mins | 60 |
| updated_time | 2014-03-28T05:25:21Z |
+----------------------+-------------------------------------------------------------------------------------------------------------------------------+
devstack@heat-test-env:~/work_heat/updatereplace$ heat resource-list stack1
+---------------+---------------------+-----------------+----------------------+
| resource_name | resource_type | resource_status | updated_time |
+---------------+---------------------+-----------------+----------------------+
| Net | OS::Neutron::Net | CREATE_COMPLETE | 2014-03-28T05:24:15Z |
| Port | OS::Neutron::Port | CREATE_COMPLETE | 2014-03-28T05:24:16Z |
| SubNet | OS::Neutron::Subnet | CREATE_FAILED | 2014-03-28T05:25:24Z |
+---------------+---------------------+-----------------+----------------------+
devstack@heat-test-env:~/work_heat/updatereplace$ heat resource-show stack1 SubNet
+------------------------+-----------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+------------------------+-----------------------------------------------------------------------------------------------------------------------------------+
| description | |
| links | http://192.168.10.99:8004/v1/c1685cb61c4243efa3660550a0b99627/stacks/stack1/8ea7366e-a870-495b-9854-86b007f4d927/resources/SubNet |
| | http://192.168.10.99:8004/v1/c1685cb61c4243efa3660550a0b99627/stacks/stack1/8ea7366e-a870-495b-9854-86b007f4d927 |
| logical_resource_id | SubNet |
| physical_resource_id | |
| required_by | Port |
| resource_name | SubNet |
| resource_status | CREATE_FAILED |
| resource_status_reason | NeutronClientException: Invalid input for allocation_pools. Reason: '10.0.2.999' is not a valid IP address. |
| resource_type | OS::Neutron::Subnet |
| updated_time | 2014-03-28T05:25:24Z |
+------------------------+-----------------------------------------------------------------------------------------------------------------------------------+

* IMO

When deleting stack, Heat attempts to delete backup_stack if it exists.
( please see delete method on parser.py )
There is only SubNet in the backup_stack that has the Port. So the Subnet
cannot be deleted (cannot delete subnets before all ports has been deleted).

mysql> select id, nova_instance, name, status, stack_id from resource;
+--------------------------------------+--------------------------------------+--------+----------+--------------------------------------+
| id | nova_instance | name | status | stack_id |
+--------------------------------------+--------------------------------------+--------+----------+--------------------------------------+
| 21d8983e-4978-435c-9b71-028d405d4b4a | 3a3c6ecf-de94-4a6e-a64c-d6ae65af4fb4 | Net | COMPLETE | 9972abb2-686e-43b2-828a-f933b65824ce |
| 58d92166-e6aa-4d6d-be0d-778e96be8e52 | e67aebe4-7dfb-4a95-ac2f-d0f2cd5cfbe9 | SubNet | FAILED | 1eb8a9ee-772b-4c98-a364-846fb917e054 |
| 8969e3bb-e9be-4cef-8a0d-12f005769a6c | d328113e-5c28-4d38-9f69-19cc5bc40ad9 | Port | COMPLETE | 9972abb2-686e-43b2-828a-f933b65824ce |
| ed16b409-a875-4c1c-b3b5-986fcbe9d88f | NULL | SubNet | FAILED | 9972abb2-686e-43b2-828a-f933b65824ce |
+--------------------------------------+--------------------------------------+--------+----------+--------------------------------------+
4 rows in set (0.00 sec)

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :
Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :
Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :
Ryo Miki (miki-ryo-e)
Changed in heat:
assignee: nobody → Ryo Miki (miki-ryo-e)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/78952

Changed in heat:
status: New → In Progress
Revision history for this message
Zane Bitter (zaneb) wrote : Re: Heat forget resource_id when UpdateReplace failed

Here's how it's supposed to work:
- The resource to be replaced gets moved to the backup stack
- The replacement resource is created. If this fails then...
- If rollback is enabled, it is deleted and the original resource returned. Everything is fine.
- If rollback is disabled, the failed resource remains in the stack and the old resource in the backup stack.
- When deleting the stack both the failed and the backup resource will be deleted.

It sounds a lot like that is how things are actually working.

Arguably we should delete the backup stack immediately after the update fails. The downside is that this takes out of service what may still be a working stack. We also might want to use these resources in the future to recover on a subsequent update.

What would be nice is if when a stack is left in a state where there are two copies of one resource, we had some way in the API to see it.

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

Hi Zane, thank you for describe about the UpdateReplace's processing.
Probably I have understood the UpdateReplace's purpose. It's a great idea.

I think I should write the detail more. I'll write in following.

[1] What is the problem?

Heat lose the resource's physical_resource_id when UpdateReplace failed.
I think, it can be serious problem.

  - Actually, the underlying resource is exist in that circumstance. But Heat have lost physical_resource_id.
  - The above resource is remained after stack-delete. Because Heat doesn't know physical_resource_id for deleting resource.
  - The user can't do action because they can never know underlying resource's id via Heat.

[2] How to reproduce?

  1) Create the new stack with template: before.template . This template would create net and subnet. The operation would be success.
      $ heat stack-create -f before.template test-stack
  2) Update the stack with template: after.template . The operation would be fail because template has wrong property of allocation_pools.
      $ heat stack-update -f after.template test-stack
  3) Please check update failed resource's physical_resource_id. It have lost.
      $ heat resource-show test-stack subnet

[3] Why need to copy resource_id?

This problem occur when handle_create failed in the UpdateReplace.
AFAIK, the UpdateReplace processing order is:

  a) Create the new resource class for the updated resource. The physical_resource_id is None at that time.
  b) Entering into the UpdateReplace.
  c) Call handle_create for updated resource.
  d) Set physical_resource_id when handle_create succeeded.

The physical_resource_id is None between a) and d) .
If error occured at period of time, None value would store into DB as physical_resource_id.

The purpose of Ryo's patch is prevent to store None value as physical_resource_id.

Could you give me advise for fixing it?

Changed in heat:
assignee: Ryo Miki (miki-ryo-e) → Mitsuru Kanabuchi (kanabuchi)
Revision history for this message
Zane Bitter (zaneb) wrote :

It's not lost, it's moved to the backup stack. See lines 115-120 of update.py. It is, however, not visible via the API.

Your patch doesn't really solve the problem if e.g. creating the resource fails _after_ setting the resource_id (which is common - e.g. you create a Nova server but building it fails in Nova). The fundamental problem is that at some point during an update, two instances of a single resource in the template exist. We need a way for the API to report that.

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

It's true that the patch doesn't solve in case of creating the resource fails after setting the resource_id. Umm.
I think, we should add new column that refer to previous stack (or resource) for solving this problem fundamentally.
It looks like currently DB doesn't refer previous stack/resource.

Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

This looks similar to a problem that I recently encountered in several ways (need to get values of pre-update properties during update). You can try to use db.resource_data for this - see how OS::Neutron::LoadBalancer is implemented...

Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

ignore my last comment, resource_data is considered "evil" and should not be used

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

I'm drafting the code for resolving this bug with following approaches.

  * add new column resource.ref_nova_instance
  * When UpdateReplace occured, store old resource's nova_instance into ref_nova_instance
  * If ref_nova_instance isn't Null when destory started, delete old resource simultaneously with current resource

I'll post the patch as WIP.

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

I uploaded the new patch from above's policies.
(Probably Jenkins will be failed because test code isn't yet.)

Please give me comment for new patch.
I checked this patch can delete previous resource after UPDATE_FAILED occured.

Revision history for this message
Ryo Miki (miki-ryo-e) wrote :

Mitsuru shared this problem to me so I'm trying to figure out the root cause.
Zane, thank you for describing. Your comment is true. I totally understood what
backup stack would delete together when stack-delete.

I updated bug description. I think there are two problems below:

1) when UpdateReplace is failed, cannot delete the Stack if there are
    dependency between backup_stack and existing_stack.
2) cannot access backup_stack via Heat API.

I want to solve the problems by below actions;

- put brand-new patch about 1).
- report new bug about 2).

Please let us know what you think.

description: updated
summary: - Heat forget resource_id when UpdateReplace failed
+ Cannot delete the Stack after UpdateReplace was failed.
Changed in heat:
importance: Undecided → High
milestone: none → juno-1
Ryo Miki (miki-ryo-e)
Changed in heat:
assignee: Mitsuru Kanabuchi (kanabuchi) → Ryo Miki (miki-ryo-e)
tags: added: icehouse-rc-potential
Revision history for this message
Ryo Miki (miki-ryo-e) wrote :

I reporte about "2) cannot access backup_stack via Heat API".
There are no accessibilities for the backup_stack

https://bugs.launchpad.net/heat/+bug/1301320

Changed in heat:
status: In Progress → Triaged
tags: removed: icehouse-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/86232

Changed in heat:
status: Triaged → In Progress
Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

I made the material of this problem's instances. Please see attached file.

  * Case1: Parent in Backup
  * Case2: Orphaned Child

Basically, the dependencies over the border would reason of this problem.
Ryo's patch may be not fundamental solution, but it can reduce bug instances.

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

I upload the document that contain this bug's problem detail and solution.
This solution was accepted by Zane in Atlanta.

Changed in heat:
assignee: Ryo Miki (miki-ryo-e) → Mitsuru Kanabuchi (kanabuchi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/86232
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=5804fccd66678482b980a0ce65013cf2a8496582
Submitter: Jenkins
Branch: master

commit 5804fccd66678482b980a0ce65013cf2a8496582
Author: Mitsuru Kanabuchi <email address hidden>
Date: Wed May 21 17:48:10 2014 +0900

    Restore resource_id from backup_stack when delete

    When UpdateReplace is occured, the resource is moved to backup_stack,
    and create new one. Then, resources which has dependency for updated
    resource will be moved to backup_stack. If the creation of update is
    failed, dependencies between backup_stack and exist_stack will exist.

    Heat deletes backup_stack before deleting exist_stack. The dependency
    still exists, so it will be failed. Thus, add behavior that restores
    resource_id in the backup_stack when the stack is deleted.

    Change-Id: Ib2a39fa20089b7140fc676f54db8cf66a4581c60
    Closes-bug: #1270775

Changed in heat:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in heat:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.