stack-update fails when new resources added to OS::Heat::AccessPolicy

Bug #1286307 reported by Tomas Sedovic
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Tomas Sedovic
tripleo
Fix Released
High
Tomas Sedovic

Bug Description

We've hit this bug in TripleO: https://bugs.launchpad.net/tripleo/+bug/1279011

We start with an AccessPolicy that has two allowed resources (OS::Nova::Server and AWS::AutoScaling::LaunchConfiguration):

  ComputeAccessPolicy:
    Properties:
      AllowedResources:
      - NovaCompute0
      - NovaCompute0Config
    Type: OS::Heat::AccessPolicy

We ask Heat to create the stack. Then we add another pair of resources (NovaCompute1 and NovaCompute1Config identical to the previous ones) and update ComputAccessPolicy's AllowedResources to include these as well:

  ComputeAccessPolicy:
    Properties:
      AllowedResources:
      - NovaCompute0
      - NovaCompute0Config
      - NovaCompute1
      - NovaCompute1Config
    Type: OS::Heat::AccessPolicy

and do a stack-update with this new template.

The operation fails during the ComputeAccessPolicy update.

Heat event log: http://paste.openstack.org/show/64344/

stack trace from heat-engine: http://paste.openstack.org/show/70606/

The reason as far as I can see is this:

1. AccessPolicy validates on creation that the stack has all the AllowedResources:

https://github.com/openstack/heat/blob/e8eddb87d34aa9fe2a2f5abcb75d9b27f86f1a68/heat/engine/resources/user.py#L321

2. AccessPolicy is created/updated before OS::Nova::Server resources (at least in our case)
3. This is not a problem because the AllowedResources don't have to be CREATE_COMPLETE or even CREATE_IN_PROGRESS yet -- they just have to be known to the stack
4. Updating AccessPolicy causes a replacement so this is verified during update, too
5. During stack-update, the ComputeAccessPolicy resource starts owned by the new stack (which is aware of the new NovaCompute1 and NovaCompute1Config resources):

https://github.com/openstack/heat/blob/e8eddb87d34aa9fe2a2f5abcb75d9b27f86f1a68/heat/engine/update.py#L189

6. However, right before the resource gets created, its internal reference to the owning stack is switched from the new stack to the current stack (which doesn't know about NovaCompute1 and NovaCompute1Config):

https://github.com/openstack/heat/blob/e8eddb87d34aa9fe2a2f5abcb75d9b27f86f1a68/heat/engine/update.py#L123

The line above looks completely harmless, but it silently modifies the right-hand side of the assignment:

https://github.com/openstack/heat/blob/e8eddb87d34aa9fe2a2f5abcb75d9b27f86f1a68/heat/engine/parser.py#L263

This means that this:

https://github.com/openstack/heat/blob/e8eddb87d34aa9fe2a2f5abcb75d9b27f86f1a68/heat/engine/update.py#L124

is now executed within the context of `existing_stack` and causes `AccessPolicy.handle_create` to raise an exception (since all this happens before the AllowedResources are created).

As an aside, however we end up fixing this (I'd like to have a look at it on Monday unless someone picks it up first), I think we should in general clean up any operator overloading that doesn't follow the expected behaviour (in this case assignment modifying the value being assigned). It violates the principle of least astonishment, explicit better than implicit and is just plain hard to understand.

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 1286307] [NEW] stack-update fails when new resources added to OS::Heat::AccessPolicy
Download full text (3.5 KiB)

Excerpts from Tomas Sedovic's message of 2014-02-28 19:49:24 UTC:
> Public bug reported:
>
> We've hit this bug in TripleO:
> https://bugs.launchpad.net/tripleo/+bug/1279011
>
> We start with an AccessPolicy that has two allowed resources
> (OS::Nova::Server and AWS::AutoScaling::LaunchConfiguration):
>
> ComputeAccessPolicy:
> Properties:
> AllowedResources:
> - NovaCompute0
> - NovaCompute0Config
> Type: OS::Heat::AccessPolicy
>
> We ask Heat to create the stack. Then we add another pair of resources
> (NovaCompute1 and NovaCompute1Config identical to the previous ones) and
> update ComputAccessPolicy's AllowedResources to include these as well:
>
> ComputeAccessPolicy:
> Properties:
> AllowedResources:
> - NovaCompute0
> - NovaCompute0Config
> - NovaCompute1
> - NovaCompute1Config
> Type: OS::Heat::AccessPolicy
>
> and do a stack-update with this new template.
>
> The operation fails during the ComputeAccessPolicy update.
>
> Heat event log: http://paste.openstack.org/show/64344/
>
> stack trace from heat-engine: http://paste.openstack.org/show/70606/
>
> The reason as far as I can see is this:
>
> 1. AccessPolicy validates on creation that the stack has all the
> AllowedResources:
>
> https://github.com/openstack/heat/blob/e8eddb87d34aa9fe2a2f5abcb75d9b27f86f1a68/heat/engine/resources/user.py#L321
>
> 2. AccessPolicy is created/updated before OS::Nova::Server resources (at least in our case)
> 3. This is not a problem because the AllowedResources don't have to be CREATE_COMPLETE or even CREATE_IN_PROGRESS yet -- they just have to be known to the stack
> 4. Updating AccessPolicy causes a replacement so this is verified during update, too
> 5. During stack-update, the ComputeAccessPolicy resource starts owned by the new stack (which is aware of the new NovaCompute1 and NovaCompute1Config resources):
>
> https://github.com/openstack/heat/blob/e8eddb87d34aa9fe2a2f5abcb75d9b27f86f1a68/heat/engine/update.py#L189
>
> 6. However, right before the resource gets created, its internal
> reference to the owning stack is switched from the new stack to the
> current stack (which doesn't know about NovaCompute1 and
> NovaCompute1Config):
>
> https://github.com/openstack/heat/blob/e8eddb87d34aa9fe2a2f5abcb75d9b27f86f1a68/heat/engine/update.py#L123
>
> The line above looks completely harmless, but it silently modifies the
> right-hand side of the assignment:
>
> https://github.com/openstack/heat/blob/e8eddb87d34aa9fe2a2f5abcb75d9b27f86f1a68/heat/engine/parser.py#L263
>
> This means that this:
>
> https://github.com/openstack/heat/blob/e8eddb87d34aa9fe2a2f5abcb75d9b27f86f1a68/heat/engine/update.py#L124
>
> is now executed within the context of `existing_stack` and causes
> `AccessPolicy.handle_create` to raise an exception (since all this
> happens before the AllowedResources are created).
>
> As an aside, however we end up fixing this (I'd like to have a look at
> it on Monday unless someone picks it up first), I think we should in
> general clean up any operator overloading that doesn't follow the
> expected behaviour (in this case assignment modifying the value bei...

Read more...

Revision history for this message
Tomas Sedovic (tsedovic) wrote :

Ladislav Smola tried that. He put all AllowedResources into DependsOn:

  ComputeAccessPolicy:
    DependsOn:
    - NovaCompute0
    - NovaCompute0Config
    - NovaCompute1
    - NovaCompute1Config
    Properties:
      AllowedResources:
      - NovaCompute0
      - NovaCompute0Config
      - NovaCompute1
      - NovaCompute1Config
    Type: OS::Heat::AccessPolicy

and got a circular dependency error. I'm not too familiar with the deps within TripleO templates. Maybe depending on just the compute nodes (or just the configs) would be enough. If we can do that, it should be a viable workaround.

Tomas Sedovic (tsedovic)
Changed in tripleo:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Tomas Sedovic (tsedovic) wrote :

Here's a testcase exposing the isuse:

http://paste.openstack.org/show/72216/

Changed in heat:
assignee: nobody → Tomas Sedovic (tsedovic)
Steven Hardy (shardy)
Changed in heat:
status: New → Confirmed
importance: Undecided → High
milestone: none → icehouse-rc1
Revision history for this message
Steven Hardy (shardy) wrote :

Thanks for the detailed report Tomas :)

So I've taken a look and confirm what is reported, basically because the AccessPolicy resource takes resource names instead of a Ref to the resources in DependsOn we don't create any implicit dependencies resulting in this situation.

I can see two possible solutions atm:

1. Make the DependsOn list create dependencies so the AccessPolicy is always created after the resources it controls access to
2. Don't create any dependencies but move the check for the resource being in the stack to a validate function, which should be called in the context of the updated stack template, not the partially updated stack as in handle_update

From comment #2 it sounds like option 2 is preferable?

Revision history for this message
Tomas Sedovic (tsedovic) wrote :

Thanks Steve.

I think option 2 seems to be better for the immediate problem TripleO has. And it wouldn't break backwards compatibility.

On the other hand, how common is it for Heat resources to reference other resources by name instead of by Ref or implicit DependsOn? I have a feeling that should be the right behaviour here, but I'm not that familiar with all the Heat resources.

Plus it would be bit trickier to introduce (since it could in fact cause circular dependency errors and such).

Could you please give me some pointers on the implementation you suggested? E.g. where do you define a validate function, where is it called from, etc. I'm not that familiar with all that yet.

When I just tried to move the validation code from AccessPolicy's handle_create to check_create_complete, the test I pasted earlier passes (and it doesn't break existing tests either). I haven't tested it manually yet and I'm not sure about the ordering and concurrency of the create & check actions. So it may be this works just because of a lucky race condition but isn't a complete solution.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/78624

Changed in heat:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/78624
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=b509812ea7bb031b6995be2a5ad5326a6417de50
Submitter: Jenkins
Branch: master

commit b509812ea7bb031b6995be2a5ad5326a6417de50
Author: Tomas Sedovic <email address hidden>
Date: Thu Mar 6 09:23:08 2014 -0500

    Fix AccessPolicy update with added resources

    This adds a test exposing bug #1286307 and moves the check for resources being
    present from AccessPolicy's handle_create to check_create_complete. This defers
    the check to a later point where the referenced resources are present.

    Change-Id: I47573812a3f80f3ae5b858cca486ec5826f606fd
    Closes-Bug: #1286307

Changed in heat:
status: In Progress → Fix Committed
Tomas Sedovic (tsedovic)
Changed in tripleo:
status: Confirmed → Fix Committed
assignee: nobody → Tomas Sedovic (tsedovic)
Changed in tripleo:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: icehouse-rc1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.