Resource type in old stack undefined during update

Bug #1508096 reported by Zane Bitter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Thomas Herve
Juno
Fix Released
High
Crag Wolfe
Kilo
Fix Released
High
Zane Bitter
Liberty
Fix Released
High
Zane Bitter

Bug Description

I just saw the following backtrace (in Kilo):

Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, in fire_timers
timer()
File "/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
cb(*args, **kw)
File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
result = function(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 112, in _start_with_trace
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper
return f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 865, in update
updater()
File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 174, in __call__
self.start(timeout=timeout)
File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 200, in start
self.step()
File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 223, in step
next(self._runner)
File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 289, in wrapper
subtask = next(parent)
File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 918, in update_task
updater.start(timeout=self.timeout_secs())
File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 200, in start
self.step()
File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 223, in step
next(self._runner)
File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 289, in wrapper
subtask = next(parent)
File "/usr/lib/python2.7/site-packages/heat/engine/update.py", line 55, in __call__
self.previous_stack.dependencies,
File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 238, in dependencies
self.resources.itervalues())
File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 201, in resources
self.t.resource_definitions(self).items())
File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 200, in <genexpr>
for (name, data) in
File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 141, in __new__
resource_name=name)
File "/usr/lib/python2.7/site-packages/heat/engine/environment.py", line 416, in get_class
raise exception.StackValidationFailed(message=msg)
StackValidationFailed: Unknown resource Type : OS::TripleO::AllNodes::Validation

It appears that it is possible to load the previous stack using an environment in which an existing resource type is not defined; it then fails with StackValidationFailed upon trying to actually create the resources.

Observing this issue is made harder by bug 1492433 and bug 1492427 - the stack was stuck in UPDATE_IN_PROGRESS and I found the backtrace only in the journal.

Revision history for this message
Jan Provaznik (jan-provaznik) wrote :

I hit this one recently, unfortunately I don't have exact steps for reproducer, but rough flow was:
1) deloyed overcloud (tripleo)
2) ran a stack-update operation (openstack overcloud update stack)
3) heat-engine was killed by OOM killer during stack-update
4) I manually updated state of IN_PROGRESS stacks/resources in DB to FAILED
5) restarted heat-engine and ran stack-update again -> at this point stack got stuck in IN_PROGRESS

Revision history for this message
Zane Bitter (zaneb) wrote :

So it seems likely that the issue here is something along the lines that we're overwriting the old environment too early to be able to recover the stack if an engine dies in mid-update.

Changed in heat:
importance: High → Medium
status: New → Triaged
Revision history for this message
Zane Bitter (zaneb) wrote :

Looks like we have another reproducer not involving killing the engine here:

https://bugzilla.redhat.com/show_bug.cgi?id=1278975

I suspect that this is similar to bug 1477812 but with resource type mappings instead of parameters. #fixedbyconvergence

Changed in heat:
importance: Medium → High
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

This may end up being a duplicate of bug 1447194

Zane Bitter (zaneb)
Changed in heat:
status: Triaged → Incomplete
importance: High → Undecided
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

I believe this will affect master too. The fix for bug 1447194 results in the correct exceptions being raised, but I think Resource needs to fallback to TemplateResource for both TemplateNotFound and ResourceTypeNotFound:

http://git.openstack.org/cgit/openstack/heat/tree/heat/engine/resource.py#n141

Changed in heat:
status: Incomplete → Triaged
importance: Undecided → High
assignee: nobody → Steve Baker (steve-stevebaker)
milestone: none → mitaka-1
tags: added: kilo-backport-potential liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/243354

Changed in heat:
status: Triaged → In Progress
Revision history for this message
Thomas Herve (therve) wrote :

OK! This does indeed affect master. I've found the following step to reproduce it:

1) Create a simple stack with a OS::Heat::TestResource
2) Update the stack, adding a MyResource referring to some resource in the environment, and updating the TestResource to fail: True
3) Update again the stack without the fail.

I think I've also found the fix, as it sounds fairly similir to bug 1477812.

Revision history for this message
Zane Bitter (zaneb) wrote :

As I noted in the comments of https://review.openstack.org/#/c/244751/ it sounds like https://review.openstack.org/#/c/184026/5 was the cause here, and since that was also backported to Kilo and Juno this likely affects Juno as well.

Changed in heat:
assignee: Steve Baker (steve-stevebaker) → Thomas Herve (therve)
tags: added: juno-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (master)

Change abandoned by Steve Baker (<email address hidden>) on branch: master
Review: https://review.openstack.org/243354
Reason: superseded by https://review.openstack.org/#/c/244751/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/244751
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=88da460316e2c18b3ceb58f51dcf0bda717a18a3
Submitter: Jenkins
Branch: master

commit 88da460316e2c18b3ceb58f51dcf0bda717a18a3
Author: Thomas Herve <email address hidden>
Date: Thu Nov 12 17:35:07 2015 +0100

    Copy the env to the backup stack in failed update

    When an update fails, we currently update the environment of the stack
    to contain both the new and old parameters and types. Unfortunately we
    don't do that for the backup stack which is reused. This patch adds the
    environment change.

    Change-Id: I4dc5dd35e4aeee498dd8960a1000913e97b924d5
    Closes-Bug: 1508096

Changed in heat:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/245044

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/246192

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/245044
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=13c68910704844719b2d3de353d2db276377bf11
Submitter: Jenkins
Branch: master

commit 13c68910704844719b2d3de353d2db276377bf11
Author: Thomas Herve <email address hidden>
Date: Fri Nov 13 09:35:55 2015 +0100

    Add a test for environment change in failed update

    Add a functional test which verifies that adding a new resource during
    an update, with a new custom resource type mapping in the environment,
    allows for recovery when the update fails.

    Change-Id: I7e52703b7f45c79a3a1434200d1e49988e78f333
    Related-Bug: 1508096

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/247085

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/247087

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (stable/kilo)

Related fix proposed to branch: stable/kilo
Review: https://review.openstack.org/247248

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (stable/kilo)

Reviewed: https://review.openstack.org/247248
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=4c85c14045d09280556b6f6821c14ce4044f0ae0
Submitter: Jenkins
Branch: stable/kilo

commit 4c85c14045d09280556b6f6821c14ce4044f0ae0
Author: Steve Baker <email address hidden>
Date: Thu Nov 19 14:05:24 2015 +1300

    Backport TestResource to stable/kilo

    The kilo TestResource used in functional tests has limited features and
    hidden bugs which are only apparent when other functional tests are
    backported. This change backports the stable/liberty TestResource to
    stable/kilo.

    Change-Id: Ib3648e21440031b0b1231d81a7a2825414457f72
    Related-Bug: #1508096

tags: added: in-stable-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/liberty)

Reviewed: https://review.openstack.org/247085
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=fde608d0b513ef6bd64bb15a1b444789ce55e9b9
Submitter: Jenkins
Branch: stable/liberty

commit fde608d0b513ef6bd64bb15a1b444789ce55e9b9
Author: Thomas Herve <email address hidden>
Date: Thu Nov 12 17:35:07 2015 +0100

    Copy the env to the backup stack in failed update

    When an update fails, we currently update the environment of the stack
    to contain both the new and old parameters and types. Unfortunately we
    don't do that for the backup stack which is reused. This patch adds the
    environment change.

    Change-Id: I4dc5dd35e4aeee498dd8960a1000913e97b924d5
    Closes-Bug: 1508096
    (cherry picked from commit 88da460316e2c18b3ceb58f51dcf0bda717a18a3
                           and 13c68910704844719b2d3de353d2db276377bf11)

Alan Pevec (apevec)
tags: removed: in-stable-kilo juno-backport-potential
Alan Pevec (apevec)
tags: removed: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/kilo)

Reviewed: https://review.openstack.org/247087
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=b94eb8f072cf39e3ac28fdd50dee1af3eea07752
Submitter: Jenkins
Branch: stable/kilo

commit b94eb8f072cf39e3ac28fdd50dee1af3eea07752
Author: Thomas Herve <email address hidden>
Date: Thu Nov 12 17:35:07 2015 +0100

    Copy the env to the backup stack in failed update

    When an update fails, we currently update the environment of the stack
    to contain both the new and old parameters and types. Unfortunately we
    don't do that for the backup stack which is reused. This patch adds the
    environment change.

    Change-Id: I4dc5dd35e4aeee498dd8960a1000913e97b924d5
    Closes-Bug: 1508096
    (cherry picked from commit 88da460316e2c18b3ceb58f51dcf0bda717a18a3
                           and 13c68910704844719b2d3de353d2db276377bf11)

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/heat 6.0.0.0b1

This issue was fixed in the openstack/heat 6.0.0.0b1 development milestone.

Changed in heat:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (stable/juno)

Change abandoned by Jeremy Stanley (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/246192
Reason: I'm abandoning this change in preparation for deleting the stable/juno branch, which is now at end of life.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/heat 5.0.1

This issue was fixed in the openstack/heat 5.0.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.