When checking a resource, we don't handle exceptions other than ResourceFailure and a few others used for flow control if they are raised by the actual Resource's convergence create/update/delete methods. We also don't handle any exceptions that occur after that, e.g. when collecting data for the SyncPoint. If an exception occurs is prevents SyncPoints being updated and/or new resource checks being triggered, but does not signal that the stack has failed. Therefore it remains stuck IN_PROGRESS, at least until it times out but I suspect permanently.
These exceptions are not even logged, because they occur in an RPC 'cast' call - so nothing is listening at the other end.
Prior to the fix for bug 1492433, we had a more or less constant stream of bugs where the stack would get stuck IN_PROGRESS. Although the scope for such bugs is smaller in a single check_resource call, we run the same risk as we did then.
Reviewed: https:/ /review. openstack. org/481757 /git.openstack. org/cgit/ openstack/ heat/commit/ ?id=33a16aa7a80 8f3f1a9fc9faf2a 8b1017a8bcbbbe
Committed: https:/
Submitter: Jenkins
Branch: master
commit 33a16aa7a808f3f 1a9fc9faf2a8b10 17a8bcbbbe
Author: Zane Bitter <email address hidden>
Date: Mon Jul 10 13:48:01 2017 -0400
Log unhandled exceptions in worker
RPC calls to the worker use 'cast', so nothing is listening to find out the
result. If an exception occurs we will never hear about it. This change
logs such unhandled exceptions as errors.
Change-Id: I51365a9dee8fd4 eff85e77d3e42bf 33be814a22c
Partial-Bug: #1703043