No error logging for uncaught exceptions in setup/cleanup of long-running tasks

Bug #1492427 reported by Zane Bitter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Zane Bitter
Kilo
Fix Released
Medium
Steve Baker
Liberty
Fix Released
Medium
Steve Baker

Bug Description

Most exceptions - even unexpected exceptions - in Heat are caught and handled. For example, any exception that occurs while processing a resource operation will cause the resource to be placed in a FAILED state and will generate a ResourceFailure exception that is caught and handled appropriately.

Most uncaught, unhandled exceptions (e.g. exceptions that occur before starting a stack operation) will bubble up to the user - the RPC handler will throw the exception, which will then be reported as the result of the RPC call by oslo.messaging.

There is one exception: code at the start and end of a long-running stack operation that is executed in a separate greenthread (so that exceptions don't bubble up to a RPC response). While any exception that occurs here is a bug, it would be very helpful if we had better error reporting of such bugs when they do occur. Currently the backtrace is printed to stderr, which in many installations means /dev/null (on systems with systemd, it appears in the journald logs, so at least it's not completely lost). We should catch and log these exceptions with a big red ERROR.

tags: added: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/242278

Changed in heat:
assignee: nobody → Zane Bitter (zaneb)
status: Triaged → In Progress
tags: added: kilo-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/242278
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=03d9aebbc7d5fba048fa9f39813704e70dee9c5d
Submitter: Jenkins
Branch: master

commit 03d9aebbc7d5fba048fa9f39813704e70dee9c5d
Author: Zane Bitter <email address hidden>
Date: Wed Nov 4 18:10:55 2015 -0500

    Log an error on an uncaught exception in a thread

    Exceptions in synchronous tasks get caught by the olso_messaging code but
    for asynchronous tasks run in threads with nothing wait()ing for them,
    there is no top level error handling, and hence no logging. (Tracebacks
    would be written to stderr, so systemd users would see them only in the
    journal rather than heat-engine.log, and other users would quite likely
    lose them altogether.) This change ensures that any uncaught exceptions are
    logged as errors.

    Change-Id: I9410aa7ffd83391ea4db13c6e8cf49f26d3049fb
    Closes-Bug: #1492427

Changed in heat:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/243848

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/243850

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/liberty)

Reviewed: https://review.openstack.org/243848
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=86198c36cad7859fd6f1c02e44dc0b191b7c15bf
Submitter: Jenkins
Branch: stable/liberty

commit 86198c36cad7859fd6f1c02e44dc0b191b7c15bf
Author: Zane Bitter <email address hidden>
Date: Wed Nov 4 18:10:55 2015 -0500

    Log an error on an uncaught exception in a thread

    Exceptions in synchronous tasks get caught by the olso_messaging code but
    for asynchronous tasks run in threads with nothing wait()ing for them,
    there is no top level error handling, and hence no logging. (Tracebacks
    would be written to stderr, so systemd users would see them only in the
    journal rather than heat-engine.log, and other users would quite likely
    lose them altogether.) This change ensures that any uncaught exceptions are
    logged as errors.

    Change-Id: I9410aa7ffd83391ea4db13c6e8cf49f26d3049fb
    Closes-Bug: #1492427
    (cherry picked from commit 03d9aebbc7d5fba048fa9f39813704e70dee9c5d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/kilo)

Reviewed: https://review.openstack.org/243850
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=510345e5adea2bc3a4e4a7c0df65ed6d7c24d0a8
Submitter: Jenkins
Branch: stable/kilo

commit 510345e5adea2bc3a4e4a7c0df65ed6d7c24d0a8
Author: Zane Bitter <email address hidden>
Date: Wed Nov 4 18:10:55 2015 -0500

    Log an error on an uncaught exception in a thread

    Exceptions in synchronous tasks get caught by the olso_messaging code but
    for asynchronous tasks run in threads with nothing wait()ing for them,
    there is no top level error handling, and hence no logging. (Tracebacks
    would be written to stderr, so systemd users would see them only in the
    journal rather than heat-engine.log, and other users would quite likely
    lose them altogether.) This change ensures that any uncaught exceptions are
    logged as errors.

    Change-Id: I9410aa7ffd83391ea4db13c6e8cf49f26d3049fb
    Closes-Bug: #1492427
    (cherry picked from commit 03d9aebbc7d5fba048fa9f39813704e70dee9c5d)

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/heat 6.0.0.0b1

This issue was fixed in the openstack/heat 6.0.0.0b1 development milestone.

Changed in heat:
status: Fix Committed → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/heat 5.0.1

This issue was fixed in the openstack/heat 5.0.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.