If gnocchi-metricd has import errors in startup they are swallowed

Bug #1491339 reported by Chris Dent
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gnocchi
Fix Released
High
Chris Dent

Bug Description

(See the history of https://review.openstack.org/#/c/218217/ for more details)

If gnocchi-metricd has errors (at least import related errors) after spawning children they can be swallowed making debugging difficult (if not impossible).

Chris Dent (cdent)
Changed in gnocchi:
assignee: nobody → Chris Dent (chdent)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to gnocchi (master)

Fix proposed to branch: master
Review: https://review.openstack.org/221400

Changed in gnocchi:
status: New → In Progress
Julien Danjou (jdanjou)
Changed in gnocchi:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to gnocchi (master)

Reviewed: https://review.openstack.org/221400
Committed: https://git.openstack.org/cgit/openstack/gnocchi/commit/?id=dab971c67fd940d84d533ede232342289d20d416
Submitter: Jenkins
Branch: master

commit dab971c67fd940d84d533ede232342289d20d416
Author: Chris Dent <email address hidden>
Date: Tue Sep 8 16:16:14 2015 +0000

    Make metricd use a collection of Process not Pool

    multiprocessing.Pool is primarily designed to be used for dynamic
    short running workers that return results. As such it has particular
    handling characteristics that make it difficult to effectively
    process exceptions in the worker children and manage a suite of long
    running workers.

    With metricd this causes problems when there are issues during
    child startup, either in inability to import code or inability to
    connect to storage or index services: The related exceptions are
    swallowed and the metricd processes continue running without doing
    any work.

    This change switches to using an explicitly managed collection of
    Process subclasses each of which, when run(), processes some
    metrics, sleeps a bit, and then processes some more.

    This allows a few useful features:

    * children are cleaned up properly by the parent on both
      KeyboardInterrupt and SIGTERM, with a suitable exit code
    * exceptions in startup are heard and cause metricd to exit with a
      suitable log message and exit code
    * asyncio can be removed for the time being
    * retrying is used on the _configure method (during which the
      indexer and storage are configured and connected), with
      exponential backoff up to five minutes

    Change-Id: I21c003b75f46c6af552970a84579d1dc6bf55348
    Related-Bug: #1493060
    Closes-Bug: #1491339

Changed in gnocchi:
status: In Progress → Fix Committed
Julien Danjou (jdanjou)
Changed in gnocchi:
milestone: none → 1.2.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.