Comment 4 for bug 1493060

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to gnocchi (master)

Reviewed: https://review.openstack.org/221400
Committed: https://git.openstack.org/cgit/openstack/gnocchi/commit/?id=dab971c67fd940d84d533ede232342289d20d416
Submitter: Jenkins
Branch: master

commit dab971c67fd940d84d533ede232342289d20d416
Author: Chris Dent <email address hidden>
Date: Tue Sep 8 16:16:14 2015 +0000

    Make metricd use a collection of Process not Pool

    multiprocessing.Pool is primarily designed to be used for dynamic
    short running workers that return results. As such it has particular
    handling characteristics that make it difficult to effectively
    process exceptions in the worker children and manage a suite of long
    running workers.

    With metricd this causes problems when there are issues during
    child startup, either in inability to import code or inability to
    connect to storage or index services: The related exceptions are
    swallowed and the metricd processes continue running without doing
    any work.

    This change switches to using an explicitly managed collection of
    Process subclasses each of which, when run(), processes some
    metrics, sleeps a bit, and then processes some more.

    This allows a few useful features:

    * children are cleaned up properly by the parent on both
      KeyboardInterrupt and SIGTERM, with a suitable exit code
    * exceptions in startup are heard and cause metricd to exit with a
      suitable log message and exit code
    * asyncio can be removed for the time being
    * retrying is used on the _configure method (during which the
      indexer and storage are configured and connected), with
      exponential backoff up to five minutes

    Change-Id: I21c003b75f46c6af552970a84579d1dc6bf55348
    Related-Bug: #1493060
    Closes-Bug: #1491339