multiprocessing.Pool is primarily designed to be used for dynamic
short running workers that return results. As such it has particular
handling characteristics that make it difficult to effectively
process exceptions in the worker children and manage a suite of long
running workers.
With metricd this causes problems when there are issues during
child startup, either in inability to import code or inability to
connect to storage or index services: The related exceptions are
swallowed and the metricd processes continue running without doing
any work.
This change switches to using an explicitly managed collection of
Process subclasses each of which, when run(), processes some
metrics, sleeps a bit, and then processes some more.
This allows a few useful features:
* children are cleaned up properly by the parent on both
KeyboardInterrupt and SIGTERM, with a suitable exit code
* exceptions in startup are heard and cause metricd to exit with a
suitable log message and exit code
* asyncio can be removed for the time being
* retrying is used on the _configure method (during which the
indexer and storage are configured and connected), with
exponential backoff up to five minutes
Reviewed: https:/ /review. openstack. org/221400 /git.openstack. org/cgit/ openstack/ gnocchi/ commit/ ?id=dab971c67fd 940d84d533ede23 2342289d20d416
Committed: https:/
Submitter: Jenkins
Branch: master
commit dab971c67fd940d 84d533ede232342 289d20d416
Author: Chris Dent <email address hidden>
Date: Tue Sep 8 16:16:14 2015 +0000
Make metricd use a collection of Process not Pool
multiproces sing.Pool is primarily designed to be used for dynamic
short running workers that return results. As such it has particular
handling characteristics that make it difficult to effectively
process exceptions in the worker children and manage a suite of long
running workers.
With metricd this causes problems when there are issues during
child startup, either in inability to import code or inability to
connect to storage or index services: The related exceptions are
swallowed and the metricd processes continue running without doing
any work.
This change switches to using an explicitly managed collection of
Process subclasses each of which, when run(), processes some
metrics, sleeps a bit, and then processes some more.
This allows a few useful features:
* children are cleaned up properly by the parent on both nterrupt and SIGTERM, with a suitable exit code
KeyboardI
* exceptions in startup are heard and cause metricd to exit with a
suitable log message and exit code
* asyncio can be removed for the time being
* retrying is used on the _configure method (during which the
indexer and storage are configured and connected), with
exponential backoff up to five minutes
Change-Id: I21c003b75f46c6 af552970a84579d 1dc6bf55348
Related-Bug: #1493060
Closes-Bug: #1491339