if gnocchi-metricd fails start the storage client it processes no metrics and does not retry

Bug #1493060 reported by Chris Dent
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gnocchi
Fix Released
High
Chris Dent

Bug Description

When using gnocchi-metricd with swift, if the swift server is not available when gnocchi-metricd starts up it will not retry and no metrics will be processed. Since there is some likelihood that the swift server and metricd process will be starting around the same time, this is a problem.

2015-09-07 13:15:35.899 18427 DEBUG keystoneclient.auth.identity.v2 [-] Making authentication request to http://192.168.2.3:5000/v2.0/tokens get_auth_ref /usr/lib/python2.7/site-packages/keystoneclient/auth/identity/v2.py:86
2015-09-07 13:15:35.901 18427 ERROR swiftclient [-] Authorization Failure. Authorization Failed: Unable to establish connection to http://192.168.2.3:5000/v2.0/tokens
2015-09-07 13:15:35.901 18427 ERROR swiftclient Traceback (most recent call last):
2015-09-07 13:15:35.901 18427 ERROR swiftclient File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1294, in _retry
2015-09-07 13:15:35.901 18427 ERROR swiftclient self.url, self.token = self.get_auth()
2015-09-07 13:15:35.901 18427 ERROR swiftclient File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1267, in get_auth
2015-09-07 13:15:35.901 18427 ERROR swiftclient timeout=self.timeout)
2015-09-07 13:15:35.901 18427 ERROR swiftclient File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 460, in get_auth
2015-09-07 13:15:35.901 18427 ERROR swiftclient auth_version=auth_version)
2015-09-07 13:15:35.901 18427 ERROR swiftclient File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 388, in get_auth_keystone
2015-09-07 13:15:35.901 18427 ERROR swiftclient raise ClientException('Authorization Failure. %s' % err)
2015-09-07 13:15:35.901 18427 ERROR swiftclient ClientException: Authorization Failure. Authorization Failed: Unable to establish connection to http://192.168.2.3:5000/v2.0/tokens
2015-09-07 13:15:35.901 18427 ERROR swiftclient

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to gnocchi (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/221400

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to gnocchi (master)

Fix proposed to branch: master
Review: https://review.openstack.org/221430

Changed in gnocchi:
assignee: nobody → Chris Dent (chdent)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on gnocchi (master)

Change abandoned by Chris Dent (<email address hidden>) on branch: master
Review: https://review.openstack.org/221430
Reason: I agree with myself, this is insufficiently generic.

Julien Danjou (jdanjou)
Changed in gnocchi:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to gnocchi (master)

Reviewed: https://review.openstack.org/221400
Committed: https://git.openstack.org/cgit/openstack/gnocchi/commit/?id=dab971c67fd940d84d533ede232342289d20d416
Submitter: Jenkins
Branch: master

commit dab971c67fd940d84d533ede232342289d20d416
Author: Chris Dent <email address hidden>
Date: Tue Sep 8 16:16:14 2015 +0000

    Make metricd use a collection of Process not Pool

    multiprocessing.Pool is primarily designed to be used for dynamic
    short running workers that return results. As such it has particular
    handling characteristics that make it difficult to effectively
    process exceptions in the worker children and manage a suite of long
    running workers.

    With metricd this causes problems when there are issues during
    child startup, either in inability to import code or inability to
    connect to storage or index services: The related exceptions are
    swallowed and the metricd processes continue running without doing
    any work.

    This change switches to using an explicitly managed collection of
    Process subclasses each of which, when run(), processes some
    metrics, sleeps a bit, and then processes some more.

    This allows a few useful features:

    * children are cleaned up properly by the parent on both
      KeyboardInterrupt and SIGTERM, with a suitable exit code
    * exceptions in startup are heard and cause metricd to exit with a
      suitable log message and exit code
    * asyncio can be removed for the time being
    * retrying is used on the _configure method (during which the
      indexer and storage are configured and connected), with
      exponential backoff up to five minutes

    Change-Id: I21c003b75f46c6af552970a84579d1dc6bf55348
    Related-Bug: #1493060
    Closes-Bug: #1491339

Revision history for this message
Julien Danjou (jdanjou) wrote :

Marking this as released since this is not a problem anymore after some testing.

Changed in gnocchi:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.