evaluator sometime fails to obtain values for gnocchi alarms

Bug #1540298 reported by Yurii Prokulevych
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Aodh
Fix Released
Undecided
Mehdi Abaakouk

Bug Description

aodh-evaluator fails to retrieve values for evaluation which leads alarm to be flapping between 'real state' and 'insufficient data' state.

Excerpt from aodh-evaluator.log:
2016-02-01 09:45:07.648 13650 DEBUG aodh.coordination [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] Members of group: ['fc8089f8-5b44-46fa-8450-1a40205e397f'] extract_my_subset /usr/lib/python2.7/site-packages/aodh/coordination.py:169
2016-02-01 09:45:07.655 13650 DEBUG aodh.coordination [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] My subset: [<aodh.storage.models.Alarm object at 0x2f58550>] extract_my_subset /usr/lib/python2.7/site-packages/aodh/coordination.py:175
2016-02-01 09:45:07.656 13650 INFO aodh.evaluator [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] initiating evaluation cycle on 1 alarms
2016-02-01 09:45:07.656 13650 DEBUG aodh.evaluator [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] evaluating alarm 53f5ed59-2b8a-443f-9f7a-d559fbf60c0f _evaluate_alarm /usr/lib/python2.7/site-packages/aodh/evaluator/__init__.py:218
2016-02-01 09:45:07.657 13650 DEBUG aodh.evaluator.threshold [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] query stats from 2016-02-01 09:41:07.657005 to 2016-02-01 09:45:07.657005 _bound_duration /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:80
2016-02-01 09:45:07.657 13650 DEBUG aodh.evaluator.gnocchi [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] stats query http://192.0.2.20:8041/v1/aggregation/resource/generic/metric/MyAlarmMeter1 _statistics /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:84
2016-02-01 09:45:07.774 13650 DEBUG aodh.evaluator.gnocchi [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] YuriiDebug: [["2016-02-01T09:42:00+00:00", 60.0, 3.0], ["2016-02-01T09:43:00+00:00", 60.0, 3.5]] _statistics /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:95
2016-02-01 09:45:07.775 13650 DEBUG aodh.evaluator.gnocchi [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] sanitize stats [[u'2016-02-01T09:42:00+00:00', 60.0, 3.0], [u'2016-02-01T09:43:00+00:00', 60.0, 3.5]] _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:109
2016-02-01 09:45:07.775 13650 DEBUG aodh.evaluator.gnocchi [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] pruned statists to 2 _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:113
2016-02-01 09:45:07.776 13650 WARNING aodh.evaluator.threshold [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] Expecting 3 datapoints but only get 2
2016-02-01 09:45:07.780 13650 INFO aodh.evaluator [req-47492f4d-0992-441b-98f5-77d177224670 - - - - -] alarm 53f5ed59-2b8a-443f-9f7a-d559fbf60c0f transitioning to insufficient data because 3 datapoints are unknown

2016-02-01 09:46:07.699 13650 DEBUG aodh.coordination [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] Members of group: ['fc8089f8-5b44-46fa-8450-1a40205e397f'] extract_my_subset /usr/lib/python2.7/site-packages/aodh/coordination.py:169
2016-02-01 09:46:07.729 13650 DEBUG aodh.coordination [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] My subset: [<aodh.storage.models.Alarm object at 0x2f71150>] extract_my_subset /usr/lib/python2.7/site-packages/aodh/coordination.py:175
2016-02-01 09:46:07.729 13650 INFO aodh.evaluator [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] initiating evaluation cycle on 1 alarms
2016-02-01 09:46:07.730 13650 DEBUG aodh.evaluator [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] evaluating alarm 53f5ed59-2b8a-443f-9f7a-d559fbf60c0f _evaluate_alarm /usr/lib/python2.7/site-packages/aodh/evaluator/__init__.py:218
2016-02-01 09:46:07.730 13650 DEBUG aodh.evaluator.threshold [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] query stats from 2016-02-01 09:42:07.730899 to 2016-02-01 09:46:07.730899 _bound_duration /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:80
2016-02-01 09:46:07.731 13650 DEBUG aodh.evaluator.gnocchi [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] stats query http://192.0.2.20:8041/v1/aggregation/resource/generic/metric/MyAlarmMeter1 _statistics /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:84
2016-02-01 09:46:09.306 13650 DEBUG aodh.evaluator.gnocchi [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] YuriiDebug: [["2016-02-01T09:43:00+00:00", 60.0, 3.5]] _statistics /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:95
2016-02-01 09:46:09.307 13650 DEBUG aodh.evaluator.gnocchi [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] sanitize stats [[u'2016-02-01T09:43:00+00:00', 60.0, 3.5]] _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:109
2016-02-01 09:46:09.307 13650 DEBUG aodh.evaluator.gnocchi [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] pruned statists to 1 _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:113
2016-02-01 09:47:07.708 13650 DEBUG aodh.coordination [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] Members of group: ['fc8089f8-5b44-46fa-8450-1a40205e397f'] extract_my_subset /usr/lib/python2.7/site-packages/aodh/coordination.py:169
2016-02-01 09:47:07.714 13650 DEBUG aodh.coordination [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] My subset: [<aodh.storage.models.Alarm object at 0x2f5d390>] extract_my_subset /usr/lib/python2.7/site-packages/aodh/coordination.py:175
2016-02-01 09:47:07.715 13650 INFO aodh.evaluator [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] initiating evaluation cycle on 1 alarms
2016-02-01 09:47:07.716 13650 DEBUG aodh.evaluator [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] evaluating alarm 53f5ed59-2b8a-443f-9f7a-d559fbf60c0f _evaluate_alarm /usr/lib/python2.7/site-packages/aodh/evaluator/__init__.py:218
2016-02-01 09:47:07.716 13650 DEBUG aodh.evaluator.threshold [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] query stats from 2016-02-01 09:43:07.716794 to 2016-02-01 09:47:07.716794 _bound_duration /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:80
2016-02-01 09:47:07.717 13650 DEBUG aodh.evaluator.gnocchi [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] stats query http://192.0.2.20:8041/v1/aggregation/resource/generic/metric/MyAlarmMeter1 _statistics /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:84
2016-02-01 09:47:07.841 13650 DEBUG aodh.evaluator.gnocchi [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] YuriiDebug: [["2016-02-01T09:45:00+00:00", 900.0, 4.5], ["2016-02-01T09:45:00+00:00", 300.0, 4.5], ["2016-02-01T09:44:00+00:00", 60.0, 7.0], ["2016-02-01T09:45:00+00:00", 60.0, 1.0], ["2016-02-01T09:46:00+00:00", 60.0, 4.5]] _statistics /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:95
2016-02-01 09:47:07.842 13650 DEBUG aodh.evaluator.gnocchi [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] sanitize stats [[u'2016-02-01T09:45:00+00:00', 900.0, 4.5], [u'2016-02-01T09:45:00+00:00', 300.0, 4.5], [u'2016-02-01T09:44:00+00:00', 60.0, 7.0], [u'2016-02-01T09:45:00+00:00', 60.0, 1.0], [u'2016-02-01T09:46:00+00:00', 60.0, 4.5]] _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:109
2016-02-01 09:47:07.842 13650 DEBUG aodh.evaluator.gnocchi [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] pruned statists to 3 _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:113
2016-02-01 09:47:07.843 13650 DEBUG aodh.evaluator.threshold [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] comparing value 7.0 against threshold 3.0 _compare /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:216
2016-02-01 09:47:07.843 13650 DEBUG aodh.evaluator.threshold [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] comparing value 1.0 against threshold 3.0 _compare /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:216
2016-02-01 09:47:07.849 13650 DEBUG aodh.evaluator.threshold [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] comparing value 4.5 against threshold 3.0 _compare /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:216
2016-02-01 09:47:07.850 13650 INFO aodh.evaluator [req-3b15136a-736a-4cfb-b96a-1ac95929cc99 - - - - -] alarm 53f5ed59-2b8a-443f-9f7a-d559fbf60c0f transitioning to alarm because Transition to alarm due to 3 samples outside threshold, most recent : 4.5

Excerpt from alarm's history:
ceilometer alarm-history 53f5ed59-2b8a-443f-9f7a-d559fbf60c0f
+------------------+----------------------------+---------------------------------------------------------------------------------------------+
| Type | Timestamp | Detail |
+------------------+----------------------------+---------------------------------------------------------------------------------------------+
| state transition | 2016-02-01T09:55:08.671000 | state: alarm |
| state transition | 2016-02-01T09:52:07.815000 | state: insufficient data |
| state transition | 2016-02-01T09:47:07.865000 | state: alarm |
| state transition | 2016-02-01T09:45:07.789000 | state: insufficient data |
| state transition | 2016-02-01T09:41:07.812000 | state: ok |

Packages:
aodh*-1.1.0-1.el7ost.noarch

Regards,
Yurii

Zi Lian Ji (jizilian)
Changed in aodh:
assignee: nobody → Zi Lian Ji (jizilian)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to aodh (master)

Fix proposed to branch: master
Review: https://review.openstack.org/277326

Changed in aodh:
assignee: Zi Lian Ji (jizilian) → Mehdi Abaakouk (sileht)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to aodh (master)

Reviewed: https://review.openstack.org/277326
Committed: https://git.openstack.org/cgit/openstack/aodh/commit/?id=b3874c47f1051d37ed839f4f8fffda2c77641f28
Submitter: Jenkins
Branch: master

commit b3874c47f1051d37ed839f4f8fffda2c77641f28
Author: Mehdi Abaakouk <email address hidden>
Date: Mon Feb 8 08:57:44 2016 +0100

    Allow to extends the evaluator lookback window

    Sometimes alarm state is flapping we just missing the last datapoint
    often. This can be solved by increase dedicated to the metric injection
    chain or for less critical scenario, we could allow a bigger lookback window.

    This change allows to extends the lookback window with the new
    configuration option 'acceptable_ingestion_lag'.

    Change-Id: If2aca73aea95c0c6d08afa5fbb89b949099507db
    Closes-bug: #1540298
    Closes-bug: #1506911

Changed in aodh:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/aodh 3.0.0.0b3

This issue was fixed in the openstack/aodh 3.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.