Aodh alarm triggers prematurely after evaluation period from insufficient data

Bug #1759687 reported by Supreeth Shivanand
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Aodh
New
Undecided
Unassigned

Bug Description

Aodh alarm moves from insufficient data state to alarm state based on trending state(most recent) after it has passed evaluation period number of statistics during the time period.

This happens for a cpu_idle_alarm with alarm type = threshold any kind of statistic(max, min or avg).
This logic of aodh alarm looks to be flawed where alarm is expected to be raised only after the ceilometer metric statistics is found to be above/below a threshold value consecutively for the evaluation number of periods. Analyzing the alarm evaluation logic, it was found that evaluator tries to move to a good state just by validating the most recent statistic, instead of checking for consecutive samples to be equivocal.

The way it sets the trending_state and moves to the trending_state if the current state is unknown seems to be the problem.
Pointers in the code:
https://github.com/openstack/aodh/blob/master/aodh/evaluator/threshold.py#L130

Abhinay (abhinay111)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.