Ceilometer

evaluation periods effectively ignored for threshold alarm

Bug #1380216 reported by Mike Spreitzer on 2014-10-12

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Ceilometer	Fix Released	Undecided	ZhiQiang Fan	Ceilometer 2015.1.0 "kilo"
	Juno	Fix Released	Undecided	Unassigned	Ceilometer 2014.2.4

Bug Description

In the file ceilometer/alarm/evaluator/threshold.py, in the class ThresholdEvaluator, consider this method:

def _transition(self, alarm, statistics, compared):
"""Transition alarm state if necessary.

The transition rules are currently hardcoded as:

- transitioning from a known state requires an unequivocal
set of datapoints

- transitioning from unknown is on the basis of the most
recent datapoint if equivocal

Ultimately this will be policy-driven.
"""

and the _sufficient method:

def _sufficient(self, alarm, statistics):
"""Check for the sufficiency of the data for evaluation.

        Ensure there is sufficient data for evaluation, transitioning to
        unknown otherwise.
        """
        sufficient = len(statistics) >= self.quorum
        ...

Note that self.quorum==1, regardless of evaluation_periods.

The current hard-wired policy effectively ignores the evaluation_periods parameter of the alarm.
Every alarm starts in the unknown state, so the first time there are any statistics at all available,
_sufficient() will return true and _transition will set the state based on how that first statistic
compares to the threshold.

Tags:

Revision history for this message

Dina Belova (dbelova) wrote on 2014-10-13:

Actually don't understand how does evaluation_periods connect with quorum... Evaluation periods is about number of historical periods to evaluate the threshold (it'll be evaluation window), quorum is about minimum number of datapoints within sliding window to avoid unknown state... Hardcoded quorum is the only problem here I suppose...

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-13: Related fix proposed to ceilometer (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/127909

Revision history for this message

Mike Spreitzer (mike-spreitzer) wrote on 2014-10-13:

Dina, the connection is this: the logic in _transition will set the alarm's state to "alarmed" as soon as there are "quorum" data points and the last is alarming. That logic I just outlined pays no attention to "evaluation periods".

Revision history for this message

Phil Neal (nealph) wrote on 2014-10-13:

Mike, I think if you consider the context of the _sufficient method, which evaluates the number of samples within the result set of _bound_duration, it follows that the quorum setting is applied only against the set that is within the evaluation period.

That is: given a sample set within the bounds of x evaluation periods, determine whether the number of samples meets the criteria of quorum = y, and if so proceed with evaluation.

Revision history for this message

Mike Spreitzer (mike-spreitzer) wrote on 2014-10-14:

Yes, Phil, that is the problem. Currently we have fixed quorum=1, so the _transition method will be called as soon as _statistics(..) returns any data at all. The first time this happens, the alarm is in the unknown state before _transition is called, so _transition decides to set the alarm state to something definite --- based on exactly 1 datum from _statistics(..). Note that the outline I just gave pays no attention to the evaluation_periods setting.

ZhiQiang Fan (aji-zqfan) on 2014-10-30

Changed in ceilometer:
assignee:	nobody → ZhiQiang Fan (aji-zqfan)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-31: Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/132146

Changed in ceilometer:
status:	New → In Progress

Karolyn Chambers (chamberk) on 2015-01-13

tags:

added: juno-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-26: Fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/132146
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=553d8d96e60cf354406568ed7dd4c563e768e4d0
Submitter: Jenkins
Branch: master

commit 553d8d96e60cf354406568ed7dd4c563e768e4d0
Author: ZhiQiang Fan <email address hidden>
Date: Fri Oct 31 03:33:34 2014 +0800

Use alarm's evaluation periods in sufficient test

Currently, we use constant value quorum=1 to check if there are enough
datapoints, however, this is not quite right for an alarm rule.

    Image evaluation periods is set to, for i.e., 3 for an instance on
    cpu_util greater or equal than 80%. Here are the cases which current
    may not work as expected:

    1. when system start or instance is just created, we may only get one
    or two samples for the instance
    2. when system is somewhere broken, or an instance is restarted (after
    being shutoff), sample may fail to be collected in some time, so we only
    get one or two sample in that time range

    We want to avoid a spurious data peak, for example, instance cpu_util can
    be 50%, 50%, 50%, 90%, in such case, alarm will not be triggered, but if
    instance cpu_util is None, None, None, 90%, current code will think alarm
    should be triggered, which is not consistent and may confuse end users.

This patch will put alarm to insufficient data when datapoints are less
than evaluation periods.

Change-Id: Ie64a537434493a5965c8e9e165cf028d57689da2
Closes-Bug: #1380216

Changed in ceilometer:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-27: Fix proposed to ceilometer (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/150446

Thierry Carrez (ttx) on 2015-02-04

Changed in ceilometer:
milestone:	none → kilo-2
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2015-04-30

Changed in ceilometer:
milestone:	kilo-2 → 2015.1.0

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-07-03: Fix merged to ceilometer (stable/juno)

Reviewed: https://review.openstack.org/150446
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=0639d0d62999f3d8d77d027ce612ebe2498cb1e3
Submitter: Jenkins
Branch: stable/juno

commit 0639d0d62999f3d8d77d027ce612ebe2498cb1e3
Author: ZhiQiang Fan <email address hidden>
Date: Fri Oct 31 03:33:34 2014 +0800

Use alarm's evaluation periods in sufficient test

Currently, we use constant value quorum=1 to check if there are enough
datapoints, however, this is not quite right for an alarm rule.

    Image evaluation periods is set to, for i.e., 3 for an instance on
    cpu_util greater or equal than 80%. Here are the cases which current
    may not work as expected:

This patch will put alarm to insufficient data when datapoints are less
than evaluation periods.

Conflicts:
ceilometer/alarm/evaluator/threshold.py

NOTE(mriedem): The conflict is due to the oslo.i18n imports on master
and oslo.i18n wasn't used in stable/juno so the _LW usage is removed.

    Change-Id: Ie64a537434493a5965c8e9e165cf028d57689da2
    Closes-Bug: #1380216
    (cherry picked from commit 553d8d96e60cf354406568ed7dd4c563e768e4d0)

tags:

added: in-stable-juno

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.