Monasca

Frozen alarm state after updating alarm-definition (Python API)

Bug #1671565 reported by jobrs on 2017-03-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Monasca	In Progress	Undecided	Unassigned

Bug Description

We recognized that after updating an alarm-expression, the alarm state would often be frozen, not changing anymore.

Example:

Given measurements of 1000.0, 1000.0, ... (every minute, verified with metric-stats command)

The alarm rule

avg(kafka.consumer_lag) > 0

fire shortly after creating the alarm-definition.

When the alarm-definition is now changed to

avg(kafka.consumer_lag) > 7000

the alarm status does not change. When the alarm is deleted, a new alarm object is created and remains in status UNDETERMINED-

Tags:

Revision history for this message

jobrs (joachim-barheine) wrote on 2017-03-09:

After looking at the event queue in Kafka, a first explanation is that the alarm-definition-update messages are invalid. Instead of the actual metric dimensions, a single element dimension set {'uname': 'notification'} is reported in the metricDefinition.

It seems like a typo in the unit-test caused an implementation error here. Check the attached subalarm-upd-patch.txt.

Now the message on the bus is at least correct and a few error entries disappear from thresholder logs. Still it does not yield the effect. No alarm state change.

summary:

- Frozen alarm state after updating alarm-defintion (Python API)
+ Frozen alarm state after updating alarm-definition (Python API)

Revision history for this message

jobrs (joachim-barheine) wrote on 2017-03-10:

fixed Python API Edit (3.9 KiB, text/plain)

Revision history for this message

jobrs (joachim-barheine) wrote on 2017-03-10:

Now I found also the second reason and could provide a fix: the update-alarm-definition events contained "" instead of the ID of the changed sub-alarm. The thus could not find the sub-alarm to update.