Coordination for aodh-evaluator is broken

Bug #1514376 reported by Yurii Prokulevych
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Aodh
Triaged
Undecided
Mehdi Abaakouk

Bug Description

While having 2 instances of aodh-evaluator strange behavior is observed in logs:
- some alarms might be skipped by both evaluators
- some alarms might be evaluated by both evaluators
- all alarms are skipped by both evaluaotors

Coordination backend - redis.
Both evaluator are working as expected when just one instance is running.
Both running on different nodes.

When both are running redis has next data:
redis:6379> KEYS *
1) "_tooz_group:central-global"
2) "_tooz_group:central--4615474070299773071"
3) "_tooz_beats:58787292-f5b7-47f1-8db6-985b83d9c3bc"
4) "_tooz_groups"
5) "_tooz_beats:a4b351ca-def2-474e-869b-748d4c37dd84"
6) "_tooz_group:alarm_evaluator"
redis:6379> HKEYS "_tooz_group:alarm_evaluator"
1) "__created__"
2) "58787292-f5b7-47f1-8db6-985b83d9c3bc"
3) "a4b351ca-def2-474e-869b-748d4c37dd84"

But when both are stopped the id of last stopped evaluator still exists :
redis:6379> KEYS *
1) "_tooz_group:central-global"
2) "_tooz_group:central--4615474070299773071"
3) "_tooz_groups"
4) "_tooz_group:alarm_evaluator"
redis:6379> HKEYS "_tooz_group:alarm_evaluator"
1) "__created__"
2) "a4b351ca-def2-474e-869b-748d4c37dd84"

Btw, messages like "Joined partitioning group alarm_evaluator" are logged to evaluator.log
and messages about leaving group are missing

Packages:
openstack-aodh-api-1.0.1-dev38.el7.centos.noarch
openstack-aodh-common-1.0.1-dev38.el7.centos.noarch
openstack-aodh-compat-1.0.1-dev38.el7.centos.noarch
openstack-aodh-evaluator-1.0.1-dev38.el7.centos.noarch
openstack-aodh-expirer-1.0.1-dev38.el7.centos.noarch
openstack-aodh-listener-1.0.1-dev38.el7.centos.noarch
openstack-aodh-notifier-1.0.1-dev38.el7.centos.noarch
python-aodh-1.0.1-dev38.el7.centos.noarch
python-tooz-1.23.0-1.el7.noarch
python-redis-2.10.3-1.el7.noarch
redis-2.8.21-1.el7.x86_64

Regards,
Yurii

Revision history for this message
Yurii Prokulevych (yprokule) wrote :
Revision history for this message
Yurii Prokulevych (yprokule) wrote :
summary: - Coordination for aodh-evaluator is not broken
+ Coordination for aodh-evaluator is broken
Julien Danjou (jdanjou)
Changed in aodh:
status: New → Triaged
Zi Lian Ji (jizilian)
Changed in aodh:
assignee: nobody → Zi Lian Ji (jizilian)
Mehdi Abaakouk (sileht)
Changed in aodh:
assignee: Zi Lian Ji (jizilian) → nobody
assignee: nobody → Mehdi Abaakouk (sileht)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.