Coordination for aodh-evaluator is broken
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Aodh |
Triaged
|
Undecided
|
Mehdi Abaakouk |
Bug Description
While having 2 instances of aodh-evaluator strange behavior is observed in logs:
- some alarms might be skipped by both evaluators
- some alarms might be evaluated by both evaluators
- all alarms are skipped by both evaluaotors
Coordination backend - redis.
Both evaluator are working as expected when just one instance is running.
Both running on different nodes.
When both are running redis has next data:
redis:6379> KEYS *
1) "_tooz_
2) "_tooz_
3) "_tooz_
4) "_tooz_groups"
5) "_tooz_
6) "_tooz_
redis:6379> HKEYS "_tooz_
1) "__created__"
2) "58787292-
3) "a4b351ca-
But when both are stopped the id of last stopped evaluator still exists :
redis:6379> KEYS *
1) "_tooz_
2) "_tooz_
3) "_tooz_groups"
4) "_tooz_
redis:6379> HKEYS "_tooz_
1) "__created__"
2) "a4b351ca-
Btw, messages like "Joined partitioning group alarm_evaluator" are logged to evaluator.log
and messages about leaving group are missing
Packages:
openstack-
openstack-
openstack-
openstack-
openstack-
openstack-
openstack-
python-
python-
python-
redis-2.
Regards,
Yurii
summary: |
- Coordination for aodh-evaluator is not broken + Coordination for aodh-evaluator is broken |
Changed in aodh: | |
status: | New → Triaged |
Changed in aodh: | |
assignee: | nobody → Zi Lian Ji (jizilian) |
Changed in aodh: | |
assignee: | Zi Lian Ji (jizilian) → nobody |
assignee: | nobody → Mehdi Abaakouk (sileht) |