Build 2665: Alarms are not shown when one of the collectors is down
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
Trunk |
Fix Committed
|
High
|
Anish Mehta |
Bug Description
System has 3 collectors, when one of them goes down, I stopped seeing the alarms which were being shown earlier.
Even some of the analytics services are down on 1 node, for that also, no alarms are being shown.
http://
{ }
http://
{ }
http://
{ }
contrail-alarm-gen logs:
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ail-control:0', 'ObjectGenerato
VNTable:
ObjectVNTable:
lytics-api:0', u'ObjectBgpPeer
:nodeh1', 'ObjectGenerato
1:default-
eerInfo:
ult-domain:
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
ERROR:contrail-
:Control:
:0', u'ObjectXmppPee
fig:contrail-
gr:0', 'ObjectXmppPeer
pute:contrail-
rail-discovery:0', u'ObjectXmppPee
ERROR:contrail-
ERROR:contrail-
Pasting the contrail status on all 3 collector nodes :
root@nodeg32:~# contrail-status
== Contrail Analytics ==
supervisor-
contrail-alarm-gen active
contrail-
contrail-
contrail-collector active
contrail-
contrail-
contrail-topology active
== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-
contrail-
contrail-
contrail-schema active
contrail-
ifmap active
== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-
== Contrail Database ==
contrail-database: active
supervisor-
contrail-
kafka active
== Contrail Support Services ==
supervisor-
rabbitmq-server active
root@nodeh1:~# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-
contrail-dns active
contrail-named active
== Contrail Analytics ==
supervisor-
contrail-alarm-gen initializing (Collector connection down)
contrail-
contrail-
contrail-collector inactive
contrail-
contrail-
contrail-topology active
== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-
contrail-
contrail-
contrail-schema backup
contrail-
ifmap active
== Contrail Database ==
contrail-database: active
supervisor-
contrail-
kafka active
== Contrail Support Services ==
supervisor-
rabbitmq-server active
root@nodeh2:~# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-
contrail-dns active
contrail-named active
== Contrail Analytics ==
supervisor-
contrail-alarm-gen active
contrail-
contrail-
contrail-collector active
contrail-
contrail-
contrail-topology active
== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-
contrail-
contrail-
contrail-schema backup
contrail-
ifmap active
== Contrail Database ==
contrail-database: active
supervisor-
contrail-
kafka active
== Contrail Support Services ==
supervisor-
rabbitmq-server active
Setup :
'all': [host1, host2, host3,host4, host5,host6, host7],
'cfgm': [host1,
'webui': [host1],
'openstack': [host1],
'control': [host2, host3],
'collector': [host1, host2, host3],
'database': [host1, host2, host3],
'compute': [host4, host5,host6, host7],
'all': ['nodeg32', 'nodeh1', 'nodeh2', 'nodeh6', 'nodei4', 'nodei5', 'nodeh7']
Changed in juniperopenstack: | |
importance: | Undecided → High |
Ankit, I need more information:
- Which alarms did you expect, and on which UVEs? alarm-gens: 10.84.13. 40:5995/ Snh_PartitionSt atusReq? partition= -1 10.84.13. 40:5995/ Snh_UVETableInf oReq?partition= -1 10.84.13. 40:5995/ Snh_UVETableAla rmReq?table= all
- content of analytics-node UVEs (I need to know which partitions live on which analytics node)
- logs for all 3 contrail-alarm-gen instances.
- Introspect output for all 3 contrail-
http://
http://
http://
Ofcourse, it will be best if you can reproduce and leave the system in the bad state.