[4.1.0.0-8] Alarms not getting raised after any of the contrail processes is stopped
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R4.1 |
Incomplete
|
Low
|
Ankit Jain | |||
R5.0 |
Incomplete
|
Medium
|
Ankit Jain | |||
Trunk |
Incomplete
|
Medium
|
Ankit Jain |
Bug Description
All the test cases failed in one of the sanity setups.
Alarms were not getting raised. When stopped the process manually, I could see the same issue.
Logging a bug to track this problem. As the problem was seen only on one setup, the problem might not be reproducible.
Setup details:
Build : 4.1.0.0-8
CoreLocation : /cs-shared/
cores : {'10.204.216.65': [], '10.204.216.64': [], '10.204.216.150': [], '10.204.216.153': [], '10.204.217.115': [], '10.204.217.76': [], '10.204.217.114': []}
LogsLocation : http://
Report : http://
Topology :
DISTRO : "Ubuntu 14.04.5 LTS"
SKU : mitaka
Config Nodes : [u'nodec7', u'nodec8', u'nodec57']
Control Nodes : [u'nodec7', u'nodec8', u'nodec57']
Compute Nodes : [u'nodei1', u'nodei2', u'nodei3']
Openstack Node : [u'nodec7']
WebUI Node : [u'nodec7', u'nodec8', u'nodec57']
Analytics Nodes : [u'nodec7', u'nodec8', u'nodec57']
Database Nodes : [u'nodec7', u'nodec8', u'nodec57']
Physical Devices : [u'hooper', u"'hooper'"]
LB Nodes : [u'nodeg36']
The following errors was seen in the log file:
contrail-
12/02/2017 06:47:14 PM [contrail-
12/02/2017 06:47:14 PM [contrail-
12/02/2017 06:47:14 PM [contrail-
12/02/2017 06:47:14 PM [contrail-
12/02/2017 06:47:14 PM [contrail-
12/02/2017 06:47:14 PM [contrail-
('Error 111 connecting to 127.0.0.1:6381. Connection refused.',)
12/02/2017 06:47:14 PM [contrail-
12/02/2017 08:14:09 PM [contrail-
12/02/2017 08:25:42 PM [kafka.conn]: <BrokerConnection host=192.168.192.5 port=9092> timed out after 40000 ms. Closing connection.
12/02/2017 08:31:25 PM [kafka.conn]: <BrokerConnection host=192.168.192.5 port=9092> timed out after 40000 ms. Closing connection.
12/02/2017 08:32:01 PM [kafka.conn]: <BrokerConnection host=192.168.192.5 port=9092> timed out after 40000 ms. Closing connection.
12/02/2017 08:32:05 PM [kafka.conn]: <BrokerConnection host=192.168.192.5 port=9092> timed out after 40000 ms. Closing connection.
12/02/2017 08:32:05 PM [kafka.conn]: <BrokerConnection host=192.168.192.5 port=9092> timed out after 40000 ms. Closing connection.
12/02/2017 08:35:43 PM [kafka.conn]: <BrokerConnection host=192.168.192.5 port=9092> timed out after 40000 ms. Closing connection.
12/02/2017 08:42:12 PM [kafka.conn]: <BrokerConnection host=192.168.192.6 port=9092>: Error receiving 4-byte payload header - closing socket
Traceback (most recent call last):
File "/usr/lib/
self.
File "/usr/lib/
return sock.recv(*args)
error: [Errno 104] Connection reset by peer
12/02/2017 08:42:12 PM [kafka.
12/02/2017 08:42:12 PM [kafka.conn]: <BrokerConnection host=192.168.192.6 port=9092>: Error receiving 4-byte payload header - closing socket
Traceback (most recent call last):
File "/usr/lib/
self.
File "/usr/lib/
return sock.recv(*args)
error: [Errno 104] Connection reset by peer
Build : 4.1.0.0-8
CoreLocation : /cs-shared/
cores : {'10.204.216.65': [], '10.204.216.64': [], '10.204.216.150': [], '10.204.216.153': [], '10.204.217.115': [], '10.204.217.76': [], '10.204.217.114': []}
LogsLocation : http://
Report : http://
Topology :
DISTRO : "Ubuntu 14.04.5 LTS"
SKU : mitaka
Config Nodes : [u'nodec7', u'nodec8', u'nodec57']
Control Nodes : [u'nodec7', u'nodec8', u'nodec57']
Compute Nodes : [u'nodei1', u'nodei2', u'nodei3']
Openstack Node : [u'nodec7']
WebUI Node : [u'nodec7', u'nodec8', u'nodec57']
Analytics Nodes : [u'nodec7', u'nodec8', u'nodec57']
Database Nodes : [u'nodec7', u'nodec8', u'nodec57']
Physical Devices : [u'hooper', u"'hooper'"]
LB Nodes : [u'nodeg36']
http://
{
value: [
{
name: "nodec57",
value: {
AlarmgenPartition: {
__T: 1512225809352367,
inst_parts: [
{
instance: "0",
partitions: [
"3",
"5",
"10",
"11",
"15",
"16",
"21",
"26",
"27",
"28",
"29"
]
}
]
}
}
},
{
name: "nodec8",
value: {
AlarmgenPartition: {
__T: 1512225809352367,
inst_parts: [
{
instance: "0",
partitions: [
"0",
"1",
"4",
"6",
"7",
"8",
"17",
"18",
"19",
"20"
]
}
]
}
}
},
{
name: "nodec7",
value: {
AlarmgenPartition: {
__T: 1512233857795934,
inst_parts: [
{
instance: "0",
partitions: [
"2",
"9",
"12",
"13",
"14",
"22",
"23",
"24",
"25"
]
}
]
}
}
Full logs copied here:
Logs and Json file copied at :
/cs-shared/
Changed in juniperopenstack: | |
assignee: | Anish Mehta (amehta00) → Sundaresan Rajangam (srajanga) |
Changed in juniperopenstack: | |
status: | Incomplete → New |
assignee: | Ankit Jain (ankitja) → Sundaresan Rajangam (srajanga) |
From the logs, it is evident that contrail-alarm-gen was not able to connect to kafka. Did you check if there was any network connectivity issue?
Next time when you see the issue, please check if the NodeStatus UVE gets updated properly after you stop the service. ip>:5995/ Snh_AlarmConfig Request? name=
Also, please check if the contrail-alarm-gen has all the alarm config objects
http://<analytics-