collector stays in initializing state ( with ocata-5.0-314)

Bug #1800407 reported by vimal
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Released
Critical
Abhay Joshi
Trunk
Fix Released
Critical
Abhay Joshi

Bug Description

collector stays in initializing state ( with ocata-5.0-314)

commands
-------------
[root@nodem14 ~]# contrail-status
Pod Service Original Name State Status
                 redis contrail-external-redis running Up 9 hours
analytics alarm-gen contrail-analytics-alarm-gen running Up 8 hours
analytics api contrail-analytics-api running Up 8 hours
analytics collector contrail-analytics-collector running Up 8 hours
analytics nodemgr contrail-nodemgr running Up 8 hours
analytics query-engine contrail-analytics-query-engine running Up 8 hours
analytics snmp-collector contrail-analytics-snmp-collector running Up 8 hours
analytics topology contrail-analytics-topology running Up 8 hours
config api contrail-controller-config-api running Up 4 hours
config device-manager contrail-controller-config-devicemgr running Up 9 hours
config nodemgr contrail-nodemgr running Up 9 hours
config schema contrail-controller-config-schema running Up 9 hours
config svc-monitor contrail-controller-config-svcmonitor running Up 9 hours
config-database cassandra contrail-external-cassandra running Up 9 hours
config-database nodemgr contrail-nodemgr running Up 9 hours
config-database rabbitmq contrail-external-rabbitmq running Up 9 hours
config-database zookeeper contrail-external-zookeeper running Up 9 hours
control control contrail-controller-control-control running Up 4 hours
control dns contrail-controller-control-dns running Up 9 hours
control named contrail-controller-control-named running Up 9 hours
control nodemgr contrail-nodemgr running Up 9 hours
database cassandra contrail-external-cassandra running Up 9 hours
database kafka contrail-external-kafka running Up 9 hours
database nodemgr contrail-nodemgr running Up 9 hours
database zookeeper contrail-external-zookeeper running Up 9 hours
webui job contrail-controller-webui-job running Up 9 hours
webui web contrail-controller-webui-web running Up 9 hours

WARNING: container with original name 'contrail-external-redis' have Pod or Service empty. Pod: '' / Service: 'redis'. Please pass NODE_TYPE with pod name to container's env

== Contrail control ==
control: active
nodemgr: active
named: active
dns: active

== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active

== Contrail database ==
kafka: active
nodemgr: active
zookeeper: active
cassandra: active

== Contrail analytics ==
snmp-collector: active
query-engine: active
api: active
alarm-gen: active
nodemgr: active
collector: initializing (KafkaPub:10.204.216.103:9092,10.204.216.95:9092,10.204.216.96:9092 connection down)
topology: active

== Contrail webui ==
web: active
job: active

== Contrail config ==
svc-monitor: backup
nodemgr: active
device-manager: active
api: active
schema: backup

[root@nodem14 ~]#

[root@nodem14 ~]# docker logs 6ffac3179269 | grep rror
[2018-10-28 16:49:26,962] INFO Opening socket connection to server 10.204.216.95/10.204.216.95:2182. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2018-10-28 16:49:38,363] WARN [Producer clientId=producer-1] Error while fetching metadata with correlation id 3 : {__confluent.support.metrics=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

[root@nodem14 ~]# docker ps | grep coll
0c43820a637f 10.204.217.152:5000/contrail-analytics-collector:ocata-5.0-314 "/entrypoint.sh /usr…" 9 hours ago Up 9 hours analytics_collector_1
8f39d66a1538 10.204.217.152:5000/contrail-analytics-snmp-collector:ocata-5.0-314 "/entrypoint.sh /usr…" 9 hours ago Up 9 hours analytics_snmp-collector_1
[root@nodem14 ~]# docker ps | grep kaf
6ffac3179269 10.204.217.152:5000/contrail-external-kafka:ocata-5.0-314 "/docker-entrypoint.…" 9 hours ago Up 9 hours analytics_database_kafka_1
[root@nodem14 ~]#

logs
--------------
logs are in /cs-shared/bugs/1800407

build - ocata-5.0-314

topology
-----------
https://github.com/Juniper/contrail-tools/blob/master/yamls/new_regr_cluster.yaml

vimal (vappachan)
Changed in juniperopenstack:
assignee: nobody → Sundaresan Rajangam (srajanga)
description: updated
Revision history for this message
musharani (musharani) wrote :

This issue is seen in ocata-master-350 as well

== Contrail analytics ==
snmp-collector: active
query-engine: active
api: active
alarm-gen: active
nodemgr: active
collector: initializing (KafkaPub:10.204.216.64:9092,10.204.216.65:9092,10.204.216.153:9092 connection down)
topology: active

Revision history for this message
Abhay Joshi (abhayj) wrote :

Assigned to Santosh as Sundar is not here anymore.

Revision history for this message
manishkn (manishkn) wrote :

Any work-around for this bug ?

Revision history for this message
Santosh Gupta (sangupta) wrote :

If I build latest contrail analytics collector binary in my contrail-dev-env container, copy it in 5.0-314 official container and restart analytics-collector container, collector is able to connect to kafka server.
This means that contrail container env and deployer steps are fine.
This also implies that analytics-collector code itself doesn’t have any issue.
I have compared ldd output of official binary against my analytics-collector binary built in contrail-dev-env, they match.
In the last few days, there are no merges in contrail-analytics or contrail-ansible-deployer or contrail-contrail-builder which would affect analytics-collector.

This was reported in build 5.0-314. CI was fine on Friday, as per Vinay. So build 5.0-313/312 should be fine.
From these, it looks like an issue is in the build env used to build the analytics-collector binary. Could someone from CI team look into this.
esp any changes done in build process/env sometime late last week

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/47380
Submitter: Vinay Vithal Mahuli (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/47381
Submitter: Michal Krawczyk (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/47384
Submitter: Michal Krawczyk (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/47381
Committed: http://github.com/Juniper/contrail-packages/commit/fcd045170844187a750a52d7c3c3d6a2b84125c0
Submitter: opencontrail-admin (<email address hidden>)
Branch: master

commit fcd045170844187a750a52d7c3c3d6a2b84125c0
Author: Michal Krawczyk <email address hidden>
Date: Wed Oct 31 08:52:48 2018 +0100

Make sure librdkafka is installed from tpc.

librdkafka package was upgraded to a newer version
in epel repository which made the newest version be
picked up by the container building jobs.

This makes sure the package from opencontrail-tpc repos
is picked which is in version 0.11.4.

Partial-Bug: #1800407
Change-Id: I74ed28a9bda8e9b717ecdfb6a6150944bd2fa6be

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/47384
Committed: http://github.com/Juniper/contrail-packages/commit/e8b457e05f81cfd9b43fda2f80cec8e49d17e954
Submitter: opencontrail-admin (<email address hidden>)
Branch: R5.0

commit e8b457e05f81cfd9b43fda2f80cec8e49d17e954
Author: Michal Krawczyk <email address hidden>
Date: Wed Oct 31 08:52:48 2018 +0100

Make sure librdkafka is installed from tpc.

librdkafka package was upgraded to a newer version
in epel repository which made the newest version be
picked up by the container building jobs.

This makes sure the package from opencontrail-tpc repos
is picked which is in version 0.11.4.

Partial-Bug: #1800407
Change-Id: I74ed28a9bda8e9b717ecdfb6a6150944bd2fa6be

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.