Contrail :: R4.1 16.04 27 newton :: at times zookeeper fails to come up on port 2182 in analyticsdb.

Bug #1727186 reported by Ritam Gangopadhyay
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.1
Incomplete
High
Ritam Gangopadhyay
R5.0
Incomplete
High
Ritam Gangopadhyay
Trunk
Incomplete
High
Ritam Gangopadhyay

Bug Description

Once in a while, we see this issue where analytics services report initializing state due kafka connection.

root@nodec28:~# docker exec -it analytics contrail-status
== Contrail Analytics ==
contrail-collector: initializing (KafkaPub:192.168.100.11:9092 connection down)
contrail-analytics-api: initializing (UvePartitions:UVE-Aggregation[Partitions:0] connection down)

On investigation it is seen that zookeeper on controller is up on port 2181, but zookeeper on analyticsdb, to which kafka connects is not up on port 2182.

On restarting zookeeper on analyticsdb it was able to aquire the port 2182 and kafka issue was no longer seen.

root@nodec28:~# docker exec -it analytics contrail-status
== Contrail Analytics ==
contrail-collector: active
contrail-analytics-api: active
contrail-query-engine: active
contrail-alarm-gen: active
contrail-snmp-collector: active
contrail-topology: active
contrail-analytics-nodemgr: active
root@nodec28:~#

*************************************************************************************
CONTRAIL ANALYTICS API LOGS
*************************************************************************************

2017-10-25 Wed 03:55:34:858.501 IST nodec28 [Thread 140144673298176, Pid 3355]: Failed to acquire metadata: Local: Broker transport failure
2017-10-25 Wed 03:55:39:859.149 IST nodec28 [Thread 140144669099776, Pid 3355]: No Kafka Callbacks
2017-10-25 Wed 03:55:39:859.217 IST nodec28 [Thread 140144669099776, Pid 3355]: Kafka Needs Restart
2017-10-25 Wed 03:55:44:859.129 IST nodec28 [Thread 140144669099776, Pid 3355]: Failed to acquire metadata: Local: Broker transport failure
2017-10-25 Wed 03:55:49:859.675 IST nodec28 [Thread 140144673298176, Pid 3355]: No Kafka Callbacks
2017-10-25 Wed 03:55:49:859.741 IST nodec28 [Thread 140144673298176, Pid 3355]: Kafka Needs Restart

*************************************************************************************
KAFKA LOGS
*************************************************************************************

[2017-10-25 11:00:04,715] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000
        at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1223)
        at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:155)
        at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:129)
        at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:89)
        at kafka.utils.ZkUtils$.apply(ZkUtils.scala:71)
        at kafka.server.KafkaServer.initZk(KafkaServer.scala:278)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:168)
        at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
        at kafka.Kafka$.main(Kafka.scala:67)
        at kafka.Kafka.main(Kafka.scala)
[2017-10-25 11:00:04,716] INFO shutting down (kafka.server.KafkaServer)

*************************************************************************************
ZOOKEEPER LOGS ON ANALYTICSDB
*************************************************************************************

2017-10-25 00:44:23,339 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.IOException: Unable to create snap directory /var/lib/zookeeper/version-2
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:91)
        at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:104)
        at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
        at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2017-10-25 11:29:00,197 - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /etc/zookeeper/conf/zoo.cfg
2017-10-25 11:29:00,208 - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 192.168.100.11 to address: /192.168.100.11
2017-10-25 11:29:00,208 - ERROR [main:QuorumPeerConfig@280] - Invalid configuration, only one server specified (ignoring)
2017-10-25 11:29:00,209 - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3

Tags: analytics
Revision history for this message
Santosh Gupta (sangupta) wrote :

1. Is this an upgrade case or fresh install?
2. All-in-one node, multi-node or HA case?
3. Is it seen on ubuntu14.04 or other openstack releases?
4. Does it happen only on initial installation or have you seen it on reboot too?
5. Once you see the system is in good state, does it go back to bad state on its own?
6. After you restart zookeeper and system is in good state, does it go back to bad state on its own?
7. Please provide the combined.json files used for provisioning.
8. The issue you reported might be due to incorrect permission/ownership of dataDir=/var/lib/zookeeper although I don’t have an answer for why it works fine when you restart zookeeper manually.
9. If you hit this issue again, please leave the box in that state and send login credentials.

Please provide below details when you hit the issue.
root@sangupta-u14d5(analyticsdb):/etc/zookeeper/conf_example# cat /etc/zookeeper/conf_example/zoo.cfg
tickTime=2000
dataDir=/var/lib/zookeeper
dataLogDir=/var/log/zookeeper
clientPort=2182
initLimit=10
syncLimit=5
maxSessionTimeout=120000

autopurge.purgeInterval=3
autopurge.snapRetainCount=3

server.1=192.168.0.44:2889:3889
root@sangupta-u14d5(analyticsdb):/# ls -asl /var/lib/zookeeper/
total 12
4 drwxr-xr-x 3 zookeeper zookeeper 4096 Oct 30 17:55 .
4 drwxr-xr-x 32 root root 4096 Oct 30 03:40 ..
0 lrwxrwxrwx 1 root root 25 Oct 30 17:55 myid -> /etc/zookeeper/conf//myid
4 drwxr-xr-x 2 zookeeper zookeeper 4096 Nov 1 17:25 version-2
root@sangupta-u14d5(analyticsdb):/# ls -asl /var/lib/zookeeper/version-2/
total 344
  4 drwxr-xr-x 2 zookeeper zookeeper 4096 Nov 1 17:25 .
  4 drwxr-xr-x 3 zookeeper zookeeper 4096 Oct 30 17:55 ..
  4 -rw-r--r-- 1 zookeeper zookeeper 296 Oct 30 17:55 snapshot.0
332 -rw-r--r-- 1 zookeeper zookeeper 336862 Nov 1 17:25 snapshot.3477

Revision history for this message
Bassem Aly (babdelmageed) wrote :

I have the same issue also after upgrading from 4.1.12 to 4.1.22. The only solution as mentioned by author is to start the zookeper manually

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.