[R4.1]: Analytics down on fresh provisioning due to zookeeper not coming up

Bug #1732406 reported by Pulkit Tandon
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.0
Fix Committed
High
Santosh Gupta
R4.1
Fix Committed
High
Santosh Gupta
Trunk
Fix Committed
High
Santosh Gupta

Bug Description

Build : R4.1 - 48
CB Build

Setup:
K8s contrail-ansible setup
3 node setup

1 node having following roles: k8s master, controller, analytics, analyticsdb
2 nodes as k8s slaves and agent

Summary:
Zookeeper inside analyticsdb tries to bind to 2181 and gets an exception:
java.net.BindException: Address already in use

This is because the zookeeper running outside, also binds to the same port with value 2181
Thus, internal zookeeper binding fails.
Surprisingly, the zoo.conf is having port value as 2182 but still it tries to connect to 2181 during provisioning.
Seems like config file(zoo.conf) get updated correctly later during provisioning.

Workaround:
As zoo.conf is correct and tries to bind to 2182, restarting the zookeeper corrected all the problems.

Logs:
Zookeeper:
2017-11-15 14:18:12,262 - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2017-11-15 14:18:12,264 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
        at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:111)
        at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
        at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2017-11-15 14:18:24,985 - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /etc/zookeeper/conf/zoo.cfg

Status of Zookeeper:
root@testbed-1-vm1(analyticsdb):/var/log/zookeeper# service zookeeper status
● zookeeper.service - LSB: centralized coordination service
   Loaded: loaded (/etc/init.d/zookeeper; bad; vendor preset: enabled)
   Active: active (exited) since Wed 2017-11-15 14:18:24 IST; 1h 25min ago
     Docs: man:systemd-sysv-generator(8)

Nov 15 14:18:24 testbed-1-vm1 systemd[1]: Stopped LSB: centralized coordination service.
Nov 15 14:18:24 testbed-1-vm1 systemd[1]: Starting LSB: centralized coordination service...
Nov 15 14:18:24 testbed-1-vm1 systemd[1]: Started LSB: centralized coordination service.

Status of Kafka:
root@testbed-1-vm1(analyticsdb):/# service kafka status
● kafka.service - kafka
   Loaded: loaded (/lib/systemd/system/kafka.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2017-11-15 15:38:43 IST; 1s ago
  Process: 25047 ExecStop=/usr/share/kafka/bin/kafka-server-stop.sh /usr/share/kafka/config/server.properties (code=exited, status=0/SUCCESS)
 Main PID: 25129 (java)
   CGroup: /docker/61fd8808a1bc6d384a1674b0b5b3b466979f4485b1214d796320948f4dc7c206/system.slice/kafka.service
           └─25129 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=t
           ‣ 25129 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=t

Nov 15 15:38:45 testbed-1-vm1 kafka-server-start.sh[25129]: [2017-11-15 15:38:45,039] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
Nov 15 15:38:45 testbed-1-vm1 kafka-server-start.sh[25129]: [2017-11-15 15:38:45,040] INFO Initiating client connection, connectString=10.204.217.194:2182 sessionTim
Nov 15 15:38:45 testbed-1-vm1 kafka-server-start.sh[25129]: [2017-11-15 15:38:45,057] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)
Nov 15 15:38:45 testbed-1-vm1 kafka-server-start.sh[25129]: [2017-11-15 15:38:45,061] INFO Opening socket connection to server 10.204.217.194/10.204.217.194:2182. Wi
Nov 15 15:38:45 testbed-1-vm1 kafka-server-start.sh[25129]: [2017-11-15 15:38:45,069] WARN Session 0x0 for server null, unexpected error, closing socket connection a
Nov 15 15:38:45 testbed-1-vm1 kafka-server-start.sh[25129]: java.net.ConnectException: Connection refused
Nov 15 15:38:45 testbed-1-vm1 kafka-server-start.sh[25129]: at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
Nov 15 15:38:45 testbed-1-vm1 kafka-server-start.sh[25129]: at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
Nov 15 15:38:45 testbed-1-vm1 kafka-server-start.sh[25129]: at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
Nov 15 15:38:45 testbed-1-vm1 kafka-server-start.sh[25129]: at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
root@testbed-1-vm1(analyticsdb):/# vi /tmp/k^C

Rest of the logs can be find at following location:
bhushana@mayamruga
Path: /home/bhushana/Documents/technical/bugs/new_bug_zookeeper

Tags: provisioning
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/37561
Submitter: Santosh Gupta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/37561
Committed: http://github.com/Juniper/contrail-ansible-internal/commit/7750fd3df9a4faffe23e1cbf46d24413ad7a3c0b
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 7750fd3df9a4faffe23e1cbf46d24413ad7a3c0b
Author: Santosh Gupta <email address hidden>
Date: Wed Nov 15 16:53:27 2017 -0800

Use package name for policy-rc.d

Package name zookeeperd should be used, not service name zookeeper.

Change-Id: I0383fb99d724423ac55f21cf9f32be5daf0dff38
Closes-Bug: #1732406

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/37599
Submitter: Santosh Gupta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/37600
Submitter: Santosh Gupta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/37642
Submitter: Santosh Gupta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/37642
Committed: http://github.com/Juniper/contrail-ansible-internal/commit/5475a02356fc971dfa659ecb868d6522be8bd9bb
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 5475a02356fc971dfa659ecb868d6522be8bd9bb
Author: Santosh Gupta <email address hidden>
Date: Fri Nov 17 17:16:28 2017 -0800

systemd: Do not start zookeeper on installation

1. We donot want zookeeper to start at installtion.
policy-rc.d not working for systemd in ubuntu-1604.
Adding separate command.
2. service name is specified in policy-rc.d. Reverted the change
for ubuntu-1404.

Change-Id: Id173bbfeea468d61517b3cd4003e08603ea2e991
Closes-Bug: #1732406

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/37662
Submitter: Santosh Gupta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/37663
Submitter: Santosh Gupta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/37662
Committed: http://github.com/Juniper/contrail-ansible-internal/commit/a6da0135dfc04e0faaf0955088cf35c35a5ab5c5
Submitter: Zuul (<email address hidden>)
Branch: master

commit a6da0135dfc04e0faaf0955088cf35c35a5ab5c5
Author: Santosh Gupta <email address hidden>
Date: Sun Nov 19 16:24:31 2017 -0800

systemd: Do not start zookeeper on installation

We donot want zookeeper to start at installtion.
policy-rc.d not working for systemd in ubuntu-1604.
Adding separate command.

Change-Id: I54b182b031227ccb877b9df841a32cbed6ffa9af
Closes-Bug: #1732406

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/37663
Committed: http://github.com/Juniper/contrail-ansible-internal/commit/fa63d74c10c91736b02e693a7d5539d4c0cf9e44
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit fa63d74c10c91736b02e693a7d5539d4c0cf9e44
Author: Santosh Gupta <email address hidden>
Date: Sun Nov 19 16:24:31 2017 -0800

systemd: Do not start zookeeper on installation

We donot want zookeeper to start at installtion.
policy-rc.d not working for systemd in ubuntu-1604.
Adding separate command.

Change-Id: I54b182b031227ccb877b9df841a32cbed6ffa9af
Closes-Bug: #1732406

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.