[ubuntu-14.04~mitaka-R4.1~5] vcenter-only provisioning: zookeeper is not running

Bug #1734617 reported by Pavana
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.1
Invalid
High
Pavana
Trunk
Invalid
High
Pavana

Bug Description

root@nodec4:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9b09c9b368d9 10.204.216.61:5100/ubuntu14vcenter5-contrail-vcenter-plugin:5 "/bin/sh -c /entry..." 44 minutes ago Up 18 minutes vcplugin
9493c2576770 10.204.216.61:5100/ubuntu14vcenter5-contrail-analytics:5 "/bin/sh -c /entry..." 44 minutes ago Up 44 minutes analytics
17562b324254 10.204.216.61:5100/ubuntu14vcenter5-contrail-analyticsdb:5 "/bin/sh -c /entry..." 45 minutes ago Up 45 minutes analyticsdb
92d9ab0a54a4 10.204.216.61:5100/ubuntu14vcenter5-contrail-controller:5 "/bin/sh -c /entry..." 46 minutes ago Up 4 minutes controller
b8f97ea18a93 registry:2 "/entrypoint.sh /e..." About an hour ago Up 48 minutes registry

root@nodec4:~# docker exec -it controller bash
root@nodec4(controller):/# contrail-status
== Contrail Control ==
contrail-control initializing (Number of connections:2, Expected:3, Extra:0, Missing:1 Missing:- Database:Cassandra)
contrail-control-nodemgr active
contrail-dns active
contrail-named active

== Contrail Config ==
contrail-api:0 initializing (Zookeeper:Zookeeper[] connection down)
contrail-config-nodemgr active
contrail-device-manager backup
contrail-schema backup
contrail-svc-monitor backup

== Contrail Config Database==
contrail-database: active

== Contrail Web UI ==
contrail-webui active
contrail-webui-middleware active

root@nodec4(controller):/# tail -f /var/log/zookeeper/zookeeper.log
2017-11-27 12:43:10,382 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.204.216.61:37682 (no session established for client)
2017-11-27 12:43:10,534 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.204.216.62:37062
2017-11-27 12:43:10,535 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2017-11-27 12:43:10,535 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.204.216.62:37062 (no session established for client)
2017-11-27 12:43:11,041 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.204.216.61:37718
2017-11-27 12:43:11,041 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2017-11-27 12:43:11,041 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.204.216.61:37718 (no session established for client)
2017-11-27 12:43:11,609 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.204.216.63:60492
2017-11-27 12:43:11,609 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running

root@nodec4(controller):/# /usr/share/zookeeper/bin/zkServer.sh status
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Error contacting service. It is probably not running.

rabbitmq cluster also does not seem to have formed
root@nodec4(controller):/# rabbitmqctl cluster_status
Cluster status of node rabbit@nodec4 ...
Error: unable to connect to node rabbit@nodec4: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@nodec4]

rabbit@nodec4:
  * connected to epmd (port 4369) on nodec4
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on nodec4
  * suggestion: start the node

current node details:
- node name: 'rabbitmq-cli-22525@nodec4'
- home dir: /var/lib/rabbitmq
- cookie hash: tBHK3P2Wa419qrcxZV75DQ==

Hence provisioning failed -

"2017-11-27 12:00:44,138-INFO-sm_ansible_callback.py:53-append(): fatal: [10.204.216.61]: FAILED! => (item - None) {"changed": true, "cmd": "docker exec controller /usr/share/contrail-utils/provision_control.py --api_server_ip 127.0.0.1 --router_asn 64512 ", "delta": "0:00:29.315226", "end": "2017-11-27 12:00:44.125185", "failed": true, "rc": 1, "start": "2017-11-27 12:00:14.809959", "stderr": "Traceback (most recent call last):\n File \"/usr/share/contrail-utils/provision_control.py\", line 226, in <module>\n main()\n File \"/usr/share/contrail-utils/provision_control.py\", line 222, in main\n ControlProvisioner(args_str)\n File \"/usr/share/contrail-utils/provision_control.py\", line 33, in __init__\n fq_name=['default-global-system-config'])\n File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 42, in wrapper\n return func(self, *args, **kwargs)\n File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 531, in _object_read\n res_type, fq_name, fq_name_str, id, ifmap_id)\n File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 858, in _read_args_to_id\n return (True, self.fq_name_to_id(res_type, fq_name))\n File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 42, in wrapper\n return func(self, *args, **kwargs)\n File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 1117, in fq_name_to_id\n content = self._request_server(rest.OP_POST, uri, data=json_body)\n File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 872, in _request_server\n retry_after_authn=retry_after_authn, retry_count=retry_count)\n File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 974, in _request\n 'Service Unavailable Timeout %d' % status)\ncfgm_common.exceptions.ServiceUnavailableError: Service unavailable time out due to: Service Unavailable Timeout 503", "stderr_lines": ["Traceback (most recent call last):", " File \"/usr/share/contrail-utils/provision_control.py\", line 226, in <module>", " main()", " File \"/usr/share/contrail-utils/provision_control.py\", line 222, in main", " ControlProvisioner(args_str)", " File \"/usr/share/contrail-utils/provision_control.py\", line 33, in __init__", " fq_name=['default-global-system-config'])", " File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 42, in wrapper", " return func(self, *args, **kwargs)", " File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 531, in _object_read", " res_type, fq_name, fq_name_str, id, ifmap_id)", " File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 858, in _read_args_to_id", " return (True, self.fq_name_to_id(res_type, fq_name))", " File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 42, in wrapper", " return func(self, *args, **kwargs)", " File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 1117, in fq_name_to_id", " content = self._request_server(rest.OP_POST, uri, data=json_body)", " File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 872, in _request_server", " retry_after_authn=retry_after_authn, retry_count=retry_count)", " File \"/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py\", line 974, in _request", " 'Service Unavailable Timeout %d' % status)", "cfgm_common.exceptions.ServiceUnavailableError: Service unavailable time out due to: Service Unavailable Timeout 503"], "stdout": "", "stdout_lines": []}"
"2017-11-27 12:00:44,144-INFO-sm_ansible_callback.py:53-append(): TASK [node : Register control node with controller, no md5 and api auth]"
"2017-11-27 12:00:44,198-INFO-sm_ansible_callback.py:53-append(): skipping: [10.204.216.62]"
"2017-11-27 12:00:44,209-INFO-sm_ansible_callback.py:53-append(): skipping: [10.204.216.63]"
"2017-11-27 12:00:44,216-INFO-sm_ansible_callback.py:53-append(): TASK [node : Register control node with controller, no md5 and no api auth]"
"2017-11-27 12:00:44,650-INFO-sm_ansible_callback.py:53-append(): fatal: [10.204.216.62]Traceback (most recent call last):
  File "/usr/share/contrail-utils/provision_control.py", line 226, in <module>
    main()
  File "/usr/share/contrail-utils/provision_control.py", line 222, in main
    ControlProvisioner(args_str)
  File "/usr/share/contrail-utils/provision_control.py", line 60, in __init__
    use_admin_api=self._args.use_admin_api)
  File "/usr/share/contrail-utils/provision_bgp.py", line 35, in __init__
    api_server_use_ssl=self._api_server_use_ssl)
  File "/usr/share/contrail-utils/vnc_admin_api.py", line 35, in __init__
    auth_host=self.auth_host)
  File "/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py", line 454, in __init__
    retry_on_error=False)
  File "/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py", line 935, in _request
    raise ConnectionError
requests.exceptions.ConnectionError"

Pavana (pavanap)
information type: Proprietary → Public
Pavana (pavanap)
summary: - [R4.1~5] vcenter-only provisioning: zookeeper is not running
+ [ubuntu-14.04~mitaka-R4.1~5] vcenter-only provisioning: zookeeper is not
+ running
Changed in juniperopenstack:
milestone: none → r5.0.0
Revision history for this message
kamlesh parmar (kparmar) wrote :

Johnson,

the puppet entry from the /etc/hosts alias is moved last now. Can you please check this one:

root@nodec5:~# cat /etc/hosts

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127.0.0.1 localhost.englab.juniper.net localhost
10.204.216.62 nodec5.englab.juniper.net nodec5
127.0.0.1 nodec5
10.204.216.63 nodec6
10.204.216.62 nodec5
10.204.216.61 nodec4
10.204.216.61 puppet
root@nodec5:~# exit
logout
Connection to 10.204.216.62 closed.
root@nodec4:~# ssh root@10.204.216.61
root@10.204.216.61's password:
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-31-generic x86_64)

 * Documentation: https://help.ubuntu.com/
Last login: Mon Nov 27 23:11:57 2017 from 172.29.108.7
root@nodec4:~#
root@nodec4:~#
root@nodec4:~#
root@nodec4:~# cat /etc/hosts

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127.0.0.1 localhost.englab.juniper.net localhost
10.204.216.61 nodec4.englab.juniper.net nodec4
127.0.0.1 nodec4
10.204.216.63 nodec6
10.204.216.62 nodec5
10.204.216.61 nodec4
10.204.216.61 puppet
root@nodec4:~# exit
logout
Connection to 10.204.216.61 closed.
root@nodec4:~# ssh root@10.204.216.63
The authenticity of host '10.204.216.63 (10.204.216.63)' can't be established.
ECDSA key fingerprint is 33:4a:21:e8:07:d0:b7:3a:91:cd:dc:8d:1d:97:64:f0.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.204.216.63' (ECDSA) to the list of known hosts.
root@10.204.216.63's password:
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-31-generic x86_64)

 * Documentation: https://help.ubuntu.com/
Last login: Mon Nov 27 20:09:25 2017 from 10.223.4.29
root@nodec6:~# cat /etc/hosts

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127.0.0.1 localhost.englab.juniper.net localhost
10.204.216.63 nodec6.englab.juniper.net nodec6
127.0.0.1 nodec6
10.204.216.63 nodec6
10.204.216.62 nodec5
10.204.216.61 nodec4
10.204.216.61 puppet

Revision history for this message
kamlesh parmar (kparmar) wrote :

The zookeeper logs show this:

2017-11-28 06:09:02,401 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.204.216.62:55970 (no session established for client)
2017-11-28 06:09:02,808 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@368] - Cannot open channel to 2 at election address /10.204.216.62:3888
java.net.ConnectException: Connection refused (Connection refused)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)
    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
2017-11-28 06:09:02,809 - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@368] - Cannot open channel to 3 at election address /10.204.216.63:3888
java.net.ConnectException: Connection refused (Connection refused)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)
    at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
2017-11-28 06:09:02,809 - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000
2017-11-28 06:09:02,824 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.204.216.61:45930

Revision history for this message
Nagendra Prasath (npchandran) wrote :

root@nodec4(controller):/# /usr/share/zookeeper/bin/zkServer.sh status
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Mode: follower

root@nodec4(controller):/# contrail-status
== Contrail Control ==
contrail-control active
contrail-control-nodemgr active
contrail-dns active
contrail-named active

== Contrail Config ==
contrail-api:0 active
contrail-config-nodemgr active
contrail-device-manager active
contrail-schema backup
contrail-svc-monitor active

== Contrail Config Database==
contrail-database: inactive

== Contrail Web UI ==
contrail-webui active
contrail-webui-middleware active

root@nodec4(controller):/# rabbitmqctl cluster_status
Cluster status of node rabbit@nodec4 ...
[{nodes,[{disc,[rabbit@nodec4]}]},
 {running_nodes,[rabbit@nodec4]},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]
root@nodec4(controller):/#

ERROR [main] 2017-11-30 23:47:40,893 CassandraDaemon.java:752 - Exception encountered during startup
java.lang.AssertionError: Table to_bgp_keyspace.service_chain_ip_address_table did not have any partition key columns in the schema tables
        at org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:974) ~[apache-cassandra-3.10.jar:3.10]
        at org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:949) ~[apache-cassandra-3.10.jar:3.10]
        at org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:912) ~[apache-cassandra-3.10.jar:3.10]
        at org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:889) ~[apache-cassandra-3.10.jar:3.10]
        at org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:877) ~[apache-cassandra-3.10.jar:3.10]
        at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:90) ~[apache-cassandra-3.10.jar:3.10]
        at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:80) ~[apache-cassandra-3.10.jar:3.10]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:263) [apache-cassandra-3.10.jar:3.10]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) [apache-cassandra-3.10.jar:3.10]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735) [apache-cassandra-3.10.jar:3.10]

Above issue could be due to continuous reboot of the docker..

Could you retry in Build #7 please... It has some bug fixes from Webui.

Revision history for this message
Pavana (pavanap) wrote :
Download full text (7.8 KiB)

Hi Nagendra,

I am still seeing the issue on build 7. I have left the setup locked, could you please take a look?

root@nodec4(controller):/# contrail-status
== Contrail Control ==
contrail-control initializing (Number of connections:2, Expected:3, Extra:0, Missing:1 Missing:- Database:Cassandra)
contrail-control-nodemgr active
contrail-dns active
contrail-named active

== Contrail Config ==
contrail-api:0 initializing (Zookeeper:Zookeeper[] connection down)
contrail-config-nodemgr active
contrail-device-manager backup
contrail-schema backup
contrail-svc-monitor backup

== Contrail Config Database==
contrail-database: active

== Contrail Web UI ==
contrail-webui active
contrail-webui-middleware active

========Run time service failures=============
/var/crashes/core.contrail-collec.3219.nodec4.1512622244
root@nodec4(controller):/# /usr/share/zookeeper/bin/zkServer.sh status
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Error contacting service. It is probably not running.
root@nodec4(controller):/# tail -f /var/log/zookeeper/zookeeper.log
2017-12-07 11:25:52,876 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.204.216.61:53706 (no session established for client)
2017-12-07 11:25:53,459 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.204.216.61:53740
2017-12-07 11:25:53,459 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2017-12-07 11:25:53,459 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.204.216.61:53740 (no session established for client)
2017-12-07 11:25:54,122 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.204.216.62:53214
2017-12-07 11:25:54,123 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2017-12-07 11:25:54,123 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.204.216.62:53214 (no session established for client)
2017-12-07 11:25:54,271 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.204.216.61:53792
2017-12-07 11:25:54,271 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2017-12-07 11:25:54,271 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.204.216.61:53792 (no session established for client)
2017-12-07 11:25:55,420 - INFO [NIOServerCxn.Fact...

Read more...

tags: added: sanity
removed: sanityblocker
Revision history for this message
Nagendra Prasath (npchandran) wrote :

did you get a chance to verify in latest build?

Revision history for this message
Sudheendra Rao (sudheendra-k) wrote :

problem not seen on latest builds, hence closing.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.