On network disconnect-and-reconnect, zk connection restoration can take a long time(upto an hour)

Bug #1461237 reported by Vedamurthy Joshi
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
Medium
Sachin Bansal
Trunk
Fix Committed
Medium
Sachin Bansal

Bug Description

R2.20 Build 30 Ubuntu 14.04 Juno multi-node setup

In the below setup, nodei34 - one of the config nodes got disconnected for about 5 hrs from the network. So repeated connection attempts to zk kept on failing.
On reconnect , it was seen that api-server<-> zookeeper connection was restored only after ~40 mins

Per Sachin, since this is was a case of very long disconnection time, due to exponential backoff, it would only end up retrying the connection only once every hour.

We may need to reduce this time to be something more frequent..maybe like every 5 mins?

env.roledefs = {
    'all': [host2, host3, host4, host5, host6, host7, host8, host9],
    'cfgm': [host2, host3, host4],
    'openstack': [host2, host3, host4],
    'webui': [host3],
    'control': [host2, host3, host4],
    'compute': [host5, host6, host7, host8, host9],
    'collector': [host2, host3, host4],
    'database': [host2, host3, host4],
    'toragent': [host5, host6, host7, host9 ],
    'tsn': [host5, host6, host7,host9 ],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodei34', 'nodei35', 'nodei36', 'nodei37', 'nodei38', 'nodei28', 'nodei27', 'nodei30']
}
------
contrail-api-0-stdout.log :

192.168.1.3 - - [2015-06-02 18:06:37] "GET /physical-router/2401cc60-a1d3-44b5-873b-61e867da981d?exclude_back_refs=True&exclude_children=True HTTP/1.1" 200 1951 0.011148
ERROR:nodei34:contrail-api:Config:0:__default__ [SYS_ERR]: VncApiError: RabbitMQ connection down
WARNING:api-0:Connection dropped: outstanding heartbeat ping not received
WARNING:api-0:Transition to CONNECTING
INFO:api-0:Zookeeper connection lost
INFO:api-0:Connecting to 192.168.1.2:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=17179869461, time_out=400000, session_id=238029839708061713, passwd='A\x08\x84c\x0e\xc4\xde\xb6\xe0E\xb7?\x98\\CA', read_only=None)
WARNING:api-0:Connection dropped: socket connection broken
INFO:api-0:Connecting to 192.168.1.3:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.4:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.2:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=17179869461, time_out=400000, session_id=238029839708061713, passwd='A\x08\x84c\x0e\xc4\xde\xb6\xe0E\xb7?\x98\\CA', read_only=None)
WARNING:api-0:Connection dropped: socket connection broken
INFO:api-0:Connecting to 192.168.1.3:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.4:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.2:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=17179869461, time_out=400000, session_id=238029839708061713, passwd='A\x08\x84c\x0e\xc4\xde\xb6\xe0E\xb7?\x98\\CA', read_only=None)
WARNING:api-0:Connection dropped: socket connection broken
INFO:api-0:Connecting to 192.168.1.3:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.4:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.2:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=17179869461, time_out=400000, session_id=238029839708061713, passwd='A\x08\x84c\x0e\xc4\xde\xb6\xe0E\xb7?\x98\\CA', read_only=None)

WARNING:api-0:Connection dropped: socket connection broken
INFO:api-0:Connecting to 192.168.1.3:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.4:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.2:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=17179869461, time_out=400000, session_id=238029839708061713, passwd='A\x08\x84c\x0e\xc4\xde\xb6\xe0E\xb7?\x98\\CA', read_only=None)
WARNING:api-0:Connection dropped: socket connection broken
INFO:api-0:Connecting to 192.168.1.3:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.4:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.2:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=17179869461, time_out=400000, session_id=238029839708061713, passwd='A\x08\x84c\x0e\xc4\xde\xb6\xe0E\xb7?\x98\\CA', read_only=None)
WARNING:api-0:Connection dropped: socket connection broken
INFO:api-0:Connecting to 192.168.1.3:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.4:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.2:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=17179869461, time_out=400000, session_id=238029839708061713, passwd='A\x08\x84c\x0e\xc4\xde\xb6\xe0E\xb7?\x98\\CA', read_only=None)
WARNING:api-0:Connection dropped: socket connection broken
INFO:api-0:Connecting to 192.168.1.3:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.4:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.2:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.2:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=17179869461, time_out=400000, session_id=238029839708061713, passwd='A\x08\x84c\x0e\xc4\xde\xb6\xe0E\xb7?\x98\\CA', read_only=None)
WARNING:api-0:Connection dropped: socket connection broken
INFO:api-0:Connecting to 192.168.1.3:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.4:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
INFO:api-0:Connecting to 192.168.1.2:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=17179869461, time_out=400000, session_id=238029839708061713, passwd='A\x08\x84c\x0e\xc4\xde\xb6\xe0E\xb7?\x98\\CA', read_only=None)
WARNING:api-0:Connection dropped: socket connection broken
INFO:api-0:Connecting to 192.168.1.3:2181
WARNING:api-0:Connection dropped: socket connection error: No route to host
ERROR:nodei34:contrail-api:Config:0:Session Event: TCP Connection Closed
WARNING:nodei34:contrail-api:Config:0:No Connection: __default__ [SYS_NOTICE]: VncApiError: RabbitMQ connection ESTABLISHED <Connection: amqp://guest@192.168.1.30:5673// at 0x7fe449df8bd0>
127.0.0.1 - - [02/Jun/2015 23:20:25] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4121
127.0.0.1 - - [02/Jun/2015 23:20:59] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4121
127.0.0.1 - - [02/Jun/2015 23:22:06] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4121
127.0.0.1 - - [02/Jun/2015 23:23:13] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4121
127.0.0.1 - - [02/Jun/2015 23:27:03] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4121
127.0.0.1 - - [2015-06-02 23:27:49] "GET / HTTP/1.1" 200 11413 0.002324
127.0.0.1 - - [02/Jun/2015 23:28:03] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4121
127.0.0.1 - - [02/Jun/2015 23:28:36] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4121
172.29.224.158 - - [02/Jun/2015 23:28:42] "GET / HTTP/1.1" 200 524
172.29.224.158 - - [02/Jun/2015 23:28:42] "GET /css/style.css HTTP/1.1" 200 2364
172.29.224.158 - - [02/Jun/2015 23:28:43] "GET /favicon.ico HTTP/1.1" 404 740
172.29.224.158 - - [02/Jun/2015 23:28:47] "GET /sandesh_trace.xml HTTP/1.1" 200 658
172.29.224.158 - - [02/Jun/2015 23:28:47] "GET /universal_parse.xsl HTTP/1.1" 200 26753
172.29.224.158 - - [02/Jun/2015 23:28:47] "GET /css/bootstrap.min.css HTTP/1.1" 200 106015
172.29.224.158 - - [02/Jun/2015 23:28:47] "GET /css/DT_bootstrap.css HTTP/1.1" 200 3743
172.29.224.158 - - [02/Jun/2015 23:28:47] "GET /js/jquery-2.0.3.min.js HTTP/1.1" 200 83612
172.29.224.158 - - [02/Jun/2015 23:28:47] "GET /js/bootstrap.min.js HTTP/1.1" 200 28631
172.29.224.158 - - [02/Jun/2015 23:28:47] "GET /js/jquery.dataTables.min.js HTTP/1.1" 200 110156
172.29.224.158 - - [02/Jun/2015 23:28:47] "GET /js/DT_bootstrap.js HTTP/1.1" 200 4270
172.29.224.158 - - [02/Jun/2015 23:28:47] "GET /js/util.js HTTP/1.1" 200 2416
172.29.224.158 - - [02/Jun/2015 23:28:50] "GET /sandesh_uve.xml HTTP/1.1" 200 1220
172.29.224.158 - - [02/Jun/2015 23:28:54] "GET /Snh_SandeshUVETypesReq? HTTP/1.1" 200 2056
172.29.224.158 - - [02/Jun/2015 23:28:54] "GET /Snh_SandeshUVETypesReq? HTTP/1.1" 200 2056
172.29.224.158 - - [02/Jun/2015 23:28:57] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4121
127.0.0.1 - - [02/Jun/2015 23:31:28] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4121
INFO:api-0:Connecting to 192.168.1.4:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=17179869461, time_out=400000, session_id=238029839708061713, passwd='A\x08\x84c\x0e\xc4\xde\xb6\xe0E\xb7?\x98\\CA', read_only=None)
WARNING:api-0:Session has expired
INFO:api-0:Zookeeper session lost, state: EXPIRED_SESSION
06/02/2015 11:59:50 PM [nodei34:contrail-api:Config:0]: SANDESH: CONNECT TO COLLECTOR: True
INFO:nodei34:contrail-api:Config:0:SANDESH: CONNECT TO COLLECTOR: True
06/02/2015 11:59:50 PM [nodei34:contrail-api:Config:0]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
INFO:nodei34:contrail-api:Config:0:SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
WARNING:keystoneclient.middleware.auth_token:This middleware module is deprecated as of v0.10.0 in favor of keystonemiddleware.auth_token - please update your WSGI pipeline to reference the new middleware package.
WARNING:keystoneclient.middleware.auth_token:Configuring admin URI using auth fragments. This is deprecated, use 'identity_uri' instead.
WARNING:keystoneclient.middleware.auth_token:Configuring auth_uri to point to the public identity endpoint is required; clients may not be able to authenticate against an admin endpoint
ERROR:nodei34:contrail-api:Config:0:Starting Introspect on HTTP Port 8084
ERROR:nodei34:contrail-api:Config:0:Cannot write http_port 8084 to /tmp/contrail-api.27287.http_port
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Auth token fetched from keystone.
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Connecting to ifmap on 192.168.1.2:8443 as api-server
Bottle v0.12.0 server starting up (using GeventServer())...
Listening on http://localhost:8095/
Hit Ctrl-C to quit.

WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: IFMAP connection ESTABLISHED
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Connecting to cassandra on ['192.168.1.2:9160', '192.168.1.3:9160', '192.168.1.4:9160']
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Keyspace names must be case-insensitively unique ("useragent" conflicts with "useragent")')
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Cannot add already existing column family "useragent_keyval_table" to keyspace "useragent"')
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Keyspace names must be case-insensitively unique ("config_db_uuid" conflicts with "config_db_uuid")')
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Cannot add already existing column family "obj_uuid_table" to keyspace "config_db_uuid"')
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Cannot add already existing column family "obj_fq_name_table" to keyspace "config_db_uuid"')
06/03/2015 12:00:04 AM [nodei34:contrail-api:Config:0]: SANDESH: CONNECT TO COLLECTOR: True
INFO:nodei34:contrail-api:Config:0:SANDESH: CONNECT TO COLLECTOR: True
06/03/2015 12:00:05 AM [nodei34:contrail-api:Config:0]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
INFO:nodei34:contrail-api:Config:0:SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
WARNING:keystoneclient.middleware.auth_token:This middleware module is deprecated as of v0.10.0 in favor of keystonemiddleware.auth_token - please update your WSGI pipeline to reference the new middleware package.
WARNING:keystoneclient.middleware.auth_token:Configuring admin URI using auth fragments. This is deprecated, use 'identity_uri' instead.
WARNING:keystoneclient.middleware.auth_token:Configuring auth_uri to point to the public identity endpoint is required; clients may not be able to authenticate against an admin endpoint
ERROR:nodei34:contrail-api:Config:0:Starting Introspect on HTTP Port 8084
ERROR:nodei34:contrail-api:Config:0:Cannot write http_port 8084 to /tmp/contrail-api.7292.http_port
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Auth token fetched from keystone.
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Connecting to ifmap on 192.168.1.2:8443 as api-server
Bottle v0.12.0 server starting up (using GeventServer())...
Listening on http://localhost:8095/
Hit Ctrl-C to quit.

WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: IFMAP connection ESTABLISHED
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Connecting to cassandra on ['192.168.1.2:9160', '192.168.1.3:9160', '192.168.1.4:9160']
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Keyspace names must be case-insensitively unique ("useragent" conflicts with "useragent")')
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Cannot add already existing column family "useragent_keyval_table" to keyspace "useragent"')
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Keyspace names must be case-insensitively unique ("config_db_uuid" conflicts with "config_db_uuid")')
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Cannot add already existing column family "obj_uuid_table" to keyspace "config_db_uuid"')
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Cannot add already existing column family "obj_fq_name_table" to keyspace "config_db_uuid"')
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Cassandra connection ESTABLISHED
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Connecting to zookeeper on 192.168.1.2:2181,192.168.1.3:2181,192.168.1.4:2181
INFO:api-0:Connecting to 192.168.1.4:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=400000, session_id=0, passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', read_only=None)
INFO:api-0:Zookeeper connection established, state: CONNECTED
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Initializing RabbitMQ connection, urls ['pyamqp://guest:guest@192.168.1.30:5673//']
ERROR:nodei34:contrail-api:Config:0:__default__ [SYS_ERR]: VncApiError: RabbitMQ connection down
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: RabbitMQ connection ESTABLISHED <Connection: amqp://guest@192.168.1.30:5673// at 0x7f98d19d52d0>
127.0.0.1 - - [03/Jun/2015 00:02:17] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 3274

Tags: config ha
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11325
Submitter: Sachin Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11326
Submitter: Sachin Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11325
Committed: http://github.org/Juniper/contrail-controller/commit/ee2b892244d3227414e2ea55619cc8b5a89898b5
Submitter: Zuul
Branch: master

commit ee2b892244d3227414e2ea55619cc8b5a89898b5
Author: Sachin Bansal <email address hidden>
Date: Fri Jun 5 15:40:00 2015 -0700

Restrict zk timeout to 5 minutes max

Change-Id: I54ace71b88a3b30f0a63fbcbbe2187b809cf1daa
Closes-Bug: 1461237

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11326
Committed: http://github.org/Juniper/contrail-controller/commit/726142b4c1f2db6eb2eb99c2d640df08f811a972
Submitter: Zuul
Branch: R2.20

commit 726142b4c1f2db6eb2eb99c2d640df08f811a972
Author: Sachin Bansal <email address hidden>
Date: Fri Jun 5 15:40:00 2015 -0700

Restrict zk timeout to 5 minutes max

Change-Id: I54ace71b88a3b30f0a63fbcbbe2187b809cf1daa
Closes-Bug: 1461237
(cherry picked from commit ee2b892244d3227414e2ea55619cc8b5a89898b5)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.