While adding large number of vmis, api-server lost connection with zookeeper

Bug #1433886 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.1
Fix Committed
High
Hampapur Ajay
R2.20
Fix Committed
High
Hampapur Ajay
Trunk
Fix Committed
High
Hampapur Ajay

Bug Description

R2.1 Build 42 Ubuntu 14.04 multi-node icehouse setup

This is a tor scale setup with about 180 tors, and 40K VMIs.
1K VMIs in each VN. 10 physical ports on each tor, 100 VMIs on each logical port
While adding vmis incrementally, it was seen that api-server connection to zookeeper was lost
After about 10-15 mins, api-server returned to normalcy
Logs will be in http://10.204.216.50/Docs/bugs/#

root@nodei34:/var/log/contrail# neutron net-list
500-{u'NeutronError': {u'message': u'An unknown exception occurred.', u'type': u'NeutronException', u'detail': u''}}
root@nodei34:/var/log/contrail#

env.roledefs = {
    'all': [host2, host3, host4, host5, host6],
    'cfgm': [host2, host3],
    'openstack': [host2],
    'webui': [host3],
    'control': [host3, host4],
    'compute': [host5, host6],
    'collector': [host2, host3],
    'database': [host2, host3, host4],
    'toragent': [host6],
    'tsn': [host6],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodei34', 'nodei35', 'nodei36', 'nodei37', 'nodei38']
}
supervisor-config manually stopped on nodei35.

contrail-api-0-stdout.log :
======================

192.168.1.2 - - [2015-03-19 09:52:43] "POST /fqname-to-id HTTP/1.1" 200 156 0.003323
192.168.1.2 - - [2015-03-19 09:52:43] "GET /virtual-machine-interface/639ee13c-9df4-49d5-b042-2859776dce10?exclude_back_refs=True&exclude_children=True HTTP/1.1" 200 1998 0.006464
192.168.1.2 - - [2015-03-19 09:52:43] "POST /fqname-to-id HTTP/1.1" 200 156 0.003882
WARNING:api-0:Connection dropped: socket connection broken
WARNING:api-0:Transition to CONNECTING
INFO:api-0:Zookeeper connection lost
INFO:api-0:Connecting to 192.168.1.2:2181
192.168.1.2 - - [2015-03-19 09:53:04] "GET /virtual-machine-interface/857931f8-a335-4f6a-a357-fd0b18fcae98?exclude_back_refs=True&exclude_children=True HTTP/1.1" 200 1627 21.055019
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=4295044169, time_out=20000, session_id=237604193645428757, passwd='S\xef(\xeao\xfe\xeaL\x82\x86T\x1a/)\x18A', read_only=None)
127.0.0.1 - - [2015-03-19 09:53:04] "POST /virtual-machine-interfaces HTTP/1.1" 200 552 21.316497
WARNING:api-0:Session has expired
INFO:api-0:Zookeeper session lost, state: EXPIRED_SESSION
03/19/2015 09:53:06 AM [nodei34:contrail-api:Config:0]: SANDESH: CONNECT TO COLLECTOR: True
INFO:nodei34:contrail-api:Config:0:SANDESH: CONNECT TO COLLECTOR: True
03/19/2015 09:53:17 AM [nodei34:contrail-api:Config:0]: SANDESH: CONNECT TO COLLECTOR: True
INFO:nodei34:contrail-api:Config:0:SANDESH: CONNECT TO COLLECTOR: True
03/19/2015 09:53:17 AM [nodei34:contrail-api:Config:0]: SANDESH: Logging: False -> 1
INFO:nodei34:contrail-api:Config:0:SANDESH: Logging: False -> 1
03/19/2015 09:53:17 AM [nodei34:contrail-api:Config:0]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
INFO:nodei34:contrail-api:Config:0:SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
WARNING:keystoneclient.middleware.auth_token:Configuring auth_uri to point to the public identity endpoint is required; clients may not be able to authenticate against an admin endpoint
Bottle v0.12.0 server starting up (using GeventServer())...
Listening on http://0.0.0.0:8084/
Hit Ctrl-C to quit.

WARNING:nodei34:contrail-api:Config:0:No Connection: __default__ [SYS_NOTICE]: VncApiError: Auth token fetched from keystone.
WARNING:nodei34:contrail-api:Config:0:No Connection: __default__ [SYS_NOTICE]: VncApiError: Connecting to ifmap on 192.168.1.2:8443 as api-server
Bottle v0.12.0 server starting up (using GeventServer())...
Listening on http://localhost:8095/
Hit Ctrl-C to quit.

ERROR:cfgm_common.ifmap.client:Uknown error sending IF-MAP message to server [Errno 111] Connection refused
WARNING:nodei34:contrail-api:Config:0:No Connection: __default__ [SYS_NOTICE]: VncApiError: IFMAP connection ESTABLISHED
WARNING:nodei34:contrail-api:Config:0:No Connection: __default__ [SYS_NOTICE]: VncApiError: Connecting to cassandra on ['192.168.1.2:9160', '192.168.1.3:9160', '192.168.1.4:9160']
WARNING:nodei34:contrail-api:Config:0:No Connection: __default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Keyspace names must be case-insensitively unique ("useragent" conflicts with "useragent")')
WARNING:nodei34:contrail-api:Config:0:No Connection: __default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Cannot add already existing column family "useragent_keyval_table" to keyspace "useragent"')
WARNING:nodei34:contrail-api:Config:0:No Connection: __default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Keyspace names must be case-insensitively unique ("config_db_uuid" conflicts with "config_db_uuid")')
WARNING:nodei34:contrail-api:Config:0:No Connection: __default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Cannot add already existing column family "obj_uuid_table" to keyspace "config_db_uuid"')
WARNING:nodei34:contrail-api:Config:0:No Connection: __default__ [SYS_WARN]: VncApiError: Warning! InvalidRequestException(why='Cannot add already existing column family "obj_fq_name_table" to keyspace "config_db_uuid"')
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Cassandra connection ESTABLISHED
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Connecting to zookeeper on 192.168.1.2:2181,192.168.1.3:2181,192.168.1.4:2181
INFO:api-0:Connecting to 192.168.1.4:2181
INFO:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=20000, session_id=0, passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', read_only=None)
INFO:api-0:Zookeeper connection established, state: CONNECTED
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Connecting to rabbitmq on 192.168.1.2:5672 user guest
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: RabbitMQ connection ESTABLISHED
WARNING:nodei34:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Running greenlet _dbe_oper_subscribe
127.0.0.1 - - [2015-03-19 09:58:22] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 3374 0.002072
127.0.0.1 - - [2015-03-19 09:59:14] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 3374 0.002404
127.0.0.1 - - [2015-03-19 10:05:49] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 3374 0.003003
Bottle v0.12.0 server starting up (using GeventServer())...
Listening on http://0.0.0.0:9100/
Hit Ctrl-C to quit.

WARNING:keystoneclient.middleware.auth_token:Unable to find authentication token in headers
127.0.0.1 - - [2015-03-19 10:12:09] "GET / HTTP/1.1" 401 194 0.000826
WARNING:keystoneclient.middleware.auth_token:Unable to find authentication token in headers
127.0.0.1 - - [2015-03-19 10:12:09] "GET / HTTP/1.1" 401 194 0.000633
127.0.0.1 - - [2015-03-19 10:12:10] "GET / HTTP/1.1" 200 9743 1.000477
127.0.0.1 - - [2015-03-19 10:12:10] "GET / HTTP/1.1" 200 9743 0.652401
127.0.0.1 - - [2015-03-19 10:12:10] "POST /fqname-to-id HTTP/1.1" 200 156 0.011116
127.0.0.1 - - [2015-03-19 10:12:10] "GET /domains?count=False&detail=False HTTP/1.1" 200 280 0.005756
127.0.0.1 - - [2015-03-19 10:12:10] "POST /fqname-to-id HTTP/1.1" 200 156 0.005885
127.0.0.1 - - [2015-03-19 10:12:10] "GET /projects?count=False&detail=False HTTP/1.1" 200 469 0.003296
WARNING:keystoneclient.middleware.auth_token:Unable to find authentication token in headers
192.168.1.2 - - [2015-03-19 10:12:11] "GET / HTTP/1.1" 401 194 0.000655
WARNING:keystoneclient.middleware.auth_token:Unable to find authentication token in headers
192.168.1.2 - - [2015-03-19 10:12:12] "GET / HTTP/1.1" 401 194 0.001126
192.168.1.2 - - [2015-03-19 10:12:12] "GET / HTTP/1.1" 200 9919 0.276946
192.168.1.2 - - [2015-03-19 10:12:12] "GET /virtual-networks?count=False&detail=False HTTP/1.1" 200 12148 0.008781
192.168.1.2 - - [2015-03-19 10:12:12] "GET / HTTP/1.1" 200 9919 0.245284
192.168.1.2 - - [2015-03-19 10:12:12] "GET /routing-instances?count=False&detail=True HTTP/1.1" 200 69059 0.071517
192.168.1.2 - - [2015-03-19 10:12:12] "POST /fqname-to-id HTTP/1.1" 200 156 0.061192
-------------------------------
zookeeper logs on nodei35 (others too have similar logs )
=======================

2015-03-19 09:02:49,290 - WARN [SyncThread:2:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:2 took 1068ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2015-03-19 09:53:04,001 - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x34c23c157610015, timeout of 20000ms exceeded
2015-03-19 09:53:04,001 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x34c23c157610015
2015-03-19 09:53:08,559 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x24c23c147630017, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2015-03-19 09:53:08,560 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.2:38941 which had sessionid 0x24c23c147630017
2015-03-19 09:53:08,573 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x24c23c147630015, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2015-03-19 09:53:08,574 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.2:58058 which had sessionid 0x24c23c147630015
2015-03-19 09:53:26,000 - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x34c23c157610014, timeout of 20000ms exceeded
2015-03-19 09:53:26,001 - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x24c23c147630015, timeout of 20000ms exceeded
2015-03-19 09:53:26,001 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x34c23c157610014
2015-03-19 09:53:26,002 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x24c23c147630015
2015-03-19 09:53:28,000 - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x24c23c147630017, timeout of 20000m

tags: added: blocker
Changed in juniperopenstack:
assignee: nobody → Hampapur Ajay (hajay)
Revision history for this message
Vedamurthy Joshi (vedujoshi) wrote :

The same issue is seen even when booting a VM on a VN with 10K VMis. (with pre-allocated port)

tags: added: bms
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Will target this for 2.20 release onwards only

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/9435
Submitter: Vedamurthy Joshi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.1

Review in progress for https://review.opencontrail.org/9446
Submitter: Vedamurthy Joshi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/9447
Submitter: Vedamurthy Joshi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/9435
Committed: http://github.org/Juniper/contrail-controller/commit/75d71806cf0528bc7834ea33d6bfcfcaaacab461
Submitter: Zuul
Branch: master

commit 75d71806cf0528bc7834ea33d6bfcfcaaacab461
Author: Vedu Joshi <email address hidden>
Date: Thu Apr 23 12:17:19 2015 +0530

Increase timeout when config components talk to zookeeper

This change is needed in scaled environments where api-requests
take longer times
Closes-bug: #1433886

Change-Id: I715645a91d90944e182b368da89f8daf5cc155b5

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/9446
Committed: http://github.org/Juniper/contrail-controller/commit/e61c3efcaa1eb88650b20ba183cca21c85ba2f50
Submitter: Zuul
Branch: R2.1

commit e61c3efcaa1eb88650b20ba183cca21c85ba2f50
Author: Vedu Joshi <email address hidden>
Date: Thu Apr 23 12:17:19 2015 +0530

Increase timeout when config components talk to zookeeper

This change is needed in scaled environments where api-requests
take longer times
Closes-bug: #1433886

Change-Id: I715645a91d90944e182b368da89f8daf5cc155b5

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/9447
Committed: http://github.org/Juniper/contrail-controller/commit/39c7b7fc0b3b23105f18572eba784a998f7b2e53
Submitter: Zuul
Branch: R2.20

commit 39c7b7fc0b3b23105f18572eba784a998f7b2e53
Author: Vedu Joshi <email address hidden>
Date: Thu Apr 23 12:17:19 2015 +0530

Increase timeout when config components talk to zookeeper

This change is needed in scaled environments where api-requests
take longer times
Closes-bug: #1433886

Change-Id: I715645a91d90944e182b368da89f8daf5cc155b5

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.