schema/api-server can get disconnected from zk at times in a scale setup

Bug #1463270 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Hampapur Ajay
Trunk
Fix Committed
High
Hampapur Ajay

Bug Description

R2.20 Build 45 Ubuntu 14.04 Juno multi-node setup

In this tor-scale setup, we have ~64K vns,
128 tors and 128K vmis on 500 vns.

It was seen that schema gets disconnected from zk every few mins (5-10 mins) and it would become active on a new master

schema-zk.log keeps showing these :

06/09/2015 09:02:37 AM [schema]: Connection dropped: socket connection broken
06/09/2015 09:02:37 AM [schema]: Transition to CONNECTING
06/09/2015 09:02:37 AM [schema]: Zookeeper connection lost
06/09/2015 09:02:37 AM [schema]: Connecting to 192.168.1.3:2181
06/09/2015 09:02:37 AM [schema]: Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=21474837297, time_out=400000, session_id=238079962090569788, passwd='m\xd5r\xaf\xaaJ\xe3\xc1\xf7\xf3A\xdb\x124\xdaE', read_only=None)
06/09/2015 09:02:37 AM [schema]: Session has expired
06/09/2015 09:02:37 AM [schema]: Zookeeper session lost, state: EXPIRED_SESSION
06/09/2015 09:02:38 AM [schema]: Connecting to 192.168.1.4:2181
06/09/2015 09:02:38 AM [schema]: Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=400000, session_id=0, passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', read_only=None)
06/09/2015 09:02:38 AM [schema]: Zookeeper connection established, state: CONNECTED
06/09/2015 09:02:38 AM [schema]: Sending request(xid=1): Exists(path='/schema-transformer', watcher=None)
06/09/2015 09:02:38 AM [schema]: Received response(xid=1): ZnodeStat(czxid=12884901891, mzxid=12884901891, ctime=1432623413370, mtime=1432623413370, version=0, cversion=2420, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=2, pzxid=21474837300)

env.roledefs = {
    'all': [host2, host3, host4, host5, host6, host7, host8, host9],
    'cfgm': [host2, host3, host4],
    'openstack': [host2, host3, host4],
    'webui': [host3],
    'control': [host2, host3, host4],
    'compute': [host5, host6, host7, host8, host9],
    'collector': [host2, host3, host4],
    'database': [host2, host3, host4],
    'toragent': [host5, host6, host7, host9 ],
    'tsn': [host5, host6, host7,host9 ],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodei34', 'nodei35', 'nodei36', 'nodei37', 'nodei38', 'nodei28', 'nodei27', 'nodei30']
}

Tags: bms config scale
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11461
Submitter: Sachin Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11462
Submitter: Sachin Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11462
Committed: http://github.org/Juniper/contrail-controller/commit/564be77452b4971a7b3b08e35e676983e18fa7e4
Submitter: Zuul
Branch: R2.20

commit 564be77452b4971a7b3b08e35e676983e18fa7e4
Author: Sachin Bansal <email address hidden>
Date: Wed Jun 10 10:08:31 2015 -0700

Allow zookeeper heartbeat to expire if determined by the application

At init time, api server may be busy doing resync. At this time, it is ok if
zookeeper heartbeat expires. Added callback in zkclient to be called when kazoo
state goes to lost. API server sets it to a stub function initially, then resets
it after init is done.

Change-Id: Icd91cfb9637a8b0086a5486722a47e26a875acbb
Partial-Bug: 1463270
(cherry picked from commit 81f9efa126d7747e5313a61b55831ce01887e808)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11461
Committed: http://github.org/Juniper/contrail-controller/commit/81f9efa126d7747e5313a61b55831ce01887e808
Submitter: Zuul
Branch: master

commit 81f9efa126d7747e5313a61b55831ce01887e808
Author: Sachin Bansal <email address hidden>
Date: Wed Jun 10 10:08:31 2015 -0700

Allow zookeeper heartbeat to expire if determined by the application

At init time, api server may be busy doing resync. At this time, it is ok if
zookeeper heartbeat expires. Added callback in zkclient to be called when kazoo
state goes to lost. API server sets it to a stub function initially, then resets
it after init is done.

Change-Id: Icd91cfb9637a8b0086a5486722a47e26a875acbb
Partial-Bug: 1463270

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.