control-node to ifmap-server connection did not come up after restart

Bug #1454376 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Tapan Karwa
Trunk
Fix Committed
High
Tapan Karwa

Bug Description

R2.20 13 Ubuntu 14.04 juno multinode setup

In this tor-scale setup, we have 128 TORs, 512 lifs on each tor, 2 vmis on each lif.
Each lif is part of different Vn (512 vns in all)

Totally,
128 tors
~64K lifs
~128K vmis
~512 vns

For some reason, control-node crashed on nodei35 . When it restarted, it could not establish connection to ifmap server on nodei34
Since it is a scale setup, ifmap can get loaded at times.

-----
2015-05-12 10:08:24.414 IFMapBestPeerTrace: PeerDown 192.168.1.2:8443:in_use 1 controller/src/ifmap/client/peer_server_finder.cc 202
-----
contrail-control.log:

2015-05-12 Tue 23:39:06:132.574 IST nodei35 [Thread 140658807375616, Pid 1034]: IFMapStateMachine [SYS_WARN]: IFMapPeerConnError: Operation canceled NewSessionResponseWait SsrcStart ifsm::EvResponseTimerExpired controller/src/ifmap/client/ifmap_state_machine.cc 915
2015-05-12 Tue 23:39:14:151.693 IST nodei35 [Thread 140658832566016, Pid 1034]: IFMapStateMachine [SYS_WARN]: IFMapPeerConnError: Operation canceled NewSessionResponseWait SsrcStart ifsm::EvResponseTimerExpired controller/src/ifmap/client/ifmap_state_machine.cc 915
2015-05-12 Tue 23:39:22:375.027 IST nodei35 [Thread 140659378697984, Pid 1034]: IFMapStateMachine [SYS_WARN]: IFMapPeerConnError: Operation canceled SubscribeResponseWait SsrcStart ifsm::EvResponseTimerExpired controller/src/ifmap/client/ifmap_state_machine.cc 915
2015-05-12 Tue 23:39:22:846.321 IST nodei35 [Thread 140659382896384, Pid 1034]: BGP [SYS_NOTICE]: BgpPeerNotificationLog: Bgp Peer 192.168.1.3:179::192.168.1.4:39259 SEND Notification with Code 6 and SubCode 3 ( Cease:Administrator has unconfigured the peer ) controller/src/bgp/bgp_session.cc 96
2015-05-12 Tue 23:39:30:390.763 IST nodei35 [Thread 140659395491584, Pid 1034]: IFMapStateMachine [SYS_WARN]: IFMapPeerConnError: Operation canceled NewSessionResponseWait SsrcStart ifsm::EvResponseTimerExpired controller/src/ifmap/client/ifmap_state_machine.cc 915
2015-05-12 Tue 23:39:38:405.759 IST nodei35 [Thread 140659378697984, Pid 1034]: IFMapStateMachine [SYS_WARN]: IFMapPeerConnError: Operation canceled NewSessionResponseWait SsrcStart ifsm::EvResponseTimerExpired controller/src/ifmap/client/ifmap_state_machine.cc 915
2015-05-12 Tue 23:39:46:421.477 IST nodei35 [Thread 140659424880384, Pid 1034]: IFMapStateMachine [SYS_WARN]: IFMapPeerConnError: Operation canceled NewSessionResponseWait SsrcStart ifsm::EvResponseTimerExpired controller/src/ifmap/client/ifmap_state_machine.cc 915

env.roledefs = {
    'all': [host2, host3, host4, host5, host6],
    'cfgm': [host2, host3],
    'openstack': [host2],
    'webui': [host3],
    'control': [host3, host4],
    'compute': [host5, host6],
    'collector': [host2, host3],
    'database': [host2, host3, host4],
    'toragent': [host6],
    'tsn': [host6],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodei34', 'nodei35', 'nodei36', 'nodei37', 'nodei38']
}

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/10417
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10417
Committed: http://github.org/Juniper/contrail-controller/commit/d7a1f8450653fe82d1ca8f2a95773a1efe48e68b
Submitter: Zuul
Branch: R2.20

commit d7a1f8450653fe82d1ca8f2a95773a1efe48e68b
Author: Tapan Karwa <email address hidden>
Date: Fri May 15 08:22:38 2015 -0700

Change the response timeout from 5 seconds to 60 seconds.

Partial-Bug: 1454376

Change-Id: Ic1e5280762a02b5de50cad9f27a8c7f007c2e91a

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/10615
Submitter: Tapan Karwa (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10615
Committed: http://github.org/Juniper/contrail-controller/commit/3899a431552ae3f6a577a30c55693bb9da2974fc
Submitter: Zuul
Branch: master

commit 3899a431552ae3f6a577a30c55693bb9da2974fc
Author: Tapan Karwa <email address hidden>
Date: Fri May 15 08:22:38 2015 -0700

Change the response timeout from 5 seconds to 60 seconds.

Closes-Bug: 1454376

Change-Id: Ic1e5280762a02b5de50cad9f27a8c7f007c2e91a

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.