On network disconnect-reconnect, few tor-agents not responding to their http ports
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R2.20 |
Fix Committed
|
High
|
Megh Bhatt | |||
Trunk |
Fix Committed
|
High
|
Megh Bhatt |
Bug Description
R2.20 Build 30 Ubuntu 14.04 Juno multi-node
In the tor-scale setup, there are 4 tor-agent/tsn nodes (i.e two pairs for active/standby) , each of them with 64 tor-agents
nodei38 and nodei28 is one such pair for TORs ovs-vm1 to ovs-vm64
I disconnected the control/data link on nodei38
contrail-
After about 20-30 mins, i reconnected the link on nodei38
It was then seen that 3 tor-agents (like contrail-
contrail-status would show them with 'timeout'
root@nodei38:
root 8672 0.0 0.0 10460 936 pts/0 S+ 12:55 0:00 grep --color=auto agent-49
root 25622 2.7 0.0 2413912 139988 ? Sl 00:54 19:52 /usr/bin/
root@nodei38:
root@nodei38:
contrail-
contrail-
contrail-
root@nodei38:
root@nodei38:
^C
root@nodei38:
contrail- 25622 root 10u IPv4 28219150 0t0 TCP *:9058 (LISTEN)
root@nodei38:
tags: | added: analytics |
contrail-tor-agent is struck in tbb::mutex::lock called from TcpSession: :IsClosed. On looking at the lock, it seems that there is an attempt to take lock on a free'd TcpSession.
(gdb) bt sysdeps/ unix/sysv/ linux/x86_ 64/lowlevellock .S:135 64-linux- gnu/libpthread. so.0 pthread_ mutex_lock (mutex= 0x7f76980e6b98) at ../nptl/ pthread_ mutex_lock. c:79 e6b98) at /usr/include/ tbb/mutex. h:164 :scoped_ lock::acquire (this=0x7fff60b 7fc40, mutex=...) at /usr/include/ tbb/mutex. h:105 :scoped_ lock::scoped_ lock (this=0x7fff60b 7fc40, mutex=...) at /usr/include/ tbb/mutex. h:91 :IsClosed (this=0x7f76980 e6b90) at controller/ src/io/ tcp_session. h:119 :SendSandesh (this=0x7f76b40 70f70, snh=0x3dcdc40) at tools/sandesh/ library/ cpp/sandesh_ client. cc:117 :SendEnqueue (this=0x3dcdc40) at tools/sandesh/ library/ cpp/sandesh. cc:591 library/ cpp/sandesh. cc:611 vel::SYS_ NOTICE, file=..., line=1314, f1=..., f2=..., f3=..., xmpp/sandesh/ xmpp_state_ machine_ sandesh_ types.h: 504 e::OnSessionEve nt (this=0x7f765c0 0e5b0, session= 0x7f7694088de0, event=TcpSessio n::CONNECT_ FAILED) src/xmpp/ xmpp_state_ machine. cc:1311 _mfi::mf2< void, XmppStateMachine, TcpSession*, TcpSession: :Event> ::operator( ) (this=0x7f76940 88e98, 0e5b0, a1=0x7f7694088de0, a2=TcpSession: :CONNECT_ FAILED) at /usr/include/ boost/bind/ mem_fn_ template. hpp:280 _bi::list3< boost:: _bi::value< XmppStateMachin e*>, boost::arg<1>, boost::arg<2> >::operator( )<boost: :_mfi:: mf2<void, XmppStateMachine, TcpSession*, TcpSession::Event>, boost:: _bi::list2< TcpSession* &, TcpSession::Event&> > (this=0x7f76940 88ea8, f=..., boost/bind/ bind.hpp: 392 _bi::bind_ t<void, boost:: _mfi::mf2< void, XmppStateMachine, TcpSession*, TcpSession::Event>, boost:: _bi::list3< boost:: _bi::value< XmppStateMachin e*>, boost::arg<1>, boost::arg<2> > >::operator( )<TcpSession* , TcpSession::Event> (this=0x7f76940 88e98, @0x7fff60b80160 : 0x7f7694088de0, a2=@0x7fff60b8015c: TcpSession: :CONNECT_ FAILED) at /usr/include/ boost/bind/ bind_template. hpp:61 detail: :function: :void_function_ obj_invoker2< boost:: _bi::bind_ t<void, boost:: _mfi::mf2< void, XmppStateMachine, TcpSession*, TcpSession::Event>, boost:: _bi::list3< boost:: _bi::value< XmppStateMachin e*>, boost::arg<1>, boost::arg<2> > >, void, TcpSession*, TcpSession: :Event> ::invoke (function_ obj_ptr= ..., a0=0x7f7694088de0, a1=TcpSession: :CONNECT_ FAILED) boost/function/ function_ template. hpp:153 function2< void, Tcp...
#0 __lll_lock_wait () at ../nptl/
#1 0x00007f76c3b28657 in _L_lock_909 () from /lib/x86_
#2 0x00007f76c3b28480 in __GI___
#3 0x000000000100c7ec in tbb::mutex::lock (this=0x7f76980
#4 0x000000000100c79e in tbb::mutex:
#5 0x000000000100c759 in tbb::mutex:
#6 0x00000000012fce92 in TcpSession:
#7 0x0000000001859d81 in SandeshClient:
#8 0x000000000183f7e3 in Sandesh:
#9 0x000000000183fa36 in Sandesh::Dispatch (this=0x3dcdc40, sconn=0x0) at tools/sandesh/
#10 0x0000000001616f2e in XmppEventLog::Send (category=..., level=SandeshLe
f4=...) at build/debug/
#11 0x000000000161305d in XmppStateMachin
at controller/
#12 0x0000000001631886 in boost::
p=0x7f765c0
#13 0x00000000016300ca in boost::
a=...) at /usr/include/
#14 0x000000000162edd8 in boost::
a1=
#15 0x000000000162def1 in boost::
at /usr/include/
#16 0x000000000182e424 in boost::