TOR Agent crash @ WorkQueue<OVSDB::OvsdbClientTcpSession::queue_msg>::ShutdownLocked

Bug #1426303 reported by Prabhjot Singh Sethi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Prabhjot Singh Sethi
R2.1
Fix Committed
High
Prabhjot Singh Sethi

Bug Description

on tcp session close, TOR agent crashed while shutting down workqueue as it fails to stop currently running OVSDB::IO task

#0 0x00007f923c536bb9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f923c539fc8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f923c52fa76 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f923c52fb22 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x000000000089ea15 in WorkQueue<OVSDB::OvsdbClientTcpSession::queue_msg>::ShutdownLocked (this=0x7f92281cdab0, delete_entries=<optimized out>) at controller/src/base/queue_task.h:342
#5 0x000000000089ca2f in Shutdown (delete_entries=true, this=0x7f92281cdab0) at controller/src/base/queue_task.h:140
#6 OVSDB::OvsdbClientTcpSession::~OvsdbClientTcpSession (this=0x7f92281be190, __in_chrg=<optimized out>) at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_tcp.cc:106
#7 0x000000000089cdc9 in OVSDB::OvsdbClientTcpSession::~OvsdbClientTcpSession (this=0x7f92281be190, __in_chrg=<optimized out>) at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_tcp.cc:111
#8 0x0000000000c2e91c in TcpServer::DeleteSession (this=0x7f92280024e0, session=<optimized out>) at controller/src/io/tcp_server.cc:157
#9 0x00000000008c6ce7 in OVSDB::intrusive_ptr_release (p=0x7f9230021a10) at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_idl.cc:97
#10 0x00000000008cb67f in ~intrusive_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:97
#11 operator= (rhs=0x0, this=0x7f922c169188) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:135
#12 OVSDB::OvsdbDBObject::EmptyTable (this=this@entry=0x7f922c1690d0) at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_object.cc:108
#13 0x00000000008d8e69 in OVSDB::UnicastMacRemoteTable::EmptyTable (this=0x7f922c1690d0) at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/unicast_mac_remote_ovsdb.cc:306
#14 0x00000000009aad7a in KSyncObject::NotifyEvent (this=0x7f922c1690d0, entry=0x7f922c16fb90, event=<optimized out>) at controller/src/ksync/ksync_object.cc:1111
#15 0x00000000009aaf4d in KSyncObject::SafeNotifyEvent (this=0x7f922c1690d0, entry=0x7f922c16fb90, event=KSyncEntry::DEL_REQ) at controller/src/ksync/ksync_object.cc:138
#16 0x00000000009ab992 in KSyncObjectManager::Process (this=0x7f9228002660, event=0x7f9230058590) at controller/src/ksync/ksync_object.cc:1216
#17 0x00000000009af78a in operator() (a0=0x7f9230058590, this=0x7f9235bbba90) at /usr/include/boost/function/function_template.hpp:767
#18 RunQueue (this=0x7f92300ab170) at controller/src/base/queue_task.h:53
#19 QueueTaskRunner<KSyncObjectEvent*, WorkQueue<KSyncObjectEvent*> >::Run (this=0x7f92300ab170) at controller/src/base/queue_task.h:36
#20 0x0000000000cfb5d0 in TaskImpl::execute (this=0x7f9235d87740) at controller/src/base/task.cc:232
#21 0x00007f923d73eb3a in ?? () from /usr/lib/libtbb.so.2
#22 0x00007f923d73a816 in ?? () from /usr/lib/libtbb.so.2
#23 0x00007f923d739f4b in ?? () from /usr/lib/libtbb.so.2
#24 0x00007f923d7360ff in ?? () from /usr/lib/libtbb.so.2
#25 0x00007f923d7362f9 in ?? () from /usr/lib/libtbb.so.2
#26 0x00007f923d95a182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#27 0x00007f923c5fafbd in clone () from /lib/x86_64-linux-gnu/libc.so.6

on thread 2

(gdb) thr 2
[Switching to thread 2 (Thread 0x7f92357bb700 (LWP 3220))]
#0 0x00000000008e17d9 in json_serialize_string ()
(gdb) bt
#0 0x00000000008e17d9 in json_serialize_string ()
#1 0x00000000008e151f in json_serialize_object_member ()
#2 0x00000000008e1668 in json_serialize_object ()
#3 0x00000000008e13e9 in json_serialize ()
#4 0x00000000008e1358 in json_to_ds ()
#5 0x00000000008e130f in json_to_string ()
#6 0x00000000008c6e6c in OVSDB::OvsdbClientIdl::SendJsonRpc (this=0x7f9230021a10, msg=<optimized out>) at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_idl.cc:169
#7 0x00000000008c864d in OVSDB::OvsdbClientIdl::MessageProcess (this=0x7f9230021a10, buf=buf@entry=0x7f922c16e6a0 "{\"method\":\"echo\",\"id\":\"echo\",\"params\":[]}Log", len=len@entry=41)
    at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_idl.cc:211
#8 0x00000000008ca64e in OVSDB::OvsdbClientSession::MessageProcess (this=this@entry=0x7f92281be190, buf=buf@entry=0x7f922c16e6a0 "{\"method\":\"echo\",\"id\":\"echo\",\"params\":[]}Log", len=len@entry=41)
    at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_session.cc:31
#9 0x000000000089b1d2 in OVSDB::OvsdbClientTcpSession::ReceiveDequeue (this=0x7f92281be190, msg=...) at controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_tcp.cc:133
#10 0x000000000089d2e0 in operator() (a1=..., p=<optimized out>, this=<optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:165
#11 operator()<bool, boost::_mfi::mf1<bool, OVSDB::OvsdbClientTcpSession, OVSDB::OvsdbClientTcpSession::queue_msg>, boost::_bi::list1<OVSDB::OvsdbClientTcpSession::queue_msg&> > (a=<synthetic pointer>, f=..., this=<optimized out>)
    at /usr/include/boost/bind/bind.hpp:303
#12 operator()<OVSDB::OvsdbClientTcpSession::queue_msg> (a1=<synthetic pointer>, this=<optimized out>) at /usr/include/boost/bind/bind_template.hpp:32
#13 boost::detail::function::function_obj_invoker1<boost::_bi::bind_t<bool, boost::_mfi::mf1<bool, OVSDB::OvsdbClientTcpSession, OVSDB::OvsdbClientTcpSession::queue_msg>, boost::_bi::list2<boost::_bi::value<OVSDB::OvsdbClientTcpSession*>, boost::arg<1> > >, bool, OVSDB::OvsdbClientTcpSession::queue_msg>::invoke (function_obj_ptr=..., a0=...) at /usr/include/boost/function/function_template.hpp:132
#14 0x000000000089ffa8 in operator() (a0=..., this=0x7f92357baa90) at /usr/include/boost/function/function_template.hpp:767
#15 RunQueue (this=0x7f922c0a51f0) at controller/src/base/queue_task.h:53
#16 QueueTaskRunner<OVSDB::OvsdbClientTcpSession::queue_msg, WorkQueue<OVSDB::OvsdbClientTcpSession::queue_msg> >::Run (this=0x7f922c0a51f0) at controller/src/base/queue_task.h:36
#17 0x0000000000cfb5d0 in TaskImpl::execute (this=0x7f9235d8c740) at controller/src/base/task.cc:232
#18 0x00007f923d73eb3a in ?? () from /usr/lib/libtbb.so.2
#19 0x00007f923d73a816 in ?? () from /usr/lib/libtbb.so.2
#20 0x00007f923d739f4b in ?? () from /usr/lib/libtbb.so.2
#21 0x00007f923d7360ff in ?? () from /usr/lib/libtbb.so.2
#22 0x00007f923d7362f9 in ?? () from /usr/lib/libtbb.so.2
#23 0x00007f923d95a182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#24 0x00007f923c5fafbd in clone () from /lib/x86_64-linux-gnu/libc.so.6

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/7906
Committed: http://github.org/Juniper/contrail-controller/commit/4e6c23466d19feb3a4ee8010db961c904955b0a5
Submitter: Zuul
Branch: R2.1

commit 4e6c23466d19feb3a4ee8010db961c904955b0a5
Author: Prabhjot Singh Sethi <email address hidden>
Date: Fri Feb 27 02:34:38 2015 -0800

Fix TOR Agent Crash while workqueue shutdown

Issue:
------
TOR agent was running two workqueue to provide better
turn arround for OVSDB Keep alive messages. Where TCP
Reader on message receive enqueues message to workqueue
running in OVSDB::IO task context, where it only does
parsing JSON messages and reply to KeepAlive messages,
where as other parsed msgs were being enqueued to
another queue in KSYNC task for processing

On TCP Session Close we do handle clean of internal data
structures in context of KSYNC task which does the
cleanup and deletes the session object.
Since OVSDB::IO task doesn't run in exclusion to KSYNC
task, it so happened that while doing Shutdown of
workqueues it still had workqueue actively scheduled on
another thread and fails to stop (resulting in assertion)

Fix:
----
Removing use of OVSDB::IO work queue, instead moving the
JSON parsing of OVSDB message and replying to KeepAlive
messages to IO Reader Task which runs independently.
since reader task already provides all the neccessary
infrastructure and clean exit.
Also avoiding unneccessary Task Context switches.

Closes-Bug: 1426303
Change-Id: Ia7561a5712fc9532eb404235f0579a958f4a62ad

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/8031
Committed: http://github.org/Juniper/contrail-controller/commit/29b54c29cbbe8dab2c8818205ab32e70a96cbfe9
Submitter: Zuul
Branch: master

commit 29b54c29cbbe8dab2c8818205ab32e70a96cbfe9
Author: Prabhjot Singh Sethi <email address hidden>
Date: Fri Feb 27 02:34:38 2015 -0800

Fix TOR Agent Crash while workqueue shutdown

Issue:
------
TOR agent was running two workqueue to provide better
turn arround for OVSDB Keep alive messages. Where TCP
Reader on message receive enqueues message to workqueue
running in OVSDB::IO task context, where it only does
parsing JSON messages and reply to KeepAlive messages,
where as other parsed msgs were being enqueued to
another queue in KSYNC task for processing

On TCP Session Close we do handle clean of internal data
structures in context of KSYNC task which does the
cleanup and deletes the session object.
Since OVSDB::IO task doesn't run in exclusion to KSYNC
task, it so happened that while doing Shutdown of
workqueues it still had workqueue actively scheduled on
another thread and fails to stop (resulting in assertion)

Fix:
----
Removing use of OVSDB::IO work queue, instead moving the
JSON parsing of OVSDB message and replying to KeepAlive
messages to IO Reader Task which runs independently.
since reader task already provides all the neccessary
infrastructure and clean exit.
Also avoiding unneccessary Task Context switches.

Closes-Bug: 1426303
(cherry picked from commit 4e6c23466d19feb3a4ee8010db961c904955b0a5)

Conflicts:
 src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_tcp.cc
 src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_tcp.h

Change-Id: Ib899782aa6a2306e8739ef3ab65176eb9cbad637

Changed in juniperopenstack:
status: In Progress → Fix Committed
tags: added: releasenote
information type: Proprietary → Public
Changed in juniperopenstack:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.