tor-agent crash at KSyncSM_DelAckWait

Bug #1451782 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Prabhjot Singh Sethi
Trunk
Fix Committed
High
Prabhjot Singh Sethi

Bug Description

R2.20 Build 9 Ubuntu 14.04 Icehouse multi-node setup

Could not get a valid backtrace from the core file, but Prabhjot seems to have a lead...

Core file will be in http://10.204.216.50/Docs/bugs/#

root@nodek3:/var/crashes# strings core.contrail-tor-ag.1938.nodek3.1430507668 |grep -i assert
nt: controller/src/ksync/ksync_object.cc:1058: KSyncEntry::KSyncState KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent): Assertion `0' faile
contrail-tor-agent: controller/src/ksync/ksync_object.cc:1058: KSyncEntry::KSyncState KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent): Assertion `0' failed.
TBBmalloc: TBB_USE_ASSERT 0
__assert_fail
contrail-tor-agent: controller/src/ksync/ksync_object.cc:1058: KSyncEntry::KSyncState KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent): Assertion `0' failed.
root@nodek3:/var/crashes#

root@nodek3:/var/crashes# strings core.contrail-tor-ag.1938.nodek3.1430507668 |grep -i assert
nt: controller/src/ksync/ksync_object.cc:1058: KSyncEntry::KSyncState KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent): Assertion `0' faile
contrail-tor-agent: controller/src/ksync/ksync_object.cc:1058: KSyncEntry::KSyncState KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent): Assertion `0' failed.
TBBmalloc: TBB_USE_ASSERT 0
__assert_fail
contrail-tor-agent: controller/src/ksync/ksync_object.cc:1058: KSyncEntry::KSyncState KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent): Assertion `0' failed.
root@nodek3:/var/crashes#

Tags: bms vrouter
Changed in juniperopenstack:
assignee: Prakash Bailkeri (prakashmb) → Prabhjot Singh Sethi (prabhjot)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/9993
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Changed in juniperopenstack:
status: New → In Progress
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/9994
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/9993
Committed: http://github.org/Juniper/contrail-controller/commit/3e8598de6c6a49a64a4355c85fd6f69d157d10b6
Submitter: Zuul
Branch: master

commit 3e8598de6c6a49a64a4355c85fd6f69d157d10b6
Author: Prabhjot Singh Sethi <email address hidden>
Date: Wed May 6 14:47:50 2015 +0530

Fix TOR Agent Crash in KSync infra

Issue:
------
Following events happened in TOR Agent
Delete of an active unicast remote route entry, before
ack for this is received it triggered creation of stale
entry resulting in DB state stuck with ksync entry and
thus further DB operations on the same entry results in
a bad state.

Fix:
----
When a deleted KSyncEntry is converted to Stale Entry
remove its association with DB Entry, so that unless
KSyncEntry is reclaimed by removing stale entry flag
it should not trigger the state machine.

Added test-case for the same

Closes-Bug: 1451782
Change-Id: I1d4a86d5f73fbb882e6a34fe89b381d8c821aa4c

Revision history for this message
chhandak (chhandak) wrote :
Download full text (3.4 KiB)

Also seen in Build R2.20.10

#0 0x00007f11c90becc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f11c90c20d8 in __GI_abort () at abort.c:89
#2 0x00007f11c90b7b86 in __assert_fail_base (fmt=0x7f11c9208830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x19b6fc8 "0",
    file=file@entry=0x19b70d8 "controller/src/ksync/ksync_object.cc", line=line@entry=1058,
    function=function@entry=0x19b9980 <KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent)::__PRETTY_FUNCTION__> "KSyncEntry::KSyncState KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent)") at assert.c:92
#3 0x00007f11c90b7c32 in __GI___assert_fail (assertion=0x19b6fc8 "0", file=0x19b70d8 "controller/src/ksync/ksync_object.cc", line=1058,
    function=0x19b9980 <KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent)::__PRETTY_FUNCTION__> "KSyncEntry::KSyncState KSyncSM_DelAckWait(KSyncObject*, KSyncEntry*, KSyncEntry::KSyncEvent)") at assert.c:101
#4 0x0000000001458fe2 in KSyncSM_DelAckWait (obj=0x7f1164545630, entry=0x7f115059f7f0, event=KSyncEntry::DEL_REQ) at controller/src/ksync/ksync_object.cc:1058
#5 0x000000000145936f in KSyncObject::NotifyEvent (this=0x7f1164545630, entry=0x7f115059f7f0, event=KSyncEntry::DEL_REQ) at controller/src/ksync/ksync_object.cc:1154
#6 0x0000000001456967 in KSyncObject::SafeNotifyEvent (this=0x7f1164545630, entry=0x7f115059f7f0, event=KSyncEntry::DEL_REQ) at controller/src/ksync/ksync_object.cc:203
#7 0x0000000001456893 in KSyncObject::Delete (this=0x7f1164545630, entry=0x7f115059f7f0) at controller/src/ksync/ksync_object.cc:189
#8 0x000000000145960f in KSyncObject::StaleEntryCleanupCb (this=0x7f1164545630) at controller/src/ksync/ksync_object.cc:1206
#9 0x00000000014650d7 in boost::_mfi::mf0<bool, KSyncObject>::operator() (this=0x7f1164545c80, p=0x7f1164545630) at /usr/include/boost/bind/mem_fn_template.hpp:49
#10 0x0000000001463d8b in boost::_bi::list1<boost::_bi::value<KSyncObject*> >::operator()<bool, boost::_mfi::mf0<bool, KSyncObject>, boost::_bi::list0> (this=0x7f1164545c90, f=...,
    a=...) at /usr/include/boost/bind/bind.hpp:243
#11 0x000000000146213d in boost::_bi::bind_t<bool, boost::_mfi::mf0<bool, KSyncObject>, boost::_bi::list1<boost::_bi::value<KSyncObject*> > >::operator() (this=0x7f1164545c80)
    at /usr/include/boost/bind/bind_template.hpp:20
#12 0x0000000001460875 in boost::detail::function::function_obj_invoker0<boost::_bi::bind_t<bool, boost::_mfi::mf0<bool, KSyncObject>, boost::_bi::list1<boost::_bi::value<KSyncObject*> > >, bool>::invoke (function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:132
#13 0x000000000114d52c in boost::function0<bool>::operator() (this=0x7f1164545c78) at /usr/include/boost/function/function_template.hpp:767
#14 0x000000000192630f in Timer::TimerTask::Run (this=0x3849520) at controller/src/base/timer.cc:42
#15 0x000000000191338a in TaskImpl::execute (this=0x7f11c291bb40) at controller/src/base/task.cc:232
#16 0x00007f11c9c8db3a in ?? () from /usr/lib/libtbb.so.2
#17 0x00007f11c9c89816 in ?? () from /usr/lib/libtbb.so.2
#18 0x00007f11c9c88f4b in ?? () from /u...

Read more...

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/9994
Committed: http://github.org/Juniper/contrail-controller/commit/3bba364e858bcea658aa40e2a7cf65a46912141c
Submitter: Zuul
Branch: R2.20

commit 3bba364e858bcea658aa40e2a7cf65a46912141c
Author: Prabhjot Singh Sethi <email address hidden>
Date: Wed May 6 14:47:50 2015 +0530

Fix TOR Agent Crash in KSync infra

Issue:
------
Following events happened in TOR Agent
Delete of an active unicast remote route entry, before
ack for this is received it triggered creation of stale
entry resulting in DB state stuck with ksync entry and
thus further DB operations on the same entry results in
a bad state.

Fix:
----
When a deleted KSyncEntry is converted to Stale Entry
remove its association with DB Entry, so that unless
KSyncEntry is reclaimed by removing stale entry flag
it should not trigger the state machine.

Added test-case for the same

Closes-Bug: 1451782
(cherry picked from commit 3e8598de6c6a49a64a4355c85fd6f69d157d10b6)

Change-Id: I39b197d44fb9372fcc498a77e7c4e1c7dcd179b4

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.