vrouter crash in DBEntryBase::ClearState

Bug #1571584 reported by vageesan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Won't Fix
Undecided
Manish Singh
R2.21.x
New
Critical
Manish Singh
R2.22.x
Won't Fix
Medium
Manish Singh
R3.0
Fix Committed
Critical
Manish Singh
Trunk
Fix Committed
Critical
Manish Singh

Bug Description

vrouter crashed with following backtrace in solution test run.

3.0.2.0-26~kilo

10.84.5.112:/cs-shared/bugs/<bug-id>/

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f7d8b447cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#0 0x00007f7d8b447cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f7d8b44b0d8 in __GI_abort () at abort.c:89
#2 0x00007f7d8b440b86 in __assert_fail_base (fmt=0x7f7d8b591830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x1294c76 "state_.erase(listener) != 0", file=file@entry=0x1294c4b "controller/src/db/db_entry.cc", line=line@entry=77, function=function@entry=0x1294ea0 "void DBEntryBase::ClearState(DBTableBase*, DBEntryBase::ListenerId)") at assert.c:92
#3 0x00007f7d8b440c32 in __GI___assert_fail (assertion=0x1294c76 "state_.erase(listener) != 0", file=0x1294c4b "controller/src/db/db_entry.cc", line=77, function=0x1294ea0 "void DBEntryBase::ClearState(DBTableBase*, DBEntryBase::ListenerId)") at assert.c:101
#4 0x000000000106c11b in DBEntryBase::ClearState(DBTableBase*, int) ()
#5 0x0000000000c388f4 in ?? ()
#6 0x0000000000c38cf9 in PktFlowInfo::Process(PktInfo const*, PktControlInfo*, PktControlInfo*) ()
#7 0x0000000000c44acd in FlowHandler::Run() ()
#8 0x0000000000c40fe4 in Proto::ProcessProto(boost::shared_ptr<PktInfo>) ()
#9 0x0000000000c26958 in FlowProto::FlowEventHandler(FlowEvent*, FlowTable*) ()
#10 0x0000000000c2c2df in QueueTaskRunner<FlowEvent*, WorkQueue<FlowEvent*> >::Run() ()
#11 0x0000000001186b3c in TaskImpl::execute() ()
#12 0x00007f7d8c016b3a in ?? () from /usr/lib/libtbb.so.2
#13 0x00007f7d8c012816 in ?? () from /usr/lib/libtbb.so.2
#14 0x00007f7d8c011f4b in ?? () from /usr/lib/libtbb.so.2
#15 0x00007f7d8c00e0ff in ?? () from /usr/lib/libtbb.so.2
#16 0x00007f7d8c00e2f9 in ?? () from /usr/lib/libtbb.so.2
#17 0x00007f7d8c232182 in start_thread (arg=0x7f7d84ace700) at pthread_create.c:312
#18 0x00007f7d8b50b47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

vageesan (vageesant)
Changed in juniperopenstack:
milestone: r3.0.2.0 → none
amit surana (asurana-t)
tags: added: blocker
Jeba Paulaiyan (jebap)
information type: Proprietary → Public
Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → Manish Singh (manishs)
Revision history for this message
Manish Singh (manishs) wrote :
Download full text (3.4 KiB)

(gdb) bt
#0 0x00007f7d8b447cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f7d8b44b0d8 in __GI_abort () at abort.c:89
#2 0x00007f7d8b440b86 in __assert_fail_base (fmt=0x7f7d8b591830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x1294c76 "state_.erase(listener) != 0", file=file@entry=0x1294c4b "controller/src/db/db_entry.cc", line=line@entry=77, function=function@entry=0x1294ea0 <DBEntryBase::ClearState(DBTableBase*, int)::__PRETTY_FUNCTION__> "void DBEntryBase::ClearState(DBTableBase*, DBEntryBase::ListenerId)") at assert.c:92
#3 0x00007f7d8b440c32 in __GI___assert_fail (assertion=0x1294c76 "state_.erase(listener) != 0", file=0x1294c4b "controller/src/db/db_entry.cc", line=77, function=0x1294ea0 <DBEntryBase::ClearState(DBTableBase*, int)::__PRETTY_FUNCTION__> "void DBEntryBase::ClearState(DBTableBase*, DBEntryBase::ListenerId)") at assert.c:101
#4 0x000000000106c11b in DBEntryBase::ClearState (this=0x7f7d643cf148, tbl_base=0x7f7d6cdf5600, listener=0) at controller/src/db/db_entry.cc:77
#5 0x0000000000ac0dd0 in AgentDBEntry::ClearRefState (this=<optimized out>) at controller/src/vnsw/agent/cmn/agent_db.cc:24
#6 0x0000000000c388f4 in intrusive_ptr_release (p=<optimized out>) at controller/src/vnsw/agent/cmn/agent_db.h:35
#7 ~intrusive_ptr (this=0x7f7d84acd628, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:97
#8 ~ComponentNH (this=0x7f7d84acd620, __in_chrg=<optimized out>) at controller/src/vnsw/agent/oper/nexthop.h:1120
#9 SetInEcmpIndex (pkt=pkt@entry=0x7f7d5c959c40, flow_info=flow_info@entry=0x7f7d84acd8a0, out=0x7f7d84acd860, in=0x7f7d84acd820, in=0x7f7d84acd820) at controller/src/vnsw/agent/pkt/pkt_flow_info.cc:570
#10 0x0000000000c38cf9 in PktFlowInfo::Process (this=this@entry=0x7f7d84acd8a0, pkt=0x7f7d5c959c40, in=in@entry=0x7f7d84acd820, out=out@entry=0x7f7d84acd860) at controller/src/vnsw/agent/pkt/pkt_flow_info.cc:1452
#11 0x0000000000c44acd in FlowHandler::Run (this=0x7f7d8032dbb0) at controller/src/vnsw/agent/pkt/flow_handler.cc:112
#12 0x0000000000c40fe4 in RunProtoHandler (handler=0x7f7d8032dbb0, this=0x7f7d84acda90) at controller/src/vnsw/agent/pkt/proto.cc:51
#13 Proto::ProcessProto (this=this@entry=0x7f7d6cea05f0, msg_info=(boost::shared_ptr<PktInfo>) (count 5, weak count 1) 0x7f7d5c959c40) at controller/src/vnsw/agent/pkt/proto.cc:66
#14 0x0000000000c26958 in FlowProto::FlowEventHandler (this=0x7f7d6cea05f0, req=0x7f7d5c959d70, table=<optimized out>) at controller/src/vnsw/agent/pkt/flow_proto.cc:390
#15 0x0000000000c2c2df in operator() (a0=0x7f7d5c959d70, this=0x7f7d84acdb30) at /usr/include/boost/function/function_template.hpp:767
#16 RunQueue (this=0x7f7d38483bd0) at controller/src/base/queue_task.h:87
#17 QueueTaskRunner<FlowEvent*, WorkQueue<FlowEvent*> >::Run (this=0x7f7d38483bd0) at controller/src/base/queue_task.h:66
#18 0x0000000001186b3c in TaskImpl::execute (this=0x7f7d84c9fe40) at controller/src/base/task.cc:253
#19 0x00007f7d8c016b3a in ?? () from /usr/lib/libtbb.so.2
#20 0x00007f7d8c012816 in ?? () from /usr/lib/libtbb.so.2
#21 0x00007f7d8c011f4b in ?? () from /usr/lib/libtbb.so.2
#22 0x00007f7d8c00e0ff ...

Read more...

Revision history for this message
Manish Singh (manishs) wrote :

Proper backtrace.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/19848
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
Hari Prasad Killi (haripk) wrote :

Also seen in R2.21.x build 37:

(gdb) bt
#0 0x00007f16963c2cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f16963c60d8 in __GI_abort () at abort.c:89
#2 0x00007f16963bbb86 in __assert_fail_base (fmt=0x7f169650c830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x110c0db "state_count_[listener] == 0",
    file=file@entry=0x110c0b4 "controller/src/db/db_table.cc", line=line@entry=89,
    function=function@entry=0x110c480 <DBTableBase::ListenerInfo::Unregister(int)::__PRETTY_FUNCTION__> "void DBTableBase::ListenerInfo::Unregister(DBTableBase::ListenerId)")
    at assert.c:92
#3 0x00007f16963bbc32 in __GI___assert_fail (assertion=0x110c0db "state_count_[listener] == 0", file=0x110c0b4 "controller/src/db/db_table.cc", line=89,
    function=0x110c480 <DBTableBase::ListenerInfo::Unregister(int)::__PRETTY_FUNCTION__> "void DBTableBase::ListenerInfo::Unregister(DBTableBase::ListenerId)") at assert.c:101
#4 0x0000000000f193ef in Unregister (listener=2, this=0x7f168406cb00) at controller/src/db/db_table.cc:89
#5 DBTableBase::Unregister (this=0x7f168406c990, listener=2) at controller/src/db/db_table.cc:182
#6 0x0000000000b2553e in RouteFlowUpdate::WalkDone (partition=<optimized out>, info=0x7f16840a7880) at controller/src/vnsw/agent/pkt/flow_table.cc:2121
#7 0x0000000000f1e27b in operator() (a0=<optimized out>, this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767
#8 DBTableWalker::Worker::Run (this=0x7f1641ee2e20) at controller/src/db/db_table_walker.cc:151
#9 0x000000000102334c in TaskImpl::execute (this=0x7f168fc23c40) at controller/src/base/task.cc:253
#10 0x00007f1696f91b3a in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f1696f8d816 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f1696f8cf4b in ?? () from /usr/lib/libtbb.so.2
#13 0x00007f1696f890ff in ?? () from /usr/lib/libtbb.so.2
#14 0x00007f1696f892f9 in ?? () from /usr/lib/libtbb.so.2
#15 0x00007f16971ad182 in start_thread (arg=0x7f168f247700) at pthread_create.c:312
#16 0x00007f169648647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/19848
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/19848
Committed: http://github.org/Juniper/contrail-controller/commit/40e5eb0ab00c65c151f04bcbc7aafee8ba77a513
Submitter: Zuul
Branch: R3.0

commit 40e5eb0ab00c65c151f04bcbc7aafee8ba77a513
Author: Manish <email address hidden>
Date: Mon May 2 11:14:59 2016 +0530

Take interface nh reference after creating in VmInterface.

Previously say a db entry is created so it has ref count of 0. Now references
are being taken and released in multiple partition (currently in flow). The
refcount manipulation is atomic but operations after manipulations are not. In
absence of self reference refcount can go to 0 and come back to 1 and in turn
dbstate could be deleted. Entry is not marked for deletion and is valid. This
results in double call to free state and second call asserts as there was no
state.
Bit more explanation on why refcount is 1 and there is no state above.
Say there are two referrers a and b. a has taken reference and then releasing
it. So refcount goes to 1 and then back to 0. While it has modified refcount and
going through clearrefstate (refcount being 0), b runs in parallel and
increments refcount as well as tried adding state. Meanwhile a also proceeds and
then deletes state. Result is refcount is at 1 and there is no state which is
wrong.

Closes-bug: #1571584
Change-Id: I2237971d2f36aac00fa76b90770943bdddc86c19

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/20084
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/20091
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/20084
Committed: http://github.org/Juniper/contrail-controller/commit/2bebbbbc8406ff8bf426376643431b43047cbb4c
Submitter: Zuul
Branch: master

commit 2bebbbbc8406ff8bf426376643431b43047cbb4c
Author: Manish <email address hidden>
Date: Mon May 2 11:14:59 2016 +0530

Take interface nh reference after creating in VmInterface.

Previously say a db entry is created so it has ref count of 0. Now references
are being taken and released in multiple partition (currently in flow). The
refcount manipulation is atomic but operations after manipulations are not. In
absence of self reference refcount can go to 0 and come back to 1 and in turn
dbstate could be deleted. Entry is not marked for deletion and is valid. This
results in double call to free state and second call asserts as there was no
state.
Bit more explanation on why refcount is 1 and there is no state above.
Say there are two referrers a and b. a has taken reference and then releasing
it. So refcount goes to 1 and then back to 0. While it has modified refcount and
going through clearrefstate (refcount being 0), b runs in parallel and
increments refcount as well as tried adding state. Meanwhile a also proceeds and
then deletes state. Result is refcount is at 1 and there is no state which is
wrong.

Closes-bug: #1571584
Change-Id: I2237971d2f36aac00fa76b90770943bdddc86c19

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.