agent crashed during GR testing

Bug #1615149 reported by Ananth Suryanarayana
This bug report is a duplicate of:  Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
New
High
RAVI KIRAN

Bug Description

/cs-shared/ananth/gr_agent/1/contrail-vrouter-agent.gz
/cs-shared/ananth/gr_agent/1/core.contrail-vroute.23127.a6s1.1471554785.gz

During GR testing, agent crash repeatedly seen when tried with production version of contrail-vrouter-agent binary with following signature. However, when tried with debug version, crash did not happen. I guess production version of the binary has different timings (race conditions) due to optimized code generated by the compiler ?

(gdb) bt
#0 0x00007fae1ec75c37 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fae1ec79028 in __GI_abort () at abort.c:89
#2 0x00007fae1ec6ebf6 in __assert_fail_base (
    fmt=0x7fae1edbf3b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x1312f46 "index < bitmap_.size()",
    file=file@entry=0x1312fe0 "controller/src/vnsw/agent/cmn/index_vector.h",
    line=line@entry=73,
    function=function@entry=0x13183e0 <IndexVector<MplsLabel>::Update(unsigned long, MplsLabel*)::__PRETTY_FUNCTION__> "void IndexVector<EntryType>::Update(size_t, EntryType*) [with EntryType = MplsLabel; size_t = long unsigned int]") at assert.c:92
#3 0x00007fae1ec6eca2 in __GI___assert_fail (
    assertion=0x1312f46 "index < bitmap_.size()",
    file=0x1312fe0 "controller/src/vnsw/agent/cmn/index_vector.h", line=73,
    function=0x13183e0 <IndexVector<MplsLabel>::Update(unsigned long, MplsLabel*)::__PRETTY_FUNCTION__> "void IndexVector<EntryType>::Update(size_t, EntryType*) [with EntryType = MplsLabel; size_t = long unsigned int]") at assert.c:101
#4 0x0000000000ac9dd2 in Update (entry=0x7fadfc091960, index=<optimized out>,
    this=0x7fae10a60608) at controller/src/vnsw/agent/cmn/index_vector.h:73
#5 UpdateLabel (entry=0x7fadfc091960, label=<optimized out>, this=0x7fae10a60550)
    at controller/src/vnsw/agent/oper/mpls.h:179
#6 MplsTable::Add (this=0x7fae10a60550, req=0x7fae172f5510)
    at controller/src/vnsw/agent/oper/mpls.cc:53
#7 0x000000000118af21 in DBTable::Input (this=0x7fae10a60550, tbl_partition=
    0x7fae10a6a560, client=<optimized out>, req=0x7fae172f5510)
    at controller/src/db/db_table.cc:503
#8 0x0000000000ac8607 in MplsTable::CreateMcastLabel (this=0x7fae10a60550,
    label=<optimized out>, type=type@entry=Composite::L2COMP,
    component_nh_key_list=std::vector of length 2, capacity 2 = {...},
    vrf_name="default-domain:admin:test1:test1")
    at controller/src/vnsw/agent/oper/mpls.cc:403
#9 0x0000000000a723e9 in BridgeRouteEntry::ReComputeMulticastPaths (
    this=0x7fadfc08da10, path=<optimized out>, del=<optimized out>)
    at controller/src/vnsw/agent/oper/bridge_route.cc:797
#10 0x0000000000a61759 in AgentRouteTable::Input (this=0x7fadfc04fd10,
    part=0x7fadfc04ff40, client=<optimized out>, req=0x7fadfc08ac80)
    at controller/src/vnsw/agent/oper/agent_route.cc:368
#11 0x0000000000a6113b in AgentRouteTable::Input (this=0x7fadec76b2a0,
    part=0x7fadec76b4d0, client=0x0, req=0x7fadfc08ac80)
    at controller/src/vnsw/agent/oper/agent_route.cc:335
#12 0x000000000118a9ce in DBPartition::QueueRunner::Run (this=0x7fadfc04d600)
    at controller/src/db/db_partition.cc:208
#13 0x00000000012cc06f in TaskImpl::execute (this=0x7fae184abc40)
    at controller/src/base/task.cc:262
#14 0x00007fae1f844b3a in ?? () from /usr/lib/libtbb.so.2
#15 0x00007fae1f840816 in ?? () from /usr/lib/libtbb.so.2
#16 0x00007fae1f83ff4b in ?? () from /usr/lib/libtbb.so.2
#17 0x00007fae1f83c0ff in ?? () from /usr/lib/libtbb.so.2
#18 0x00007fae1f83c2f9 in ?? () from /usr/lib/libtbb.so.2
#19 0x00007fae1fa60184 in start_thread (arg=0x7fae172f6700) at pthread_create.c:312
#20 0x00007fae1ed3937d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Tags: vrouter
Changed in juniperopenstack:
importance: Undecided → High
assignee: nobody → RAVI KIRAN (ravibk)
Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :
Download full text (3.4 KiB)

May be I got incorrect signature above (from a different core)
This is what I see now (from the core)

(gdb) bt
#0 0x00007ff224783c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ff224787028 in __GI_abort () at abort.c:89
#2 0x00007ff22477cbf6 in __assert_fail_base (fmt=0x7ff2248cd3b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x2388103 "0", file=file@entry=0x23880c8 "controller/src/vnsw/agent/contrail/linux/pkt0_interface.cc", line=line@entry=53, function=function@entry=0x2388420 <Pkt0Interface::InitControlInterface()::__PRETTY_FUNCTION__> "virtual void Pkt0Interface::InitControlInterface()") at assert.c:92
#3 0x00007ff22477cca2 in __GI___assert_fail (assertion=0x2388103 "0", file=0x23880c8 "controller/src/vnsw/agent/contrail/linux/pkt0_interface.cc", line=53, function=0x2388420 <Pkt0Interface::InitControlInterface()::__PRETTY_FUNCTION__> "virtual void Pkt0Interface::InitControlInterface()") at assert.c:101
#4 0x00000000018ef72b in Pkt0Interface::InitControlInterface (this=0x7ff2140022f0) at controller/src/vnsw/agent/contrail/linux/pkt0_interface.cc:53
#5 0x0000000001b0e057 in ControlInterface::Init (this=0x7ff2140022f0, pkt_handler=0x7ff214addb70) at controller/src/vnsw/agent/pkt/control_interface.h:29
#6 0x0000000001b0d792 in PktModule::Init (this=0x7ff2140021c0, run_with_vrouter=true) at controller/src/vnsw/agent/pkt/pkt_init.cc:36
#7 0x00000000018f9d24 in ContrailInitCommon::InitModules (this=0x7ffc71edde80) at controller/src/vnsw/agent/init/contrail_init_common.cc:87
#8 0x00000000018f2de5 in AgentInit::InitModulesBase (this=0x7ffc71edde80) at controller/src/vnsw/agent/init/agent_init.cc:239
#9 0x00000000018f2751 in AgentInit::InitBase (this=0x7ffc71edde80) at controller/src/vnsw/agent/init/agent_init.cc:149
#10 0x00000000018f8bcf in boost::_mfi::mf0<bool, AgentInit>::operator() (this=0x4c71e18, p=0x7ffc71edde80) at /usr/include/boost/bind/mem_fn_template.hpp:49
#11 0x00000000018f8139 in boost::_bi::list1<boost::_bi::value<AgentInit*> >::operator()<bool, boost::_mfi::mf0<bool, AgentInit>, boost::_bi::list0> (this=0x4c71e28, f=..., a=...) at /usr/include/boost/bind/bind.hpp:243
#12 0x00000000018f793b in boost::_bi::bind_t<bool, boost::_mfi::mf0<bool, AgentInit>, boost::_bi::list1<boost::_bi::value<AgentInit*> > >::operator() (this=0x4c71e18) at /usr/include/boost/bind/bind_template.hpp:20
#13 0x00000000018f7203 in boost::detail::function::function_obj_invoker0<boost::_bi::bind_t<bool, boost::_mfi::mf0<bool, AgentInit>, boost::_bi::list1<boost::_bi::value<AgentInit*> > >, bool>::invoke (function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:132
#14 0x00000000014f7ac6 in boost::function0<bool>::operator() (this=0x4c71e10) at /usr/include/boost/function/function_template.hpp:767
#15 0x00000000022f7fcc in TaskTrigger::WorkerTask::Run (this=0x4c72440) at controller/src/base/task_trigger.cc:22
#16 0x00000000022e6338 in TaskImpl::execute (this=0x7ff21dfebd40) at controller/src/base/task.cc:262
#17 0x00007ff225352b3a in ?? () from /usr/lib/libtbb.so.2
#18 0x00007ff22534e816 in ?? () from /usr/lib/libtbb.so.2
#19 0x00007ff22534df4b in ?? () from /u...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.
  • Duplicate of a private bug Remove

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.