continuous-build18, multiple vrouter-agent cores seen while running sanity

Bug #1710286 reported by Sudheendra Rao
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Fix Committed
Critical
Hari Prasad Killi

Bug Description

multiple vrouter-agent core seen on continuous build18 mainline on both mitaka and newton.

The backtrace are:
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000000bcc562 in AgentRouteWalker::VrfWalkDoneInternal(DBTableBase*) ()
#2 0x00000000016095db in DBTableWalkMgr::ProcessWalkDone() ()
#3 0x0000000001792377 in TaskTrigger::WorkerTask::Run() ()
#4 0x000000000178ab4d in TaskImpl::execute() ()
#5 0x00007f48f6408fdd in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#6 0x00007f48f64020dc in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#7 0x00007f48f6400fd3 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#8 0x00007f48f63fca91 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#9 0x00007f48f63fccf9 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#10 0x00007f48f66266ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f48f587e82d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)

(gdb) bt
#0 0x00007fa67464c030 in ?? ()
#1 0x0000000001609203 in DBTableWalkMgr::InvokeWalkCb(DBTablePartBase*, DBEntryBase*) ()
#2 0x00000000016015ec in DBTable::WalkWorker::Run() ()
#3 0x000000000178ab4d in TaskImpl::execute() ()
#4 0x00007fa6a8996fdd in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#5 0x00007fa6a89900dc in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#6 0x00007fa6a898efd3 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#7 0x00007fa6a898aa91 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#8 0x00007fa6a898acf9 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#9 0x00007fa6a8bb46ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007fa6a7e0c82d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)

Core files are placed under the folder:
/cs-share/test_runs/<bug_id>

Setup details:
Compute: nodei24, nodei25 and nodei26
Cfgm0: nodei21

Tags: sanity vrouter
Changed in juniperopenstack:
assignee: nobody → Hari Prasad Killi (haripk)
Revision history for this message
Sudheendra Rao (sudheendra-k) wrote :

back trace on Mitaka HA setup:

Core file is copied to /cs-shared/test_runs/1710286

(gdb) bt
#0  0x00007f692bd8ac37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f692bd8e028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f692bd83bf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f692bd83ca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x000000000088eda0 in ?? ()
#5  0x0000000000c55ad5 in VmInterface::StaticRoute::AddL3(Agent const*, VmInterface*) const ()
#6  0x0000000000c47fdf in VmInterfaceState::Update(Agent const*, VmInterface*, VmInterfaceState::Op, VmInterfaceState::Op) const ()
#7  0x0000000000c4c0b1 in VmInterface::StaticRouteList::UpdateList(Agent const*, VmInterface*, VmInterfaceState::Op, VmInterfaceState::Op) ()
#8  0x0000000000c565cf in VmInterface::ApplyConfig(bool, bool, bool, boost::asio::ip::address_v4 const&, unsigned char) ()
#9  0x0000000000c56997 in VmInterface::Resync(InterfaceTable const*, VmInterfaceData const*) ()
#10 0x0000000000b84b4a in AgentOperDBTable::OnChange(DBEntry*, DBRequest const*) ()
#11 0x000000000140b428 in DBTable::Input(DBTablePartition*, DBClient*, DBRequest*) ()
#12 0x0000000000bc38e9 in InterfaceTable::CreateVhost() ()
#13 0x0000000000cdab23 in ContrailInitCommon::CreateInterfaces() ()
#14 0x0000000000cd834e in AgentInit::SetResourceManagerReady() ()
#15 0x0000000000cbb3e4 in ResourceManager::ResourceManager(Agent*) ()
#16 0x0000000000cd83a3 in AgentInit::CreateResourceManager() ()
#17 0x0000000000cd8430 in AgentInit::InitBase() ()
#18 0x0000000001562ed7 in TaskTrigger::WorkerTask::Run() ()
#19 0x000000000155ca57 in TaskImpl::execute() ()
#20 0x00007f692c959b3a in ?? () from /usr/lib/libtbb.so.2
#21 0x00007f692c955816 in ?? () from /usr/lib/libtbb.so.2
#22 0x00007f692c954f4b in ?? () from /usr/lib/libtbb.so.2
#23 0x00007f692c9510ff in ?? () from /usr/lib/libtbb.so.2
#24 0x00007f692c9512f9 in ?? () from /usr/lib/libtbb.so.2
#25 0x00007f692cb75184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#26 0x00007f692be4e37d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)

Setup:
Compute: nodem8, nodem9 and nodem10
Cfgm0: nodem14

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/34531
Submitter: Hari Prasad Killi (<email address hidden>)

Jeba Paulaiyan (jebap)
tags: added: sanity
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/34545
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/34545
Committed: http://github.com/Juniper/contrail-controller/commit/78980400c4995b7b53d8a9ee311dab06019875fb
Submitter: Zuul (<email address hidden>)
Branch: master

commit 78980400c4995b7b53d8a9ee311dab06019875fb
Author: Manish <email address hidden>
Date: Mon Aug 14 17:38:38 2017 +0530

Agent crash @ AgentRouteWalker::VrfWalkDoneInternal

Problem:
Agent route walker are of two types for each bgp peer.
One if to notify and other is to delete. When delete walker is started on peer
going away, its walk done releases the walker. Walk done of notify walker can
then result in crash with invalid pointer.

Solution:
Release notify walker (stop walk) when delete walk is started.

Change-Id: Ibe83bf86d6fd0abf70a8feeda12a6befcac3830d
Closes-bug: #1710286

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.