observed core on TSN node /var/crashes/core.contrail-vroute.1757.cd-st-lnxserver-08.1432285937

Bug #1458057 reported by Raj Sahu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Praveen
R2.20
Fix Committed
High
Praveen

Bug Description

root@cd-st-lnxserver-08:/var/crashes# contrail-status
== Contrail vRouter ==
supervisor-vrouter: active
contrail-tor-agent-1 active
contrail-tor-agent-2 active
contrail-vrouter-agent active
contrail-vrouter-nodemgr EXITED

========Run time service failures=============
/var/crashes/core.contrail-vroute.1757.cd-st-lnxserver-08.1432285937
/var/crashes/core.contrail-vroute.3167.cd-st-lnxserver-08.1432327718
root@cd-st-lnxserver-08:/var/crashes#

After rebooting the TOR connected to TSN, i see that broadcast traffic is not replicated for a long time from the TSN.

tags: added: vrouter
Revision history for this message
Hari Prasad Killi (haripk) wrote :

The two cores have the following bt

#0 0x00007f8ed252f63c in std::string::assign(std::string const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x0000000001366ffa in PathSandeshData::set_peer (this=0x7f8eb3bfd5f0, val=...) at build/debug/vnsw/agent/oper/agent_types.h:2932
#2 0x00000000013659d1 in AgentPath::SetSandeshData (this=0x7f8eb8011500, pdata=...) at controller/src/vnsw/agent/oper/agent_path.cc:1069
#3 0x000000000138997e in BridgeRouteEntry::DBEntrySandesh (this=0x7f8e8c015020, sresp=0x7f8ebc0ab280, stale=false)
    at controller/src/vnsw/agent/oper/bridge_route.cc:749
#4 0x0000000001355062 in AgentLayer2RtSandesh::UpdateResp (this=0x7f8ebc0ab410, entry=0x7f8e8c015020) at controller/src/vnsw/agent/oper/agent_sandesh.cc:273
#5 0x0000000001357b4b in AgentSandesh::EntrySandesh (this=0x7f8ebc0ab410, entry=0x7f8e8c015020, first=0, last=99)
    at controller/src/vnsw/agent/oper/agent_sandesh.cc:722
#6 0x000000000136012c in boost::_mfi::mf3<bool, AgentSandesh, DBEntryBase*, int, int>::operator() (this=0x7f8ebc0ad070, p=0x7f8ebc0ab410,
    a1=0x7f8e8c015020, a2=0, a3=99) at /usr/include/boost/bind/mem_fn_template.hpp:393
#7 0x000000000135f9ea in boost::_bi::list4<boost::_bi::value<AgentSandesh*>, boost::arg<2>, boost::_bi::value<int>, boost::_bi::value<int> >::operator()<bool, boost::_mfi::mf3<bool, AgentSandesh, DBEntryBase*, int, int>, boost::_bi::list2<DBTablePartBase*&, DBEntryBase*&> > (this=0x7f8ebc0ad080, f=..., a=...)
    at /usr/include/boost/bind/bind.hpp:447
#8 0x000000000135f0a0 in boost::_bi::bind_t<bool, boost::_mfi::mf3<bool, AgentSandesh, DBEntryBase*, int, int>, boost::_bi::list4<boost::_bi::value<AgentSandesh*>, boost::arg<2>, boost::_bi::value<int>, boost::_bi::value<int> > >::operator()<DBTablePartBase*, DBEntryBase*> (this=0x7f8ebc0ad070,
    a1=@0x7f8eb3bfda90: 0x7f8e84003fd0, a2=@0x7f8eb3bfda88: 0x7f8e8c015020) at /usr/include/boost/bind/bind_template.hpp:61
#9 0x000000000135e78c in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::_mfi::mf3<bool, AgentSandesh, DBEntryBase*, int, int>, boost::_bi::list4<boost::_bi::value<AgentSandesh*>, boost::arg<2>, boost::_bi::value<int>, boost::_bi::value<int> > >, bool, DBTablePartBase*, DBEntryBase*>::invoke (function_obj_ptr=..., a0=0x7f8e84003fd0, a1=0x7f8e8c015020) at /usr/include/boost/function/function_template.hpp:132
#10 0x0000000001b61a86 in boost::function2<bool, DBTablePartBase*, DBEntryBase*>::operator() (this=0x7f8ebc0ab030, a0=0x7f8e84003fd0, a1=0x7f8e8c015020)
    at /usr/include/boost/function/function_template.hpp:767
#11 0x0000000001b60e8b in DBTableWalker::Worker::Run (this=0x7f8ebc0ab460) at controller/src/db/db_table_walker.cc:142
#12 0x0000000001caeaa2 in TaskImpl::execute (this=0x7f8ecb40ec40) at controller/src/base/task.cc:232

Revision history for this message
Hari Prasad Killi (haripk) wrote :

Peer is invalid and vrf name is empty in path
Route is ff:ff:ff:ff:ff:ff

Changed in juniperopenstack:
assignee: nobody → Manish Singh (manishs)
Changed in juniperopenstack:
milestone: none → r2.30-fcs
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/10759
Submitter: Praveen K V (<email address hidden>)

Changed in juniperopenstack:
importance: Undecided → High
Changed in opencontrail:
importance: Undecided → High
Changed in juniperopenstack:
assignee: Manish Singh (manishs) → Praveen (praveen-karadakal)
Changed in opencontrail:
assignee: Akhil Ranjan (aranjan) → Praveen (praveen-karadakal)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10759
Committed: http://github.org/Juniper/contrail-controller/commit/ba144cc5153a099b12d08e7e32a57ab433c4f697
Submitter: Zuul
Branch: R2.20

commit ba144cc5153a099b12d08e7e32a57ab433c4f697
Author: Praveen K V <email address hidden>
Date: Sun May 24 16:51:21 2015 +0530

Fix access to bgp_peer after free

Bridge route for multicast MAC can contain multiple paths from
control-node.

When the connection from control-node breaks, we walk thru all route
tables and delete path from the peer. But, in DeletePathUsingKeyData we
delete only one path for a peer. This will result in a stale reference
to peer in the route.

Modified DeletePathUsingKeyData to delete all path from a peer.

Change-Id: Icd35349aa3ad3784351f884b533812e5c719e3ce
closes-bug:#1458057

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/10859
Submitter: Praveen K V (<email address hidden>)

no longer affects: opencontrail
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10859
Committed: http://github.org/Juniper/contrail-controller/commit/de456bdf3aa2b693e5d5ba155c47ecd19179709e
Submitter: Zuul
Branch: master

commit de456bdf3aa2b693e5d5ba155c47ecd19179709e
Author: Praveen K V <email address hidden>
Date: Sun May 24 16:51:21 2015 +0530

Fix access to bgp_peer after free

Bridge route for multicast MAC can contain multiple paths from
control-node.

When the connection from control-node breaks, we walk thru all route
tables and delete path from the peer. But, in DeletePathUsingKeyData we
delete only one path for a peer. This will result in a stale reference
to peer in the route.

Modified DeletePathUsingKeyData to delete all path from a peer.

Change-Id: Icd35349aa3ad3784351f884b533812e5c719e3ce
closes-bug:#1458057

Changed in juniperopenstack:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.