vrouter-agent and tor-agent crash at SendProuterMsgFromPhyInterface on contrail-api restart in a scale setup

Bug #1428975 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Vedamurthy Joshi
R2.1
Fix Committed
High
Vedamurthy Joshi

Bug Description

R2.10 39 Ubuntu 14.04 Multi-node setup

nodei38 is a node with 128 tor-agents (110 active TORs) and 11K VMIs (1K active endpoints)

There are two contrail-api nodes (nodei34 and nodei35)
On restarting contrail-api on both these nodes one after the other, vrouter-agent and tor-agents crashed

Cores will be in http://10.204.216.50/Docs/bugs/#

root@nodei38:/var/crashes# ls -ltr
total 1107500
-rw------- 1 root root 162619392 Feb 27 10:17 core.contrail-tor-ag.2295.nodei38.1425012472
-rw------- 1 root root 153624576 Feb 27 10:18 core.contrail-tor-ag.23796.nodei38.1425012481
-rw------- 1 root root 158982144 Feb 27 10:25 core.contrail-tor-ag.24876.nodei38.1425012910
-rw------- 1 root root 153276416 Feb 27 10:48 core.contrail-tor-ag.2339.nodei38.1425014304
-rw------- 1 root root 2637926 Feb 27 11:32 core.contrail-tor-ag.2341.nodei38.1425016957.gz
-rw------- 1 root root 1868295 Feb 27 11:32 core.contrail-tor-ag.21087.nodei38.1425016971.gz
-rw------- 1 root root 1874133 Feb 27 11:33 core.contrail-tor-ag.21576.nodei38.1425016986.gz
-rw------- 1 root root 1855571 Feb 27 11:35 core.contrail-tor-ag.28710.nodei38.1425017134.gz
-rw------- 1 root root 174551040 Mar 6 13:04 core.contrail-tor-ag.2328.nodei38.1425627280
-rw------- 1 root root 171933696 Mar 6 13:04 core.contrail-tor-ag.2324.nodei38.1425627280
-rw------- 1 root root 174628864 Mar 6 13:04 core.contrail-tor-ag.2289.nodei38.1425627281
-rw------- 1 root root 175931392 Mar 6 13:04 core.contrail-tor-ag.2369.nodei38.1425627282
-rw------- 1 root root 172654592 Mar 6 13:04 core.contrail-tor-ag.2346.nodei38.1425627282
-rw------- 1 root root 184504320 Mar 6 13:04 core.contrail-tor-ag.2318.nodei38.1425627283
-rw------- 1 root root 791908352 Mar 6 13:04 core.contrail-vroute.2383.nodei38.1425627292
-rw------- 1 root root 252092416 Mar 6 13:08 core.contrail-vroute.8873.nodei38.1425627485
root@nodei38:/var/crashes#

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000c03e2c in ProuterUveTable::SendProuterMsgFromPhyInterface(Interface const*) ()
(gdb) bt
#0 0x0000000000c03e2c in ProuterUveTable::SendProuterMsgFromPhyInterface(Interface const*) ()
#1 0x0000000000c03ee1 in ProuterUveTable::DeleteLogicalInterface(Interface const*, LogicalInterface const*) ()
#2 0x0000000000c04491 in ProuterUveTable::InterfaceNotify(DBTablePartBase*, DBEntryBase*) ()
#3 0x0000000000deb412 in DBTableBase::RunNotify(DBTablePartBase*, DBEntryBase*) ()
#4 0x0000000000ded268 in DBTablePartBase::RunNotify() ()
#5 0x0000000000dea0cd in DBPartition::QueueRunner::Run() ()
#6 0x0000000000ed4b90 in TaskImpl::execute() ()
#7 0x00007ff254b1fb3a in ?? () from /usr/lib/libtbb.so.2
#8 0x00007ff254b1b816 in ?? () from /usr/lib/libtbb.so.2
#9 0x00007ff254b1af4b in ?? () from /usr/lib/libtbb.so.2
#10 0x00007ff254b170ff in ?? () from /usr/lib/libtbb.so.2
#11 0x00007ff254b172f9 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007ff254d3b182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007ff254013fbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) quit

Tags: bms vrouter
Revision history for this message
Vedamurthy Joshi (vedujoshi) wrote :

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000c03c7b in ProuterUveTable::EnqueueProuterMsg(PhysicalDevice const*) ()
(gdb) bt
#0 0x0000000000c03c7b in ProuterUveTable::EnqueueProuterMsg(PhysicalDevice const*) ()
#1 0x0000000000c041c3 in ProuterUveTable::AddLogicalInterface(Interface const*, LogicalInterface const*) ()
#2 0x0000000000c044a8 in ProuterUveTable::InterfaceNotify(DBTablePartBase*, DBEntryBase*) ()
#3 0x0000000000deb412 in DBTableBase::RunNotify(DBTablePartBase*, DBEntryBase*) ()
#4 0x0000000000ded268 in DBTablePartBase::RunNotify() ()
#5 0x0000000000dea0cd in DBPartition::QueueRunner::Run() ()
#6 0x0000000000ed4b90 in TaskImpl::execute() ()
#7 0x00007fe8a816db3a in ?? () from /usr/lib/libtbb.so.2
#8 0x00007fe8a8169816 in ?? () from /usr/lib/libtbb.so.2
#9 0x00007fe8a8168f4b in ?? () from /usr/lib/libtbb.so.2
#10 0x00007fe8a81650ff in ?? () from /usr/lib/libtbb.so.2
#11 0x00007fe8a81652f9 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007fe8a8389182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007fe8a7661fbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)

Ashok Singh (ashoksr)
Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → Ashok Singh (ashoksr)
Revision history for this message
Ashok Singh (ashoksr) wrote :

Reviewed: https://review.opencontrail.org/7907
Committed: http://github.org/Juniper/contrail-controller/commit/dcd844e7a8930afa0acf4b8cdfaa7f93e13dcaa1
Submitter: Zuul
Branch: R2.1

commit dcd844e7a8930afa0acf4b8cdfaa7f93e13dcaa1
Author: ashoksingh <email address hidden>
Date: Fri Feb 27 19:02:01 2015 +0530

Fix Agent crash in UVE code when out of order deletes are received.

Crash was happening because we were accessing an object pointer after it was deleted. As a fix we no longer store any object pointers in UVE data-structures. Instead store the required data in UVE data-structures.

Change-Id: Id744fbec18f54d644d122605976d91b3c77ba5b3
Closes-Bug: #1425415

Revision history for this message
Ashok Singh (ashoksr) wrote :

Reviewed: https://review.opencontrail.org/8299
Committed: http://github.org/Juniper/contrail-controller/commit/3609f2d3bce558197136c674002dfffe39d4046e
Submitter: Zuul
Branch: master

commit 3609f2d3bce558197136c674002dfffe39d4046e
Author: ashoksingh <email address hidden>
Date: Sat Feb 21 23:11:25 2015 +0530

Changed in juniperopenstack:
status: New → Fix Committed
assignee: Ashok Singh (ashoksr) → Vedamurthy Joshi (vedujoshi)
Revision history for this message
Ashok Singh (ashoksr) wrote :

Fix committed as part of bug #1425415 (for both R2.1 and mainline)

tags: added: bms
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.