tor-agent crash at DeleteLogicalInterface on tor scale setup

Bug #1425415 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Ashok Singh
R2.1
Fix Committed
High
Ashok Singh

Bug Description

R2.1 Build 36 Ubuntu 12.04.3 Icehouse

This is a tor scale setup with 110 TORs and 11K VMIs
128 tor-agent services run on nodei38

Crash will be in http://10.204.216.51/Docs/bugs/#

gdb /usr/bin/contrail-tor-agent core.contrail-tor-ag.18557.nodei38.1424763436

#0 0x0000000000a307b5 in ProuterUveTable::DeleteLogicalInterface(Interface const*, LogicalInterface const*) ()
(gdb) bt
#0 0x0000000000a307b5 in ProuterUveTable::DeleteLogicalInterface(Interface const*, LogicalInterface const*) ()
#1 0x0000000000a33056 in ProuterUveTable::InterfaceNotify(DBTablePartBase*, DBEntryBase*) ()
#2 0x0000000000cc91aa in DBTableBase::RunNotify(DBTablePartBase*, DBEntryBase*) ()
#3 0x0000000000ccb92a in DBTablePartBase::RunNotify() ()
#4 0x0000000000cc7bd7 in DBPartition::QueueRunner::Run() ()
#5 0x0000000000da80f5 in TaskImpl::execute() ()
#6 0x00007f0da7d0ee52 in ?? () from /usr/lib/libtbb.so.2
#7 0x00007f0da7d0ac2d in ?? () from /usr/lib/libtbb.so.2
#8 0x00007f0da7d0a0db in ?? () from /usr/lib/libtbb.so.2
#9 0x00007f0da7d07c1f in ?? () from /usr/lib/libtbb.so.2
#10 0x00007f0da7d07e59 in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f0da7f25e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f0da6bd93fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x0000000000000000 in ?? ()

Tags: scale vrouter
Revision history for this message
Vedamurthy Joshi (vedujoshi) wrote :

This is seen around the time when contrail-api was restarted on both the config nodes on the setup

tags: added: scale
Changed in juniperopenstack:
assignee: Praveen (praveen-karadakal) → Ashok Singh (ashoksr)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.1

Review in progress for https://review.opencontrail.org/7907
Submitter: Ashok Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/7907
Committed: http://github.org/Juniper/contrail-controller/commit/dcd844e7a8930afa0acf4b8cdfaa7f93e13dcaa1
Submitter: Zuul
Branch: R2.1

commit dcd844e7a8930afa0acf4b8cdfaa7f93e13dcaa1
Author: ashoksingh <email address hidden>
Date: Fri Feb 27 19:02:01 2015 +0530

Fix Agent crash in UVE code when out of order deletes are received.

Crash was happening because we were accessing an object pointer after it was deleted. As a fix we no longer store any object pointers in UVE data-structures. Instead store the required data in UVE data-structures.

Change-Id: Id744fbec18f54d644d122605976d91b3c77ba5b3
Closes-Bug: #1425415

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/8299
Submitter: Ashok Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/8299
Committed: http://github.org/Juniper/contrail-controller/commit/3609f2d3bce558197136c674002dfffe39d4046e
Submitter: Zuul
Branch: master

commit 3609f2d3bce558197136c674002dfffe39d4046e
Author: ashoksingh <email address hidden>
Date: Sat Feb 21 23:11:25 2015 +0530

In scaled scenario reduce the number of prouter UVEs sent from agent

In Scaled setup, it is observed that agent spends lot of time sending UVEs. One of the issues identified is when multiple DB change notifications are received, agent ends up building and sending multiple prouter UVE messages (one for each change notification). To avoid building prouter UVE message for each change notification, agent now builds prouter UVE message only if we have not already enqueued request to build the prouter UVE. This helps us in achieving state compression. This is achieved using TaskTrigger. With these changes it was observed that agent starts about ~35 seconds faster than it would have without these changes.

Partial-Bug: #1424218
(cherry picked from commit 714103efe78012b781b119e1087a0c7cb49a5822)

Fix Agent crash in UVE code when out of order deletes are received.

Crash was happening because we were accessing an object pointer after it was deleted. As a fix we no longer store any object pointers in UVE data-structures. Instead store the required data in UVE data-structures.

Closes-Bug: #1425415
(cherry picked from commit dcd844e7a8930afa0acf4b8cdfaa7f93e13dcaa1)

Change-Id: I50fe8dd1eea4932bb6291f5bc3813f356d8ef70a

Changed in juniperopenstack:
status: New → Fix Committed
Ashok Singh (ashoksr)
Changed in juniperopenstack:
status: Fix Committed → In Progress
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.1

Review in progress for https://review.opencontrail.org/8360
Submitter: Ashok Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/8360
Committed: http://github.org/Juniper/contrail-controller/commit/7023e6bc4e214a9cda4b0a5b9e76a639049f73b9
Submitter: Zuul
Branch: R2.1

commit 7023e6bc4e214a9cda4b0a5b9e76a639049f73b9
Author: ashoksingh <email address hidden>
Date: Mon Mar 16 12:37:26 2015 +0530

Fix contrail-tor-agent crash in Prouter UVE code.

When PhysicalDevice delete notification is received before Physical Interface (which is associated with PhysicalDevice) notification, In Physical Interface notification we were assuming that UVE object corresponding to PhysicalDevice is always represent. Add check for presence of UVE object corresponding to PhysicalDevice before referring it.

Change-Id: I38b81ec992d5385e2932bcdeb6eb999c78a1610b
Closes-Bug: #1425415

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/8376
Submitter: Ashok Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/8376
Committed: http://github.org/Juniper/contrail-controller/commit/ed6cc4fc56a1225e063c84e35db449b999416e11
Submitter: Zuul
Branch: master

commit ed6cc4fc56a1225e063c84e35db449b999416e11
Author: ashoksingh <email address hidden>
Date: Mon Mar 16 12:37:26 2015 +0530

Fix contrail-tor-agent crash in Prouter UVE code.

When PhysicalDevice delete notification is received before Physical Interface (which is associated with PhysicalDevice) notification, In Physical Interface notification we were assuming that UVE object corresponding to PhysicalDevice is always represent. Add check for presence of UVE object corresponding to PhysicalDevice before referring it.

Change-Id: I38b81ec992d5385e2932bcdeb6eb999c78a1610b
Closes-Bug: #1425415
(cherry picked from commit 7023e6bc4e214a9cda4b0a5b9e76a639049f73b9)

Changed in juniperopenstack:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.