[Build R2.20.10 Juno] TOR Scale: Tor Agent crash @ OVSDB::VMInterfaceKSyncObject::~VMInterfaceKSyncObject()

Bug #1453064 reported by chhandak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Prabhjot Singh Sethi
Trunk
Fix Committed
High
Prabhjot Singh Sethi

Bug Description

Trigger:
-------------
Observed the crash while upgrading QFX image. Basically ssl connection with QFX went off and came back with new image

Backtrace
-----------------
(gdb) bt
#0 0x00007fb8b6ecacc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fb8b6ece0d8 in __GI_abort () at abort.c:89
#2 0x00007fb8b6ec3b86 in __assert_fail_base (fmt=0x7fb8b7014830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0xdb3d38 "px != 0",
    file=file@entry=0xdc7918 "/usr/include/boost/smart_ptr/intrusive_ptr.hpp", line=line@entry=162,
    function=function@entry=0xdf8040 "T* boost::intrusive_ptr<T>::operator->() const [with T = OVSDB::OvsdbClientIdl]") at assert.c:92
#3 0x00007fb8b6ec3c32 in __GI___assert_fail (assertion=0xdb3d38 "px != 0", file=0xdc7918 "/usr/include/boost/smart_ptr/intrusive_ptr.hpp", line=162,
    function=0xdf8040 "T* boost::intrusive_ptr<T>::operator->() const [with T = OVSDB::OvsdbClientIdl]") at assert.c:101
#4 0x00000000006c7e9e in ?? ()
#5 0x0000000000945a6f in OVSDB::OvsdbDBObject::~OvsdbDBObject() ()
#6 0x0000000000958800 in OVSDB::VMInterfaceKSyncObject::~VMInterfaceKSyncObject() ()
#7 0x0000000000a2988f in KSyncObjectManager::Process(KSyncObjectEvent*) ()
#8 0x0000000000a2d922 in QueueTaskRunner<KSyncObjectEvent*, WorkQueue<KSyncObjectEvent*> >::Run() ()
#9 0x0000000000da7e30 in TaskImpl::execute() ()
#10 0x00007fb8b7a99b3a in ?? () from /usr/lib/libtbb.so.2
#11 0x00007fb8b7a95816 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007fb8b7a94f4b in ?? () from /usr/lib/libtbb.so.2
#13 0x00007fb8b7a910ff in ?? () from /usr/lib/libtbb.so.2
#14 0x00007fb8b7a912f9 in ?? () from /usr/lib/libtbb.so.2
#15 0x00007fb8b7cb5182 in start_thread (arg=0x7fb89f3fc700) at pthread_create.c:312
#16 0x00007fb8b6f8e47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Setup
-----------
env.roledefs = {
    'all': [host1, host2, host3, host4, host5, host6],
    'cfgm': [host1, host2, host3],
    'openstack': [host1, host2, host3],
    'webui': [host2],
    'control': [host1, host3],
    'compute': [host4, host5, host6],
    'tsn': [host4, host5],
    'toragent': [host4, host5],
    'collector': [host1, host3],
    'database': [host1, host2, host3],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodei6', 'nodei7', 'nodei8', 'nodei9', 'nodei10', 'nodei19']
}

Revision history for this message
chhandak (chhandak) wrote :
Changed in juniperopenstack:
importance: Undecided → High
assignee: nobody → Prabhjot Singh Sethi (prabhjot)
Revision history for this message
Prabhjot Singh Sethi (prabhjot) wrote :

issue happens in a scaled setup where, connection comes up and goes away immediately.

it results in a crash while trying to stop the walk on DB table.

Changed in juniperopenstack:
status: New → In Progress
information type: Proprietary → Public
tags: added: scale
tags: added: blocker
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/10171
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/10217
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/10171
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/10217
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10217
Committed: http://github.org/Juniper/contrail-controller/commit/4738e98773a217423079f0d036374d08674e0d2f
Submitter: Zuul
Branch: R2.20

commit 4738e98773a217423079f0d036374d08674e0d2f
Author: Prabhjot Singh Sethi <email address hidden>
Date: Thu May 14 03:31:47 2015 -0700

Fix TOR Agent Crash

Issue:
------
in a scaled setup when connection flaps in a short
duration, we try to stop the DB table walk in OVSDB
object destructor, but by that time we already would
have released client idl pointer, here it tries to
stop walk using NULL client idl pointer resulting
in this crash

Fix:
----
Stop DB table walk, before removing client idl reference
in EmptyTable.

Added test case to simulate

Closes-Bug: 1453064
Change-Id: I7fd0233acd8d2ce0d5ea094c396850d435149cd1
(cherry picked from commit fce762986222432d979e8d42476948e7b8020cb4)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/10171
Committed: http://github.org/Juniper/contrail-controller/commit/fce762986222432d979e8d42476948e7b8020cb4
Submitter: Zuul
Branch: master

commit fce762986222432d979e8d42476948e7b8020cb4
Author: Prabhjot Singh Sethi <email address hidden>
Date: Thu May 14 03:31:47 2015 -0700

Fix TOR Agent Crash

Issue:
------
in a scaled setup when connection flaps in a short
duration, we try to stop the DB table walk in OVSDB
object destructor, but by that time we already would
have released client idl pointer, here it tries to
stop walk using NULL client idl pointer resulting
in this crash

Fix:
----
Stop DB table walk, before removing client idl reference
in EmptyTable.

Added test case to simulate

Closes-Bug: 1453064
Change-Id: I7fd0233acd8d2ce0d5ea094c396850d435149cd1

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.