[Ubuntu R2.20.64] TOR Agent Crash @ __gnu_cxx::__verbose_terminate_handler

Bug #1471101 reported by chhandak
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Manish Singh
Trunk
Fix Committed
High
Manish Singh

Bug Description

Trigger Observed the crash after restart of both the the control node and all IF-MAP server

Backtrace
------------------
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-tor-agent --config_file /etc/contrail/contrail-tor-agent-15.c'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f6322bd7cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f6322bd7cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f6322bdb0d8 in __GI_abort () at abort.c:89
#2 0x00007f63234e26b5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f63234e0836 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f63234e0863 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007f63234e133f in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x0000000000cdad77 in DBTableWalker::Worker::Run() ()
#7 0x0000000000dd2830 in TaskImpl::execute() ()
#8 0x00007f63237a6b3a in ?? () from /usr/lib/libtbb.so.2
#9 0x00007f63237a2816 in ?? () from /usr/lib/libtbb.so.2
#10 0x00007f63237a1f4b in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f632379e0ff in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f632379e2f9 in ?? () from /usr/lib/libtbb.so.2
#13 0x00007f63239c2182 in start_thread (arg=0x7f631864f700) at pthread_create.c:312
#14 0x00007f6322c9b47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Setup
————
host1 = 'root@10.204.217.118'
host2 = 'root@10.204.217.119'
host3 = 'root@10.204.217.120'
host4 = 'root@10.204.217.121'
host5 = 'root@10.204.217.122'
host6 = 'root@10.204.217.131'
host7 = 'root@10.204.217.123'
host8 = 'root@10.204.217.124'

env.roledefs = {
    'all': [host1, host2, host3, host4, host5, host6, host7, host8],
    'cfgm': [host1, host2, host3],
    'openstack': [host1, host2, host3],
    'webui': [host2],
    'control': [host1, host3],
    'compute': [host4, host5, host6, host7, host8],
    'tsn': [host4, host5, host7, host8],
    'toragent': [host4, host5, host7, host8],
    'collector': [host1, host3],
    'database': [host1, host2, host3],
    'build': [host_build],
}

Revision history for this message
Nischal Sheth (nsheth) wrote :

Might be worth checking if this has same root cause as bug 1468052.

Revision history for this message
Manish Singh (manishs) wrote :

Here walk was started on a table which was deleted. This is in control of agent as route table was deleted by same under vrf delete, however it does not reset the table pointer to NULL. Now if a walk is started on this VRF (to delete state if any) then seeing this non NULL pointer, walker starts a route table walk as well which eventually crashes.
Solution can be to maintain a bitmap of non-deleted route tables for VRF. To use route table pointer this bitset should be consulted before any operation is started.

Manish Singh (manishs)
Changed in juniperopenstack:
assignee: nobody → Manish Singh (manishs)
Changed in juniperopenstack:
importance: Undecided → High
milestone: none → r2.30-fcs
information type: Proprietary → Public
Revision history for this message
chhandak (chhandak) wrote :

Logs saved at http://mayamruga.englab.juniper.net/bugs/1471101

To access to the core:

ssh to bhushana@10.204.216.50 Password bhu@123

cd /home/bhushana/Documents/technical/bugs/1471101

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/12291
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/12323
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/12323
Committed: http://github.org/Juniper/contrail-controller/commit/436914e1ad8541c4770c116559a0687caeac0490
Submitter: Zuul
Branch: master

commit 436914e1ad8541c4770c116559a0687caeac0490
Author: Manish <email address hidden>
Date: Thu Jul 16 09:39:56 2015 +0530

Walk on deleted route table when VRF is deleted.

Problem:
When VRF is marked for delete, route tables are deleted from agent however the
pointer is not made NULL(for debugging it is retained). This happens via
Onzerorefcount, where vrf entry(db entry) delete is enqueued later. There can be
a walk going on Vrftable and since this Vrf is not yet deleted and is in queue,
it can be picked up. Once this happens further route table walks in this vrf
will be started. Since route table were deleted these pointer(table) will be
invalid resulting in crash.

Solution:
On route table walk identify if table is deleted and walk need not be done.
To fix use bitmap and reset it once table is deleted.

Change-Id: I3ca5e6eb0eb4fb54ec873242b57a4389448c5279
Closes-bug: 1471101

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/12659
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/12659
Committed: http://github.org/Juniper/contrail-controller/commit/24ee31ba848bb9e87c1cd036cd58f388f1e113ad
Submitter: Zuul
Branch: R2.20

commit 24ee31ba848bb9e87c1cd036cd58f388f1e113ad
Author: Manish <email address hidden>
Date: Thu Jul 16 09:39:56 2015 +0530

Walk on deleted route table when VRF is deleted.

Problem:
When VRF is marked for delete, route tables are deleted from agent however the
pointer is not made NULL(for debugging it is retained). This happens via
Onzerorefcount, where vrf entry(db entry) delete is enqueued later. There can be
a walk going on Vrftable and since this Vrf is not yet deleted and is in queue,
it can be picked up. Once this happens further route table walks in this vrf
will be started. Since route table were deleted these pointer(table) will be
invalid resulting in crash.

Solution:
On route table walk identify if table is deleted and walk need not be done.
To fix use bitmap and reset it once table is deleted.

Change-Id: I3ca5e6eb0eb4fb54ec873242b57a4389448c5279
Closes-bug: 1471101

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22-dev

Review in progress for https://review.opencontrail.org/13927
Submitter: Vinay Vithal Mahuli (<email address hidden>)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.