contrail-control core seen at boost::detail::sp_counted_impl_p<TaskTrigger>::dispose()

Bug #1786154 reported by vimal
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Released
Medium
Mahesh Sivakumar
Trunk
Fix Committed
Medium
Mahesh Sivakumar

Bug Description

contrail-control core seen at boost::detail::sp_counted_impl_p<TaskTrigger>::dispose()

backtrace
----------------
gdb /usr/bin/contrail-control core.contrail-contro.1.nodem18.153376
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/contrail-control...Reading symbols from /usr/bin/contrail-control...(no debugging symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 110]
[New LWP 117]
[New LWP 111]
[New LWP 115]
[New LWP 118]
[New LWP 126]
[New LWP 130]
[New LWP 129]
[New LWP 134]
[New LWP 114]
[New LWP 112]
[New LWP 122]
[New LWP 132]
[New LWP 128]
[New LWP 124]
[New LWP 137]
[New LWP 125]
[New LWP 140]
[New LWP 143]
[New LWP 142]

[New LWP 157]
[New LWP 152]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-control'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f481f011ab7 in abort () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install contrail-control-5.0-176.el7.x86_64
(gdb) bt
#0 0x00007f481f011ab7 in abort () from /lib64/libc.so.6
#1 0x00007f481f009096 in __assert_fail_base () from /lib64/libc.so.6
#2 0x00007f481f009142 in __assert_fail () from /lib64/libc.so.6
#3 0x0000000000743b04 in TaskTrigger::~TaskTrigger() ()
#4 0x00000000006b6682 in boost::detail::sp_counted_impl_p<TaskTrigger>::dispose() ()
#5 0x000000000047cd2e in boost::detail::sp_counted_base::release() ()
#6 0x0000000000cde713 in ConfigCassandraPartition::~ConfigCassandraPartition() ()
#7 0x0000000000cde859 in ConfigCassandraPartition::~ConfigCassandraPartition() ()
#8 0x0000000000ce1cfe in ConfigCassandraClient::PostShutdown() ()
#9 0x0000000000cb1f31 in ConfigClientManager::PostShutdown() ()
#10 0x0000000000cb25ed in ConfigClientManager::InitConfigClient() ()
#11 0x0000000000743c97 in TaskTrigger::WorkerTask::Run() ()
#12 0x000000000073b90f in TaskImpl::execute() ()
#13 0x00007f481fdef8ca in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () from /lib64/libtbb.so.2
#14 0x00007f481fdeb5b6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () from /lib64/libtbb.so.2
#15 0x00007f481fdeac8b in tbb::internal::market::process(rml::job&) () from /lib64/libtbb.so.2
#16 0x00007f481fde867f in tbb::internal::rml::private_worker::run() () from /lib64/libtbb.so.2
#17 0x00007f481fde8879 in tbb::internal::rml::private_worker::thread_routine(void*) () from /lib64/libtbb.so.2
#18 0x00007f482000ae25 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f481f0d8bad in clone () from /lib64/libc.so.6

logs:
----------

/cs-shared/bugs/1786154
/home/vappachan/logs/1786154

Image
----------------

queens-5.0-176

vimal (vappachan)
description: updated
description: updated
Jeba Paulaiyan (jebap)
tags: added: contrail-control contrail-networking
tags: added: sanityblocker
Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :

Hi Mahesh, looks like assumption the code in ConfigClientManager::InitConfigClient() is not fully correct. While it is correct that none of the other tasks mentioned cannot be running at the same, time, there is no logic there to infer that there is no task has already triggered to run especially the config_reader_ task, when reinit of the db is triggered I guess due to signal USR1.

We have to add logic to check if config_reader_ task_trigger is triggered and if so, pause. Later, when ever config_reader_ task trigger is complete, we have to wakeup reinit again.

Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :

Changed mile-stone to R5.0.2 as this is a very corner case scenario where in the the DB reader task should have been triggered at the same time as when reinit is retriggered

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/45579
Submitter: Mahesh Sivakumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/45579
Committed: http://github.com/Juniper/contrail-common/commit/8ae5f21f9cbb7e4b5771db25fe2c2d7a0dd6d658
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 8ae5f21f9cbb7e4b5771db25fe2c2d7a0dd6d658
Author: Mahesh Sivakumar <email address hidden>
Date: Tue Aug 14 16:15:57 2018 -0700

Control node crash when handling config client reinit:
The config client can be reinit through a signal or if some config
params changed. While processing the reinit, the config manager tries
to shutdown and recreate the config db client. The config client and
its associated partition tasks cannot be running since they are
mutually exclusive tasks to the client init task and this is a valid
assumption. However, it is possible that the tasks have been scheduled
to run but yet to begin execution. If that is the case and we try to
shutdown the config db client/partition, TaskTriggers assert if there
are scheduled tasks. This is not a problem for WorkQueue and Task
classes. Fixing this for TaskTrigger tasks by checking if they have
been scheduled and retrying the config init.
UT will be done at a later time when the test infra for reinit code
is added. For now, verified that the config client tests pass.

Change-Id: Id6d9c6aed32d32688d88932a8ec6c85a5207ad28
Partial-Bug: 1786154

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/46183
Submitter: Mahesh Sivakumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/46183
Committed: http://github.com/Juniper/contrail-common/commit/39a4814b15cce9da235c00127a52cbf41e27125b
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit 39a4814b15cce9da235c00127a52cbf41e27125b
Author: Mahesh Sivakumar <email address hidden>
Date: Tue Aug 14 16:15:57 2018 -0700

Control node crash when handling config client reinit:
The config client can be reinit through a signal or if some config
params changed. While processing the reinit, the config manager tries
to shutdown and recreate the config db client. The config client and
its associated partition tasks cannot be running since they are
mutually exclusive tasks to the client init task and this is a valid
assumption. However, it is possible that the tasks have been scheduled
to run but yet to begin execution. If that is the case and we try to
shutdown the config db client/partition, TaskTriggers assert if there
are scheduled tasks. This is not a problem for WorkQueue and Task
classes. Fixing this for TaskTrigger tasks by checking if they have
been scheduled and retrying the config init.
UT will be done at a later time when the test infra for reinit code
is added. For now, verified that the config client tests pass.

Change-Id: Id6d9c6aed32d32688d88932a8ec6c85a5207ad28
Partial-Bug: 1786154
(cherry picked from commit 8ae5f21f9cbb7e4b5771db25fe2c2d7a0dd6d658)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.