tbb::assertion_failure exceptions

Bug #1701096 reported by Piyush Srivastava
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.21.x
Fix Committed
Undecided
Hari Prasad Killi
Trunk
Invalid
Undecided
Hari Prasad Killi
OpenContrail
Fix Committed
Undecided
Hari Prasad Killi

Bug Description

Build: 2.21.3-71 Icehouse

We recently deployed a fix which uses the `enqueue` function in the TBB library instead of `spawn`. While this seems to have helped as we do not see frequent issues of TBB locking up. However, we recently ran into another issue where contrail-vrouter-agent seems to have crashed and restarted by the supervisor daemon. Looking at the gcore, we saw the following:

(gdb) bt
#0 0x00002b04ed090625 in raise () from /lib64/libc.so.6
#1 0x00002b04ed091e05 in abort () from /lib64/libc.so.6
#2 0x00002b04ec687617 in tbb::assertion_failure (filename=0x2b04ec6a6ad8 "/ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/arena.cpp", line=50,
    expression=0x2b04ec6a6b1e "governor::is_set(&s)", comment=0x0) at /ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/tbb_assert_impl.h:80
#3 0x00002b04ec690c96 in tbb::internal::arena::process (this=0x2614380, s=...) at /ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/arena.cpp:50
#4 0x00002b04ec68f906 in tbb::internal::market::process (this=0x2610f00, j=...) at /ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/market.cpp:393
#5 0x00002b04ec68a4bc in tbb::internal::rml::private_worker::run (this=0x2611400) at /ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/private_server.cpp:263
#6 0x00002b04ec68a362 in tbb::internal::rml::private_worker::thread_routine (arg=0x2611400) at /ecbuilds/PipeLine/sb/third_party/tbb40_20111130oss/src/tbb/private_server.cpp:231
#7 0x00002b04ec452aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00002b04ed14693d in clone () from /lib64/libc.so.6
(gdb)

We haven't enabled the new TaskMonitor feature that was added in build 65 so this looks like something totally different than the earlier TBB issue. In the supervisor logs, we see:

2017-06-01 08:49:10,252 INFO stopped: contrail-vrouter-agent (terminated by SIGKILL)
2017-06-01 08:49:11,184 INFO spawned: 'contrail-vrouter-agent' with pid 5151
2017-06-01 08:49:16,928 INFO success: contrail-vrouter-agent entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2017-06-03 06:44:55,652 INFO exited: contrail-vrouter-agent (terminated by SIGABRT (core dumped); not expected)
2017-06-03 06:44:56,655 INFO spawned: 'contrail-vrouter-agent' with pid 79685
2017-06-03 06:45:01,670 INFO success: contrail-vrouter-agent entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)

Even though supervisor is automatically restarting the contrail-vrouter-agent, the new process is pretty much dead. We had to manually restart supervisor-vrouter to get vrouter-agent into a working state.

Tags: wpc
Vineet Gupta (vineetrf)
tags: added: wpc
Sachin Bansal (sbansal)
Changed in juniperopenstack:
assignee: nobody → Hari Prasad Killi (haripk)
Changed in opencontrail:
assignee: nobody → Hari Prasad Killi (haripk)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/33627
Submitter: Nikhil Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/33628
Submitter: Nikhil Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/33765
Submitter: Nikhil Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/33766
Submitter: Nikhil Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/33766
Committed: http://github.com/Juniper/contrail-controller/commit/dae3ce31c86abb586dd4d2d61e4e3b4063dd00e3
Submitter: Zuul (<email address hidden>)
Branch: R2.21.x

commit dae3ce31c86abb586dd4d2d61e4e3b4063dd00e3
Author: Nikhil B <email address hidden>
Date: Wed Jul 19 10:07:24 2017 +0530

Use newer version of libtbb even for centos 6.5+

There were some random crashes reported with older versions of libtbb.
We need to use newer version to avoid those crashes. SConscript should
be able to work with both old and new versions

Change-Id: Ibf3960323479b1d1afbd2a0cc1fce66c3d403050
Partial-Bug: #1701096

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/33628
Submitter: Nikhil Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/33628
Committed: http://github.com/Juniper/contrail-third-party/commit/249be28d0acfa5a44d2b2b9ae032fc4db521dbad
Submitter: Zuul (<email address hidden>)
Branch: R2.21.x

commit 249be28d0acfa5a44d2b2b9ae032fc4db521dbad
Author: Nikhil B <email address hidden>
Date: Fri Jul 14 12:19:00 2017 +0530

Use newer version of libtbb even for centos 6.5+

There were some random crashes reported with older versions of libtbb.
We need to use newer version to avoid those crashes

Change-Id: I9aff585fcd9f956550ac37081b82cd658d4974d9
Partial-Bug: #1701096

Changed in opencontrail:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.