vrouter agent core seen at tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits

Bug #1783698 reported by vimal
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Released
High
sangarshan p
Trunk
Fix Committed
High
sangarshan p

Bug Description

vrouter agent core seen at tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTrait.

backtrace
-----------
gdb /usr/bin/contrail-vrouter-agent core.contrail-vroute.37800.nodem19
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/contrail-vrouter-agent...Reading symbols from /usr/bin/contrail-vrouter-agent...(no debugging symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 37831]
[New LWP 37832]
[New LWP 37825]
[New LWP 37829]
[New LWP 37826]
[New LWP 37830]
[New LWP 37800]
[New LWP 37827]
[New LWP 37828]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal 6, Aborted.
#0 0x00007f988331e277 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install contrail-vrouter-agent-5.0-156.el7.x86_64
(gdb) bt
#0 0x00007f988331e277 in raise () from /lib64/libc.so.6
#1 0x00007f988331f968 in abort () from /lib64/libc.so.6
#2 0x00007f9883317096 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007f9883317142 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000e94a4a in TaskImpl::execute() ()
#5 0x00007f9883ef58ca in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) () from /lib64/libtbb.so.2
#6 0x00007f9883ef15b6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () from /lib64/libtbb.so.2
#7 0x00007f9883ef0c8b in tbb::internal::market::process(rml::job&) () from /lib64/libtbb.so.2
#8 0x00007f9883eee67f in tbb::internal::rml::private_worker::run() () from /lib64/libtbb.so.2
#9 0x00007f9883eee879 in tbb::internal::rml::private_worker::thread_routine(void*) () from /lib64/libtbb.so.2
#10 0x00007f9884110e25 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f98833e6bad in clone () from /lib64/libc.so.6

image
--------

queens-5.0-156

core
------
/cs-shared/bugs/1783698/core.contrail-vroute.37800.nodem19.1532522003

vimal (vappachan)
Changed in juniperopenstack:
milestone: none → r5.0.1
vimal (vappachan)
description: updated
tags: added: sanityblocker
Revision history for this message
Sivakumar Ganapathy (hotlava51) wrote :

Assigned to Sangarshan.

Revision history for this message
vimal (vappachan) wrote :

This core is not see in last 5 runs.

vimal (vappachan)
tags: added: sanity
removed: sanityblocker
Revision history for this message
sangarshan p (sangarshp) wrote :

please share contrail-vrouter-agent log files also when crash is seen. moving defect to incomplete for now

Jeba Paulaiyan (jebap)
tags: added: contrail-networking
Revision history for this message
alok kumar (kalok) wrote :

This is seen in RHOSP13 setup with build rhel-queens-5.0-182.

(gdb) bt
#0 0x00007fbe66c1c207 in raise () from /lib64/libc.so.6
#1 0x00007fbe66c1d8f8 in abort () from /lib64/libc.so.6
#2 0x00007fbe66c15026 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007fbe66c150d2 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000d2e777 in VrfEntry::DeleteTimeout() ()
#5 0x0000000000e9d8d9 in Timer::TimerTask::Run() ()
#6 0x0000000000e956af in TaskImpl::execute() ()
#7 0x00007fbe677f396a in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*)
    () from /lib64/libtbb.so.2
#8 0x00007fbe677ef5a6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () from /lib64/libtbb.so.2
#9 0x00007fbe677eec6b in tbb::internal::market::process(rml::job&) () from /lib64/libtbb.so.2
#10 0x00007fbe677ec65f in tbb::internal::rml::private_worker::run() () from /lib64/libtbb.so.2
#11 0x00007fbe677ec859 in tbb::internal::rml::private_worker::thread_routine(void*) () from /lib64/libtbb.so.2
#12 0x00007fbe67a0edd5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fbe66ce4b3d in clone () from /lib64/libc.so.6

logs, core and agent binary is copied at /cs-shared/bugs/1783698/rhosp13/

[kalok@nodem4 rhosp13]$ ls -l
total 209064
drwxr-xr-x 2 fedora fedora 4096 Jan 1 2014 contrail
-rwxr-xr-x 1 fedora fedora 25397920 Jan 1 2014 contrail-vrouter-agent
-rw------- 1 fedora fedora 188678144 Jan 1 2014 core.contrail-vroute.177.overcloud-novacompute-2.1533974014

Revision history for this message
sangarshan p (sangarshp) wrote :

agent was crashed due to std::bad_cast exception , we need log files to understand the possible exception cases. please share /var/log/files when the issue is seen next time.

Revision history for this message
vimal (vappachan) wrote :

Same core is seen in queens-5.0-270

logs are at : /cs-shared/bugs/1783698/270

gdb /usr/bin/contrail-vroute
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/contrail-vrouter-agent...Reading symbols from /usrng symbols found)...done.
(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install contrail-vrouter-agent-5.0-2
(gdb) q
r-agent core.contrail-vroute.28908.nodem20.1538154508 in/contrail-vrouter
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/contrail-vrouter-agent...Reading symbols from /usrng symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 28944]
[New LWP 39590]
[New LWP 28946]
[New LWP 28947]
[New LWP 39591]
[New LWP 28942]
[New LWP 28948]
[New LWP 28943]
[New LWP 28949]
[New LWP 28908]
[New LWP 28945]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal 6, Aborted.
#0 0x00007fa53c894277 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install contrail-vrouter-agent-5.0-2
(gdb) bt
#0 0x00007fa53c894277 in raise () from /lib64/libc.so.6
#1 0x00007fa53c895968 in abort () from /lib64/libc.so.6
#2 0x00007fa53c88d096 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007fa53c88d142 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000e9ae1a in TaskImpl::execute() ()
#5 0x00007fa53d68c66a in tbb::internal::custom_scheduler<tbb::internal::IntelScask&, tbb::task*) ()
   from /lib/libtbb.so.2
#6 0x00007fa53d685f00 in tbb::internal::arena::process(tbb::internal::generic_s
#7 0x00007fa53d6849a3 in tbb::internal::market::process(rml::job&) ()
   from /lib/libtbb.so.2
#8 0x00007fa53d6809c7 in tbb::internal::rml::private_worker::run() ()
   from /lib/libtbb.so.2
#9 0x00007fa53d680c39 in tbb::internal::rml::private_worker::thread_routine(voi
#10 0x00007fa53d451e25 in start_thread () from /lib64/libpthread.so.0
#11 0x00007fa53c95cbad in clone () from /lib64/libc.so.6
(gdb)

Revision history for this message
vimal (vappachan) wrote :

same core is seen in
 5.0-279 . Logs are updated in cs-shared/bugs/1783698/279

[New LWP 28998]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal 6, Aborted.
#0 0x00007ff0508f8277 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install contrail-vrouter-agent-5.0-279.el7
(gdb) bt
#0 0x00007ff0508f8277 in raise () from /lib64/libc.so.6
#1 0x00007ff0508f9968 in abort () from /lib64/libc.so.6
#2 0x00007ff0508f1096 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007ff0508f1142 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000e9ae2a in TaskImpl::execute() ()
#5 0x00007ff0516f066a in tbb::internal::custom_scheduler<tbb::internal::IntelScheduleask&, tbb::task*) ()
   from /lib/libtbb.so.2
#6 0x00007ff0516e9f00 in tbb::internal::arena::process(tbb::internal::generic_schedul
#7 0x00007ff0516e89a3 in tbb::internal::market::process(rml::job&) ()
   from /lib/libtbb.so.2
#8 0x00007ff0516e49c7 in tbb::internal::rml::private_worker::run() ()
   from /lib/libtbb.so.2
#9 0x00007ff0516e4c39 in tbb::internal::rml::private_worker::thread_routine(void*) ()
#10 0x00007ff0514b5e25 in start_thread () from /lib64/libpthread.so.0
#11 0x00007ff0509c0bad in clone () from /lib64/libc.so.6
(gdb)

tags: added: sanityblocker
Revision history for this message
Sivakumar Ganapathy (hotlava51) wrote :

Hi Sudhee,
With instrumented build, we are hitting different core (1786148),
Core is generated due to exception error, when exception is thrown, we don’t get stack trace from the core. We got relevant info from 5.0.279 build sanity run only. we found out from log file that exception was thrown in task Agent::controllerXmpp run. We added traces in possible paths to narrow down the issue, but we could not hit it with this instrumented build.

Since the problem is not reproduced with the instrumented build DE is looking at the other bug 1786148 while waiting for core from instrumented image for 1783698.

Revision history for this message
sangarshan p (sangarshp) wrote :

root caused the issue , working on fix, ETA:18/10

Revision history for this message
sangarshan p (sangarshp) wrote :
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/47103
Submitter: sangarshan p (<email address hidden>)

Revision history for this message
sangarshan p (sangarshp) wrote :

running sanity with the fix, waiting for sanity report to checkin the fix.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/47103
Committed: http://github.com/Juniper/contrail-controller/commit/a7bbb3c655bd02ff31be07650e422d354494e58c
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit a7bbb3c655bd02ff31be07650e422d354494e58c
Author: sangarshp <email address hidden>
Date: Thu Oct 18 16:56:32 2018 +0530

Local IPv6 route handling is not done correctly

IPv6 routes received form controller with VRF table mpls label.
IP address check is not present and type casted to V4 always.
made changes to skip IPv6 route processing if the mpls label
is VRF table label.

Change-Id: I904a1e2604a73fde53c271cb49e4b6d74e84c4a3
Closes-Bug: #1783698

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/47940
Submitter: sangarshan p (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/47940
Committed: http://github.com/Juniper/contrail-controller/commit/206c34b16ef225538cc8ef0e3fb2fff0c486519a
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 206c34b16ef225538cc8ef0e3fb2fff0c486519a
Author: sangarshp <email address hidden>
Date: Thu Oct 18 16:56:32 2018 +0530

Local IPv6 route handling is not done correctly

IPv6 routes received form controller with VRF table mpls label.
IP address check is not present and type casted to V4 always.
made changes to skip IPv6 route processing if the mpls label
is VRF table label.

Change-Id: I904a1e2604a73fde53c271cb49e4b6d74e84c4a3
Closes-Bug: #1783698
(cherry picked from commit a7bbb3c655bd02ff31be07650e422d354494e58c)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.