Contrail 2.21-14: contrail-vrouter-agent Crash with no traffic, not recovering

Bug #1528106 reported by Deepak Jeyaraman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Incomplete
High
Hari Prasad Killi

Bug Description

Hit this coredump in one of the compute nodes - and the service never came back up.

root@ccra-07:~# contrail-status
== Contrail vRouter ==
supervisor-vrouter: active
contrail-vrouter-agent initializing (No control-nodes configured)
contrail-vrouter-nodemgr active

========Run time service failures=============
/var/crashes/core.virsh.45453.ccra-07.1449712625
/var/crashes/core.contrail-vroute.3283.ccra-07.1450667198

root@ccra-16:~# contrail-version
Package Version Build-ID | Repo | Package Name
-------------------------------------- ------------------------------ ----------------------------------
contrail-fabric-utils 2.21.1-14 14
contrail-install-packages 2.21.1-14~icehouse 14
contrail-lib 2.21.1-14 14
contrail-nodemgr 2.21.1-14 14
contrail-nova-vif 2.21.1-14 14
contrail-openstack-vrouter 2.21.1-14 14

root@ccra-16:~# scp root@ccra-07://var/crashes/core.contrail-vroute.3283.ccra-07.1450667198 .
The authenticity of host 'ccra-07 (10.102.28.81)' can't be established.
ECDSA key fingerprint is c2:09:b2:86:cb:22:99:45:78:65:94:eb:53:8a:84:21.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ccra-07,10.102.28.81' (ECDSA) to the list of known hosts.
root@ccra-07's password:
core.contrail-vroute.3283.ccra-07.1450667198 77% 335MB 111.5MB/s 00:00 ETAgcore.contrail-vroute.3283.ccra-07.1450667198 100% 433MB 108.3MB/s 00:04
root@ccra-16:~# which contrail-vrouter-agent
/usr/bin/contrail-vrouter-agent
root@ccra-16:~# gdb /usr/bin/contrail-vrouter-agent core.contrail-vroute.3283.ccra-07.1450667198
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/contrail-vrouter-agent...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 4588]
[New LWP 4589]
[New LWP 4601]
[New LWP 4593]
[New LWP 4602]
[New LWP 4631]
[New LWP 4616]
[New LWP 4624]
[New LWP 4605]
[New LWP 4606]
[New LWP 4615]
[New LWP 4713]
[New LWP 4586]
[New LWP 4614]
[New LWP 4622]
[New LWP 4607]
[New LWP 4710]
[New LWP 4608]
[New LWP 4604]
[New LWP 4595]
[New LWP 4612]
[New LWP 4714]
[New LWP 4599]
[New LWP 4625]
[New LWP 4626]
[New LWP 4627]
[New LWP 4591]
[New LWP 4617]
[New LWP 4587]
[New LWP 4628]
[New LWP 4600]
[New LWP 4715]
[New LWP 4712]
[New LWP 4609]
[New LWP 4629]
[New LWP 4618]
[New LWP 4597]
[New LWP 4623]
[New LWP 4619]
[New LWP 4610]
[New LWP 4611]
[New LWP 4590]
[New LWP 4711]
[New LWP 4632]
[New LWP 4630]
[New LWP 4621]
[New LWP 3283]
[New LWP 4598]
[New LWP 4594]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f425e176cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f425e176cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f425e17a0d8 in __GI_abort () at abort.c:89
#2 0x00007f425e16fb86 in __assert_fail_base (fmt=0x7f425e2c0830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x1019f75 "0",
    file=file@entry=0x1028320 "controller/src/vnsw/agent/oper/vrf.cc", line=line@entry=333, function=function@entry=0x10284e0 "bool VrfEntry::DeleteTimeout()") at assert.c:92
#3 0x00007f425e16fc32 in __GI___assert_fail (assertion=0x1019f75 "0", file=0x1028320 "controller/src/vnsw/agent/oper/vrf.cc", line=333,
    function=0x10284e0 "bool VrfEntry::DeleteTimeout()") at assert.c:101
#4 0x00000000009e177d in VrfEntry::DeleteTimeout() ()
#5 0x0000000000fe3eb9 in Timer::TimerTask::Run() ()
#6 0x0000000000fdd8b0 in TaskImpl::execute() ()
#7 0x00007f425ed45b3a in ?? () from /usr/lib/libtbb.so.2
#8 0x00007f425ed41816 in ?? () from /usr/lib/libtbb.so.2
#9 0x00007f425ed40f4b in ?? () from /usr/lib/libtbb.so.2
#10 0x00007f425ed3d0ff in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f425ed3d2f9 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f425ef61182 in start_thread (arg=0x7f4256ffb700) at pthread_create.c:312
#13 0x00007f425e23a47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) bt full
#0 0x00007f425e176cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        resultvar = 0
        pid = 3283
        selftid = 4588
#1 0x00007f425e17a0d8 in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7fffbbd13e62, sa_sigaction = 0x7fffbbd13e62}, sa_mask = {__val = {139923024497948, 16941856, 333, 4294967295,
              139923023140067, 4294967296, 139922904164272, 332, 8340631, 54654256, 0, 0, 0, 21474836480, 139923070885888, 139923024513072}}, sa_flags = 16883573,
          sa_restorer = 0x10284e0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x00007f425e16fb86 in __assert_fail_base (fmt=0x7f425e2c0830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x1019f75 "0",
    file=file@entry=0x1028320 "controller/src/vnsw/agent/oper/vrf.cc", line=line@entry=333, function=function@entry=0x10284e0 "bool VrfEntry::DeleteTimeout()") at assert.c:92
        str = 0x7f424c0a7fb0 "\220\304\002LB\177"
        total = 4096
#3 0x00007f425e16fc32 in __GI___assert_fail (assertion=0x1019f75 "0", file=0x1028320 "controller/src/vnsw/agent/oper/vrf.cc", line=333,
    function=0x10284e0 "bool VrfEntry::DeleteTimeout()") at assert.c:101
No locals.
#4 0x00000000009e177d in VrfEntry::DeleteTimeout() ()
No symbol table info available.
#5 0x0000000000fe3eb9 in Timer::TimerTask::Run() ()
No symbol table info available.
#6 0x0000000000fdd8b0 in TaskImpl::execute() ()
No symbol table info available.
#7 0x00007f425ed45b3a in ?? () from /usr/lib/libtbb.so.2
No symbol table info available.
#8 0x00007f425ed41816 in ?? () from /usr/lib/libtbb.so.2
No symbol table info available.
#9 0x00007f425ed40f4b in ?? () from /usr/lib/libtbb.so.2
No symbol table info available.
#10 0x00007f425ed3d0ff in ?? () from /usr/lib/libtbb.so.2
No symbol table info available.
#11 0x00007f425ed3d2f9 in ?? () from /usr/lib/libtbb.so.2
No symbol table info available.
#12 0x00007f425ef61182 in start_thread (arg=0x7f4256ffb700) at pthread_create.c:312
        __res = <optimized out>
        pd = 0x7f4256ffb700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139922904168192, -1957536147036793459, 0, 0, 139922904168896, 139922904168192, 1896511055331497357, 1896528720881554829},
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#13 0x00007f425e23a47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.

Tags: vrouter
information type: Proprietary → Public
Revision history for this message
Deepak Jeyaraman (jdeepak) wrote :

Reboot didnt help too.

root@ccra-07:~# uptime
 22:28:11 up 9 min, 1 user, load average: 0.13, 0.27, 0.20

Service restart also didnt help.

summary: - Contrail 2.21-14: contrail-vrouter-agent Crash with no traffic
+ Contrail 2.21-14: contrail-vrouter-agent Crash with no traffic, not
+ recovering
Revision history for this message
Deepak Jeyaraman (jdeepak) wrote :

Coredump located here:

$ pwd
/volume/vmcores/sagrawal/jdeepak_temp/NFV/contrail_bug/1528106
jdeepak@ttsv-shell04 /volume/vmcores/sagrawal/jdeepak_temp/NFV/contrail_bug/1528106
$ ls
total 55592
-rwxrwxrwx 1 jdeepak software 55640064 Dec 20 22:40 core.contrail-vroute.3283.ccra-07.1450667198
-rwxrwxrwx 1 jdeepak software 1048678 Dec 20 22:43 vrouter.log.1

tags: added: vrouter
Changed in juniperopenstack:
milestone: none → r3.0-fcs
Revision history for this message
Hari Prasad Killi (haripk) wrote :

The core file is not available.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.