VrfEntry::DeleteTimeout() crash during test_verify_flow tables

Bug #1531295 reported by Vivek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Fix Committed
High
Manish Singh

Bug Description

The crash occurred while running tests test_verify_flow_tables and test_vm_file_trf_scp_tests in regular sanity.

Core files are located at
/cs-shared/test_runs/nodeg28/2016_01_05_12_19_34/nodek12_core.contrail-vroute.2786.nodek12.1451986955.gz
/cs-shared/test_runs/nodeg28/2016_01_05_12_19_34/nodek11_core.contrail-vroute.2876.nodek11.1451984793.gz

Contrail Version:3.0-2692-kilo

Traceback:
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f06899cfcc9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#0 0x00007f06899cfcc9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f06899d30d8 in __GI_abort () at abort.c:89
#2 0x00007f06899c8b86 in __assert_fail_base (
    fmt=0x7f0689b19830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x113b9b5 "0",
    file=file@entry=0x114cf30 "controller/src/vnsw/agent/oper/vrf.cc",
    line=line@entry=343,
    function=function@entry=0x114d4a0 "bool VrfEntry::DeleteTimeout()")
    at assert.c:92
#3 0x00007f06899c8c32 in __GI___assert_fail (assertion=0x113b9b5 "0",
    file=0x114cf30 "controller/src/vnsw/agent/oper/vrf.cc", line=343,
    function=0x114d4a0 "bool VrfEntry::DeleteTimeout()") at assert.c:101
#4 0x0000000000a58472 in VrfEntry::DeleteTimeout() ()
#5 0x00000000010fd7a9 in Timer::TimerTask::Run() ()
#6 0x00000000010f6970 in TaskImpl::execute() ()
#7 0x00007f068a59eb3a in ?? () from /usr/lib/libtbb.so.2
#8 0x00007f068a59a816 in ?? () from /usr/lib/libtbb.so.2
#9 0x00007f068a599f4b in ?? () from /usr/lib/libtbb.so.2
#10 0x00007f068a5960ff in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f068a5962f9 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f068a7ba182 in start_thread (arg=0x7f0682c55700)
    at pthread_create.c:312
#13 0x00007f0689a9347d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Tags: sanity vrouter
Jeba Paulaiyan (jebap)
information type: Proprietary → Public
Revision history for this message
Daisuke Nakajima (dnakajima) wrote :

same core is seen on ToR-agent (Is this core same as https://bugs.launchpad.net/juniperopenstack/+bug/1493861 ?)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-tor-agent --config_file /etc/contrail/contrail-tor-agent-0.co'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fe412181cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fe412181cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fe4121850d8 in __GI_abort () at abort.c:89
#2 0x00007fe41217ab86 in __assert_fail_base (fmt=0x7fe4122cb830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0xe4c275 "0", file=file@entry=0xe5a640 "controller/src/vnsw/agent/oper/vrf.cc", line=line@entry=333,
    function=function@entry=0xe5a800 "bool VrfEntry::DeleteTimeout()") at assert.c:92
#3 0x00007fe41217ac32 in __GI___assert_fail (assertion=0xe4c275 "0", file=0xe5a640 "controller/src/vnsw/agent/oper/vrf.cc",
    line=333, function=0xe5a800 "bool VrfEntry::DeleteTimeout()") at assert.c:101
#4 0x000000000091b56d in VrfEntry::DeleteTimeout() ()
#5 0x0000000000e1c3f9 in Timer::TimerTask::Run() ()
#6 0x0000000000e15df0 in TaskImpl::execute() ()
#7 0x00007fe412d50b3a in ?? () from /usr/lib/libtbb.so.2
#8 0x00007fe412d4c816 in ?? () from /usr/lib/libtbb.so.2
#9 0x00007fe412d4bf4b in ?? () from /usr/lib/libtbb.so.2
#10 0x00007fe412d480ff in ?? () from /usr/lib/libtbb.so.2
#11 0x00007fe412d482f9 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007fe412f6c182 in start_thread (arg=0x7fe40b808700) at pthread_create.c:312
#13 0x00007fe41224547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Jeba Paulaiyan (jebap)
tags: added: sanity
Revision history for this message
Manish Singh (manishs) wrote :

core location?

Revision history for this message
Manish Singh (manishs) wrote :

daisuke-san, did u pushed ur core somewhere?

Revision history for this message
Manish Singh (manishs) wrote :

vivek, didnt see the directory of core in ubuntu-build-02
manishs@ubuntu-build02:~/bugs$ ls /cs-shared/test_runs/nodeg28/2016_01_05_12_19_34/
ls: cannot access /cs-shared/test_runs/nodeg28/2016_01_05_12_19_34/: No such file or directory
manishs@ubuntu-build02:~/bugs$ ls /cs-shared/test_runs/nodeg28/2016_01_05_12_19_34/
ls: cannot access /cs-shared/test_runs/nodeg28/2016_01_05_12_19_34/: No such file or directory
manishs@ubuntu-build02:~/bugs$

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

Manish, looks like the core mentioned in the bug got deleted as it was a temp location. I have copied the same core from another instance of the crash to /cs-shared/bugs/1531295/ (3.0-2697 Juno)

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

Another core from Ubuntu 14.04 Kilo Sanity

[New LWP 2724]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fc737302cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Traceback (most recent call last):
  File "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", line 63, in <module>
    from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
(gdb) bt
#0 0x00007fc737302cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fc7373060d8 in __GI_abort () at abort.c:89
#2 0x00007fc7372fbb86 in __assert_fail_base (fmt=0x7fc73744c830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x11bf3f5 "0",
    file=file@entry=0x11d0cd0 "controller/src/vnsw/agent/oper/vrf.cc", line=line@entry=344,
    function=function@entry=0x11d1240 <VrfEntry::DeleteTimeout()::__PRETTY_FUNCTION__> "bool VrfEntry::DeleteTimeout()") at assert.c:92
#3 0x00007fc7372fbc32 in __GI___assert_fail (assertion=0x11bf3f5 "0", file=0x11d0cd0 "controller/src/vnsw/agent/oper/vrf.cc", line=344,
    function=0x11d1240 <VrfEntry::DeleteTimeout()::__PRETTY_FUNCTION__> "bool VrfEntry::DeleteTimeout()") at assert.c:101
#4 0x0000000000aacc12 in VrfEntry::DeleteTimeout (this=0x7fc7180a20a0) at controller/src/vnsw/agent/oper/vrf.cc:344
#5 0x0000000001174a79 in operator() (this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767
#6 Timer::TimerTask::Run (this=0x1e72b00) at controller/src/base/timer.cc:42
#7 0x000000000116db0c in TaskImpl::execute (this=0x7fc730b6ba40) at controller/src/base/task.cc:253
#8 0x00007fc737ed1b3a in ?? () from /usr/lib/libtbb.so.2
#9 0x00007fc737ecd816 in ?? () from /usr/lib/libtbb.so.2
#10 0x00007fc737eccf4b in ?? () from /usr/lib/libtbb.so.2
#11 0x00007fc737ec90ff in ?? () from /usr/lib/libtbb.so.2
#12 0x00007fc737ec92f9 in ?? () from /usr/lib/libtbb.so.2
#13 0x00007fc7380ed182 in start_thread (arg=0x7fc730588700) at pthread_create.c:312
#14 0x00007fc7373c647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Will copy core to /cs-shared/bugs/1531295/

Revision history for this message
Manish Singh (manishs) wrote :

Both crashes are same.
ICMP flow entry was pending.
Root-cause yet to be diagnosed.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/16545
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/16545
Committed: http://github.org/Juniper/contrail-controller/commit/e0e6be975f82c439a8df0708322fa148a89adfba
Submitter: Zuul
Branch: master

commit e0e6be975f82c439a8df0708322fa148a89adfba
Author: Manish Singh <email address hidden>
Date: Wed Jan 27 17:27:51 2016 +0530

Flows not deleted resulting in vrf del pending.

Problem:
Flows were expecting index to be allocated by vrouter, however vrouter sends
error for same. On receiving this event, these flows are marked as short flow.
Stats collector visits these flow and enqueues delete for these flows. Delete
routine checks if index is allocated. If its not then deletion is skipped
assuming that vrouter still has not added index. It does not check that index
generation resulted in error and now flow has to be deleted.

Solution:
In delete check if flow has been marked as short flow because of index
allocation failure and let it get deleted.

Change-Id: I71e8c3b6ddec7fb3884cc613c9f79d2fdb15301f
Closes-bug: 1531295

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.