agent crash at pthread_mutex_lock with fast, repeated hping tcp setup/teardown

Bug #1542656 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Fix Committed
High
Praveen

Bug Description

R3.0 Build 2711 Ubuntu 14.04 Kilo multinode

On repeated hping3 between vms with fast tcp setup/teardown ( inter-packet gap of 100 micro sec), sometimes, below crash is seen

Ex : hping3 -S -p 22 10.1.1.3 -s 10000 -c 1000 -i u100

Core will be in http://10.204.216.50/Docs/bugs/#

(gdb) bt
#0 0x00007fcbb8b90414 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x0000000000cbe419 in KSyncFlowIndexManager::FindByIndex(unsigned int) ()
#2 0x0000000000cd1a00 in KSyncSandeshContext::FlowMsgHandler(vr_flow_req*) ()
#3 0x00000000010b9640 in Sandesh::ReceiveBinaryMsgOne(unsigned char*, unsigned int, int*, SandeshContext*) ()
#4 0x0000000000ceeb16 in ?? ()
#5 0x0000000000ceec8b in KSyncBulkSandeshContext::Decoder(char*, unsigned int, unsigned int, bool) ()
#6 0x0000000000ced690 in KSyncSock::ProcessKernelData(char*) ()
#7 0x0000000000cf377f in QueueTaskRunner<char*, WorkQueue<char*> >::Run() ()
#8 0x00000000011790ac in TaskImpl::execute() ()
#9 0x00007fcbb8972b3a in ?? () from /usr/lib/libtbb.so.2
#10 0x00007fcbb896e816 in ?? () from /usr/lib/libtbb.so.2
#11 0x00007fcbb896df4b in ?? () from /usr/lib/libtbb.so.2
#12 0x00007fcbb896a0ff in ?? () from /usr/lib/libtbb.so.2
#13 0x00007fcbb896a2f9 in ?? () from /usr/lib/libtbb.so.2
#14 0x00007fcbb8b8e182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007fcbb7e6747d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)

Tags: vrouter
Revision history for this message
Praveen (praveen-karadakal) wrote :

Crash happens with following sequence,

1. Flow is added with following,
    - flow-1 with index i1
    - rflow-1 with index i2
2. Vrouter evicts flows flow-1 and flow-2
    - flow-1 is evicted with flow-2 (uses index i1)
   - flow-2 is evicted with rflow-2 (uses index i2)
3. Agent processes flow-2 and starts eviction of flow-1
4. As part of flow-1 eviction, flow-2 is also deleted.
5. Deletion of flow-2 will result in VRouter operation of delete index i2 which inadvertently deletes flow-2
6. Later when reverse flow for flow-2 is being setup, it will point to index i2 which is deleted and results in vrouter error

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/16960
Submitter: Praveen K V (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/16960
Committed: http://github.org/Juniper/contrail-controller/commit/23d3519be2aeb1f5f17b1b4121ace61d28a24f71
Submitter: Zuul
Branch: master

commit 23d3519be2aeb1f5f17b1b4121ace61d28a24f71
Author: Praveen K V <email address hidden>
Date: Sun Feb 7 11:27:51 2016 +0530

On eviction of a flow, dont delete reverse flow

When a flow is evicted, unlink the reverse flow and mark it as short
flow.

Change-Id: Ic468f3431c820cbd4d914519f57c76fc26152861
Partial-Bug: #1542656

Revision history for this message
Vinod Nair (vinodnair) wrote :

Still see Similar issue in Build 2712 Kilo
core is copied to /cs-shared/bugs/1542656

backtrace is as below
Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-vrouter-agent'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __GI___pthread_mutex_lock (mutex=0x7fdadab39e40) at ../nptl/pthread_mutex_lock.c:66
66 ../nptl/pthread_mutex_lock.c: No such file or directory.
Traceback (most recent call last):
  File "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", line 63, in <module>
    from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
(gdb) bt
#0 __GI___pthread_mutex_lock (mutex=0x7fdadab39e40) at ../nptl/pthread_mutex_lock.c:66
#1 0x0000000000cbe809 in lock (this=0x801ad1b7dfd8) at /usr/include/tbb/mutex.h:164
#2 acquire (mutex=..., this=<synthetic pointer>) at /usr/include/tbb/mutex.h:105
#3 scoped_lock (mutex=..., this=<synthetic pointer>) at /usr/include/tbb/mutex.h:91
#4 KSyncFlowIndexManager::FindByIndex (this=this@entry=0x7fdad8014480, idx=<optimized out>)
    at controller/src/vnsw/agent/vrouter/ksync/ksync_flow_index_manager.cc:257
#5 0x0000000000cd1df0 in KSyncSandeshContext::FlowMsgHandler (this=<optimized out>, r=0x7fdac4159170)
    at controller/src/vnsw/agent/vrouter/ksync/sandesh_ksync.cc:134
#6 0x00000000010b9650 in Sandesh::ReceiveBinaryMsgOne (buf=buf@entry=0x7fdad426f766 "", buf_len=buf_len@entry=262, error=error@entry=0x7fdae0b5f730,
    client_context=client_context@entry=0x7fdacc0157a8) at tools/sandesh/library/cpp/sandesh.cc:565
#7 0x0000000000ceeef6 in DecodeSandeshMessages (buf=0x7fdad426f766 "", buf_len=262, sandesh_context=sandesh_context@entry=0x7fdacc0157a8, alignment=4)
    at controller/src/ksync/ksync_sock.cc:167
#8 0x0000000000cef06b in KSyncBulkSandeshContext::Decoder (this=0x7fdacc0157a8, data=<optimized out>, len=<optimized out>, alignment=<optimized out>,
    more=<optimized out>) at controller/src/ksync/ksync_sock.cc:940
#9 0x0000000000ceda80 in KSyncSock::ProcessKernelData (this=0x7fdab7fbe7a0, data=0x7fdad426f730 "<\001") at controller/src/ksync/ksync_sock.cc:323
#10 0x0000000000cf3bdf in operator() (a0=0x7fdad426f730 "<\001", this=0x7fdae0b5fb30) at /usr/include/boost/function/function_template.hpp:767
#11 RunQueue (this=0x7fdab63ce270) at controller/src/base/queue_task.h:87
#12 QueueTaskRunner<char*, WorkQueue<char*> >::Run (this=0x7fdab63ce270) at controller/src/base/queue_task.h:66
#13 0x00000000011793ec in TaskImpl::execute (this=0x7fdae211fc40) at controller/src/base/task.cc:253
#14 0x00007fdae94adb3a in ?? () from /usr/lib/libtbb.so.2
#15 0x00007fdae94a9816 in ?? () from /usr/lib/libtbb.so.2
#16 0x00007fdae94a8f4b in ?? () from /usr/lib/libtbb.so.2
#17 0x00007fdae94a50ff in ?? () from /usr/lib/libtbb.so.2
#18 0x00007fdae94a52f9 in ?? () from /usr/lib/libtbb.so.2
#19 0x00007fdae96c9182 in start_thread (arg=0x7fdae0b60700) at pthread_create.c:312
#20 0x00007fdae89a247d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Revision history for this message
vageesan (vageesant) wrote :

added core from kilo 2714 to /cs-shared/bugs/1542656/core.contrail-vroute.27490.csol1-node10.1455832222.kilo.2714

Revision history for this message
Hari Prasad Killi (haripk) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.