VM UVE showing empty data during sanity tests

Bug #1388221 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Vedamurthy Joshi

Bug Description

2.0 Build 2427 Centos Havana multi-node multi-interface testbed

This has happened during multiple tests in the sanity run :
http://10.204.216.50/Docs/logs/2427_2014_10_31_19_46_43/test_report.html

Logs and cassandra logs are in /cs-shared/test_runs/nodec21/2014_10_31_19_46_43 on nodeb10.englab.juniper.net(use juniper id)

In the testcase test_svc_mirroring, the case because it didnt find the VM uve for vn2_vm2

2014-10-31 22:43:06,877 - WARNING - Failed to get VM vn2_vm2, ID 55b6e5b0-e75d-4114-809e-c3e9c7e1a517 info from Opserver^M
2014-10-31 22:43:08,879 - INFO - Verifying the vm in opserver^M
2014-10-31 22:43:08,879 - INFO - Verifying in collector 10.204.217.6 ...^M
Key Error^M
2014-10-31 22:43:08,886 - WARNING - Failed to get VM vn2_vm2, ID 55b6e5b0-e75d-4114-809e-c3e9c7e1a517 info from Opserver^M
2014-10-31 22:43:10,888 - INFO - Verifying the vm in opserver^M
2014-10-31 22:43:10,889 - INFO - Verifying in collector 10.204.217.6 ...^M
Key Error^M

Revision history for this message
Megh Bhatt (meghb) wrote :
Download full text (39.1 KiB)

The issue was caused due to the Sandesh send queue getting stuck with ready_to_send_ being set to false due to a race condition. This issue was introduced as part of recent checkin https://github.com/Juniper/contrail-sandesh/commit/fc865df0e682e5b63c852fdf7b4710035000e518 which removed mutex from SandeshWriter and replaced it by using tbb::atomic to update ready_to_send_. However the mutex is needed since TcpSession::Send and TcpSession::WriteReady callback can happen in parallel and hence the mutex is needed to serialize the execution of both and updating of the ready_to_send_.

Following is the gdb analysis of the gcore of contrail-vrouter-agent:

[root@nodec60 ~]# gdb /var/tmp/contrail-vrouter-agent /var/tmp/core.6504
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /var/tmp/contrail-vrouter-agent...done.

warning: core file may not match specified executable file.
[New Thread 7013]
[New Thread 7014]
[New Thread 7015]
[New Thread 7016]
[New Thread 11208]
[New Thread 6504]
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/b5/d86fbcf0ccb03331e6c7c73897b96845e0a4eb
Reading symbols from /usr/lib64/libxml2.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libxml2.so.2
Reading symbols from /usr/lib/libthriftasio.so.0...done.
Loaded symbols for /usr/lib/libthriftasio.so.0
Reading symbols from /usr/lib/libthrift-0.8.0.so...done.
Loaded symbols for /usr/lib/libthrift-0.8.0.so
Reading symbols from /usr/lib/libboost_filesystem.so.1.48.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libboost_filesystem.so.1.48.0
Reading symbols from /usr/lib/libboost_program_options.so.1.48.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libboost_program_options.so.1.48.0
Reading symbols from /usr/lib/libboost_regex.so.1.48.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libboost_regex.so.1.48.0
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /usr/lib/libboost_system.so.1.48.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libboost_system.so.1.48.0
Reading symbols from /usr/lib/liblog4cplus-1.1.so.7...done.
Loaded symbols for /usr/lib/liblog4cplus-1.1.so.7
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /usr/lib/libtbb_debug.so.2...done.
Loaded symbols for /usr/lib/libtbb_debug.so.2
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so...

Changed in juniperopenstack:
assignee: Raj Reddy (rajreddy) → Megh Bhatt (meghb)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/4322
Committed: http://github.org/Juniper/contrail-sandesh/commit/421e705ef085d6f2b225bf1d3b27beb5c423f8da
Submitter: Zuul
Branch: master

commit 421e705ef085d6f2b225bf1d3b27beb5c423f8da
Author: Megh Bhatt <email address hidden>
Date: Tue Nov 4 18:14:34 2014 -0800

TcpSession::Send() and TcpSession::WriteReady() can be called from
concurrent threads and hence just having an atomic bool to update
the send status is not enough to guarantee that the correct send
status is finally set. A recent change in SandeshWriter removed
the mutex acquired before called TcpSession::Send() and updating the
send status and used to update the send status in the
TcpSession::WriteReady() callback. This change reverts back to
using the mutex in SandeshWriter.
Closes-Bug: #1388221

Change-Id: Ibb560021d6a0d8963db6284f1d722a974cb45be2

Megh Bhatt (meghb)
Changed in juniperopenstack:
assignee: Megh Bhatt (meghb) → Vedamurthy Joshi (vedujoshi)
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.