tor-agent continously crashing while trying to bringup SSL connection with openvswitch

Bug #1458243 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Prabhjot Singh Sethi
R2.20
Fix Committed
High
Prabhjot Singh Sethi

Bug Description

R2.20 Build 27 Ubuntu 14.04 Juno multi-node setup

Was trying to bringup tor-agent controlling a openvswitch over SSL and see that tor-agent is continuously crashing.

env.roledefs = {
    'all': [host1, host2, host3, host4, host5, host6, host7],
    'cfgm': [host1,host2,host3],
    'openstack': [host1,host2,host3],
    'control': [host1,host2,host3],
    'compute': [host4,host5, host6, host7],
    'collector': [host1,host2,host3],
    'webui': [host1],
    'database': [host1,host2,host3],
    'toragent': [host6, host7],
    'tsn': [host6, host7],
    'build': [host_build],
}

env.hostnames = {
    'all': ['nodec1', 'nodec2', 'nodec3', 'nodek1', 'nodek2', 'nodek3', 'nodeg11']

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-tor-agent --config_file /etc/contrail/contrail-tor-agent-2.co'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f9d54a8bcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f9d54a8bcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f9d54a8f0d8 in __GI_abort () at abort.c:89
#2 0x00007f9d54a84b86 in __assert_fail_base (fmt=0x7f9d54bd5830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0xde2ad5 "0",
    file=file@entry=0xdf6b60 "controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_session.cc", line=line@entry=70,
    function=function@entry=0xdf6c40 "void OVSDB::OvsdbClientSession::MessageProcess(const u_int8_t*, std::size_t)") at assert.c:92
#3 0x00007f9d54a84c32 in __GI___assert_fail (assertion=0xde2ad5 "0", file=0xdf6b60 "controller/src/vnsw/agent/ovs_tor_agent/ovsdb_client/ovsdb_client_session.cc", line=70,
    function=0xdf6c40 "void OVSDB::OvsdbClientSession::MessageProcess(const u_int8_t*, std::size_t)") at assert.c:101
#4 0x0000000000914c4a in OVSDB::OvsdbClientSession::MessageProcess(unsigned char const*, unsigned long) ()
#5 0x00000000009154a5 in OVSDB::OvsdbClientSslSession::RecvMsg(unsigned char const*, unsigned long) ()
#6 0x0000000000cea191 in TcpMessageReader::OnRead(boost::asio::const_buffer) ()
#7 0x0000000000cca930 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, TcpSession, boost::asio::const_buffer>, boost::_bi::list2<boost::_bi::value<SslSession*>, boost::arg<1> > >, void, boost::asio::const_buffer>::invoke(boost::detail::function::function_buffer&, boost::asio::const_buffer) ()
#8 0x0000000000ccbb74 in SslSession::SslReader::Run() ()
#9 0x0000000000dab4e0 in TaskImpl::execute() ()
#10 0x00007f9d5565ab3a in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f9d55656816 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f9d55655f4b in ?? () from /usr/lib/libtbb.so.2
#13 0x00007f9d556520ff in ?? () from /usr/lib/libtbb.so.2
#14 0x00007f9d556522f9 in ?? () from /usr/lib/libtbb.so.2
#15 0x00007f9d55876182 in start_thread (arg=0x7f9d357f5700) at pthread_create.c:312
#16 0x00007f9d54b4f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Core will be in http://10.204.216.50/Docs/bugs/#

root@nodek3:/var/crashes# ls -ltr
total 248160
-rw------- 1 root root 178106368 May 23 21:16 core.contrail-tor-ag.23883.nodek3.1432396005
-rw------- 1 root root 157593600 May 23 21:17 core.contrail-tor-ag.15871.nodek3.1432396024
-rw------- 1 root root 157597696 May 23 21:17 core.contrail-tor-ag.17014.nodek3.1432396043
-rw------- 1 root root 153407488 May 23 23:49 core.contrail-tor-ag.2388.nodek3.1432405182
-rw------- 1 root root 157597696 May 24 01:36 core.contrail-tor-ag.2388.nodek3.1432411604
-rw------- 1 root root 149209088 May 24 02:17 core.contrail-tor-ag.12073.nodek3.1432414076
root@nodek3:/var/crashes#

root@nodek3:/var/crashes# ls -ltr
total 248160
-rw------- 1 root root 178106368 May 23 21:16 core.contrail-tor-ag.23883.nodek3.1432396005
-rw------- 1 root root 157593600 May 23 21:17 core.contrail-tor-ag.15871.nodek3.1432396024
-rw------- 1 root root 157597696 May 23 21:17 core.contrail-tor-ag.17014.nodek3.1432396043
-rw------- 1 root root 153407488 May 23 23:49 core.contrail-tor-ag.2388.nodek3.1432405182
-rw------- 1 root root 157597696 May 24 01:36 core.contrail-tor-ag.2388.nodek3.1432411604
-rw------- 1 root root 149209088 May 24 02:17 core.contrail-tor-ag.12073.nodek3.1432414076
root@nodek3:/var/crashes#

Tags: bms vrouter
Revision history for this message
Prabhjot Singh Sethi (prabhjot) wrote :

there is an error the received message causing parser to fail. currently we are asserting on parse failure.

ideally on parse failures we can close the session and let the session get re-established on next attempt instead of assertion.

Changed in juniperopenstack:
status: New → In Progress
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : master

Review in progress for https://review.opencontrail.org/10821
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R2.20

Review in progress for https://review.opencontrail.org/10822
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
Prabhjot Singh Sethi (prabhjot) wrote :

issue was seen because ovs-vswitchd was also trying to connect to ToR-Agent using the SSL certificates.

on reading the first message from this ovsdb-client in ToR-Agent fails to parse message and assert.

in usual case these failure should not happen, so will change assertion to connection close with the peer.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/10821
Committed: http://github.org/Juniper/contrail-controller/commit/e11dda764cece8c02d41bdfcb30ffa4cbbaf4025
Submitter: Zuul
Branch: master

commit e11dda764cece8c02d41bdfcb30ffa4cbbaf4025
Author: Prabhjot Singh Sethi <email address hidden>
Date: Tue May 26 12:47:40 2015 +0530

ToR Agent Crash - OVSDB parser failure

Issue:
------
If connection orignates from non-OVSDB-server, connection
gets established because of proper certificates use, but
message parsing in ToR Agent fails, currently we assert
for such scenario.

Fix:
----
Changing assertion with session closure.
Moving OVSDB SM event trace to separate buffer

Closes-Bug: 1458243
Change-Id: I500fa1a4bf5ef9440c9da5295d64b53a1ce207d7

Changed in juniperopenstack:
status: In Progress → Fix Committed
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/10822
Committed: http://github.org/Juniper/contrail-controller/commit/3489b664f62e59d2318d2980f6691129ec2172e4
Submitter: Zuul
Branch: R2.20

commit 3489b664f62e59d2318d2980f6691129ec2172e4
Author: Prabhjot Singh Sethi <email address hidden>
Date: Tue May 26 12:47:40 2015 +0530

ToR Agent Crash - OVSDB parser failure

Issue:
------
If connection orignates from non-OVSDB-server, connection
gets established because of proper certificates use, but
message parsing in ToR Agent fails, currently we assert
for such scenario.

Fix:
----
Changing assertion with session closure.
Moving OVSDB SM event trace to separate buffer

Closes-Bug: 1458243
Change-Id: I500fa1a4bf5ef9440c9da5295d64b53a1ce207d7
(cherry picked from commit e11dda764cece8c02d41bdfcb30ffa4cbbaf4025)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.