collector crash @ RedisProcessorExec::SyncDeleteUVEs

Bug #1648601 reported by Vinoth Kannan Ganapathy
38
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.1
Fix Committed
High
Sundaresan Rajangam
R3.2
Fix Committed
High
Sundaresan Rajangam
Trunk
Fix Committed
High
Sundaresan Rajangam

Bug Description

Contrail version 3.2, build 8

Multi node HA setup, where openstack, contrail, collector and database on each separate node respectively.

seen collector crash @ RedisProcessorExec::SyncDeleteUVEs
Also so many core file got generated..

root@5b8s30-vm3:~# contrail-status
== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen:0 active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector failed
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

========Run time service failures=============
/var/crashes/core.contrail-collec.13777.5b8s30-vm3.1481220943
/var/crashes/core.contrail-collec.21972.5b8s30-vm3.1481220968
/var/crashes/core.contrail-collec.19114.5b8s30-vm3.1481221224
/var/crashes/core.contrail-collec.19207.5b8s30-vm3.1481221228
/var/crashes/core.contrail-collec.19071.5b8s30-vm3.1481221169
/var/crashes/core.contrail-collec.19226.5b8s30-vm3.1481221231
/var/crashes/core.contrail-collec.18995.5b8s30-vm3.1481221127
/var/crashes/core.contrail-collec.25292.5b8s30-vm3.1481220978
/var/crashes/core.contrail-collec.17522.5b8s30-vm3.1481220952
/var/crashes/core.contrail-collec.18975.5b8s30-vm3.1481221123
/var/crashes/core.contrail-collec.20898.5b8s30-vm3.1481220960
/var/crashes/core.contrail-collec.18955.5b8s30-vm3.1481221120
/var/crashes/core.contrail-collec.19264.5b8s30-vm3.1481221240
/var/crashes/core.contrail-collec.19595.5b8s30-vm3.1481220956
/var/crashes/core.contrail-collec.19091.5b8s30-vm3.1481221171
/var/crashes/core.contrail-collec.24636.5b8s30-vm3.1481220975
/var/crashes/core.contrail-collec.19014.5b8s30-vm3.1481221164
/var/crashes/core.contrail-collec.19245.5b8s30-vm3.1481221235
/var/crashes/core.contrail-collec.23392.5b8s30-vm3.1481220973
/var/crashes/core.contrail-collec.18752.5b8s30-vm3.1481221118

(gdb) bt
#0 0x00007f129410ecc9 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f12941120d8 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f1294107b86 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f1294107c32 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00000000005af30d in RedisProcessorExec::SyncDeleteUVEs (redis_ip=..., redis_port=<optimized out>, redis_password="", source="5b8s35",
    node_type="Compute", module="contrail-vrouter-agent", instance_id="0", delReply=std::vector of length 0, capacity 0)
    at controller/src/analytics/redis_processor_vizd.cc:229
#5 0x00000000005fb3ed in OpServerProxy::DeleteUVEs (this=0x1b592b0, source="5b8s35", module="contrail-vrouter-agent", node_type="Compute", instance_id="0")
    at controller/src/analytics/OpServerProxy.cc:826
#6 0x00000000005a1e31 in SandeshGenerator::DisconnectSession (this=this@entry=0x7f12740f40a0, vsession=vsession@entry=0x1c27820)
    at controller/src/analytics/generator.cc:175
#7 0x00000000005903dd in Collector::ReceiveSandeshCtrlMsg (this=0x1b67f60, state_machine=<optimized out>, session=<optimized out>, sandesh=<optimized out>)
    at controller/src/analytics/collector.cc:285
#8 0x0000000000740ff0 in SandeshServerConnection::ProcessSandeshCtrlMessage (this=this@entry=0x1c4fa60,
    msg="<SandeshHeader><Namespace type=\"string\" identifier=\"1\"></Namespace><Timestamp type=\"i64\" identifier=\"2\">1481220943296491</Timestamp><Module type=\"string\" identifier=\"3\">contrail-vrouter-agent</Module>"..., header=..., sandesh_name="SandeshCtrlClientToServer",
    header_offset=header_offset@entry=734) at tools/sandesh/library/cpp/sandesh_connection.cc:172
#9 0x000000000073fd7e in ssm::ServerInit::react (this=this@entry=0x7f126010dee0, event=...) at tools/sandesh/library/cpp/sandesh_state_machine.cc:303
#10 0x00000000007404e5 in react<ssm::ServerInit, boost::statechart::event_base, void const*> (eventType=<synthetic pointer>, evt=..., stt=...)
    at /usr/include/boost/statechart/custom_reaction.hpp:42
#11 local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> >, boost::statechart::simple_state<ssm::ServerInit, SandeshStateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (
    eventType=0xb8e140 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., stt=...)
    at /usr/include/boost/statechart/simple_state.hpp:816
#12 local_react<boost::mpl::list3<boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> > > (
    eventType=0xb8e140 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., this=0x7f126010dee0)
    at /usr/include/boost/statechart/simple_state.hpp:851
#13 local_react_impl<boost::mpl::list4<boost::statechart::custom_reaction<ssm::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> >, boost::statechart::simple_state<ssm::ServerInit, SandeshStateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (eventType=<optimized out>, evt=..., stt=...) at /usr/include/boost/statechart/simple_state.hpp:820
#14 local_react<boost::mpl::list4<boost::statechart::custom_reaction<ssm::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> > > (eventType=<optimized out>, evt=..., this=<optimized out>) at /usr/include/boost/statechart/simple_state.hpp:851
#15 local_react_impl<boost::mpl::list<boost::statechart::transition<ssm::EvStop, ssm::Idle, SandeshStateMachine, &SandeshStateMachine::OnIdle>, boost::statechart::custom_reaction<ssm::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageRecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> >, boost::statechart::simple_state<ssm::ServerInit, SandeshStateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (
    eventType=0xb8e140 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., stt=...)
    at /usr/include/boost/statechart/simple_state.hpp:820
#16 local_react<boost::mpl::list<boost::statechart::transition<ssm::EvStop, ssm::Idle, SandeshStateMachine, &SandeshStateMachine::OnIdle>, boost::statechart::custom_reaction<ssm::EvTcpClose>, boost::statechart::custom_reaction<ssm::EvSandeshCtrlMessageRecv>, boost::statechart::custom_reaction<ssm::EvSandeshMessageR---Type <return> to continue, or q <return> to quit---
ecv>, boost::statechart::in_state_reaction<ssm::EvTcpDeleteSession, SandeshStateMachine, &SandeshStateMachine::DeleteTcpSession> > > (
    eventType=0xb8e140 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>, evt=..., this=0x7f126010dee0)
    at /usr/include/boost/statechart/simple_state.hpp:851
#17 boost::statechart::simple_state<ssm::ServerInit, SandeshStateMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl (this=0x7f126010dee0, evt=..., eventType=0xb8e140 <boost::statechart::detail::id_holder<ssm::EvSandeshCtrlMessageRecv>::idProvider_>)
    at /usr/include/boost/statechart/simple_state.hpp:489
#18 0x000000000073e07b in operator() (this=<synthetic pointer>) at /usr/include/boost/statechart/state_machine.hpp:87
#19 operator()<boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, void const*>, boost::statechart::state_machine<SandeshStateMachine, ssm::Idle>::exception_event_handler> (this=0x1c2b9c8,
    action=...) at /usr/include/boost/statechart/null_exception_translator.hpp:33
#20 boost::statechart::state_machine<SandeshStateMachine, ssm::Idle, std::allocator<void>, boost::statechart::null_exception_translator>::send_event (
    this=0x1c2b970, evt=...) at /usr/include/boost/statechart/state_machine.hpp:889
#21 0x0000000000735fb5 in process_event (evt=..., this=0x1c2b970) at /usr/include/boost/statechart/state_machine.hpp:275
#22 SandeshStateMachine::DequeueEvent (this=0x1c2b970, ec=...) at tools/sandesh/library/cpp/sandesh_state_machine.cc:800
#23 0x000000000073d1d7 in operator() (a0=<error reading variable: access outside bounds of object referenced via synthetic pointer>, this=0x7f128970de90)
    at /usr/include/boost/function/function_template.hpp:767
#24 QueueTaskRunner<SandeshStateMachine::EventContainer, WorkQueue<SandeshStateMachine::EventContainer> >::RunQueue (this=0x7f1278226700)
    at controller/src/base/queue_task.h:92
#25 0x000000000046a2bf in TaskImpl::execute (this=0x7f128d10b340) at controller/src/base/task.cc:262
#26 0x00007f1295699b3a in ?? () from /usr/lib/libtbb.so.2
#27 0x00007f1295695816 in ?? () from /usr/lib/libtbb.so.2
#28 0x00007f1295694f4b in ?? () from /usr/lib/libtbb.so.2
#29 0x00007f12956910ff in ?? () from /usr/lib/libtbb.so.2
#30 0x00007f12956912f9 in ?? () from /usr/lib/libtbb.so.2
#31 0x00007f12958b5182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#32 0x00007f12941d247d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Revision history for this message
Vinoth Kannan Ganapathy (vganapathy) wrote :

cores and logs are copied to the below location

vganapathy@ubuntu-build04:/cs-shared/bugs/1648601$ pwd
/cs-shared/bugs/1648601
vganapathy@ubuntu-build04:/cs-shared/bugs/1648601$ ls -altr
total 218860
drwxrwxrwx 278 root root 24576 Dec 8 11:52 ..
-rw------- 1 vganapathy support1 216436736 Dec 8 11:52 core.contrail-collec.13777.5b8s30-vm3.1481220943
-rw-r----- 1 vganapathy support1 1109079 Dec 8 11:55 redis-server.log
-rw-r--r-- 1 vganapathy support1 269155 Dec 8 11:55 contrail-collector.log
-rw-r--r-- 1 vganapathy support1 1048798 Dec 8 11:55 contrail-collector.log.1
-rw-r--r-- 1 vganapathy support1 1048789 Dec 8 11:55 contrail-collector.log.2
-rw-r--r-- 1 vganapathy support1 1048605 Dec 8 11:55 contrail-collector.log.3
-rw-r--r-- 1 vganapathy support1 1048976 Dec 8 11:55 contrail-collector.log.4
-rw-r--r-- 1 vganapathy support1 1050774 Dec 8 11:55 contrail-collector.log.5
drwxr-xr-x 2 vganapathy support1 4096 Dec 8 11:55 .
-rw-r--r-- 1 vganapathy support1 83624 Dec 8 11:55 contrail-collector-stdout.log
vganapathy@ubuntu-build04:/cs-shared/bugs/1648601$

description: updated
Raj Reddy (rajreddy)
Changed in juniperopenstack:
milestone: r3.2.0.0-fcs → none
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/27057
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/27058
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/27059
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/27060
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/27058
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/734b540aa33b4f55c5a542cb3acf06050cd0d576
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 734b540aa33b4f55c5a542cb3acf06050cd0d576
Author: Sundaresan Rajangam <email address hidden>
Date: Thu Dec 8 18:26:24 2016 -0800

Remove tcp-check connect option for redis on haproxy

tcp-check connect option for redis on haproxy.cfg causes the client
connections in the redis-server to grow continuously and reaches the max
limit resulting in connection failure/response error for requests from
collector and other analytics services to redis.

Change-Id: If088ba40e7f0bc420a753ec11bca9dd081ffb160
Partial-Bug: #1648601
(cherry picked from commit 32388a96b848629bf8f4b7d7ea832fca8c3dccd9)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/27057
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/32388a96b848629bf8f4b7d7ea832fca8c3dccd9
Submitter: Zuul (<email address hidden>)
Branch: master

commit 32388a96b848629bf8f4b7d7ea832fca8c3dccd9
Author: Sundaresan Rajangam <email address hidden>
Date: Thu Dec 8 18:26:24 2016 -0800

Remove tcp-check connect option for redis on haproxy

tcp-check connect option for redis on haproxy.cfg causes the client
connections in the redis-server to grow continuously and reaches the max
limit resulting in connection failure/response error for requests from
collector and other analytics services to redis.

Change-Id: If088ba40e7f0bc420a753ec11bca9dd081ffb160
Partial-Bug: #1648601

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/27059
Committed: http://github.org/Juniper/contrail-puppet/commit/08cc8fc04c4a521826d82b919ff1e3142be8f4ac
Submitter: Zuul (<email address hidden>)
Branch: master

commit 08cc8fc04c4a521826d82b919ff1e3142be8f4ac
Author: Sundaresan Rajangam <email address hidden>
Date: Thu Dec 8 18:43:44 2016 -0800

Remove tcp-check connect option for redis on haproxy

tcp-check connect option for redis on haproxy.cfg causes the client
connections in the redis-server to grow continuously and reaches the max
limit resulting in connection failure/response error for requests from
collector and other analytics services to redis.

Change-Id: Id459548fd69b6f87a1514f2bb38a039bbbec256b
Closes-Bug: #1648601

Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :

This is happening again with the latest build 4.0.0.0-3043.

Files in /cs-shared/bugs/1648601/again/files.tgz

Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f59be6ebc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f59be6ebc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f59be6ef028 in __GI_abort () at abort.c:89
#2 0x00007f59be6e4bf6 in __assert_fail_base (fmt=0x7f59be8353b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f3eee "0",
    file=file@entry=0x813318 "controller/src/analytics/redis_processor_vizd.cc", line=line@entry=254,
    function=function@entry=0x8133e0 "static bool RedisProcessorExec::SyncDeleteUVEs(const string&, short unsigned int, const string&, const string&, const string&, const string&, const string&, std::vector<std::pair<std::basic_string<cha"...) at assert.c:92
#3 0x00007f59be6e4ca2 in __GI___assert_fail (assertion=0x7f3eee "0", file=0x813318 "controller/src/analytics/redis_processor_vizd.cc", line=254,
    function=0x8133e0 "static bool RedisProcessorExec::SyncDeleteUVEs(const string&, short unsigned int, const string&, const string&, const string&, const string&, const string&, std::vector<std::pair<std::basic_string<cha"...) at assert.c:101
#4 0x00000000005ba55d in ?? ()
#5 0x0000000000618dfd in ?? ()
#6 0x00000000005ad1e1 in ?? ()
#7 0x000000000059ca7f in ?? ()
#8 0x00000000007545dd in ?? ()
#9 0x0000000000751200 in ?? ()
#10 0x0000000000751528 in ?? ()
#11 0x000000000075100b in ?? ()
#12 0x0000000000749a85 in ?? ()
#13 0x00000000007504a5 in ?? ()
#14 0x0000000000473e2f in ?? ()
#15 0x00007f59bfc73b3a in ?? () from /usr/lib/libtbb.so.2
#16 0x00007f59bfc6f816 in ?? () from /usr/lib/libtbb.so.2
#17 0x00007f59bfc6ef4b in ?? () from /usr/lib/libtbb.so.2
#18 0x00007f59bfc6b0ff in ?? () from /usr/lib/libtbb.so.2
#19 0x00007f59bfc6b2f9 in ?? () from /usr/lib/libtbb.so.2
#20 0x00007f59bfe8f184 in start_thread (arg=0x7f599cff3700) at pthread_create.c:312
#21 0x00007f59be7af37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Revision history for this message
Arvind (arvindv) wrote :

Ananth, the TB points to a different code path, so opened a new bug,
https://bugs.launchpad.net/juniperopenstack/+bug/1670908

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/29932
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/29933
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/29933
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/271c4c2fadb367edc6e9ec33fa1c13d83724d511
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 271c4c2fadb367edc6e9ec33fa1c13d83724d511
Author: Sundaresan Rajangam <email address hidden>
Date: Thu Dec 8 18:26:24 2016 -0800

Remove tcp-check connect option for redis on haproxy

tcp-check connect option for redis on haproxy.cfg causes the client
connections in the redis-server to grow continuously and reaches the max
limit resulting in connection failure/response error for requests from
collector and other analytics services to redis.

Change-Id: If088ba40e7f0bc420a753ec11bca9dd081ffb160
Partial-Bug: #1648601
(cherry picked from commit 32388a96b848629bf8f4b7d7ea832fca8c3dccd9)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/29932
Committed: http://github.org/Juniper/contrail-puppet/commit/9a28809941716efdf416cf4bb576d7a6cdb4d35f
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 9a28809941716efdf416cf4bb576d7a6cdb4d35f
Author: Sundaresan Rajangam <email address hidden>
Date: Thu Dec 8 18:43:44 2016 -0800

Remove tcp-check connect option for redis on haproxy

tcp-check connect option for redis on haproxy.cfg causes the client
connections in the redis-server to grow continuously and reaches the max
limit resulting in connection failure/response error for requests from
collector and other analytics services to redis.

Change-Id: Id459548fd69b6f87a1514f2bb38a039bbbec256b
Closes-Bug: #1648601
(cherry picked from commit 08cc8fc04c4a521826d82b919ff1e3142be8f4ac)

Jim Reilly (jpreilly)
tags: added: att-aic-contrail
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.