controller crash in OpServerProxy.cc : reply->type != 6

Bug #1531712 reported by vageesan
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Won't Fix
High
Anish Mehta
R2.20.x
Fix Committed
High
Anish Mehta
R2.21.x
Fix Committed
High
Anish Mehta
R2.22.x
Fix Committed
High
Anish Mehta
R3.0
Fix Committed
High
Anish Mehta
Trunk
Fix Committed
High
Anish Mehta

Bug Description

2.21.1 #15.

core file is in 10.84.5.112:/auto/cs-shared/bugs/1531712/core.contrail-collec.11692.csol1-node3.1452045454

in traffic test ( high throughput/cps ) contrail-controller crash is seen with following backtrace.

(gdb) bt
#0 0x00007fb137625cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fb1376290d8 in __GI_abort () at abort.c:89
#2 0x00007fb13761eb86 in __assert_fail_base (
    fmt=0x7fb13776f830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x76b92c "reply->type != 6",
    file=file@entry=0x76ba00 "controller/src/analytics/OpServerProxy.cc", line=line@entry=357,
    function=function@entry=0x76c9e0 <OpServerProxy::OpServerImpl::processorCallbackProcess(redisAsyncContext const*, void*, void*)::__PRETTY_FUNCTION__> "void OpServerProxy::OpServerImpl::processorCallbackProcess(const redisAsyncContext*, void*, void*)") at assert.c:92
#3 0x00007fb13761ec32 in __GI___assert_fail (assertion=0x76b92c "reply->type != 6",
    file=0x76ba00 "controller/src/analytics/OpServerProxy.cc", line=357,
    function=0x76c9e0 <OpServerProxy::OpServerImpl::processorCallbackProcess(redisAsyncContext const*, void*, void*)::__PRETTY_FUNCTION__> "void OpServerProxy::OpServerImpl::processorCallbackProcess(const redisAsyncContext*, void*, void*)") at assert.c:101
#4 0x000000000052dbe7 in OpServerProxy::OpServerImpl::processorCallbackProcess (
    this=<optimized out>, c=<optimized out>, r=<optimized out>, privdata=<optimized out>)
    at controller/src/analytics/OpServerProxy.cc:357
#5 0x00000000004dad3a in operator() (a2=0x0, a1=0xe14e60, a0=0xcffcd0, this=0x7fff6f59d5d0)
    at /usr/include/boost/function/function_template.hpp:767
#6 RedisAsyncConnection::RAC_AsyncCmdCallback (c=0xcffcd0, r=0xe14e60, privdata=0x0)
    at controller/src/analytics/redis_connection.cc:239
#7 0x000000000074c853 in __redisRunCallback (cb=0x7fff6f59d6f0, cb=0x7fff6f59d6f0,
    reply=<optimized out>, ac=0xcffcd0) at build/third_party/hiredis/src/async.c:219
#8 redisProcessCallbacks (ac=0xcffcd0) at build/third_party/hiredis/src/async.c:417
#9 0x000000000074dc79 in redisBoostClient::handle_read (this=0xdb0ec0, ec=...)
    at build/third_party/hiredis/hiredis-boostasio-adapter/boostasio.cpp:62
---Type <return> to continue, or q <return> to quit---
#10 0x000000000074e344 in call<boost::shared_ptr<redisBoostClient>, boost::system::error_code> (
    b1=<synthetic pointer>, u=..., this=<optimized out>)
    at /usr/include/boost/bind/mem_fn_template.hpp:156
#11 operator()<boost::shared_ptr<redisBoostClient> > (a1=..., u=..., this=<optimized out>)
    at /usr/include/boost/bind/mem_fn_template.hpp:171
#12 operator()<boost::_mfi::mf1<void, redisBoostClient, boost::system::error_code>, boost::_bi::list2<const boost::system::error_code&, long unsigned int const&> > (a=<synthetic pointer>, f=...,
    this=<optimized out>) at /usr/include/boost/bind/bind.hpp:313
#13 operator()<boost::system::error_code, long unsigned int> (a2=<optimized out>, a1=...,
    this=<optimized out>) at /usr/include/boost/bind/bind_template.hpp:102
#14 operator() (this=<optimized out>) at /usr/include/boost/asio/detail/bind_handler.hpp:127
#15 asio_handler_invoke<boost::asio::detail::binder2<boost::_bi::bind_t<void, boost::_mfi::mf1<void, redisBoostClient, boost::system::error_code>, boost::_bi::list2<boost::_bi::value<boost::shared_ptr<redisBoostClient> >, boost::arg<1> (*)()> >, boost::system::error_code, unsigned long> > (function=...)
    at /usr/include/boost/asio/handler_invoke_hook.hpp:64
#16 invoke<boost::asio::detail::binder2<boost::_bi::bind_t<void, boost::_mfi::mf1<void, redisBoostClient, boost::system::error_code>, boost::_bi::list2<boost::_bi::value<boost::shared_ptr<redisBoostClient> >, boost::arg<1> (*)()> >, boost::system::error_code, unsigned long>, boost::_bi::bind_t<void, boost::_mfi::mf1<void, redisBoostClient, boost::system::error_code>, boost::_bi::list2<boost::_bi::value<boost::shared_ptr<redisBoostClient> >, boost::arg<1> (*)()> > > (context=..., function=...)
    at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#17 boost::asio::detail::reactive_null_buffers_op<boost::_bi::bind_t<void, boost::_mfi::mf1<void, redisBoostClient, boost::system::error_code>, boost::_bi::list2<boost::_bi::value<boost::shared_ptr<redisBoostClient> >, boost::arg<1> (*)()> > >::do_complete (owner=<optimized out>, base=<optimized out>)
    at /usr/include/boost/asio/detail/reactive_null_buffers_op.hpp:75
#18 0x000000000058d37f in complete (bytes_transferred=0, ec=..., owner=..., this=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /usr/include/boost/asio/detail/task_io_service_operation.hpp:37

#19 boost::asio::detail::epoll_reactor::descriptor_state::do_complete (owner=0xce3540,
    base=0xcfefc0, ec=..., bytes_transferred=<optimized out>)
    at /usr/include/boost/asio/detail/impl/epoll_reactor.ipp:651
#20 0x0000000000562777 in complete (bytes_transferred=5, ec=..., owner=..., this=0xcfefc0)
    at /usr/include/boost/asio/detail/task_io_service_operation.hpp:37
#21 do_run_one (ec=..., this_thread=..., lock=..., this=0xce3540)
    at /usr/include/boost/asio/detail/impl/task_io_service.ipp:384
#22 boost::asio::detail::task_io_service::run (this=0xce3540, ec=...)
    at /usr/include/boost/asio/detail/impl/task_io_service.ipp:153
#23 0x000000000057fa58 in run (this=0xccda00, ec=...)
    at /usr/include/boost/asio/impl/io_service.ipp:66
#24 EventManager::Run (this=0xccda00) at controller/src/io/event_manager.cc:32
#25 0x00000000004218cb in main (argc=<optimized out>, argv=<optimized out>)
    at controller/src/analytics/main.cc:412
(gdb)

Tags: analytics soln
vageesan (vageesant)
Changed in juniperopenstack:
milestone: none → r2.23
no longer affects: juniperopenstack
vageesan (vageesant)
description: updated
Revision history for this message
Anish Mehta (amehta00) wrote :

From the code, this is when this assert can happen:

            // If redis returns error for async request, then perhaps it
            // is busy executing a script and it has reached the maximum
            // execution time limit.
            assert(reply->type != REDIS_REPLY_ERROR);

This timeout for script execution is set to 15 seconds.

Jeba Paulaiyan (jebap)
information type: Proprietary → Public
Revision history for this message
Jeba Paulaiyan (jebap) wrote :

This crash is seen in 2.22.1-5 as well. Will copy the cores to /cs-shared/bugs/1531712-22.22.1-5

Revision history for this message
Raj Reddy (rajreddy) wrote :

discussed w/ Anish & Megh -- we will have to add instrumentation to see what command under what circumstances has taken more than 15s to complete.. Anish will commit the code and we will monitor..

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/18353
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/18355
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/18355
Committed: http://github.org/Juniper/contrail-controller/commit/ed7b118a064e5138fcbd2d41ab8b163260fac2fd
Submitter: Zuul
Branch: master

commit ed7b118a064e5138fcbd2d41ab8b163260fac2fd
Author: Anish Mehta <email address hidden>
Date: Fri Mar 11 16:36:04 2016 -0800

Extra logs are added to the lua scripts at the debug level
When lua script returns an error, we print the error string while asserting

Change-Id: I5ce6acfc1c7abcf98151b8558d1770f3859d21cf
Closes-Bug: 1531712

Changed in juniperopenstack:
milestone: none → r3.1.0.0-fcs
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/18416
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/18417
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20.x

Review in progress for https://review.opencontrail.org/18418
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/18417
Committed: http://github.org/Juniper/contrail-controller/commit/bb5ea4e3d65d7969c11268a8dec1157b7895c86c
Submitter: Zuul
Branch: R2.21.x

commit bb5ea4e3d65d7969c11268a8dec1157b7895c86c
Author: Anish Mehta <email address hidden>
Date: Fri Mar 11 16:36:04 2016 -0800

Extra logs are added to the lua scripts at the debug level
When lua script returns an error, we print the error string while asserting

Change-Id: I5ce6acfc1c7abcf98151b8558d1770f3859d21cf
Closes-Bug: 1531712

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/18418
Committed: http://github.org/Juniper/contrail-controller/commit/3f7a8d149fadc2b2b92ec00ad841794bf523e2da
Submitter: Zuul
Branch: R2.20.x

commit 3f7a8d149fadc2b2b92ec00ad841794bf523e2da
Author: Anish Mehta <email address hidden>
Date: Fri Mar 11 16:36:04 2016 -0800

Extra logs are added to the lua scripts at the debug level
When lua script returns an error, we print the error string while asserting

Change-Id: I5ce6acfc1c7abcf98151b8558d1770f3859d21cf
Closes-Bug: 1531712

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/18416
Committed: http://github.org/Juniper/contrail-controller/commit/38b2722f256445ca7387938e9a88dfc2c79662c1
Submitter: Zuul
Branch: R2.22.x

commit 38b2722f256445ca7387938e9a88dfc2c79662c1
Author: Anish Mehta <email address hidden>
Date: Fri Mar 11 16:36:04 2016 -0800

Extra logs are added to the lua scripts at the debug level
When lua script returns an error, we print the error string while asserting

Change-Id: I5ce6acfc1c7abcf98151b8558d1770f3859d21cf
Closes-Bug: 1531712

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/18545
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/18545
Committed: http://github.org/Juniper/contrail-controller/commit/1779c0eb891df0792f99ce7d29aac38a671aa2f8
Submitter: Zuul
Branch: R3.0

commit 1779c0eb891df0792f99ce7d29aac38a671aa2f8
Author: Anish Mehta <email address hidden>
Date: Fri Mar 11 16:36:04 2016 -0800

Extra logs are added to the lua scripts at the debug level
When lua script returns an error, we print the error string while asserting

Change-Id: I5ce6acfc1c7abcf98151b8558d1770f3859d21cf
Closes-Bug: 1531712

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.