Juniper Openstack

Healthcheck: agent crash @ InstanceTaskExecvp::ReadData

Bug #1533627 reported by Senthilnathan Murugappan on 2016-01-13

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Juniper Openstack	Status tracked in Trunk
	Trunk	Fix Committed	High	Prabhjot Singh Sethi	Juniper Openstack r3.0-fcs

Bug Description

Observed the below crash with 2696 kilo build. Box had one HTTP and one Ping healthcheck instance.
The core will be copied to /cs-shared/bugs/<bug_id>
(gdb) bt
#0 0x0000000000a33eb9 in close (ec=..., impl=..., this=0x313538333038365a) at /usr/include/boost/asio/detail/impl/reactive_descriptor_service.ipp:128
#1 close (ec=..., impl=..., this=0x3135383330383632) at /usr/include/boost/asio/posix/stream_descriptor_service.hpp:129
#2 close (ec=..., this=0x7fa5fc2ca4d8) at /usr/include/boost/asio/posix/basic_descriptor.hpp:222
#3 InstanceTaskExecvp::ReadData (this=0x7fa5fc2ca450, ec=..., read_bytes=<optimized out>) at controller/src/vnsw/agent/oper/instance_task.cc:34
#4 0x0000000000a35342 in operator() (a2=<optimized out>, a1=..., p=<optimized out>, this=0x7fff13d8c960) at /usr/include/boost/bind/mem_fn_template.hpp:280
#5 operator()<boost::_mfi::mf2<void, InstanceTaskExecvp, const boost::system::error_code&, long unsigned int>, boost::_bi::list2<const boost::system::error_code&, long unsigned int const&> > (a=<synthetic pointer>, f=..., this=0x7fff13d8c970) at /usr/include/boost/bind/bind.hpp:392
#6 operator()<boost::system::error_code, long unsigned int> (a2=@0x7fff13d8c988: 0, a1=..., this=0x7fff13d8c960) at /usr/include/boost/bind/bind_template.hpp:102
#7 operator() (this=0x7fff13d8c960) at /usr/include/boost/asio/detail/bind_handler.hpp:127
#8 asio_handler_invoke<boost::asio::detail::binder2<boost::_bi::bind_t<void, boost::_mfi::mf2<void, InstanceTaskExecvp, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<InstanceTaskExecvp*>, boost::arg<1> (*)(), boost::arg<2> (*)()> >, boost::system::error_code, unsigned long> > (function=...)
at /usr/include/boost/asio/handler_invoke_hook.hpp:64
#9 invoke<boost::asio::detail::binder2<boost::_bi::bind_t<void, boost::_mfi::mf2<void, InstanceTaskExecvp, boost::system::error_code const&, unsigned long>, boost::_b
i::list3<boost::_bi::value<InstanceTaskExecvp*>, boost::arg<1> (*)(), boost::arg<2> (*)()> >, boost::system::error_code, unsigned long>, boost::_bi::bind_t<void, boost
::_mfi::mf2<void, InstanceTaskExecvp, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<InstanceTaskExecvp*>, boost::arg<1> (*)(), boost::arg<2> (*)()> > > (context=..., function=...) at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#10 boost::asio::detail::descriptor_read_op<boost::asio::mutable_buffers_1, boost::_bi::bind_t<void, boost::_mfi::mf2<void, InstanceTaskExecvp, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<InstanceTaskExecvp*>, boost::arg<1> (*)(), boost::arg<2> (*)()> > >::do_complete (owner=0x2e6d190,
base=<optimized out>) at /usr/include/boost/asio/detail/descriptor_read_op.hpp:104
#11 0x00000000009d3e57 in complete (bytes_transferred=0, ec=..., owner=..., this=0x7fa5d404d420) at /usr/include/boost/asio/detail/task_io_service_operation.hpp:37
#12 do_run_one (ec=..., this_thread=..., lock=..., this=0x2e6d190) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:384
#13 boost::asio::detail::task_io_service::run (this=0x2e6d190, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:153
#14 0x000000000105ec51 in run (this=0x2e6d120, ec=...) at /usr/include/boost/asio/impl/io_service.ipp:66
#15 EventManager::RunWithExceptionHandling (this=0x2e6d120) at controller/src/io/event_manager.cc:51
#16 0x00000000007b437e in main (argc=<optimized out>, argv=0x7fff13d8d678) at controller/src/vnsw/agent/contrail/main.cc:115
(gdb)

Tags:

Revision history for this message

Senthilnathan Murugappan (msenthil) wrote on 2016-01-14:

Observing this frequently when a healthcheck instance is detached from the VMI

Revision history for this message

Prabhjot Singh Sethi (prabhjot) wrote on 2016-01-26:

issue happens due to parallel excess to health check instance from two threads

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2016-01-26: [Review update] master

Review in progress for https://review.opencontrail.org/16523
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2016-01-30: A change has been merged

Reviewed: https://review.opencontrail.org/16523
Committed: http://github.org/Juniper/contrail-controller/commit/4a4bb3d5b8af10b15a2c4776ea077778c51284bb
Submitter: Zuul
Branch: master

commit 4a4bb3d5b8af10b15a2c4776ea077778c51284bb
Author: Prabhjot Singh Sethi <email address hidden>
Date: Tue Jan 26 23:51:34 2016 +0530

Fix Healtcheck instance parallel access & cleanup

Issue:
------
Health check instance is getting access from asio and
DBtable task context causing race condition to access
object and delete it at the same time.

Fix:
----
- move operation for READ and EXIT to a new HealthCheck
task context which runs in exclusion with DBTable task
- move cleanup of instance from DBTable to HealthCheck
task context to put events in correct sequence
- instance holds reference to service object to assure
sanity of access till cleanup is complete

Closes-Bug: 1533627
Related-Bug: 1530539
Change-Id: I2880a2c21a8a642bd6612067be5b67ba02c88fe8

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.