Activity log for bug #1770123

Date Who What changed Old value New value Message
2018-05-09 09:02:01 Parth Sarupria bug added bug
2018-05-09 09:02:01 Parth Sarupria attachment added valgrind memcheck report, referred in the description https://bugs.launchpad.net/bugs/1770123/+attachment/5136631/+files/valgrind_memcheck_contrailcollector_orig_without_structured_syslog_kafka.log
2018-05-09 09:02:54 Parth Sarupria bug added subscriber Sundaresan Rajangam
2018-05-09 09:03:53 Parth Sarupria bug added subscriber Zhiqiang Cui
2018-05-09 09:05:28 Parth Sarupria information type Proprietary Public
2018-05-09 09:06:44 Parth Sarupria description I have come across memory leak in contrail-collector referring to kaka_processor.cc (https://github.com/Juniper/contrail-analytics/blob/master/contrail-collector/kafka_processor.cc#L186). This is observed only after the restart of kafka in anayticsdb. I have attached report generated by valgrind memcheck tool stating the below two leaks. And at the end of this, are the steps which I used to reproduce the issue in two setups running CAN version 4.1.1.0-10. Can you please look into this? From valgrind_memcheck_contrailcollector_orig_without_structured_syslog_kafka.log, (attachment) First leak , ==10081== ==10081== 142 (72 direct, 70 indirect) bytes in 1 blocks are definitely lost in loss record 8,005 of 9,287 ==10081== at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==10081== by 0x549589: RdKafka::HandleImpl::metadata(bool, RdKafka::Topic const*, RdKafka::Metadata**, int) (in /usr/bin/contrail-collector) ==10081== by 0x7EFD34: KafkaProcessor::KafkaTimer() (kafka_processor.cc:406) ==10081== by 0x8003C0: boost::_mfi::mf0<bool, KafkaProcessor>::operator()(KafkaProcessor*) const (mem_fn_template.hpp:49) ==10081== by 0x7FFD7E: bool boost::_bi::list1<boost::_bi::value<KafkaProcessor*> >::operator()<bool, boost::_mfi::mf0<bool, KafkaProcessor>, boost::_bi::list0>(boost::_bi::type<bool>, boost::_mfi::mf0<bool, KafkaProcessor>&, boost::_bi::list0&, long) (bind.hpp:243) ==10081== by 0x7FF46A: boost::_bi::bind_t<bool, boost::_mfi::mf0<bool, KafkaProcessor>, boost::_bi::list1<boost::_bi::value<KafkaProcessor*> > >::operator()() (bind_template.hpp:20) ==10081== by 0x7FE59A: boost::detail::function::function_obj_invoker0<boost::_bi::bind_t<bool, boost::_mfi::mf0<bool, KafkaProcessor>, boost::_bi::list1<boost::_bi::value<KafkaProcessor*> > >, bool>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:132) ==10081== by 0x450387: boost::function0<bool>::operator()() const (function_template.hpp:767) ==10081== by 0x491C84: Timer::TimerTask::Run() (timer.cc:44) ==10081== by 0x46841B: TaskImpl::execute() (task.cc:277) ==10081== by 0x6F95B39: ??? (in /usr/lib/libtbb.so.2) ==10081== by 0x6F91815: ??? (in /usr/lib/libtbb.so.2) Second leak, ==10081== ==10081== 5,184 bytes in 39 blocks are definitely lost in loss record 9,076 of 9,287 ==10081== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==10081== by 0x4C2CF1F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==10081== by 0x53FF61: rd_list_grow (in /usr/bin/contrail-collector) ==10081== by 0x53FFE6: rd_list_init (in /usr/bin/contrail-collector) ==10081== by 0x545189: rd_kafka_metadata_leader_query_tmr_cb (in /usr/bin/contrail-collector) ==10081== by 0x503467: rd_kafka_timers_run (in /usr/bin/contrail-collector) ==10081== by 0x4E45A6: rd_kafka_thread_main (in /usr/bin/contrail-collector) ==10081== by 0x5405E6: _thrd_wrapper_function (in /usr/bin/contrail-collector) ==10081== by 0x6D5F183: start_thread (pthread_create.c:312) ==10081== by 0x848003C: clone (clone.S:111) ==10081== I could remove the first leak by deleting metadata pointer object at the end. I couldn’t understand much for the second leak. (though it involves rdkafka timer too) diff --git a/src/analytics/kafka_processor.cc b/src/analytics/kafka_processor.cc index 6b23d28..e760344 100644 --- a/src/analytics/kafka_processor.cc +++ b/src/analytics/kafka_processor.cc @@ -406,10 +406,14 @@ KafkaProcessor::KafkaTimer() { &metadata, 5000); if (err != RdKafka::ERR_NO_ERROR) { LOG(ERROR, "Failed to acquire metadata: " << RdKafka::err2str(err)); } else { LOG(ERROR, "Kafka Metadata Detected"); LOG(ERROR, "Metadata for " << metadata->orig_broker_id() << ":" << metadata->orig_broker_name()); if (collector_ && redis_up_) { LOG(ERROR, "Kafka Restarting Redis"); KafkaProcessor::KafkaTimer() { k_event_cb.disableKafka = false; } } + LOG(DEBUG, "Deleting metadata !!!"); + delete metadata; } } Steps used to reproduce the issue, 1. root@csp-ucpe-bglr153(analytics):/usr/bin# service contrail-collector stop 2. root@csp-ucpe-bglr153(analytics):/usr/bin# valgrind --leak-check=yes --leak-check=full --track-origins=yes --show-leak-kinds=all ./contrail-collector &> valgrind_memcheck_contrailcollector_orig_without_structured_syslog_kafka.log 3. check for contrail-status root@csp-ucpe-bglr152(analytics):/# contrail-status == Contrail Analytics == contrail-alarm-gen active contrail-analytics-api active contrail-analytics-nodemgr active contrail-collector inactive // this remains inactive as I have started contrail-collector using valgrind and not as a service. contrail-query-engine active contrail-snmp-collector active contrail-topology active 4. root@csp-ucpe-bglr153(analyticsdb):/# service kafka stop kafka: stopped root@csp-ucpe-bglr153(analyticsdb):/# service kafka start kafka: started wait for some time and stop the valgrind process, the report will be generated I have come across memory leak in contrail-collector referring to kaka_processor.cc (https://github.com/Juniper/contrail-analytics/blob/master/contrail-collector/kafka_processor.cc#L186). This is observed only after the restart of kafka in anayticsdb. I have attached report generated by valgrind memcheck tool stating the below two leaks. And at the end of this, are the steps which I used to reproduce the issue in two setups running CAN version 4.1.1.0-10. From valgrind_memcheck_contrailcollector_orig_without_structured_syslog_kafka.log, (attachment) First leak , ==10081== ==10081== 142 (72 direct, 70 indirect) bytes in 1 blocks are definitely lost in loss record 8,005 of 9,287 ==10081== at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==10081== by 0x549589: RdKafka::HandleImpl::metadata(bool, RdKafka::Topic const*, RdKafka::Metadata**, int) (in /usr/bin/contrail-collector) ==10081== by 0x7EFD34: KafkaProcessor::KafkaTimer() (kafka_processor.cc:406) ==10081== by 0x8003C0: boost::_mfi::mf0<bool, KafkaProcessor>::operator()(KafkaProcessor*) const (mem_fn_template.hpp:49) ==10081== by 0x7FFD7E: bool boost::_bi::list1<boost::_bi::value<KafkaProcessor*> >::operator()<bool, boost::_mfi::mf0<bool, KafkaProcessor>, boost::_bi::list0>(boost::_bi::type<bool>, boost::_mfi::mf0<bool, KafkaProcessor>&, boost::_bi::list0&, long) (bind.hpp:243) ==10081== by 0x7FF46A: boost::_bi::bind_t<bool, boost::_mfi::mf0<bool, KafkaProcessor>, boost::_bi::list1<boost::_bi::value<KafkaProcessor*> > >::operator()() (bind_template.hpp:20) ==10081== by 0x7FE59A: boost::detail::function::function_obj_invoker0<boost::_bi::bind_t<bool, boost::_mfi::mf0<bool, KafkaProcessor>, boost::_bi::list1<boost::_bi::value<KafkaProcessor*> > >, bool>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:132) ==10081== by 0x450387: boost::function0<bool>::operator()() const (function_template.hpp:767) ==10081== by 0x491C84: Timer::TimerTask::Run() (timer.cc:44) ==10081== by 0x46841B: TaskImpl::execute() (task.cc:277) ==10081== by 0x6F95B39: ??? (in /usr/lib/libtbb.so.2) ==10081== by 0x6F91815: ??? (in /usr/lib/libtbb.so.2) Second leak, ==10081== ==10081== 5,184 bytes in 39 blocks are definitely lost in loss record 9,076 of 9,287 ==10081== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==10081== by 0x4C2CF1F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==10081== by 0x53FF61: rd_list_grow (in /usr/bin/contrail-collector) ==10081== by 0x53FFE6: rd_list_init (in /usr/bin/contrail-collector) ==10081== by 0x545189: rd_kafka_metadata_leader_query_tmr_cb (in /usr/bin/contrail-collector) ==10081== by 0x503467: rd_kafka_timers_run (in /usr/bin/contrail-collector) ==10081== by 0x4E45A6: rd_kafka_thread_main (in /usr/bin/contrail-collector) ==10081== by 0x5405E6: _thrd_wrapper_function (in /usr/bin/contrail-collector) ==10081== by 0x6D5F183: start_thread (pthread_create.c:312) ==10081== by 0x848003C: clone (clone.S:111) ==10081== I could remove the first leak by deleting metadata pointer object at the end. I couldn’t understand much for the second leak. (though it involves rdkafka timer too) diff --git a/src/analytics/kafka_processor.cc b/src/analytics/kafka_processor.cc index 6b23d28..e760344 100644 --- a/src/analytics/kafka_processor.cc +++ b/src/analytics/kafka_processor.cc @@ -406,10 +406,14 @@ KafkaProcessor::KafkaTimer() {                                    &metadata, 5000);              if (err != RdKafka::ERR_NO_ERROR) {                  LOG(ERROR, "Failed to acquire metadata: " << RdKafka::err2str(err));              } else {                  LOG(ERROR, "Kafka Metadata Detected");                  LOG(ERROR, "Metadata for " << metadata->orig_broker_id() <<                      ":" << metadata->orig_broker_name());                  if (collector_ && redis_up_) {                      LOG(ERROR, "Kafka Restarting Redis");                      KafkaProcessor::KafkaTimer() {                       k_event_cb.disableKafka = false;                  }              } + LOG(DEBUG, "Deleting metadata !!!"); + delete metadata;          }      } Steps used to reproduce the issue, 1. root@csp-ucpe-bglr153(analytics):/usr/bin# service contrail-collector stop 2. root@csp-ucpe-bglr153(analytics):/usr/bin# valgrind --leak-check=yes --leak-check=full --track-origins=yes --show-leak-kinds=all ./contrail-collector &> valgrind_memcheck_contrailcollector_orig_without_structured_syslog_kafka.log 3. check for contrail-status                                 root@csp-ucpe-bglr152(analytics):/# contrail-status == Contrail Analytics ==                                 contrail-alarm-gen active contrail-analytics-api active contrail-analytics-nodemgr active contrail-collector inactive // this remains inactive as I have started contrail-collector using valgrind and not as a service. contrail-query-engine active contrail-snmp-collector active contrail-topology active 4. root@csp-ucpe-bglr153(analyticsdb):/# service kafka stop kafka: stopped root@csp-ucpe-bglr153(analyticsdb):/# service kafka start kafka: started wait for some time and stop the valgrind process, the report will be generated
2018-05-10 01:48:18 Jeba Paulaiyan nominated for series juniperopenstack/trunk
2018-05-10 01:48:18 Jeba Paulaiyan bug task added juniperopenstack/trunk
2018-05-10 01:48:26 Jeba Paulaiyan juniperopenstack/trunk: importance Undecided Medium
2018-05-10 01:48:33 Jeba Paulaiyan juniperopenstack/trunk: assignee Sundaresan Rajangam (srajanga)
2018-05-10 01:48:36 Jeba Paulaiyan juniperopenstack/trunk: milestone r5.1.0
2018-05-10 21:01:50 Sundaresan Rajangam juniperopenstack/trunk: assignee Sundaresan Rajangam (srajanga) Zhiqiang Cui (zcui)
2018-05-15 19:51:22 OpenContrail Admin juniperopenstack/trunk: status New In Progress
2018-05-22 17:24:18 Sundaresan Rajangam nominated for series juniperopenstack/r5.0
2018-05-22 17:24:18 Sundaresan Rajangam bug task added juniperopenstack/r5.0
2018-05-22 17:24:18 Sundaresan Rajangam nominated for series juniperopenstack/r4.0
2018-05-22 17:24:18 Sundaresan Rajangam bug task added juniperopenstack/r4.0
2018-05-22 17:24:18 Sundaresan Rajangam nominated for series juniperopenstack/r4.1
2018-05-22 17:24:18 Sundaresan Rajangam bug task added juniperopenstack/r4.1
2018-05-22 17:24:29 Sundaresan Rajangam juniperopenstack/r5.0: assignee Zhiqiang Cui (zcui)
2018-05-22 17:24:38 Sundaresan Rajangam juniperopenstack/r4.1: assignee Zhiqiang Cui (zcui)
2018-05-22 17:24:58 Sundaresan Rajangam juniperopenstack/r4.0: assignee Zhiqiang Cui (zcui)
2018-05-22 17:25:08 Sundaresan Rajangam juniperopenstack/r4.1: milestone r4.1.1.0
2018-05-22 17:25:22 Sundaresan Rajangam juniperopenstack/r5.0: importance Undecided Medium
2018-05-22 17:25:24 Sundaresan Rajangam juniperopenstack/r4.1: importance Undecided Medium
2018-05-22 17:25:26 Sundaresan Rajangam juniperopenstack/r4.0: importance Undecided Medium
2018-05-22 17:27:19 OpenContrail Admin juniperopenstack/r5.0: status New In Progress
2018-05-22 18:14:41 OpenContrail Admin juniperopenstack/trunk: status In Progress Fix Committed
2018-05-24 04:43:47 Jeba Paulaiyan juniperopenstack/r5.0: milestone r5.0.1
2018-05-24 04:43:49 Jeba Paulaiyan juniperopenstack/r4.1: milestone r4.1.1.0 r4.1.2.0
2018-05-24 04:43:52 Jeba Paulaiyan juniperopenstack/r4.0: milestone r4.0.4.0
2018-05-29 23:24:20 OpenContrail Admin juniperopenstack/r4.0: status New In Progress
2018-05-29 23:27:18 OpenContrail Admin juniperopenstack/r4.1: status New In Progress
2018-06-01 05:41:36 OpenContrail Admin juniperopenstack/r5.0: status In Progress Fix Committed
2018-06-01 21:24:43 OpenContrail Admin juniperopenstack/r4.1: status In Progress Fix Committed
2018-06-01 21:24:44 OpenContrail Admin juniperopenstack/r4.1: milestone r4.1.2.0 r4.1.1.0
2018-06-28 06:59:46 OpenContrail Admin juniperopenstack/r4.0: status In Progress Fix Committed
2018-06-28 06:59:48 OpenContrail Admin juniperopenstack/r4.0: milestone r4.0.4.0 r4.0.3.0