heka_monitoring_filter out of memory

Bug #1545743 reported by Ivan Lozgachev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StackLight
Fix Released
High
guillaume thouvenin

Bug Description

The problem reproduced after this https://bugs.launchpad.net/lma-toolchain/+bug/1545739 but according to design there is no connectivity between these two crashes

From logs:
2016/02/15 13:55:47 Plugin 'heka_monitoring_filter' error: Terminated. Reason: process_message() not enough memory
2016/02/15 13:55:47 Plugin 'heka_monitoring_filter': stopped
2016/02/15 13:55:47 Plugin 'heka_monitoring_filter': has stopped, exiting plugin without shutting down.
So 10 minutes after restarting LMA the heka_monitoring_filter has been killed because it uses too much memory.

Design explanation:
"The problem is that we reached a limit in the size of memory used by a plugin filter. The heka_monitoring is becoming to big and the result is that no data are sent. It is the first effect. The second effect is that there is a bug in our code and we keep data that cannot be send and we add new ones. So the heka_monitoring filter is eating more and more memory. At some points it is killed by heka. So the fact the the filter ran out of memory is not linked to the issue with elasticsearch. It is another bug."

Problem observation:
Open Grafana "LMA self-monitoring" dashboard for any controller. Check "ENCODER PLUGINS" row. The collection of metrics there should be stopped after plugin crash.

Environment:
3 controllers
15 compute + ceph nodes
1 elasticsearch node
1 influxdb node

Revision history for this message
Ivan Lozgachev (ilozgachev) wrote :

This found on Fuel 8.0 build 552, LMA toolchain from origin/master

summary: - heka_monitoring_filter crash failure after Kibana failure
+ heka_monitoring_filter crash after Kibana failure
summary: - heka_monitoring_filter crash after Kibana failure
+ heka_monitoring_filter out of memory
Changed in lma-toolchain:
status: New → Confirmed
assignee: nobody → LMA-Toolchain Fuel Plugins (mos-lma-toolchain)
importance: Undecided → Medium
Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

Setting the importance to High since it will make troubleshooting and debugging harder if this isn't fixed.

Changed in lma-toolchain:
milestone: none → 0.9.0
importance: Medium → High
status: Confirmed → Triaged
Changed in lma-toolchain:
assignee: LMA-Toolchain Fuel Plugins (mos-lma-toolchain) → guillaume thouvenin (guillaume-thouvenin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-plugin-lma-collector (master)

Fix proposed to branch: master
Review: https://review.openstack.org/280783

Changed in lma-toolchain:
status: Triaged → In Progress
Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

I've opened another bug to track the root cause: https://bugs.launchpad.net/lma-toolchain/+bug/1546424

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-plugin-lma-collector (master)

Reviewed: https://review.openstack.org/280783
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=f0520cd46c7e413e2ca991fa3c979b3addac449c
Submitter: Jenkins
Branch: master

commit f0520cd46c7e413e2ca991fa3c979b3addac449c
Author: Guillaume Thouvenin <email address hidden>
Date: Tue Feb 16 16:53:48 2016 +0100

    Fix the OOM of heka monitoring filter

    This change resets the table that holds data for the heka monitoring
    filter. Otherwise the table may grow infinitely and the sandbox will
    eventually be killed by Heka.

    Change-Id: If8c07944e42700d913831b500466b33831a41482
    Partial-Bug: #1545743

Changed in lma-toolchain:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-plugin-lma-collector (stable/0.9)

Fix proposed to branch: stable/0.9
Review: https://review.openstack.org/281859

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-plugin-lma-collector (stable/0.9)

Reviewed: https://review.openstack.org/281859
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=82e85a4259d96c039987f824182380c7b14f584b
Submitter: Jenkins
Branch: stable/0.9

commit 82e85a4259d96c039987f824182380c7b14f584b
Author: Guillaume Thouvenin <email address hidden>
Date: Tue Feb 16 16:53:48 2016 +0100

    Fix the OOM of heka monitoring filter

    This change resets the table that holds data for the heka monitoring
    filter. Otherwise the table may grow infinitely and the sandbox will
    eventually be killed by Heka.

    Change-Id: If8c07944e42700d913831b500466b33831a41482
    Partial-Bug: #1545743
    (cherry picked from commit f0520cd46c7e413e2ca991fa3c979b3addac449c)

Changed in lma-toolchain:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.