StackLight

heka_monitoring_filter out of memory

Bug #1545743 reported by Ivan Lozgachev on 2016-02-15

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StackLight	Fix Released	High	guillaume thouvenin	StackLight 0.9.0

Bug Description

The problem reproduced after this https://bugs.launchpad.net/lma-toolchain/+bug/1545739 but according to design there is no connectivity between these two crashes

From logs:
2016/02/15 13:55:47 Plugin 'heka_monitoring_filter' error: Terminated. Reason: process_message() not enough memory
2016/02/15 13:55:47 Plugin 'heka_monitoring_filter': stopped
2016/02/15 13:55:47 Plugin 'heka_monitoring_filter': has stopped, exiting plugin without shutting down.
So 10 minutes after restarting LMA the heka_monitoring_filter has been killed because it uses too much memory.

Design explanation:
"The problem is that we reached a limit in the size of memory used by a plugin filter. The heka_monitoring is becoming to big and the result is that no data are sent. It is the first effect. The second effect is that there is a bug in our code and we keep data that cannot be send and we add new ones. So the heka_monitoring filter is eating more and more memory. At some points it is killed by heka. So the fact the the filter ran out of memory is not linked to the issue with elasticsearch. It is another bug."

Problem observation:
Open Grafana "LMA self-monitoring" dashboard for any controller. Check "ENCODER PLUGINS" row. The collection of metrics there should be stopped after plugin crash.

Environment:
3 controllers
15 compute + ceph nodes
1 elasticsearch node
1 influxdb node

Revision history for this message

Ivan Lozgachev (ilozgachev) wrote on 2016-02-15:

This found on Fuel 8.0 build 552, LMA toolchain from origin/master

summary:	- heka_monitoring_filter crash failure after Kibana failure + heka_monitoring_filter crash after Kibana failure
summary:	- heka_monitoring_filter crash after Kibana failure + heka_monitoring_filter out of memory

guillaume thouvenin (guillaume-thouvenin) on 2016-02-15

Changed in lma-toolchain:
status:	New → Confirmed
assignee:	nobody → LMA-Toolchain Fuel Plugins (mos-lma-toolchain)
importance:	Undecided → Medium

Revision history for this message

Simon Pasquier (simon-pasquier) wrote on 2016-02-15:

Setting the importance to High since it will make troubleshooting and debugging harder if this isn't fixed.

Changed in lma-toolchain:
milestone:	none → 0.9.0
importance:	Medium → High
status:	Confirmed → Triaged

guillaume thouvenin (guillaume-thouvenin) on 2016-02-16

Changed in lma-toolchain:
assignee:	LMA-Toolchain Fuel Plugins (mos-lma-toolchain) → guillaume thouvenin (guillaume-thouvenin)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-16: Fix proposed to fuel-plugin-lma-collector (master)

Fix proposed to branch: master
Review: https://review.openstack.org/280783

Changed in lma-toolchain:
status:	Triaged → In Progress

Revision history for this message

Simon Pasquier (simon-pasquier) wrote on 2016-02-17:

I've opened another bug to track the root cause: https://bugs.launchpad.net/lma-toolchain/+bug/1546424

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-18: Fix merged to fuel-plugin-lma-collector (master)

Reviewed: https://review.openstack.org/280783
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=f0520cd46c7e413e2ca991fa3c979b3addac449c
Submitter: Jenkins
Branch: master

commit f0520cd46c7e413e2ca991fa3c979b3addac449c
Author: Guillaume Thouvenin <email address hidden>
Date: Tue Feb 16 16:53:48 2016 +0100

Fix the OOM of heka monitoring filter

    This change resets the table that holds data for the heka monitoring
    filter. Otherwise the table may grow infinitely and the sandbox will
    eventually be killed by Heka.

Change-Id: If8c07944e42700d913831b500466b33831a41482
Partial-Bug: #1545743

guillaume thouvenin (guillaume-thouvenin) on 2016-02-18

Changed in lma-toolchain:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-18: Fix proposed to fuel-plugin-lma-collector (stable/0.9)

Fix proposed to branch: stable/0.9
Review: https://review.openstack.org/281859

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-18: Fix merged to fuel-plugin-lma-collector (stable/0.9)

Reviewed: https://review.openstack.org/281859
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=82e85a4259d96c039987f824182380c7b14f584b
Submitter: Jenkins
Branch: stable/0.9

commit 82e85a4259d96c039987f824182380c7b14f584b
Author: Guillaume Thouvenin <email address hidden>
Date: Tue Feb 16 16:53:48 2016 +0100

Fix the OOM of heka monitoring filter

    This change resets the table that holds data for the heka monitoring
    filter. Otherwise the table may grow infinitely and the sandbox will
    eventually be killed by Heka.

    Change-Id: If8c07944e42700d913831b500466b33831a41482
    Partial-Bug: #1545743
    (cherry picked from commit f0520cd46c7e413e2ca991fa3c979b3addac449c)

Simon Pasquier (simon-pasquier) on 2016-04-28

Changed in lma-toolchain:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.