Anon memory overage is not being alarmed

Bug #2000251 reported by Eric MacDonald
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Eric MacDonald

Bug Description

Brief Description
-----------------
The starlingX collectd memory monitoring plugin is no longer alarming Anon memory overage due to this previous commit.

https://opendev.org/starlingx/monitoring/commit/fcc8ddda66b507e747a6e5f32c2300b84e4f7ad6

The Anon (Anonymous) memory 'val.type' dispatched also needs to be changed from 'percent' to 'memory' like the platform memory was in that commit. See work around below.

Severity
--------
Minor: Anon memory overage is not alarmed

Steps to Reproduce
------------------
Consume more than 80% or Anon memory

Expected Behavior
------------------
Alarm should get raised after 3 minutes

Actual Behavior
----------------
No alarm is raised

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
Any pull after Aug 24, 2021

Last Pass
---------
Prior to Aug 24, 2021

Timestamp/Logs
--------------
Logged but not alarmed

controller-0 collectd[95219]: info 4K memory usage: Anon: 95.2%, Anon: 56605.9 MiB, cgroup-rss: 56165.2 MiB, Avail: 2833.5 MiB, Total: 59439.5 MiB

controller-0 collectd[95219]: info 4K numa memory usage: node0, Anon: 92.16%, Anon: 56605.9 MiB, Avail: 4812.6 MiB, Total: 61418.6 MiB

Test Activity
-------------
Normal Use

Workaround
----------
Change the following lines from 'percent' to 'memory' on system and restart collectd

https://opendev.org/starlingx/monitoring/src/branch/master/collectd-extensions/src/memory.py#L878
https://opendev.org/starlingx/monitoring/src/branch/master/collectd-extensions/src/memory.py#L887

on system its

/opt/collectd/extensions/python/memory.py

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to monitoring (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/monitoring/+/868322

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to monitoring (master)

Reviewed: https://review.opendev.org/c/starlingx/monitoring/+/868322
Committed: https://opendev.org/starlingx/monitoring/commit/4fad452db5a55bddc7fb1d663fdbac01ff2f648d
Submitter: "Zuul (22348)"
Branch: master

commit 4fad452db5a55bddc7fb1d663fdbac01ff2f648d
Author: Eric MacDonald <email address hidden>
Date: Wed Dec 21 09:13:37 2022 -0500

    Enable Anon memory alarming

    The starlingX collectd memory monitoring plugin is no longer
    alarming Anon memory overage due to this previous commit.

    https://opendev.org/
    starlingx/monitoring/commit/fcc8ddda66b507e747a6e5f32c2300b84e4f7ad6

    The Anon (Anonymous) memory 'val.type' dispatched also needs
    to be changed from 'percent' to 'memory' like the platform
    memory was in that commit so that the reading notification
    is sent to the fm_notifier which manages alarm and degrade.

    Test Plan: for both total and numa nodes

    PASS: Verify Anon memory major alarming and clear
    PASS: Verify Anon memory critical alarming, degrade and clear
    PASS: Verify Anon memory alarms/degrade clear over collectd restart
    PASS: Verify Anon memory degrade handling over multiple alarm
          severity threshold assertion/clear changes across different
          eids. Test for stuck degrade case.

    Closes-Bug: 2000251
    Signed-off-by: Eric MacDonald <email address hidden>
    Change-Id: I7c436a64886ecb619d2db751a1f92f2ffb1c4e9b

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
importance: Undecided → Low
tags: added: stx.8.0 stx.monitor
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.