Platform Memory usage alarm calculation incorrect

Bug #1880605 reported by Eric MacDonald
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

Brief Description
-----------------
The memory plugin for collectd is (but should not) be monitoring, sampling nor alarming hugepage memory utilization.

Severity
--------
Major: Can result in host degrade for memory that is not associated with the platform.

Steps to Reproduce
------------------
Exceed 80% of hugepage memory

Expected Behavior
------------------
No platform memory alarm

Actual Behavior
----------------
Platform memory alarm is raised

Reproducibility
---------------
Reproducible: with over usage of hugepage memory

System Configuration
--------------------
System with hugepage memory

Branch/Pull Time/Commit
-----------------------
All recent loads in the last year.

Last Pass
---------
N/A

Timestamp/Logs
--------------
100.103 Platform Memory threshold exceeded ; threshold 80.00%, actual 81.82% host=worker-5.numa=node0_hugepage major 2020-05-20T23:25:36

100.103 Platform Memory threshold exceeded ; threshold 80.00%, actual 81.82% host=worker-3.numa=node1_hugepages major 2020-05-20T19:28:56

Test Activity
-------------
[Feature Testing, Regression Testing, Developer Testing]

Workaround
----------
none

Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to monitoring (master)

Fix proposed to branch: master
Review: https://review.opendev.org/730680

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to monitoring (master)

Reviewed: https://review.opendev.org/730680
Committed: https://git.openstack.org/cgit/starlingx/monitoring/commit/?id=f7437000c75ec15096b97e8c78ecc3c094239b7a
Submitter: Zuul
Branch: master

commit f7437000c75ec15096b97e8c78ecc3c094239b7a
Author: Eric MacDonald <email address hidden>
Date: Mon May 25 16:50:02 2020 -0400

    Platform Memory usage alarm calculation incorrect

    This update removes hugepage memory monitoring, sampling and
    alarming for over usage.

    Hugepage memory is only used by k8s pods or openstack vm's.
    Therefore its usage and alarming should not be tied to the
    platform.

    Change-Id: Iab8104ff56fdd641c058a4fdc587313cbeec9faf
    Closes-Bug: 1880605
    Signed-off-by: Eric MacDonald <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.4.0 stx.metal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.