STX-O Master | memory threshold exceeded when app is applied
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Thales Elero Cervi |
Bug Description
Brief Description
-----------------
After applying the app STX-Openstack, the '100.103 memory threshold exceeded' alarm is displayed
Severity
--------
Major
Steps to Reproduce
------------------
-apply STX-Openstack app in a stx master deployment
Expected Behavior
------------------
app is applied without alarms
Actual Behavior
----------------
app is applied and 100.103 alarm is displayed
Reproducibility
---------------
Reproducible
System Configuration
-------
DX
Branch/Pull Time/Commit
-------
STX master 20240204T070002Z
STX Openstack 2024-01-31
Last Pass
---------
Nov-28 sanity report, the alarm wasn't reproduced there
Timestamp/Logs
--------------
alarms:
[sysadmin@
+------
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 100.103 | Memory threshold exceeded ; threshold 90.00%, actual 94.77% | host=controller-0. | critical | 2024-02-06T17: |
| | | memory=platform | | 10:28.506106 |
| | | | | |
+------
Test Activity
-------------
Sanity
Workaround
----------
N/A
tags: | added: stx.distro.openstack |
tags: | added: stx.9.0 |
Changed in starlingx: | |
importance: | Undecided → High |
Changed in starlingx: | |
assignee: | nobody → Thales Elero Cervi (tcervi) |
importance: | High → Medium |
Changed in starlingx: | |
importance: | Medium → High |
I used kube-memory to trace "openstack" namespace pods resources usage and got the following:
+------ -----+- ------- ------- ------- ----+ -----+- ------- ------- ------- ----+
Namespace | Resident Set Size (MiB) |
openstack | 5431.63 |
+------
The top pods on memory usage are:
keystone-api : ~560 (MiB)
maria-db-server : ~367 (MiB)
neutron-dhcp-agent : ~482 (MiB)
neutron-l3-agent : ~398 (MiB)
neutron-server : ~553 (MiB)
So I tried an old app tarball that I knew was not throwing such alarm (app built on 20231218T170059Z) and to my surprise the memory usage is pretty much the same (even a bit higher) and it DOES NOT triggers the memory threshold alarm:
+------ -----+- ------- ------- ------- ----+ -----+- ------- ------- ------- ----+
Namespace | Resident Set Size (MiB) |
openstack | 5509.31 |
+------
The top pods on memory usage are:
keystone-api : ~545 (MiB)
maria-db-server : ~384 (MiB)
neutron-dhcp-agent : ~476 (MiB)
neutron-l3-agent : ~397 (MiB)
neutron-server : ~554 (MiB)
Since the StarlingX ISO is the same on both tests (20240206T070059Z build), I will review the latest commits to stx/openstack- armada- app but I am unsure that something on the app have changed in order to cause such issue.
One point to note is that the majority of our docker images are currently pointing to "stable/2023.1" branches of OpenStack repos, and those are being patched with fixes cherry-pick eventually. One new (broken) image could be causing a memory usage peak... but since I do not see differences between the "latest" and the "20231218T170059Z" apps on what regards memory usage this is probably not our case here.