2019-04-08 22:22:48 |
Tee Ngo |
description |
Brief Description
-----------------
Shortly after the stx-openstack application is deployed in simplex, platform memory is rapidly
depleted leading to unresponsive system and finally OOM-induced reboot.
However, the platform memory appear to stabilize after a series of reboots
controller-0:~$ last reboot
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 09:52 - 19:44 (09:51)
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 09:13 - 19:44 (10:30)
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 07:42 - 19:44 (12:01)
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 06:19 - 19:44 (13:24)
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 04:53 - 06:17 (01:23)
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 04:02 - 06:17 (02:14)
controller-0:~$ uptime
19:44:11 up 9:51, 2 users, load average: 3.30, 3.01, 2.69
Severity
--------
Critical
Steps to Reproduce
------------------
Install, configure, unlock AIOSX and apply stx-openstack application
Expected Behavior
------------------
No memory alarms. Ideally, platform memory should stay below 50% to accomodate occasional/periodic
surges from audits, VM deployments/migrations and maintenance related activities.
Actual Behavior
----------------
Major memory alarms appear after stx-openstack app is applied. These alarms are shortly upgraded
to critical. Processes (e.g. kube-apiserver, mysqld, etc..) started getting randomly killed due to OOM.
Reproducibility
---------------
Reproducible
System Configuration
--------------------
One node system, http, IPv4. The number of nginx workers are likely split between the 2 controllers
in duplex configurations. It is highly likely that the memory alarm condition is also observable in AIODX.
Branch/Pull Time/Commit
-----------------------
BUILD_ID="20190406T203346Z"
JOB="STX_build_master_master"
BUILD_BY="starlingx.build@cengn.ca"
Last Pass
---------
The timeframe when this issue might be introduced is unknown.
Timestamp/Logs
--------------
See memory dumps attached
After osh-openstack-ingress chart was processed (around 2019-04-08 05:18:04 in sysinv.log), there were 72 nginx workers (refer to rss.dump) with average RSS value of 27791.
During the deployment of osh-openstack-mariadb chart, the number of workers jumped significantly to 144 (Mon Apr 8 05:18:25 in rss.dump) and again to 216 (Mon Apr 8 05:18:25 in rss.dump). The number of workers reached 220 before the first OOM-induced reboot.
Test Activity
-------------
Developer Testing |
Brief Description
-----------------
Shortly after the stx-openstack application is deployed in simplex, platform memory is rapidly depleted leading to unresponsive system and finally OOM-induced reboot.
However, the platform memory appears to stabilize after a series of reboots
controller-0:~$ last reboot
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 09:52 - 19:44 (09:51)
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 09:13 - 19:44 (10:30)
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 07:42 - 19:44 (12:01)
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 06:19 - 19:44 (13:24)
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 04:53 - 06:17 (01:23)
reboot system boot 3.10.0-957.1.3.e Mon Apr 8 04:02 - 06:17 (02:14)
controller-0:~$ uptime
19:44:11 up 9:51, 2 users, load average: 3.30, 3.01, 2.69
Severity
--------
Critical
Steps to Reproduce
------------------
Install, configure, unlock AIOSX and apply stx-openstack application
Expected Behavior
------------------
No memory alarms. Ideally, platform memory should stay below 50% to accomodate occasional/periodic surges from audits, VM deployments/migrations and maintenance related activities.
Actual Behavior
----------------
Major memory alarms appear after stx-openstack app is applied. These alarms are shortly upgraded to critical. Processes (e.g. kube-apiserver, mysqld, etc..) started getting randomly killed due to OOM.
Reproducibility
---------------
Reproducible
System Configuration
--------------------
One node system, http, IPv4. The number of nginx workers are likely split between the 2 controllers in duplex configurations. It is highly likely that the memory alarm condition is also observable in AIODX.
Branch/Pull Time/Commit
-----------------------
BUILD_ID="20190406T203346Z"
JOB="STX_build_master_master"
BUILD_BY="starlingx.build@cengn.ca"
Last Pass
---------
The timeframe when this issue might be introduced is unknown.
Timestamp/Logs
--------------
See memory dumps attached
After osh-openstack-ingress chart was processed (around 2019-04-08 05:18:04 in sysinv.log), there were 72 nginx workers (refer to rss.dump) with average RSS value of 27791.
During the deployment of osh-openstack-mariadb chart, the number of workers jumped significantly to 144 (Mon Apr 8 05:18:25 in rss.dump) and again to 216 (Mon Apr 8 05:18:25 in rss.dump). The number of workers reached 220 before the first OOM-induced reboot.
Test Activity
-------------
Developer Testing |
|