pm-qos-mgr robustness: remove RPM from controller and make daemon restartable
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Al Bailey |
Bug Description
Brief Description
-----------------
The pm-qos-mgr package should only be installed on worker nodes. This package needs to be excluded from controllers. It is currently only excluded from storage. This is not a functional problem, but requires unnecessary patching on controllers if we ever have to update it. This should match the packaging of worker-utils.
The pom-qos-mgr process does not automatically restart if it is killed/stopped abnormally. The RPM spec needs to be modified to handle the case of process stopping abnormally, it should automatically get restarted by systemd.
Proposed changes:
Add "Restart=
Eg, Should look like this.
stx-config/
. . .
[Service]
Type=simple
ExecStart=
Restart=on-abnormal
Align with packaging done for worker-utils:, eg ./stx-metal :
./bsp-files/
./bsp-files/
Need to add the following:
./bsp-files/
The following already exists.
./bsp-files/
Severity
--------
Minor: System/Feature is usable with minor issue.
Steps to Reproduce
------------------
Install load on 2+2 Standard configuration.
On controller, observe this RPM exists when it should not.
rpm -qa|grep pm-qos-mgr
pm-qos-
sudo pkill -9 -f /usr/bin/pm-qos-mgr
After process dies, it does not get restarted with a new PID.
Expected Behavior
------------------
pm-qos-mgr RPM should only be installed on workers, including AIO controller,worker.
pm-qos-mgr process should restart if it is killed.
Actual Behavior
----------------
pm-qos-mgr is installed everywhere except storage, this includes controllers.
After process is killed, it does not get restarted with a new PID.
Reproducibility
---------------
100%
System Configuration
-------
Multi-node system (2+2 Standard)
Branch/Pull Time/Commit
-------
Branch and the time when code was pulled or git commit or cengn load info
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID=
SRC_BUILD_ID="10"
JOB="StarlingX_
BUILD_BY="jenkins"
BUILD_NUMBER="10"
BUILD_HOST=
BUILD_DATE=
Introduced with this:
stx-config
commit 76b1a7a16f536f1
Author: Jim Gauld <email address hidden>
Date: Tue May 28 16:34:12 2019 -0400
Introduce PM QoS cpu latency manager for kubelet
Last Pass
---------
n/a
Timestamp/Logs
--------------
n/a
Test Activity
-------------
Developer testing.
Changed in starlingx: | |
status: | Fix Committed → Fix Released |
Marking as stx.3.0 / medium priority - robustness/cleanup