k8s tmp files are cleared every 10 days causing config failures
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Andy |
Bug Description
Brief Description
-----------------
The starlingx config code calls the k8s python client to perform a number of operations. The k8s python client creates a file under /tmp and continues to use this tmp file for the life-cycle of the sysinv-conductor process. After 10 days, sysinv starts to fail with an error that the tmp file is no longer there. There is a cleanup service in starlingx/centos that which runs daily and removes /tmp files which are not in use for 10 days.
This is a known issue with k8s:
https:/
The best option is to use a different location other than /tmp to keep these files. This is required for any starlingx process that calls the k8s python client. Keeping the files in /var/run is a good option.
Severity
--------
Major - sysinv/config cmds will start failing after the system is up for 10 days w/o any controller swact
Steps to Reproduce
------------------
- Leave a system up for more than 10 days
- Attempt to make a config change -- For example: updating from http to https
Expected Behavior
------------------
config cmds remain functional regardless of how long the system has been up
Actual Behavior
----------------
config cmds start failing after the system is up for 10 days
Reproducibility
---------------
Was seen on one system which was up for more than 10 days, but expected to be reproducible
System Configuration
-------
any
Branch/Pull Time/Commit
-------
Seen with a recent stx master load, but is a day 1 issue
Last Pass
---------
Never
Timestamp/Logs
--------------
sysinv 2020-06-11 20:51:51.446 106052 ERROR sysinv.
sysinv 2020-06-11 22:27:03.641 106052 ERROR sysinv.
sysinv 2020-06-11 22:29:09.146 106052 ERROR sysinv.
sysinv 2020-06-11 22:40:19.170 106052 ERROR sysinv.
Test Activity
-------------
System soak
Workaround
----------
Restart the sysinv-conductor to recover the system:
sudo sm-restart service sysinv-conductor
description: | updated |
description: | updated |
tags: | added: stx.config stx.containers |
Changed in starlingx: | |
assignee: | nobody → Andy (andy.wrs) |
importance: | Undecided → High |
Changed in starlingx: | |
status: | New → In Progress |
tags: | added: stx.4.0 |
Changed in starlingx: | |
status: | Confirmed → In Progress |
Fix proposed to branch: master /review. opendev. org/736246
Review: https:/