Comment 0 for bug 1883599

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Brief Description
-----------------
The starlingx config code calls the k8s python client to perform a number of operations. The k8s python client creates a file under /tmp and continues to use this tmp file for the life-cycle of the sysinv-conductor process. After 10 days, sysinv starts to fail with an error that the tmp file is no longer there. There is a cleanup service in starlingx/centos that which runs daily and removes /tmp files which are not in use for 10 days.

This is a known issue with k8s:
https://github.com/kubernetes-client/python/issues/765

Options for resolution:

Severity
--------
Major - sysinv/config cmds will start failing after the system is up for 10 days w/o any controller swact

Steps to Reproduce
------------------
- Leave a system up for more than 10 days
- Attempt to make a config change -- For example: updating from http to https

Expected Behavior
------------------
config cmds remain functional regardless of how long the system has been up

Actual Behavior
----------------
config

Reproducibility
---------------
Was seen on one system which was up for more than 10 days, but expected to be reproducible

System Configuration
--------------------
any

Branch/Pull Time/Commit
-----------------------
Seen with a recent stx master load, but is a day 1 issue

Last Pass
---------
Never

Timestamp/Logs
--------------
sysinv 2020-06-11 20:51:51.446 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr
sysinv 2020-06-11 22:27:03.641 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr
sysinv 2020-06-11 22:29:09.146 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr
sysinv 2020-06-11 22:40:19.170 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr

Test Activity
-------------
System soak

Workaround
----------
Restart the sysinv-conductor to recover the system:
sudo sm-restart service sysinv-conductor