Activity log for bug #1883599

Date Who What changed Old value New value Message
2020-06-15 20:17:03 Ghada Khalil bug added bug
2020-06-15 20:19:49 Ghada Khalil description Brief Description ----------------- The starlingx config code calls the k8s python client to perform a number of operations. The k8s python client creates a file under /tmp and continues to use this tmp file for the life-cycle of the sysinv-conductor process. After 10 days, sysinv starts to fail with an error that the tmp file is no longer there. There is a cleanup service in starlingx/centos that which runs daily and removes /tmp files which are not in use for 10 days. This is a known issue with k8s: https://github.com/kubernetes-client/python/issues/765 Options for resolution: Severity -------- Major - sysinv/config cmds will start failing after the system is up for 10 days w/o any controller swact Steps to Reproduce ------------------ - Leave a system up for more than 10 days - Attempt to make a config change -- For example: updating from http to https Expected Behavior ------------------ config cmds remain functional regardless of how long the system has been up Actual Behavior ---------------- config Reproducibility --------------- Was seen on one system which was up for more than 10 days, but expected to be reproducible System Configuration -------------------- any Branch/Pull Time/Commit ----------------------- Seen with a recent stx master load, but is a day 1 issue Last Pass --------- Never Timestamp/Logs -------------- sysinv 2020-06-11 20:51:51.446 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:27:03.641 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:29:09.146 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:40:19.170 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr Test Activity ------------- System soak Workaround ---------- Restart the sysinv-conductor to recover the system: sudo sm-restart service sysinv-conductor Brief Description ----------------- The starlingx config code calls the k8s python client to perform a number of operations. The k8s python client creates a file under /tmp and continues to use this tmp file for the life-cycle of the sysinv-conductor process. After 10 days, sysinv starts to fail with an error that the tmp file is no longer there. There is a cleanup service in starlingx/centos that which runs daily and removes /tmp files which are not in use for 10 days. This is a known issue with k8s: https://github.com/kubernetes-client/python/issues/765 The best option is to use a different location other than /tmp to keep these files. This is required for any starlingx process that calls the k8s python client. Keeping the files in /var/run is a good option. Severity -------- Major - sysinv/config cmds will start failing after the system is up for 10 days w/o any controller swact Steps to Reproduce ------------------ - Leave a system up for more than 10 days - Attempt to make a config change -- For example: updating from http to https Expected Behavior ------------------ config cmds remain functional regardless of how long the system has been up Actual Behavior ---------------- config Reproducibility --------------- Was seen on one system which was up for more than 10 days, but expected to be reproducible System Configuration -------------------- any Branch/Pull Time/Commit ----------------------- Seen with a recent stx master load, but is a day 1 issue Last Pass --------- Never Timestamp/Logs -------------- sysinv 2020-06-11 20:51:51.446 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:27:03.641 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:29:09.146 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:40:19.170 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr Test Activity ------------- System soak Workaround ---------- Restart the sysinv-conductor to recover the system: sudo sm-restart service sysinv-conductor
2020-06-15 20:20:23 Ghada Khalil description Brief Description ----------------- The starlingx config code calls the k8s python client to perform a number of operations. The k8s python client creates a file under /tmp and continues to use this tmp file for the life-cycle of the sysinv-conductor process. After 10 days, sysinv starts to fail with an error that the tmp file is no longer there. There is a cleanup service in starlingx/centos that which runs daily and removes /tmp files which are not in use for 10 days. This is a known issue with k8s: https://github.com/kubernetes-client/python/issues/765 The best option is to use a different location other than /tmp to keep these files. This is required for any starlingx process that calls the k8s python client. Keeping the files in /var/run is a good option. Severity -------- Major - sysinv/config cmds will start failing after the system is up for 10 days w/o any controller swact Steps to Reproduce ------------------ - Leave a system up for more than 10 days - Attempt to make a config change -- For example: updating from http to https Expected Behavior ------------------ config cmds remain functional regardless of how long the system has been up Actual Behavior ---------------- config Reproducibility --------------- Was seen on one system which was up for more than 10 days, but expected to be reproducible System Configuration -------------------- any Branch/Pull Time/Commit ----------------------- Seen with a recent stx master load, but is a day 1 issue Last Pass --------- Never Timestamp/Logs -------------- sysinv 2020-06-11 20:51:51.446 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:27:03.641 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:29:09.146 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:40:19.170 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr Test Activity ------------- System soak Workaround ---------- Restart the sysinv-conductor to recover the system: sudo sm-restart service sysinv-conductor Brief Description ----------------- The starlingx config code calls the k8s python client to perform a number of operations. The k8s python client creates a file under /tmp and continues to use this tmp file for the life-cycle of the sysinv-conductor process. After 10 days, sysinv starts to fail with an error that the tmp file is no longer there. There is a cleanup service in starlingx/centos that which runs daily and removes /tmp files which are not in use for 10 days. This is a known issue with k8s: https://github.com/kubernetes-client/python/issues/765 The best option is to use a different location other than /tmp to keep these files. This is required for any starlingx process that calls the k8s python client. Keeping the files in /var/run is a good option. Severity -------- Major - sysinv/config cmds will start failing after the system is up for 10 days w/o any controller swact Steps to Reproduce ------------------ - Leave a system up for more than 10 days - Attempt to make a config change -- For example: updating from http to https Expected Behavior ------------------ config cmds remain functional regardless of how long the system has been up Actual Behavior ---------------- config cmds start failing after the system is up for 10 days Reproducibility --------------- Was seen on one system which was up for more than 10 days, but expected to be reproducible System Configuration -------------------- any Branch/Pull Time/Commit ----------------------- Seen with a recent stx master load, but is a day 1 issue Last Pass --------- Never Timestamp/Logs -------------- sysinv 2020-06-11 20:51:51.446 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:27:03.641 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:29:09.146 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr sysinv 2020-06-11 22:40:19.170 106052 ERROR sysinv.puppet.puppet [-] failed to create secure_system config: ConfigException: File does not exists: /tmp/tmpFQ1byr Test Activity ------------- System soak Workaround ---------- Restart the sysinv-conductor to recover the system: sudo sm-restart service sysinv-conductor
2020-06-15 22:32:39 Ghada Khalil tags stx.config stx.containers
2020-06-15 22:32:50 Ghada Khalil bug added subscriber Daniel Badea
2020-06-15 22:33:06 Ghada Khalil starlingx: assignee Andy (andy.wrs)
2020-06-15 22:33:10 Ghada Khalil starlingx: importance Undecided High
2020-06-16 14:41:45 Ghada Khalil starlingx: status New In Progress
2020-06-17 18:00:13 Ghada Khalil tags stx.config stx.containers stx.4.0 stx.config stx.containers
2020-06-18 13:02:51 OpenStack Infra starlingx: status In Progress Fix Released
2020-06-18 13:02:52 OpenStack Infra bug watch added https://github.com/kubernetes-client/python/issues/765
2020-06-18 14:38:14 Ghada Khalil starlingx: status Fix Released Confirmed
2020-06-18 15:34:37 Ghada Khalil starlingx: status Confirmed In Progress
2020-06-18 20:30:08 OpenStack Infra starlingx: status In Progress Fix Released
2020-09-25 01:09:01 Ghada Khalil starlingx: status Fix Released Triaged
2020-09-25 01:09:08 Ghada Khalil bug added subscriber Allain Legacy
2020-09-25 14:52:56 OpenStack Infra starlingx: status Triaged In Progress
2020-09-25 15:46:59 OpenStack Infra starlingx: status In Progress Fix Released
2020-10-31 15:17:13 Bart Wensley bug added subscriber Bart Wensley
2021-06-16 12:26:18 OpenStack Infra tags stx.4.0 stx.config stx.containers in-f-centos8 stx.4.0 stx.config stx.containers