DC upgrade orchestration fails if started more than 10 days ago due to deleted /tmp files
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
ayyappa |
Bug Description
Brief Description
-----------------
dcmanager-
number of operations. The k8s python client creates temp files under
/tmp and continues use these tmp files for the life-cycle of the
processes.
However systemd-
files in /tmp dir that are older than 10 days. If the k8s client code
is not triggered for more than 10 days (thus its temp files are not
accessed for more than 10 days), these temp files will be removed as
part of the cleanup. Certain dcmanager-
starts to fail with an error that the tmp file is no longer there.
Severity
--------
Major.
Steps to Reproduce
------------------
1)perform subcloud upgrade orchestration after cleaning up the /tmp directory
Expected Behavior
------------------
subcloud upgrade orchestration should work without any issues
Actual Behavior
----------------
subcloud upgrade orchestration fails
Reproducibility
---------------
100% reproducible.
System Configuration
-------
Any.
Branch/Pull Time/Commit
-------
NA.
Last Pass
---------
NA.
Timestamp/Logs
--------------
NA.
Test Activity
-------------
NA.
Workaround
----------
NA
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
assignee: | nobody → ayyappa (mantri425) |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.10.0 stx.distcloud |
Reviewed: https:/ /review. opendev. org/c/starlingx /distcloud/ +/919601 /opendev. org/starlingx/ distcloud/ commit/ 54eca85c7444a6d b6ea1824b7387d3 a2c5e384ed
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 54eca85c7444a6d b6ea1824b7387d3 a2c5e384ed
Author: amantri <email address hidden>
Date: Tue May 14 11:41:54 2024 -0400
Setup /var/run/ dcmanager_ orchestrator_ tmp as orchestrator temp dir
dcmanager- orchestrator call the k8s python client to perform a
number of operations. The k8s python client creates temp files under
/tmp and continues use these tmp files for the life-cycle of the
processes.
However systemd- tmpfiles- clean.service will run every day to clean up orchestrator operations then
files in /tmp dir that are older than 10 days. If the k8s client code
is not triggered for more than 10 days (thus its temp files are not
accessed for more than 10 days), these temp files will be removed as
part of the cleanup. Certain dcmanager-
starts to fail with an error that the tmp file is no longer there.
This is a known issue of kubernetes python client: /github. com/kubernetes- client/ python/ issues/ 765
https:/
The commit fixes this issue by setting TMPDIR to /var/run/dcmanager_ r_tmp when sm starts dcmanager- orchestrator.
orchestrato
The following similar commits were added for sysinv,dcmanager /review. opendev. org/c/starlingx /config/ +/736761 /review. opendev. org/c/starlingx /distcloud/ +/736247
services in the past
https:/
https:/
Closes-bug: 2066048
Change-Id: I3d39f5b034e3ef 2e6ad9636e86f26 f0e93f16d45
Signed-off-by: amantri <email address hidden>