DC upgrade orchestration fails if started more than 10 days ago due to deleted /tmp files

Bug #2066048 reported by ayyappa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
ayyappa

Bug Description

Brief Description
-----------------
dcmanager-orchestrator call the k8s python client to perform a
number of operations. The k8s python client creates temp files under
/tmp and continues use these tmp files for the life-cycle of the
processes.

However systemd-tmpfiles-clean.service will run every day to clean up
files in /tmp dir that are older than 10 days. If the k8s client code
is not triggered for more than 10 days (thus its temp files are not
accessed for more than 10 days), these temp files will be removed as
part of the cleanup. Certain dcmanager-orchestrator operations then
starts to fail with an error that the tmp file is no longer there.

Severity
--------
Major.

Steps to Reproduce
------------------
1)perform subcloud upgrade orchestration after cleaning up the /tmp directory

Expected Behavior
------------------
subcloud upgrade orchestration should work without any issues

Actual Behavior
----------------
subcloud upgrade orchestration fails

Reproducibility
---------------
100% reproducible.

System Configuration
--------------------
Any.

Branch/Pull Time/Commit
-----------------------
NA.

Last Pass
---------
NA.

Timestamp/Logs
--------------
NA.

Test Activity
-------------
NA.

Workaround
----------
NA

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → ayyappa (mantri425)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/919601
Committed: https://opendev.org/starlingx/distcloud/commit/54eca85c7444a6db6ea1824b7387d3a2c5e384ed
Submitter: "Zuul (22348)"
Branch: master

commit 54eca85c7444a6db6ea1824b7387d3a2c5e384ed
Author: amantri <email address hidden>
Date: Tue May 14 11:41:54 2024 -0400

    Setup /var/run/dcmanager_orchestrator_tmp as orchestrator temp dir

    dcmanager-orchestrator call the k8s python client to perform a
    number of operations. The k8s python client creates temp files under
    /tmp and continues use these tmp files for the life-cycle of the
    processes.

    However systemd-tmpfiles-clean.service will run every day to clean up
    files in /tmp dir that are older than 10 days. If the k8s client code
    is not triggered for more than 10 days (thus its temp files are not
    accessed for more than 10 days), these temp files will be removed as
    part of the cleanup. Certain dcmanager-orchestrator operations then
    starts to fail with an error that the tmp file is no longer there.

    This is a known issue of kubernetes python client:
    https://github.com/kubernetes-client/python/issues/765

    The commit fixes this issue by setting TMPDIR to /var/run/dcmanager_
    orchestrator_tmp when sm starts dcmanager-orchestrator.

    The following similar commits were added for sysinv,dcmanager
    services in the past
    https://review.opendev.org/c/starlingx/config/+/736761
    https://review.opendev.org/c/starlingx/distcloud/+/736247

    Closes-bug: 2066048

    Change-Id: I3d39f5b034e3ef2e6ad9636e86f26f0e93f16d45
    Signed-off-by: amantri <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.10.0 stx.distcloud
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.