StarlingX

DC upgrade orchestration fails if started more than 10 days ago due to deleted /tmp files

Bug #2066048 reported by ayyappa on 2024-05-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Medium	ayyappa

Bug Description

Brief Description
-----------------
dcmanager-orchestrator call the k8s python client to perform a
number of operations. The k8s python client creates temp files under
/tmp and continues use these tmp files for the life-cycle of the
processes.

However systemd-tmpfiles-clean.service will run every day to clean up
files in /tmp dir that are older than 10 days. If the k8s client code
is not triggered for more than 10 days (thus its temp files are not
accessed for more than 10 days), these temp files will be removed as
part of the cleanup. Certain dcmanager-orchestrator operations then
starts to fail with an error that the tmp file is no longer there.

Severity
--------
Major.

Steps to Reproduce
------------------
1)perform subcloud upgrade orchestration after cleaning up the /tmp directory

Expected Behavior
------------------
subcloud upgrade orchestration should work without any issues

Actual Behavior
----------------
subcloud upgrade orchestration fails

Reproducibility
---------------
100% reproducible.

System Configuration
--------------------
Any.

Branch/Pull Time/Commit
-----------------------
NA.

Last Pass
---------
NA.

Timestamp/Logs
--------------
NA.

Test Activity
-------------
NA.

Workaround
----------
NA

Tags:

OpenStack Infra (hudson-openstack) on 2024-05-17

Changed in starlingx:
status:	New → In Progress

Ghada Khalil (gkhalil) on 2024-05-17

Changed in starlingx:
assignee:	nobody → ayyappa (mantri425)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-05-17: Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/919601
Committed: https://opendev.org/starlingx/distcloud/commit/54eca85c7444a6db6ea1824b7387d3a2c5e384ed
Submitter: "Zuul (22348)"
Branch: master

commit 54eca85c7444a6db6ea1824b7387d3a2c5e384ed
Author: amantri <email address hidden>
Date: Tue May 14 11:41:54 2024 -0400

Setup /var/run/dcmanager_orchestrator_tmp as orchestrator temp dir

    dcmanager-orchestrator call the k8s python client to perform a
    number of operations. The k8s python client creates temp files under
    /tmp and continues use these tmp files for the life-cycle of the
    processes.

    However systemd-tmpfiles-clean.service will run every day to clean up
    files in /tmp dir that are older than 10 days. If the k8s client code
    is not triggered for more than 10 days (thus its temp files are not
    accessed for more than 10 days), these temp files will be removed as
    part of the cleanup. Certain dcmanager-orchestrator operations then
    starts to fail with an error that the tmp file is no longer there.

This is a known issue of kubernetes python client:
https://github.com/kubernetes-client/python/issues/765

The commit fixes this issue by setting TMPDIR to /var/run/dcmanager_
orchestrator_tmp when sm starts dcmanager-orchestrator.

    The following similar commits were added for sysinv,dcmanager
    services in the past
    https://review.opendev.org/c/starlingx/config/+/736761
    https://review.opendev.org/c/starlingx/distcloud/+/736247

Closes-bug: 2066048

Change-Id: I3d39f5b034e3ef2e6ad9636e86f26f0e93f16d45
Signed-off-by: amantri <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Ghada Khalil (gkhalil) on 2024-05-17

Changed in starlingx:
importance:	Undecided → Medium
tags:	added: stx.10.0 stx.distcloud

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

auto-github-kubernetes-client-python #765 Edit

Bug watches keep track of this bug in other bug trackers.