Apps take a long time to apply and the progress status remains at 0%
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Low
|
Dan Voiculeasa |
Bug Description
Brief Description
-----------------
This is a follow-up to LP: https:/
A specific solution was implemented for auditd to reduce the repository update interval and allow the app to apply in a timely fashion: https:/
However, the issue is applicable to other applications in the kube-system namespace (which has multiple apps in the same namespace.
As discussed w/ Bob Church in the review above, this needs further investigation to find a more generic solution.
Severity
--------
Minor - with the above fix, auditd is applying properly. This is tracking a better solution.
Steps to Reproduce
------------------
- Revert the review above
- system apply auditd
Expected Behavior
------------------
- apply completes in a timely fashion in the order of minutes
Actual Behavior
----------------
- apply takes more than an hour to complete
Reproducibility
---------------
Intermittent. Not seen on every apply; seems to be a timing issue related to the helm repository update interval
System Configuration
-------
Seen on a DX system, but unsure if that's related
Branch/Pull Time/Commit
-------
reported in Debian build: 2022-09-01_18-00-06
Last Pass
---------
Intermittent
Timestamp/Logs
--------------
See https:/
Test Activity
-------------
Regression Testing
Workaround
----------
None
tags: | added: stx.apps |
description: | updated |
description: | updated |
Changed in starlingx: | |
importance: | Undecided → Low |
description: | updated |
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
assignee: | nobody → Dan Voiculeasa (dvoicule) |
tags: | added: stx.8.0 |
Reviewed: https:/ /review. opendev. org/c/starlingx /config/ +/865856 /opendev. org/starlingx/ config/ commit/ b83b0e70fef2073 e0d56e4a18fc2fb 61fd973b84
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit b83b0e70fef2073 e0d56e4a18fc2fb 61fd973b84
Author: Dan Voiculeasa <email address hidden>
Date: Mon Nov 28 16:39:21 2022 +0200
AppFwk: Add FluxCD recovery logic for apply operation [2]
Add some robustenss to the app framework. It is observed that the
framework can reach a state where a helm charts are not uploaded to
HelmRepository. This leads to app framework waiting for reconciliation
of HelmRepository to be fired. Currently the reconciliation interval
is set to 60 minutes for every app checked.
Issue becomes obvious when udating the app to use newer HelmCharts.
HelmChart observed status is '''chart pull error: failed to get chart
version for remote reference: no chart name found''' which is a
string the recovery logic will attempt to recover from.
Update recovery logic to trigger a HelmRepository reconciliation
before a HelmChart reconciliation.
Skip CentOS testing because we use the same fluxcd and kubernetes.
The only difference is the python kubernetes library, but the
implementation does not use any new API calls.
Tests on AIO-SX Debian:
reconciliati on is triggered by the recovery logic.
PASS: AIO-SX unlocked enabled available
PASS: inspect logs to see HelmRepository
Closes-Bug: 1995748 636164d011b5fa5 d44ce8c9a6c
Signed-off-by: Dan Voiculeasa <email address hidden>
Change-Id: I34ae586a5a267b