Misleading app status after failed override update
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Low
|
David Barbosa Bastos |
Bug Description
Brief Description
-----------------
Application status was misleading after a failed override update with illegal values. Application should be in failed (apply-failed) state, and alarm should be raised accordingly. Instead, we're led to believe
that the update was completed successfully.
Severity
--------
Minor
Steps to Reproduce
------------------
1) The application must have the applied status.
2) Modify user overrides with illegal values: system helm-override-
3) Reapply the app and the status will change to applied even though helmrelease fails.
Expected Behavior
------------------
Application should be in failed(
Actual Behavior
----------------
In seconds after application-apply, the progress becomes 'completed' and status 'applied'
Reproducibility
---------------
Reproducible 100%
System Configuration
-------
AIO-SX, AIO-DX+, Std
Branch/Pull Time/Commit
-------
SW_VERSION="23.09"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID=
SRC_BUILD_ID="1699"
Last Pass
---------
n/a
Timestamp/Logs
--------------
[sysadmin@
+------
| Property | Value |
+------
| name | ks-ingress-nginx |
| namespace | kube-system |
| user_overrides | controller: |
| | resources: |
| | limits: |
| | cpu: "10" |
| | requests: |
| | cpu: "255" |
| | |
+------
[sysadmin@
+------
| Property | Value |
+------
| active | True |
| app_version | 22.12-1 |
| created_at | 2023-12-
| manifest_file | fluxcd-manifests |
| manifest_name | nginx-ingress-
| name | nginx-ingress-
| progress | None |
| status | applying |
| updated_at | 2024-01-
+------
Please use 'system application-list' or 'system application-show nginx-ingress-
[sysadmin@
NAME AGE READY STATUS
ceph-pools-audit 39d True Release reconciliation succeeded
cephfs-provisioner 39d True Release reconciliation succeeded
ks-ingress-nginx 39d False upgrade retries exhausted
rbd-provisioner 39d True Release reconciliation succeeded
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-
calico-node-94gr2 1/1 Running 1 (39d ago) 39d 192.168.204.2 controller-0 <none> <none>
ceph-pools-
ceph-pools-
ceph-pools-
cephfs-
cephfs-
cephfs-
coredns-
ic-nginx-
kube-apiserver-
kube-controller
kube-multus-
kube-proxy-gr89h 1/1 Running 1 (39d ago) 39d 192.168.204.2 controller-0 <none> <none>
kube-scheduler-
kube-sriov-
rbd-nodeplugin-
rbd-provisioner
rbd-storage-
+------
| application | version | manifest name | manifest file | status | progress |
+------
| cert-manager | 22.12-8 | cert-manager-
| metrics-server | 22.12-1 | metrics-
| nginx-ingress-
| oidc-auth-apps | 22.12-6 | oidc-auth-
| platform-integ-apps | 22.12-62 | platform-
| wr-analytics | 24.03-0 | wr-analytics-
+------
[sysadmin@
Name: ks-ingress-nginx
Namespace: kube-system
Labels: chart_group=
Annotations: <none>
API Version: helm.toolkit.
Kind: HelmRelease
Metadata:
Creation Timestamp: 2023-12-
Finalizers:
finalizers.
Generation: 1
Managed Fields:
API Version: helm.toolkit.
Fields Type: FieldsV1
fieldsV1:
f:metadata:
.:
Manager: helm-controller
Operation: Update
Time: 2023-12-
API Version: helm.toolkit.
Fields Type: FieldsV1
fieldsV1:
f:metadata:
.:
f:labels:
.:
f:spec:
.:
f:chart:
.:
f:spec:
.:
.:
f:install:
.:
f:interval:
f:test:
.:
f:enable:
f:timeout:
f:upgrade:
.:
Manager: kubectl-
Operation: Update
Time: 2023-12-
API Version: helm.toolkit.
Fields Type: FieldsV1
fieldsV1:
f:status:
f:failures:
Manager: helm-controller
Operation: Update
Subresource: status
Time: 2024-01-
Resource Version: 20773261
UID: b4cd91a4-
Spec:
Chart:
Spec:
Chart: ingress-nginx
Reconcile Strategy: ChartVersion
Source Ref:
Kind: HelmRepository
Name: stx-platform
Version: 4.0.15
Install:
Disable Hooks: false
Interval: 1m
Release Name: ic-nginx-ingress
Test:
Enable: false
Timeout: 30m
Upgrade:
Disable Hooks: false
Values From:
Kind: Secret
Name: ingress-
Values Key: ingress-
Kind: Secret
Name: ingress-
Values Key: ingress-
Status:
Conditions:
Last Transition Time: 2024-01-
Message: upgrade retries exhausted
Reason: UpgradeFailed
Status: False
Type: Ready
Last Transition Time: 2024-01-
Message: Helm upgrade failed: cannot patch "ic-nginx-
error updating the resource "ic-nginx-
cannot patch "ic-nginx-
Looks like there are no changes for IngressClass "nginx"
Patch ValidatingWebho
warning: Upgrade "ic-nginx-ingress" failed: cannot patch "ic-nginx-
Reason: UpgradeFailed
Status: False
Type: Released
Failures: 9
Helm Chart: kube-system/
Last Applied Revision: 4.0.15
Last Attempted Revision: 4.0.15
Last Attempted Values Checksum: 78fa2175ba00ae1
Last Release Revision: 8
Observed Generation: 1
Upgrade Failures: 1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal info 5m27s (x6 over 35m) helm-controller Helm upgrade succeeded
Normal info 3m27s (x7 over 35m) helm-controller Helm upgrade has started
Warning error 3m21s helm-controller Helm upgrade failed: cannot patch "ic-nginx-
error updating the resource "ic-nginx-
cannot patch "ic-nginx-
Looks like there are no changes for IngressClass "nginx"
Patch ValidatingWebho
warning: Upgrade "ic-nginx-ingress" failed: cannot patch "ic-nginx-
Warning error 3m21s helm-controller reconciliation failed: Helm upgrade failed: cannot patch "ic-nginx-
Warning error 9s (x8 over 3m20s) helm-controller reconciliation failed: upgrade retries exhausted
Test Activity
-------------
Testing
Workaround
----------
n/a
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
assignee: | nobody → David Barbosa Bastos (dbarbosa-wr) |
Changed in starlingx: | |
importance: | Undecided → Low |
tags: | added: stx.9.0 stx.apps |
Reviewed: https:/ /review. opendev. org/c/starlingx /config/ +/908856 /opendev. org/starlingx/ config/ commit/ ce4b7c1eb328c8c 6bc443da4fd5b24 1f5384b207
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit ce4b7c1eb328c8c 6bc443da4fd5b24 1f5384b207
Author: David Bastos <email address hidden>
Date: Mon Feb 12 15:51:59 2024 -0300
Fix misleading app status after failed override update
Application status was misleading after a failed override update with
illegal values. Application should be in failed (apply-failed) state,
and alarm should be raised accordingly. Instead, we're led to believe
that the update was completed successfully.
The solution consists of adding a default delay to the system of 60
seconds before changing the helmrelease status. This way we ensure
that reconciliation has already been called.
This also ensures that any application can override this default
value via metadata. Just create a variable with the same name with
the amount of time that is needed.
Test Plan: controller update) with illegal update) with correct hr_reconcile_ check_delay key in its
PASS: Build-pkgs && build-image
PASS: Upload, apply, delete and update nginx-ingress-
PASS: Upload, apply, delete and update platform-integ-apps
PASS: Upload, apply, delete and update metrics-server
PASS: Update user overrides (system user-override-
values. When reapplying the app it should fail.
PASS: Update user overrides (system user-override-
values. When reapplying the app it should complete successfully.
PASS: If the app has the fluxcd_
metadata, the system's default delay value must be overwritten.
Closes-Bug: 2053276
Change-Id: I5e75745009be23 5e2646a79764cb4 ff619a93d59
Signed-off-by: David Bastos <email address hidden>