stx-openstack: Cinder helm release fails to upgrade after helm override

Bug #2018930 reported by Luan Nunes Utimura
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Luan Nunes Utimura

Bug Description

Brief Description
-----------------
After performing a helm override of the `backup_driver` and `backup_mount_options` fields in `.Values.conf.cinder.DEFAULT`, the `osh-openstack-cinder` helm release fails to upgrade due to changes in the `cinder-backup-storage-init` job spec.

Severity
--------
Major. After the upgrade fails, the application is left in the `apply-failed` state.

Although the helm release is able to recover after some time (by rolling back), the application will stay in the `apply-failed` state until a second apply is triggered. This can impact some automated tests depending on the timing of those events.

Steps to Reproduce
------------------
1) Upload/apply stx-openstack;
2) Perform a helm override of `backup_driver` and `backup_mount_options` fields:
   ```
     system helm-override-update stx-openstack cinder openstack \
       cinder-static-overrides.yaml
   ```
   and re-apply stx-openstack;
3) Delete helm overrides:
   ```
     system helm-override-delete wr-openstack cinder openstack
   ```
   and re-apply stx-openstack.

Expected Behavior
------------------
The application is successfully applied both times.

Actual Behavior
----------------
The application fails to apply the second time.

Reproducibility
---------------
Reproducible.

System Configuration
--------------------
AIO-SX.

Branch/Pull Time/Commit
-----------------------
master:
  * /mirror/starlingx/master/debian/monolithic/latest_green_build/

Last Pass
---------
N/A.

Timestamp/Logs
--------------
sysinv 2023-02-25 08:38:03.607 2806263 ERROR sysinv.conductor.kube_app [-] Application stx-openstack: release cinder: Failed during apply :Helm upgrade failed: cannot patch "cinder-backup-storage-init" with kind Job: Job.batch "cinder-backup-storage-init" is invalid: spec.template: Invalid value: core.PodTemplateSpec{[...]}: field is immutable.

Test Activity
-------------
Developer Testing.

Workaround
----------
Before applying stx-openstack (and after performing the helm overrides), remove the conflicting job manually: `kubectl -n openstack delete job/cinder-backup-storage-init`.

Changed in starlingx:
assignee: nobody → Luan Nunes Utimura (lutimura)
tags: added: stx.9.0 stx.distro.openstack
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/openstack-armada-app/+/882610
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/1fb92fb8ba2cc201ddc21d90ae12a81b1ebf1d42
Submitter: "Zuul (22348)"
Branch: master

commit 1fb92fb8ba2cc201ddc21d90ae12a81b1ebf1d42
Author: Luan Nunes Utimura <email address hidden>
Date: Mon May 8 14:53:40 2023 -0300

    Fixing cinder helm release storage bootstrap hooks

    After performing a helm override of the `backup_driver` and
    `backup_mount_options` fields in `.Values.conf.cinder.DEFAULT`, it has
    been observed that `osh-openstack-cinder` fails to upgrade due to
    changes to immutable fields in `cinder-backup-storage-init` job spec.

    Since this is only a problem because the job is being kept in the system
    even after it has finished its task, one can avoid this upgrade failure
    by simply leveraging helm hooks [1].

    [1] https://helm.sh/docs/topics/charts_hooks/

    Test Plan:
    PASS - Build openstack-helm package
    PASS - Build stx-openstack-helm-fluxcd package
    PASS - Build stx-openstack helm charts
    PASS - Upload/apply stx-openstack
    PASS - While watching the jobs:
            `kubectl -n openstack get jobs -w | grep cinder`
           Perform a helm override and verify that storage bootstrap jobs
           are being terminated and reinitialized during the helm release
           upgrade
    PASS - Remove/delete stx-openstack

    Closes-Bug: 2018930

    Change-Id: Icfda326ac390564a3bca1c358e8e444b95d66808
    Signed-off-by: Luan Nunes Utimura <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.