Brief Description
-----------------
After Backup and restore and unlock the controller-0 all the apps stay in "failed to download one or more image(s)" state, and "restore-requested" status. According to sysinv.log this is because the yaml files from flucd folder are missed.
Severity
--------
<Major: System/Feature is usable but degraded>
Steps to Reproduce
------------------
* Install a simplex system with StarlingX master
* Run the Backup Ansible playbook from controller-0
* Install a clean image of StarlingX master in the system with wipedisk=false
* Run the restore Ansible playbook locally with the backup file saved above
* All the application are in applied/uploaded state
Expected Behavior
------------------
All the application are in applied/uploaded state
Actual Behavior
----------------
All the application are in restore-requested state
Reproducibility
---------------
3/3
System Configuration
--------------------
SX and DX (might happen on Standard/Storage as well), ipv4
Branch/Pull Time/Commit
-----------------------
Branch and the time when code was pulled or git commit or cengn load info
Last Pass
---------
Before nginx-ingress-controller, platform-integ-apps and cert-manage were migrated to fluxcd
Timestamp/Logs
--------------
{code:java}
controller-0:~$ source /etc/platform/openrc
[sysadmin@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[sysadmin@controller-0 ~(keystone_admin)]$ system application-list
+--------------------------+---------+-------------------------------------------+------------------+-------------------+------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+---------+-------------------------------------------+------------------+-------------------+------------------------------------------+
| cert-manager | 1.0-34 | cert-manager-fluxcd-manifests | fluxcd-manifests | restore-requested | failed to download one or more image(s). |
| nginx-ingress-controller | 1.1-34 | nginx-ingress-controller-fluxcd-manifests | fluxcd-manifests | restore-requested | failed to download one or more image(s). |
| oidc-auth-apps | 1.0-66 | oidc-auth-apps-fluxcd-manifests | fluxcd-manifests | uploaded | completed |
| platform-integ-apps | 1.0-52 | platform-integ-apps-fluxcd-manifests | fluxcd-manifests | restore-requested | failed to download one or more image(s). |
| rook-ceph-apps | 1.0-17 | rook-ceph-manifest | manifest.yaml | uploaded | completed |
+--------------------------+---------+-------------------------------------------+------------------+-------------------+------------------------------------------+ {code}
sysinv.log
{code:java}
sysinv 2022-06-16 00:31:10.165 108519 INFO sysinv.conductor.manager [-] Restore in progress - defer platform managed application activity
sysinv 2022-06-16 00:31:10.166 108519 ERROR sysinv.conductor.manager [-] [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/nginx-ingress-controller/1.1-34/nginx-ingress-controller-images.yaml': IOError: [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/nginx-ingress-controller/1.1-34/nginx-ingress-controller-images.yaml'
2022-06-16 00:31:10.166 108519 ERROR sysinv.conductor.manager Traceback (most recent call last):
2022-06-16 00:31:10.166 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 6634, in _restore_download_images
2022-06-16 00:31:10.166 108519 ERROR sysinv.conductor.manager self._app.download_images(rapp)
2022-06-16 00:31:10.166 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 867, in download_images
2022-06-16 00:31:10.166 108519 ERROR sysinv.conductor.manager saved_images_list = self._retrieve_images_list(app.sync_imgfile)
2022-06-16 00:31:10.166 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 855, in _retrieve_images_list
2022-06-16 00:31:10.166 108519 ERROR sysinv.conductor.manager with io.open(app_images_file, 'r', encoding='utf-8') as f:
2022-06-16 00:31:10.166 108519 ERROR sysinv.conductor.manager IOError: [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/nginx-ingress-controller/1.1-34/nginx-ingress-controller-images.yaml'
2022-06-16 00:31:10.166 108519 ERROR sysinv.conductor.manager
sysinv 2022-06-16 00:31:27.241 109357 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost_patch_start_2022-06-16-00-31-27 patch
sysinv 2022-06-16 00:31:27.242 109357 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost_patch_end. No changes from mtce/1.0.
sysinv 2022-06-16 00:31:56.590 109357 WARNING keystonemiddleware.auth_token [-] Authorization failed for token: InvalidToken: Token authorization failed
sysinv 2022-06-16 00:32:10.136 108519 INFO sysinv.common.rest_api [-] GET cmd:http://localhost:30001/nfvi-plugins/v1/sw-update hdr:{'Content-type': 'application/json', 'User-Agent': 'sysinv/1.0'} payload:None
sysinv 2022-06-16 00:32:10.137 108519 INFO sysinv.common.rest_api [-] Response={u'status': u'success', u'in-progress': None, u'sw-update-type': None}
sysinv 2022-06-16 00:32:10.145 108519 INFO sysinv.conductor.manager [-] Request downloading images for cert-manager:
sysinv 2022-06-16 00:32:10.155 108519 ERROR sysinv.conductor.manager [-] [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/cert-manager/1.0-34/cert-manager-images.yaml': IOError: [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/cert-manager/1.0-34/cert-manager-images.yaml'
2022-06-16 00:32:10.155 108519 ERROR sysinv.conductor.manager Traceback (most recent call last):
2022-06-16 00:32:10.155 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 6634, in _restore_download_images
2022-06-16 00:32:10.155 108519 ERROR sysinv.conductor.manager self._app.download_images(rapp)
2022-06-16 00:32:10.155 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 867, in download_images
2022-06-16 00:32:10.155 108519 ERROR sysinv.conductor.manager saved_images_list = self._retrieve_images_list(app.sync_imgfile)
2022-06-16 00:32:10.155 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 855, in _retrieve_images_list
2022-06-16 00:32:10.155 108519 ERROR sysinv.conductor.manager with io.open(app_images_file, 'r', encoding='utf-8') as f:
2022-06-16 00:32:10.155 108519 ERROR sysinv.conductor.manager IOError: [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/cert-manager/1.0-34/cert-manager-images.yaml'
2022-06-16 00:32:10.155 108519 ERROR sysinv.conductor.manager
sysinv 2022-06-16 00:32:10.161 108519 INFO sysinv.conductor.manager [-] Request downloading images for platform-integ-apps:
sysinv 2022-06-16 00:32:10.169 108519 ERROR sysinv.conductor.manager [-] [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/platform-integ-apps/1.0-52/platform-integ-apps-images.yaml': IOError: [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/platform-integ-apps/1.0-52/platform-integ-apps-images.yaml'
2022-06-16 00:32:10.169 108519 ERROR sysinv.conductor.manager Traceback (most recent call last):
2022-06-16 00:32:10.169 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 6634, in _restore_download_images
2022-06-16 00:32:10.169 108519 ERROR sysinv.conductor.manager self._app.download_images(rapp)
2022-06-16 00:32:10.169 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 867, in download_images
2022-06-16 00:32:10.169 108519 ERROR sysinv.conductor.manager saved_images_list = self._retrieve_images_list(app.sync_imgfile)
2022-06-16 00:32:10.169 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 855, in _retrieve_images_list
2022-06-16 00:32:10.169 108519 ERROR sysinv.conductor.manager with io.open(app_images_file, 'r', encoding='utf-8') as f:
2022-06-16 00:32:10.169 108519 ERROR sysinv.conductor.manager IOError: [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/platform-integ-apps/1.0-52/platform-integ-apps-images.yaml'
2022-06-16 00:32:10.169 108519 ERROR sysinv.conductor.manager
sysinv 2022-06-16 00:32:10.172 108519 INFO sysinv.conductor.manager [-] Request downloading images for nginx-ingress-controller:
sysinv 2022-06-16 00:32:10.177 108519 INFO sysinv.conductor.manager [-] Restore in progress - defer platform managed application activity
sysinv 2022-06-16 00:32:10.178 108519 ERROR sysinv.conductor.manager [-] [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/nginx-ingress-controller/1.1-34/nginx-ingress-controller-images.yaml': IOError: [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/nginx-ingress-controller/1.1-34/nginx-ingress-controller-images.yaml'
2022-06-16 00:32:10.178 108519 ERROR sysinv.conductor.manager Traceback (most recent call last):
2022-06-16 00:32:10.178 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 6634, in _restore_download_images
2022-06-16 00:32:10.178 108519 ERROR sysinv.conductor.manager self._app.download_images(rapp)
2022-06-16 00:32:10.178 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 867, in download_images
2022-06-16 00:32:10.178 108519 ERROR sysinv.conductor.manager saved_images_list = self._retrieve_images_list(app.sync_imgfile)
2022-06-16 00:32:10.178 108519 ERROR sysinv.conductor.manager File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 855, in _retrieve_images_list
2022-06-16 00:32:10.178 108519 ERROR sysinv.conductor.manager with io.open(app_images_file, 'r', encoding='utf-8') as f:
2022-06-16 00:32:10.178 108519 ERROR sysinv.conductor.manager IOError: [Errno 2] No such file or directory: '/opt/platform/fluxcd/22.06/nginx-ingress-controller/1.1-34/nginx-ingress-controller-images.yaml'
2022-06-16 00:32:10.178 108519 ERROR sysinv.conductor.manager {code}
fluxcd folder is missed or not restored.
{code:java}
controller-0:~$ ls /opt/platform/
armada config device_images extra helm helm_charts keystone lost+found nfv puppet sysinv {code}
Test Activity
-------------
Regression Testing
Workaround
----------
- copy the fluxcd folder from another lab
- sudo -u postgres psql -U postgres -d sysinv -c "update kube_app set status='uploaded' where name='cert-manager'"
- lock and unlock controller
Reviewed: https:/ /review. opendev. org/c/starlingx /ansible- playbooks/ +/846244 /opendev. org/starlingx/ ansible- playbooks/ commit/ 97a6c9b64f0b574 ce298af3c28aeae e8312e07a0
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 97a6c9b64f0b574 ce298af3c28aeae e8312e07a0
Author: Thiago Brito <email address hidden>
Date: Thu Jun 16 22:55:33 2022 -0400
Backup and restore fluxcd folder
Include /opt/platform/ fluxcd folder in the list of folders to backup.
Also, checks if the armada folder is contained in the backup now that we
might not have armada apps installed on the system anymore.
TEST PLAN (AIO-SX)
PASS Backup & Restore stx.7.0 with fluxcd and armada (rook-ceph) apps
PASS Backup & Restore stx.7.0 with fluxcd apps only
PASS Upgrade stx.6.0 to stx.7.0 with only armada apps, passes restore
but seeing an unrelated problem during upgrade-activate
Closes-Bug: 1979124 c6e703ce61d7013 2dfed0fbe23
Signed-off-by: Thiago Brito <email address hidden>
Change-Id: I9924113aa0099e