restore fails on an upgraded system

Bug #2042971 reported by ayyappa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Joshua Kraitberg

Bug Description

Brief Description
-----------------
Legacy restore fails on an upgraded system

Severity
--------
major

Steps to Reproduce
------------------
1)Applied WRA, WAD, Metrics-server, Vault application on the build -> /folk/cgts/patches-to-verify/21.12/wind-river-cloud-platform-host-installer-21.12-b45-PATCH_0012.iso

2) Upgraded the lab to -> /localdisk/loadbuild/jenkins/wrcp-22.12-formal-patch/vWRCP_22.12_PATCH_0004/export/outputs/patch/WRCP_22.12_0004.iso (WRCP_22.12_PATCH_0004_NOV02)

3)Upgraded the WRA-22.12
4) Perform Backup
ansible-playbook /usr/share/ansible/stx-ansible/playbooks/backup.yml -e "ansible_become_pass=Li69nux*" -e "admin_password=Li69nux*"
5) SCP file from /opt/backups to workstation
6) Install previous build (WRCP_22.12_PATCH_0004_NOV02) use 'STOP: install_controller'
7. Login to Controller-0 (active controller)
8. SCP backup file from Workstation to controller-0
9 .Restore

ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=/home/sysadmin" -e "ansible_become_pass=Li69nux*" -e "admin_password=Li69nux*" -e "backup_filename=localhost_platform_backup_2023_11_06_10_42_49.tgz" -e "wipe_ceph_osds=false"

Expected Behavior
------------------
Restore playbook should run successfully

Actual Behavior
----------------
Restore playbook is failing.

Reproducibility
---------------
100%

System Configuration
--------------------
standard, DC systems

Load info (eg: 2022-03-10_20-00-07)

stx.8.0

Branch/Pull Time/Commit
-----------------------
na

Last Pass
---------
na

Timestamp/Logs
--------------
na

Test Activity
-------------
upgrade testing

Workaround
----------
Describe workaround if available

ayyappa (mantri425)
summary: - backup and restore fails on an upgraded system
+ restore fails on an upgraded system
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/900369

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/900369
Committed: https://opendev.org/starlingx/config/commit/a641b9fe9fce96da2ea79592deef8be429da25a5
Submitter: "Zuul (22348)"
Branch: master

commit a641b9fe9fce96da2ea79592deef8be429da25a5
Author: amantri <email address hidden>
Date: Tue Nov 7 13:14:10 2023 -0500

    Remove 'resourceVersion' from cert-manager-backup.yml file

    During upgrade, in 64-upgrade-cert-manager.sh script
    we backup the existing cert-manager resources to
    cert-manager-backup.yaml file which contains the
    resourceVersion and convert the apiVersion and copy to
    new file cert-manager-v1.yaml and once the cert-manager
    app is upgraded, we apply the cert-manager-v1.yaml so in
    the last-configuration-applied which contains the
    resourceVersion is causing issue during legacy restore
    operation. This Fix addresses this issue by removing the
    "resourceVersion" from all the cm resources in the
    cert-manager-backup.yaml file.

    Test Cases:
    PASS: Perform upgrade on the system, after upgrade verify
          that cert-manager resources like issuer,clusterissuer,
          certificates,certificaterequests doesn't contain the
          resourceVersion in the last-applied-configuration under
          annotations
    PASS: On an upgraded system, try to update the clusterissuer
          with the clusterissuer definition file and verify it is
          successfully updated
    PASS: On an upgraded system, perform backup and restore and
          verify it is successful

    Closes-bug: 2042971

    Change-Id: I31c77b75182d953d5a7050f9ea08b3f66bef1e47
    Signed-off-by: amantri <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → ayyappa (mantri425)
importance: Undecided → Medium
tags: added: stx.9.0 stx.update
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Re-opening as an additional fix is required...

The restore still failed with the following error

TASK [restore-platform/restore-more-data : Restore sysinv default configuration file] ******************************************************************************************************************************************************
Wednesday 08 November 2023 18:48:06 +0000 (0:00:00.551) 0:29:52.355 ****
fatal: [localhost]: FAILED! => changed=true
  cmd:
  - tar
  - -C
  - /opt/platform/sysinv/22.12
  - -xpf
  - /opt/platform-backup/localhost_platform_backup_2023_11_08_23_01_40.tgz
  - --wildcards
  - --transform=s,.*/,,
  - '*/sysinv.conf.default'
  delta: '0:00:00.150347'
  end: '2023-11-08 18:48:07.367373'
  msg: non-zero return code
  rc: 2
  start: '2023-11-08 18:48:07.217026'
  stderr: |-
    tar: */sysinv.conf.default: Not found in archive
    tar: Exiting with failure status due to previous errors
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

By default in upgrade we are following "optimized restore" so the file "sysinv.conf.default" never gets restored during upgrade, so if legacy restore is performed it fails. But this may be needed for legacy restore performed on an non-upgraded system?

Changed in starlingx:
status: Fix Released → In Progress
assignee: ayyappa (mantri425) → Joshua Kraitberg (jkraitbe-wr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/900550

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/900550
Committed: https://opendev.org/starlingx/config/commit/d9e12f6c455c4c2705848c026467a646f22db813
Submitter: "Zuul (22348)"
Branch: master

commit d9e12f6c455c4c2705848c026467a646f22db813
Author: Joshua Kraitberg <email address hidden>
Date: Thu Nov 9 12:46:50 2023 -0500

    Fix: Retain sysinv data during migration

    The migration code was deleting /opt/platform/<FROM_RELEASE>/sysinv,
    before it was migrated to /opt/platform/<TO_RELEASE>/sysinv.
    This caused the files inside like `sysinv.conf.default` to be lost
    during simplex upgrade.

    Originally, in legacy restore, `sysinv.conf.default` was
    individually restored after migration so the deletion
    had no impact.

    `sysinv.conf.default` is required on non-SX systems.
    This is used so that other hosts sysinv-agent can mount and
    have an initial sysinv.conf suitable for RPC to the controller.

    The loss of the file is not problematic on a SX system, but would
    prevent a later SX-to-DX migration.

    TEST PLAN
    PASS: Optimized upgrade AIO-SX, stx6 to stx8
    PASS: Optimized upgrade AIO-SX, stx7 to stx8

    Closes-bug: 2042971
    Signed-off-by: Joshua Kraitberg <email address hidden>
    Change-Id: I7a22e050f74785b99ea6b7758cf23d3419add1de

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.