DC restore subcloud group to previous release: Some subclouds were installed with active release instead of inactive release

Bug #2044564 reported by Victor Romano
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Victor Romano

Bug Description

Brief Description

DC subcloud restore group to previous release: Some subclouds was installed to active release instead of inactive release

CMD:

dcmanager subcloud-backup restore --group restore-group --with-install --release 21.12

Severity

Minor

Note: The reproducibility rate is low and we can run the restore operation again once it fails. And so far we've seen it only for sushy subclouds

Steps to Reproduce

Deploy SystemController and subclouds with 21.12
Upgrade SystemController
apply network restrictions
backup subclouds
upgrade subclouds
restore subclouds back to 21.12

Expected Behavior

The subclouds should be restored to 21.12 load

Actual Behavior

some subclouds installed with 22.12 load and the backup failed as the 22.12 backup was not available

Reproducibility

2 out of 93 subclouds

System Configuration

dc

Load info (eg: 2022-03-10_20-00-07)

cat /etc/build.info
SW_VERSION="22.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="2022-12-19_02-22-00"
SRC_BUILD_ID="38"JOB="wrcp-22.12-debian"
BUILD_BY="jenkins"
BUILD_NUMBER="50"
BUILD_HOST="yow-wrcp-lx.wrs.com"
BUILD_DATE="2022-12-19 07:22:00 +0000"

Last Pass

intermittent issue

Timestamp/Logs

logs:

TASK [subcloud-bnr/restore : Fail if there is no backup file or there are more than one backup file] ***
Thursday 23 November 2023 22:56:14 +0000 (0:00:00.460) 0:00:07.635 *****
fatal: [subcloud3015]: FAILED! => changed=false
  msg: There must be one platform backup file in /opt/dc-vault/backups/subcloud3015/22.12.

Alarms

no alarms

Test Activity

Regression Testing

Workaround

re-run restore operation again

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/901864
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/e7167fdc7382b2e0f88ee29d8520a3d8bfb6198c
Submitter: "Zuul (22348)"
Branch: master

commit e7167fdc7382b2e0f88ee29d8520a3d8bfb6198c
Author: Victor Romano <email address hidden>
Date: Fri Nov 24 14:39:18 2023 -0300

    Enforce password change when installing via rvmc

    When installing a subcloud via rvmc, it's possible that the BMC
    reports a successful operation when booting with new install media
    without actually installing the subcloud. To prevent this, a new
    variable called enforce_password_change was created, failing the
    installation if there is an error during initial password change
    (observed if the system wasn't correctly reinstalled).

    Test plan:
      - PASS: Perform a subcloud add with install forcing a failure by
              changing the password before the playbook and verify the
              installation fails as expected.
      - PASS: Perform a normal subcloud add with install and verify
              the operation completed successfully.
      - PASS: Deploy a standalone SX and verify the bootstrap completed
              successfully.
      - PASS: Upgrade a subcloud from stx6 to stx8 and verify the
              operation completed successfully.

    Closes-Bug: 2044564

    Change-Id: I80bee246dedfdf9688507c3529d7d080992da08b
    Signed-off-by: Victor Romano <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
tags: added: stx.9.0 stx.distcloud
Changed in starlingx:
assignee: nobody → Victor Romano (vgluzrom)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.