Backup and restore operation fails on a subcloud that was migrated to a non-primary site

Bug #2049651 reported by Gustavo Herzmann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Gustavo Herzmann

Bug Description

Brief Description
-----------------
During geo-redundancy feature testing, when attempting to run backup and restore on a subcloud that was migrated to the non-primary site, the restore operation fails due to a missing install values.

Severity
--------
<Major: System/Feature is usable but degraded>

Steps to Reproduce
------------------
1) Perform subcloud-peer-group migration from site1 to site2

2) Create the backup for the subclouds through below command:
dcmanager subcloud-backup create --subcloud subcloud3 --sysadmin-password Li69nux*

3) Restore a subcloud, including remote installation, from system backup data in central storage:

[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud subcloud3 --with-install
The server could not comply with the request since it is either malformed or otherwise incorrect. The restore operation was requested with_install, but the following subcloud(s) does not contain install values: subcloud3
ERROR (app) Unable to restore subcloud backup

Expected Behavior
------------------
Subcloud Restore should be successful

Actual Behavior
----------------
Subcloud restoration is failing

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Distributed Cloud

Branch/Pull Time/Commit
-----------------------
Master (2023-01-16)

Last Pass
---------
This is the first time attempting this test

Timestamp/Logs
--------------
[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud subcloud3 --with-install
The server could not comply with the request since it is either malformed or otherwise incorrect. The restore operation was requested with_install, but the following subcloud(s) does not contain install values: subcloud3
ERROR (app) Unable to restore subcloud backup

Test Activity
-------------
Feature Testing

Workaround
----------
Use the following command to update the subcloud with the install values, populating the data_install field, and then attempting to restore with --with-install again:
dcmanager subcloud update --install-values <install_values_yaml> subcloud3

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/905973

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/905973
Committed: https://opendev.org/starlingx/distcloud/commit/5fcf4aee3c3cd5cd83e020deaf630d7ab34562a0
Submitter: "Zuul (22348)"
Branch: master

commit 5fcf4aee3c3cd5cd83e020deaf630d7ab34562a0
Author: Gustavo Herzmann <email address hidden>
Date: Tue Jan 16 17:43:58 2024 -0300

    Synchronize install values with secondary subclouds

    This commit ensures that the install values of secondary subclouds on
    the non-primary site are synchronized with those on the primary site.
    This synchronization enables users to perform operations requiring the
    'data_install' field without having to execute 'dcmanager subcloud
    update --install-values <install_values.yaml> <secondary_subcloud_ref>'
    beforehand.

    Validation of install values is skipped on the secondary side when the
    request comes from a peer system. This is because the user may not
    intend to perform operations requiring the 'data_install' field.
    The values are already validated on the primary site. Without skipping
    validation, the sync operation could fail if the non-primary site does
    not have the load already imported.

    Test Plan:
    1. PASS: After the initial association sync, verify that secondary
             subclouds created on the non-primary site have the
             'data_install' field synchronized with subclouds on the
             primary site.
    2. PASS: After the initial sync is completed, update the install
             values on the primary site and re-sync. Verify that the
             new install values were synchronized between peers.
    3. PASS: Run the previous test when the non-primary site does not have
             the load imported and verify that the sync still works.
    4. PASS: Delete the 'data_install' field from the secondary subcloud
             on the non-primary site and then run
             'dcmanager peer-group-association sync' on the primary site.
             Verify that the secondary subcloud 'data_install' field is
             updated with the correct value.
    5. PASS: On the non-primary site, use 'dcmanager subcloud update' to
             update the install values, verifying that the command still
             works as expected.
    6. PASS: Repeat the previous test, but this time using an install
             values file without required fields, and verify that the
             operation fails during the install values validation.

    Closes-Bug: 2049651

    Change-Id: I4dbaaa16e40f6a214bbb93f9e48f614c10de7d42
    Signed-off-by: Gustavo Herzmann <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.distcloud stx.update
Changed in starlingx:
assignee: nobody → Gustavo Herzmann (gherzman)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.