Restore of the system controller fails if backup is taken with unprovisioned host.

Bug #1960321 reported by Virginia Martins Perozim
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Virginia Martins Perozim

Bug Description

Brief Description
-----------------
Restore of the system controller fails if backup is taken with unprovisioned host.

Severity
--------
Major: Restore fails and system not usable after that

Steps to Reproduce
------------------
1. Backup the system with unprovisioned host
$ansible-playbook /usr/share/ansible/stx-ansible/playbooks/backup.yml -e "ansible_become_pass=Li69nux* admin_password=Li69nux*" -e "backup_user_local_registry=true"
2. Reinstall controller-0
3. Restore controller-0 from backup file
ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "backup_filename=localhost_platform_backup_2021_11_10_14_59_21.tgz admin_password=Li69nux* ansible_become_pass=Li69nux* initial_backup_dir=/home/sysadmin"

Expected Behavior
------------------
Backup and Restore should end with success.

Actual Behavior
----------------
Backup was successful and restore fails

Reproducibility
---------------
Reproducible

System Configuration
--------------------
DC system, AIO-DX, Standard

Branch/Pull Time/Commit
-----------------------
master

Last Pass
---------
N/A

Timestamp/Logs
--------------
TASK [recover-ceph-data : Get list of OSDs defined in /etc/ceph/ceph.conf] ************************************************************************************************************************************************************************************************************Wednesday 10 November 2021 18:32:45 +0000 (0:00:12.174) 0:32:31.509 ****
changed: [localhost]

TASK [recover-ceph-data : Remove "[osd.*]" sections from /etc/ceph/ceph.conf] *********************************************************************************************************************************************************************************************************Wednesday 10 November 2021 18:32:46 +0000 (0:00:00.164) 0:32:31.674 ****
changed: [localhost] => (item=osd.0)

TASK [recover-ceph-data : Set initial ceph-mon name] **********************************************************************************************************************************************************************************************************************************Wednesday 10 November 2021 18:32:46 +0000 (0:00:00.340) 0:32:32.014 ****
ok: [localhost]

TASK [recover-ceph-data : Update host config data to get ceph-mon size] ***************************************************************************************************************************************************************************************************************Wednesday 10 November 2021 18:32:46 +0000 (0:00:00.034) 0:32:32.049 ****
changed: [localhost]

TASK [recover-ceph-data : Fail if host hieradata cannot be generated] *****************************************************************************************************************************************************************************************************************Wednesday 10 November 2021 18:32:52 +0000 (0:00:06.187) 0:32:38.236 ****
fatal: [localhost]: FAILED! => changed=false
msg: Failed to create puppet hiera host config.

PLAY RECAP ****************************************************************************************************************************************************************************************************************************************************************************localhost : ok=511 changed=287 unreachable=0 failed=1

Test Activity
-------------
Regression Testing

Workaround
----------
watch the playbook output until this completes
2021-11-10 18:32:26,797 p=10486 u=sysadmin | Wednesday 10 November 2021 18:32:26 +0000 (0:01:30.560) 0:32:12.449 ****
2021-11-10 18:32:27,033 p=10486 u=sysadmin | changed: [localhost]
2021-11-10 18:32:27,038 p=10486 u=sysadmin | TASK [restore-platform/restore-more-data : Check controller-0 is in online state]
Then issue system host-delete 3

Changed in starlingx:
assignee: nobody → Virginia Martins Perozim (vmperozim)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Ghada Khalil (gkhalil)
tags: added: stx.update
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/828914
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/3b5aa651e5fa278bb513234bb282e3b5ebaadfeb
Submitter: "Zuul (22348)"
Branch: master

commit 3b5aa651e5fa278bb513234bb282e3b5ebaadfeb
Author: Virginia Martins Perozim <email address hidden>
Date: Fri Feb 11 11:34:17 2022 -0500

    Handle B&R for unprovisioned hosts

    During the Restore
    -----------------
    Remove unprovisioned hosts in case the backup had been done with them.
    It was a modification in the restore-platform/restore-more-data, when
    the hosts are already loaded from backup and after check if
    controller-0 is online.

    Test Plan:

    PASS: controller-0 + controller-1 + None(id=3) + None(id=4) (backup)
    PASS: controller-0 + controller-1 + None(id=3) + None(id=5) (backup)
    PASS: controller-0 + controller-1 (backup)
    PASS: controller-0 + controller-1 + None(id=3) + None(id=4) (restore)
    PASS: controller-0 + controller-1 + None(id=3) + None(id=5) (restore)
    PASS: controller-0 + controller-1 (restore)

    Regression:

    PASS: AIO-SX upgrade

    Closes-Bug: 1960321
    Signed-off-by: Virginia Martins Perozim <email address hidden>
    Change-Id: I2adc59f2f58a0efa7426b03e38a920bb9cc82537

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.7.0
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.