Restore playbook is not reentrant

Bug #1987536 reported by Thiago Paiva Brito
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
New
Low
Unassigned

Bug Description

Brief Description
-----------------
The restore_platform.yml playbook imports the bootstrap playbook to do some of the tasks to restore the system, but if any of the boostrap tasks fails during the restore, it's not possible to re-run the restore due to a leftover flag that is not removed on teardown

Severity
--------
Minor: System/Feature is usable with minor issue

Steps to Reproduce
------------------
- Run restore with any config that will make bootstrap fail

Expected Behavior
------------------
Playbook fails but can be re-executed

Actual Behavior
----------------
Playbook doesn't execute again, forcing re-installation of the system or manual removal of flag file (for experts)

Reproducibility
---------------
10/10 when the restore fails during bootstrap tasks

System Configuration
--------------------
AIO-SX (Might happen on all configs)

Branch/Pull Time/Commit
-----------------------
2022-08-22

Last Pass
---------
N/A

Timestamp/Logs
--------------

### 1st run failed because of this task
2022-08-23 20:16:43,861 p=1209313 u=tbrito n=ansible | TASK [bootstrap/bringup-essential-services : set_fact] ******************************************************************************************************************************************
2022-08-23 20:16:43,874 p=1209313 u=tbrito n=ansible | fatal: [lab_vbox_2]: FAILED! => {
    "msg": "The task includes an option with an undefined variable. The error was: 'target_backup_dir' is undefined\n\nThe error appears to be in '/home/tbrito/workspace/repos/starlingx/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/bringup-essential-services/tasks/main.yml': line 81, column 9, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n block:\n - set_fact:\n ^ here\n"
}
2022-08-23 20:16:43,876 p=1209313 u=tbrito n=ansible | PLAY RECAP **************************************************************************************************************************************************************************************
2022-08-23 20:16:43,876 p=1209313 u=tbrito n=ansible | lab_vbox_2 : ok=299 changed=122 unreachable=0 failed=1 skipped=311 rescued=0 ignored=0

### 2nd run after defining variable failed here:
2022-08-24 09:54:16,896 p=1338321 u=tbrito n=ansible | TASK [restore-platform/prepare-env : Check if restore is in progress] ***************************************************************************************************************************
2022-08-24 09:54:16,908 p=1338754 u=tbrito n=ansible | <10.127.130.10> ESTABLISH SSH CONNECTION FOR USER: sysadmin
2022-08-24 09:54:16,912 p=1338754 u=tbrito n=ansible | <10.127.130.10> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o Port=10200 -o 'User="sysadmin"' -o ConnectTimeout=10 -o ControlPath=/home/tbrito/.ansible/cp/7f214f7080 10.127.130.10 '/bin/sh -c '"'"'echo ~sysadmin && sleep 0'"'"''
2022-08-24 09:54:17,008 p=1338754 u=tbrito n=ansible | <10.127.130.10> (0, b'/home/sysadmin\n', b'')
2022-08-24 09:54:17,010 p=1338754 u=tbrito n=ansible | <10.127.130.10> ESTABLISH SSH CONNECTION FOR USER: sysadmin
2022-08-24 09:54:17,012 p=1338754 u=tbrito n=ansible | <10.127.130.10> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o Port=10200 -o 'User="sysadmin"' -o ConnectTimeout=10 -o ControlPath=/home/tbrito/.ansible/cp/7f214f7080 10.127.130.10 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /home/sysadmin/.ansible/tmp `"&& mkdir "` echo /home/sysadmin/.ansible/tmp/ansible-tmp-1661345657.009449-1338754-142997854121028 `" && echo ansible-tmp-1661345657.009449-1338754-142997854121028="` echo /home/sysadmin/.ansible/tmp/ansible-tmp-1661345657.009449-1338754-142997854121028 `" ) && sleep 0'"'"''
2022-08-24 09:54:17,168 p=1338754 u=tbrito n=ansible | <10.127.130.10> (0, b'ansible-tmp-1661345657.009449-1338754-142997854121028=/home/sysadmin/.ansible/tmp/ansible-tmp-1661345657.009449-1338754-142997854121028\n', b'')
2022-08-24 09:54:17,173 p=1338754 u=tbrito n=ansible | Using module file /tmp/tbrito_ansible-playbookstox/venv/lib/python3.9/site-packages/ansible/modules/stat.py
2022-08-24 09:54:17,175 p=1338754 u=tbrito n=ansible | <10.127.130.10> PUT /home/tbrito/.ansible/tmp/ansible-local-1338321yf2oqsu_/tmpwnlf503j TO /home/sysadmin/.ansible/tmp/ansible-tmp-1661345657.009449-1338754-142997854121028/AnsiballZ_stat.py
2022-08-24 09:54:17,177 p=1338754 u=tbrito n=ansible | <10.127.130.10> SSH: EXEC sshpass -d11 sftp -o BatchMode=no -b - -C -o ControlMaster=auto -o ControlPersist=60s -o Port=10200 -o 'User="sysadmin"' -o ConnectTimeout=10 -o ControlPath=/home/tbrito/.ansible/cp/7f214f7080 '[10.127.130.10]'
2022-08-24 09:54:17,526 p=1338754 u=tbrito n=ansible | <10.127.130.10> (0, b'sftp> put /home/tbrito/.ansible/tmp/ansible-local-1338321yf2oqsu_/tmpwnlf503j /home/sysadmin/.ansible/tmp/ansible-tmp-1661345657.009449-1338754-142997854121028/AnsiballZ_stat.py\n', b'')
2022-08-24 09:54:17,528 p=1338754 u=tbrito n=ansible | <10.127.130.10> ESTABLISH SSH CONNECTION FOR USER: sysadmin
2022-08-24 09:54:17,530 p=1338754 u=tbrito n=ansible | <10.127.130.10> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o Port=10200 -o 'User="sysadmin"' -o ConnectTimeout=10 -o ControlPath=/home/tbrito/.ansible/cp/7f214f7080 10.127.130.10 '/bin/sh -c '"'"'chmod u+x /home/sysadmin/.ansible/tmp/ansible-tmp-1661345657.009449-1338754-142997854121028/ /home/sysadmin/.ansible/tmp/ansible-tmp-1661345657.009449-1338754-142997854121028/AnsiballZ_stat.py && sleep 0'"'"''
2022-08-24 09:54:17,634 p=1338754 u=tbrito n=ansible | <10.127.130.10> (0, b'', b'')
2022-08-24 09:54:17,635 p=1338754 u=tbrito n=ansible | <10.127.130.10> ESTABLISH SSH CONNECTION FOR USER: sysadmin
2022-08-24 09:54:17,635 p=1338754 u=tbrito n=ansible | <10.127.130.10> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o Port=10200 -o 'User="sysadmin"' -o ConnectTimeout=10 -o ControlPath=/home/tbrito/.ansible/cp/7f214f7080 -tt 10.127.130.10 '/bin/sh -c '"'"'sudo -H -S -p "[sudo via ansible, key=zlttdfhtzadcdtwnhanaaezfildzqtsr] password:" -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-zlttdfhtzadcdtwnhanaaezfildzqtsr ; /usr/bin/python /home/sysadmin/.ansible/tmp/ansible-tmp-1661345657.009449-1338754-142997854121028/AnsiballZ_stat.py'"'"'"'"'"'"'"'"' && sleep 0'"'"''
2022-08-24 09:54:17,821 p=1338754 u=tbrito n=ansible | Escalation succeeded
2022-08-24 09:54:18,207 p=1338754 u=tbrito n=ansible | <10.127.130.10> (0, b'\r\n\r\n{"changed": false, "stat": {"exists": true, "path": "/etc/platform/.restore_in_progress", "mode": "0644", "isdir": false, "ischr": false, "isblk": false, "isreg": true, "isfifo": false, "islnk": false, "issock": false, "uid": 0, "gid": 0, "size": 0, "inode": 1053275, "dev": 2052, "nlink": 1, "atime": 1661284731.32077, "mtime": 1661284731.32077, "ctime": 1661284731.32077, "wusr": true, "rusr": true, "xusr": false, "wgrp": false, "rgrp": true, "xgrp": false, "woth": false, "roth": true, "xoth": false, "isuid": false, "isgid": false, "blocks": 0, "block_size": 4096, "device_type": 0, "readable": true, "writeable": true, "executable": false, "pw_name": "root", "gr_name": "root", "checksum": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "mimetype": "inode/x-empty", "charset": "binary", "version": "3903128544", "attributes": ["extents"], "attr_flags": "e"}, "invocation": {"module_args": {"path": "/etc/platform/.restore_in_progress", "follow": false, "get_md5": false, "get_checksum": true, "get_mime": true, "get_attributes": true, "checksum_algorithm": "sha1"}}}\r\n', b'Shared connection to 10.127.130.10 closed.\r\n')
2022-08-24 09:54:18,210 p=1338754 u=tbrito n=ansible | <10.127.130.10> ESTABLISH SSH CONNECTION FOR USER: sysadmin
2022-08-24 09:54:18,212 p=1338754 u=tbrito n=ansible | <10.127.130.10> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o Port=10200 -o 'User="sysadmin"' -o ConnectTimeout=10 -o ControlPath=/home/tbrito/.ansible/cp/7f214f7080 10.127.130.10 '/bin/sh -c '"'"'rm -f -r /home/sysadmin/.ansible/tmp/ansible-tmp-1661345657.009449-1338754-142997854121028/ > /dev/null 2>&1 && sleep 0'"'"''
2022-08-24 09:54:18,313 p=1338754 u=tbrito n=ansible | <10.127.130.10> (0, b'', b'')
2022-08-24 09:54:18,317 p=1338321 u=tbrito n=ansible | ok: [lab_vbox_2] => {
    "changed": false,
    "invocation": {
        "module_args": {
            "checksum_algorithm": "sha1",
            "follow": false,
            "get_attributes": true,
            "get_checksum": true,
            "get_md5": false,
            "get_mime": true,
            "path": "/etc/platform/.restore_in_progress"
        }
    },
    "stat": {
        "atime": 1661284731.32077,
        "attr_flags": "e",
        "attributes": [
            "extents"
        ],
        "block_size": 4096,
        "blocks": 0,
        "charset": "binary",
        "checksum": "da39a3ee5e6b4b0d3255bfef95601890afd80709",
        "ctime": 1661284731.32077,
        "dev": 2052,
        "device_type": 0,
        "executable": false,
        "exists": true,
        "gid": 0,
        "gr_name": "root",
        "inode": 1053275,
        "isblk": false,
        "ischr": false,
        "isdir": false,
        "isfifo": false,
        "isgid": false,
        "islnk": false,
        "isreg": true,
        "issock": false,
        "isuid": false,
        "mimetype": "inode/x-empty",
        "mode": "0644",
        "mtime": 1661284731.32077,
        "nlink": 1,
        "path": "/etc/platform/.restore_in_progress",
        "pw_name": "root",
        "readable": true,
        "rgrp": true,
        "roth": true,
        "rusr": true,
        "size": 0,
        "uid": 0,
        "version": "3903128544",
        "wgrp": false,
        "woth": false,
        "writeable": true,
        "wusr": true,
        "xgrp": false,
        "xoth": false,
        "xusr": false
    }
}
2022-08-24 09:54:18,324 p=1338321 u=tbrito n=ansible | Read vars_file 'vars/common/main.yml'
2022-08-24 09:54:18,325 p=1338321 u=tbrito n=ansible | Read vars_file 'host_vars/backup-restore/default.yml'
2022-08-24 09:54:18,326 p=1338321 u=tbrito n=ansible | TASK [restore-platform/prepare-env : Fail if restore is already in progress] ********************************************************************************************************************
2022-08-24 09:54:18,349 p=1338321 u=tbrito n=ansible | fatal: [lab_vbox_2]: FAILED! => {
    "changed": false,
    "msg": " Restore is already in progress!"
}
2022-08-24 09:54:18,351 p=1338321 u=tbrito n=ansible | PLAY RECAP **************************************************************************************************************************************************************************************
2022-08-24 09:54:18,352 p=1338321 u=tbrito n=ansible | lab_vbox_2 : ok=22 changed=6 unreachable=0 failed=1 skipped=28 rescued=0 ignored=0

Test Activity
-------------
Developer Testing

Workaround
----------
- sudo rm /etc/platform/.restore_in_progress
- Retry running restore_platform.yml playbook

Tags: stx.update
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
tags: added: stx.update
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.