Subsequent subcloud backup fails after powering off subcloud during backup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Guilherme Schons |
Bug Description
Brief Description
-----------------
The dcmanager fails when retrying to back up subcloud after powering off subcloud during the backup process.
Severity
--------
Major
Steps to Reproduce
------------------
Create subcloud backup
dcmanager subcloud-backup create --subcloud subcloud1 --sysadmin-password Li69nux*
Watch Ansible logs and power off the subcloud via IPMI after reaching the following task: "Run subcloud1 backup playbook"
Backup fails as it cannot connect to the subcloud (it will take some minutes)
'Failed to connect to the host via ssh: ssh: connect to host *** port 22: No route to host'
Check Backup status goes to 'failed' state
Power on the subcloud again
Once subcloud is online/in-sync, retry the subcloud backup creation:
dcmanager subcloud-backup create --subcloud subcloud1 --sysadmin-password Li69nux*
The backup operation fails with the following error:
msg: Backup is already in progress!
Try again and you'll get the same failure.
Expected Behavior
------------------
The operator is able to back up the subcloud again after a power-off failure
Actual Behavior
----------------
The operator can no longer backup a sub cloud whether the sub cloud goes down during a backup
Reproducibility
---------------
100% - Interrupt the backup process and try again, or manually create the .backup_in_progress file.
System Configuration
-------
Distributed Cloud
Last Pass
---------
Never tested before.
Timestamp/Logs
--------------
TASK [subcloud-
Tuesday 01 November 2022 18:03:36 +0000 (0:00:00.015) 0:08:36.261 ******
fatal: [subcloud1]: UNREACHABLE! => changed=false
msg: 'Failed to connect to the host via ssh: ssh: connect to host ***::1016 port 22: No route to host'
unreachable: true
| 8 | subcloud1 | managed | offline | complete | unknown | failed | None |
+----+-
// Powered on Subcloud1 and initiated backup process again
| 8 | subcloud1 | managed | online | complete | in-sync | failed | None |
+----+-
$ dcmanager subcloud-backup create --subcloud subcloud1 --sysadmin-password Li69nux*
| 8 | subcloud1 | managed | online | complete | in-sync | backing-up | None |
// Failed as previous backup was still in-progress
| 8 | subcloud1 | managed | online | complete | in-sync | failed | None |
TASK [backup/prepare-env : Fail if backup is already in progress] **************
Tuesday 01 November 2022 18:34:37 +0000 (0:00:00.293) 0:00:03.329 ******
fatal: [localhost]: FAILED! => changed=false
msg: Backup is already in progress!
Test Activity
-------------
Feature testing
Workaround
----------
SSH into the subcloud > Remove the hidden flag file - ./etc/platform/
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.8.0 stx.distcloud |
Changed in starlingx: | |
assignee: | nobody → Guilherme Schons (gdossant) |
Fix proposed to branch: master /review. opendev. org/c/starlingx /ansible- playbooks/ +/869478
Review: https:/