Backup & Restore: should not fail if backup is larger than the filesystems that will be used during restore

Bug #1992345 reported by Virginia Martins Perozim
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Virginia Martins Perozim

Bug Description

Brief Description
-----------------
If the backup file is larger than 1GB, scp to /tmp hangs and eventually the subcloud remote restore fails. There is no indication that the restore failed because of \tmp 1G size limit.

the issue was caused by a rather big backup file (3.5G) being transferred over scp using /tmp, that has only 1G. I provided a different directory (/home/sysadmin/ansible-restore) on the values.yaml we use on the dcmanager call:

Severity
--------
Major: Subcloud Remote Restore fails

Steps to Reproduce
------------------
1. Backup subcloud
2. Remote restore using the following restore values
initial_backup_dir: /home/sysadmin/subcloud2
backup_filename: <backup file name>
on_box_data: false

Expected Behavior
------------------
Subcloud remote restore should work

Actual Behavior
----------------
Remote restore fail

Reproducibility
---------------
Reproducible

System Configuration
--------------------
DC system with redfish enabled subcloud
DC9/subcloud2

###

Wind River Cloud Platform
Release 21.12
###
Wind River Systems, Inc.
###
SW_VERSION="21.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="2022-01-10_02-00-35"
SRC_BUILD_ID="45"

JOB="WRCP_21.12_Build"
BUILD_BY="jenkins"
BUILD_NUMBER="45"
BUILD_HOST="yow-cgts4-lx.wrs.com"
BUILD_DATE="2022-01-10 02:02:25 -0500"

Branch/Pull Time/Commit
-----------------------
-

Last Pass
---------
-

Timestamp/Logs
--------------
-

Test Activity
-------------
PATCH Testing

Workaround
----------
Use directory other than \tmp

initial_backup_dir: /home/sysadmin/subcloud2
backup_filename: localhost_platform_backup_2022_07_28_18_14_15.tgz
on_box_data: false
ansible_remote_tmp: /home/sysadmin/ansible-restore

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.8.0 stx.distcloud stx.update
Changed in starlingx:
importance: Undecided → Medium
assignee: nobody → Virginia Martins Perozim (vmperozim)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/860828
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/16e4a61f1617bd39463e41a76f10cc2faebd8557
Submitter: "Zuul (22348)"
Branch: master

commit 16e4a61f1617bd39463e41a76f10cc2faebd8557
Author: Virginia Martins Perozim <email address hidden>
Date: Mon Oct 10 10:57:09 2022 -0400

    Fail restore if no enough space for backup file

    During subcloud remote restore, when backup file is on the System
    controller, the backup file must be tranfered to subcloud. If the
    backup file is bigger than the /tmp, that is the default staging
    directory, the restore fails and there is no indication
    that it failed because of /tmp 1G size limit.
    The solution is to check available space of staging directory on
    the target before transferring the backup file. The playbook will
    fail with useful hint if there isn't enough space.

    Test Plan:

    PASSED: DC - subcloud B&R from system controller
            file less than 1G

    PASSED: DC - subcloud B&R from system controller
            file bigger than 1G
            ansible_remote_tmp defined

    PASSED: DC - subcloud B&R from system controller
            file bigger than 1G
            ansible_remote_tmp not defined

    PASSED: DC - subcloud local B&R
            file bigger than 1G

    PASSED: AIO-SX remote B&R
            file less than 1G

    PASSED: AIO-SX remote B&R
            file bigger than 1G

    Closes-Bug: 1992345
    Signed-off-by: Virginia Martins Perozim <email address hidden>
    Change-Id: If380713ccc3136856a55241cecd318b979f9231c

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.