During restore of central controller boot ISO placed on /opt/dc-vault while on / partition.

Bug #1914258 reported by Mihnea Saracin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Mihnea Saracin

Bug Description

Brief Description
-----------------
During restore of system controller the boot ISO is moved to /opt/dc-vault while on the / file system.

Severity
--------
Major

Steps to Reproduce
------------------
Run the restore platform procedure

Expected Behavior
------------------
The restore run is successful

Actual Behavior
----------------
Restore fails due to no space left

Reproducibility
---------------
100% if we have large files in /opt/dc-vault and it exceeds / size

System Configuration
--------------------
DC system (AIO-DX system controller)

Branch/Pull Time/Commit
-----------------------
stx master build on "2020-01-07"

Last Pass
---------
Didn't test it before

Timestamp/Logs
--------------

# Ansible log of the playbook:

TASK [restore-platform/restore-more-data : Restore dc-vault filesystem] **********************************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["tar", "-C", "/", "--overwrite", "-xpf", "/home/sysadmin/cr2_platform_backup_2021_01_07_17_22_03.tgz", "opt/dc-vault"], "delta": "0:00:32.374888", "end": "2021-01-07 20:27:32.168301", "msg": "non-zero return code", "rc": 2, "start": "2021-01-07 20:26:59.793413", "stderr": "tar: opt/dc-vault/loads/20.06/bootimage1.iso: Wrote only 7168 of 10240 bytes\ntar: opt/dc-vault/lost+found: Cannot mkdir: No space left on device\ntar:

# Reason why it failed

Restore on central controller fails due to "No space left on device" /opt/dc-vault pushes / over it's 20G size limit:

/ file system sized to 20G
O/S install consumes 11G of /
Backup file placed in /home/sysadmin consumes 7G of /
Restore initiated
Boot ISO placed in /opt/dc-vault consumes just over 2G of /
Restore fails due to no space left

# df of /opt/dc-vault after failed restore.
localhost:~$ df -BG /opt/dc-vault/
Filesystem 1G-blocks Used Available Use% Mounted on
/dev/sda4 20G 20G 0G 100% /

Test Activity
-------------
Normal use

Changed in starlingx:
assignee: nobody → Mihnea Saracin (msaracin)
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium - issue with backup & restore functionality

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.5.0 stx.update
Revision history for this message
Mihnea Saracin (msaracin) wrote :
Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ansible-playbooks (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/794298

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ansible-playbooks (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/792195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (f/centos8)
Download full text (52.5 KiB)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/794324
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/163ec9989cc7360dba4c572b2c43effd10306048
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 4e96b762f549aadb0291cc9bcf3352ae923e94eb
Author: Mihnea Saracin <email address hidden>
Date: Sat May 22 15:48:19 2021 +0000

    Revert "Restore host filesystems with collected sizes"

    This reverts commit 255488739efa4ac072424b19f2dbb7a3adb0254e.

    Reason for revert: Did a rework to fix https://bugs.launchpad.net/starlingx/+bug/1926591. The original problem was in puppet, and this fix in ansible was not good enough, it generated some other problems.

    Change-Id: Iea79701a874effecb7fe995ac468d50081d1a84f
    Depends-On: I55ae6954d24ba32e40c2e5e276ec17015d9bba44

commit c064aacc377c8bd5336ceab825d4bcbf5af0b5e8
Author: Angie Wang <email address hidden>
Date: Fri May 21 21:28:02 2021 -0400

    Ensure apiserver keys are present before extract from tarball

    This is to fix the upgrade playbook issue that happens during
    AIO-SX upgrade from stx4.0 to stx5.0 which introduced by
    https://review.opendev.org/c/starlingx/ansible-playbooks/+/792093.
    The apiserver keys are not available in stx4.0 side so we need
    to ensure the keys under /etc/kubernetes/pki are present in the
    backed-up tarball before extracting, otherwise playbook fails
    because the keys are not found in the archive.

    Change-Id: I8602f07d1b1041a7fd3fff21e6f9a422b9784ab5
    Closes-Bug: 928925
    Signed-off-by: Angie Wang <email address hidden>

commit 0261f22ff7c23d2a8608fe3b51725c9f29931281
Author: Don Penney <email address hidden>
Date: Thu May 20 23:09:07 2021 -0400

    Update SX to DX migration to wait for coredns config

    This commit updates the SX to DX migration playbook to wait after
    modifying the system mode to duplex until the runtime manifest that
    updates coredns config has completed. The playbook will wait for up to
    20 minutes to allow for the possibilty that sysinv has multiple
    runtime manifests queued up, each of which could take several minutes.

    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/792494
    Depends-On: https://review.opendev.org/c/starlingx/config/+/792496
    Change-Id: I3bf94d3493ae20eeb16b3fdcb27576ee18c0dc4d
    Closes-Bug: 1929148
    Signed-off-by: Don Penney <email address hidden>

commit 7c4f17bd0d92fc1122823211e1c9787829d206a9
Author: Daniel Safta <email address hidden>
Date: Wed May 19 09:08:16 2021 +0000

    Fixed missing apiserver-etcd-client certs

    When controller-1 is the active controller
    the backup archive does not contain
    /etc/etcd/apiserver-etcd-client.{crt, key}

    This change adds a new task which brings
    the certs from /etc/kubernetes/pki

    Closes-bug: 1928925
    Signed-off-by: Daniel Safta <email address hidden>
    Change-Id: I3c68377603e1af9a71d104e5b1108e9582497a09

commit e221ef8fbe51aa6ca229b584fb5632fe512ad5cb
Author: David Sullivan <email address hidden>
Date: Wed May 19 16:01:27 2021 -0500

    Support boo...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.