Backup & Restore: AIO-SX backup hangs on IPv6 setup when creating the etcd snapshot

Bug #1916053 reported by Mihnea Saracin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Mihnea Saracin

Bug Description

Brief Description
-----------------
AIO-SX hangs at backup when it was trying to create the etcd snapshot.(IPv6 environment)

Severity
--------
Major

Steps to Reproduce
------------------
1. Deploy a SX setup
2. Do a backup

Expected Behavior
------------------
AIO-SX backup completed successfully

Actual Behavior
----------------
AIO-SX backup hangs

Reproducibility
---------------
Happened 2/2 times

System Configuration
--------------------
AIO-SX IPv6

Branch/Pull Time/Commit
-----------------------
stx master build on "2021-02-15"

Last Pass
---------
N/A

Timestamp/Logs
--------------

E TASK [common/prepare-env : stat] *************************************************************************************************************************************************************************************************************************
 E ok: [localhost -> localhost] => (item=/home/sysadmin/secrets.yml)
 E ok: [localhost -> localhost] => (item=/home/sysadmin/localhost_secrets.yml)
 E ok: [localhost -> localhost] => (item=/home/sysadmin/site.yml)
 E ok: [localhost -> localhost] => (item=/home/sysadmin/localhost.yml)
 E
 E TASK [common/prepare-env : include_vars] *****************************************************************************************************************************************************************************************************************
 E ok: [localhost] => (item={'_ansible_parsed': True, u'stat': {u'isuid': False, u'uid': 42425, u'exists': True, u'attr_flags': u'e', u'woth': False, u'device_type': 0, u'mtime': 1613376938.0, u'block_size': 4096, u'inode': 794990, u'isgid': False, u'size': 612, u'executable': False, u'roth': True, u'charset': u'us-ascii', u'readable': True, u'isreg': True, u'version': u'18446744073635291974', u'pw_name': u'sysadmin', u'gid': 345, u'ischr': False, u'wusr': True, u'writeable': True, u'isdir': False, u'blocks': 8, u'xoth': False, u'rusr': True, u'nlink': 1, u'issock': False, u'rgrp': True, u'gr_name': u'sys_protected', u'path': u'/home/sysadmin/site.yml', u'xusr': False, u'atime': 1613573491.9119966, u'mimetype': u'text/x-c', u'ctime': 1613573375.6709979, u'isblk': False, u'checksum': u'b15847ff866422a484fe61aa5e5a642b4344702a', u'dev': 2052, u'wgrp': True, u'isfifo': False, u'mode': u'0664', u'xgrp': False, u'islnk': False, u'attributes': [u'extents']}, '_ansible_item_result': True, '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'localhost', 'ansible_host': u'localhost'}, u'changed': False, 'failed': False, 'item': u'/home/sysadmin/site.yml', u'invocation': {u'module_args': {u'get_checksum': True, u'follow': False, u'checksum_algorithm': u'sha1', u'path': u'/home/sysadmin/site.yml', u'get_mime': True, u'get_md5': None, u'get_attributes': True}}, '_ansible_ignore_errors': None, '_ansible_item_label': u'/home/sysadmin/site.yml'})
 E ok: [localhost] => (item={'_ansible_parsed': True, u'stat': {u'isuid': False, u'uid': 42425, u'exists': True, u'attr_flags': u'e', u'woth': False, u'device_type': 0, u'mtime': 1613376938.0, u'block_size': 4096, u'inode': 795006, u'isgid': False, u'size': 793, u'executable': False, u'roth': True, u'charset': u'us-ascii', u'readable': True, u'isreg': True, u'version': u'18446744073635291990', u'pw_name': u'sysadmin', u'gid': 345, u'ischr': False, u'wusr': True, u'writeable': True, u'isdir': False, u'blocks': 8, u'xoth': False, u'rusr': True, u'nlink': 1, u'issock': False, u'rgrp': True, u'gr_name': u'sys_protected', u'path': u'/home/sysadmin/localhost.yml', u'xusr': False, u'atime': 1613573492.0529966, u'mimetype': u'text/x-c', u'ctime': 1613573378.3439977, u'isblk': False, u'checksum': u'47a0aa9cee9029d22348748da6e3ceac2a1ec997', u'dev': 2052, u'wgrp': True, u'isfifo': False, u'mode': u'0664', u'xgrp': False, u'islnk': False, u'attributes': [u'extents']}, '_ansible_item_result': True, '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'localhost', 'ansible_host': u'localhost'}, u'changed': False, 'failed': False, 'item': u'/home/sysadmin/localhost.yml', u'invocation': {u'module_args': {u'get_checksum': True, u'follow': False, u'checksum_algorithm': u'sha1', u'path': u'/home/sysadmin/localhost.yml', u'get_mime': True, u'get_md5': None, u'get_attributes': True}}, '_ansible_ignore_errors': None, '_ansible_item_label': u'/home/sysadmin/localhost.yml'})
 E
 E TASK [common/prepare-env : Set SSH port] *****************************************************************************************************************************************************************************************************************
 E
 E TASK [common/prepare-env : Update SSH known hosts] *******************************************************************************************************************************************************************************************************
 E
 E TASK [common/prepare-env : Check connectivity] ***********************************************************************************************************************************************************************************************************
 E
 E TASK [common/prepare-env : Fail if host is unreachable] **************************************************************************************************************************************************************************************************
 E
 E TASK [common/prepare-env : Fail if password change response sequence is not defined] *********************************************************************************************************************************************************************
 E
 E TASK [common/prepare-env : debug] ************************************************************************************************************************************************************************************************************************
 E
 E TASK [common/prepare-env : Change initial password] ******************************************************************************************************************************************************************************************************
 E
 E TASK [backup-restore/prepare-env : Check archive dir] ****************************************************************************************************************************************************************************************************
 E ok: [localhost]
 E
 E TASK [backup-restore/prepare-env : Fail if archive dir does not exist] ***********************************************************************************************************************************************************************************
 E
 E TASK [backup-restore/prepare-env : Retrieve software version number] *************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup-restore/prepare-env : Fail if software version is not defined] ******************************************************************************************************************************************************************************
 E
 E TASK [backup-restore/prepare-env : Retrieve system type] *************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup-restore/prepare-env : Fail if system type is not defined] ***********************************************************************************************************************************************************************************
 E
 E TASK [backup-restore/prepare-env : Set software version fact] ********************************************************************************************************************************************************************************************
 E ok: [localhost]
 E
 E TASK [backup/prepare-env : Check if backup is in progress] ***********************************************************************************************************************************************************************************************
 E ok: [localhost]
 E
 E TASK [backup/prepare-env : Fail if backup is already in progress] ****************************************************************************************************************************************************************************************
 E
 E TASK [backup/prepare-env : Check if it is the active controller] *****************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/prepare-env : Fail if it is not an active controller] ***************************************************************************************************************************************************************************************
 E
 E TASK [backup/prepare-env : Check disk usage of /home directory] ******************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/prepare-env : Fail if disk usage of /home directory is over 2000MB] *************************************************************************************************************************************************************************
 E
 E TASK [backup/prepare-env : Search for system_mode in /etc/platform/platform.conf] ************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/prepare-env : Fail if system_mode is not set in /etc/platform/platform.conf] ****************************************************************************************************************************************************************
 E
 E TASK [backup/prepare-env : Check if portieris application is applied] ************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/prepare-env : Fail if portieris application is applied] *************************************************************************************************************************************************************************************
 E
 E TASK [backup/prepare-env : set system_mode] **************************************************************************************************************************************************************************************************************
 E ok: [localhost]
 E
 E TASK [backup/prepare-env : Set config path facts] ********************************************************************************************************************************************************************************************************
 E ok: [localhost]
 E
 E TASK [backup/prepare-env : Check if ceph is configured] **************************************************************************************************************************************************************************************************
 E ok: [localhost]
 E
 E TASK [backup/prepare-env : Create backup in progress flag file] ******************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Generate backup_in_progress alarm] ******************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Fail if alarm script throws an exception] ***********************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Create temp dir] ************************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Create postgres temp dir] ***************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Backup roles, table spaces and schemas for databases.] **********************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Backup postgres, template1, sysinv, barbican, helmv2 db data] ***************************************************************************************************************************************************************
 E changed: [localhost] => (item=postgres)
 E changed: [localhost] => (item=template1)
 E changed: [localhost] => (item=sysinv)
 E changed: [localhost] => (item=barbican)
 E changed: [localhost] => (item=helmv2)
 E
 E TASK [backup/backup-system : Backup fm db data] **********************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Backup keystone db data] ****************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Check if it is dc controller] ***********************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Backup dcmanager db for dc controller] **************************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Backup dcorch db for dc controller] *****************************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Update dcorch tables that will be excluded from backup] *********************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Backup dcorch db] ***********************************************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Create mariadb temp dir] ****************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Check if mariadb pod is running] ********************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Set k8s cmd prefix] *********************************************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Show databases] *************************************************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Backup mariadb] *************************************************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Get stx-openstack status] ***************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Fail the backup if MariaDB is not running] **********************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Create Helm overrides temp dir] *********************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Get the openstack Helm overrides from the from the database] ****************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Generate postgres update commands for Helm overrides] ***********************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Backup Helm overrides] ******************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Check the size (in KiB) of directories that will be backed up for platform] *************************************************************************************************************************************************
 E changed: [localhost] => (item=/etc)
 E changed: [localhost] => (item=/home)
 E changed: [localhost] => (item=/opt/platform/config/20.12)
 E changed: [localhost] => (item=/opt/platform/sysinv/20.12)
 E changed: [localhost] => (item=/opt/platform/puppet/20.12/hieradata)
 E changed: [localhost] => (item=/opt/platform/.keyring/20.12)
 E changed: [localhost] => (item=/opt/platform/extra)
 E changed: [localhost] => (item=/opt/patching)
 E changed: [localhost] => (item=/www/pages/updates)
 E changed: [localhost] => (item=/opt/extension)
 E changed: [localhost] => (item=/opt/dc-vault)
 E changed: [localhost] => (item=/opt/platform/deploy/20.12)
 E changed: [localhost] => (item=/opt/backups/ansible.LQyePE/postgres)
 E changed: [localhost] => (item=/opt/platform/armada/20.12)
 E changed: [localhost] => (item=/opt/platform/helm_charts)
 E changed: [localhost] => (item=/opt/platform/helm/20.12)
 E changed: [localhost] => (item=/opt/backups/ansible.LQyePE/helm_overrides_dir)
 E
 E TASK [backup/backup-system : Estimate the total required disk size for platform backup archive] **********************************************************************************************************************************************************
 E ok: [localhost] => (item=/etc)
 E ok: [localhost] => (item=/home)
 E ok: [localhost] => (item=/opt/platform/config/20.12)
 E ok: [localhost] => (item=/opt/platform/sysinv/20.12)
 E ok: [localhost] => (item=/opt/platform/puppet/20.12/hieradata)
 E ok: [localhost] => (item=/opt/platform/.keyring/20.12)
 E ok: [localhost] => (item=/opt/platform/extra)
 E ok: [localhost] => (item=/opt/patching)
 E ok: [localhost] => (item=/www/pages/updates)
 E ok: [localhost] => (item=/opt/extension)
 E ok: [localhost] => (item=/opt/dc-vault)
 E ok: [localhost] => (item=/opt/platform/deploy/20.12)
 E ok: [localhost] => (item=/opt/backups/ansible.LQyePE/postgres)
 E ok: [localhost] => (item=/opt/platform/armada/20.12)
 E ok: [localhost] => (item=/opt/platform/helm_charts)
 E ok: [localhost] => (item=/opt/platform/helm/20.12)
 E ok: [localhost] => (item=/opt/backups/ansible.LQyePE/helm_overrides_dir)
 E
 E TASK [backup/backup-system : Check the free space in the archive dir] ************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Parse backup directory size] ************************************************************************************************************************************************************************************************
 E ok: [localhost]
 E
 E TASK [backup/backup-system : Fail if there is not enough free space in the archive dir to create platform backup] ****************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Estimate remaining space after reserving space for platform backup] *********************************************************************************************************************************************************
 E ok: [localhost]
 E
 E TASK [backup/backup-system : Check the size (in KiB) of directories that will be backed up for openstack] ************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Estimate the total required disk size for platform openstack archive] *******************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Fail if there is not enough free space in the archive dir to create openstack backup] ***************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Estimate remaining space after reserving space for openstack backup] ********************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Create ldap temp dir] *******************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Name ldap db backup] ********************************************************************************************************************************************************************************************************
 E ok: [localhost]
 E
 E TASK [backup/backup-system : Backup ldap db] *************************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Create ceph temp dir] *******************************************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Name ceph crushmap backup] **************************************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Create ceph crushmap backup] ************************************************************************************************************************************************************************************************
 E
 E TASK [backup/backup-system : Create etcd snapshot temp dir] **********************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Name etcd snapshot backup] **************************************************************************************************************************************************************************************************
 E ok: [localhost]
 E
 E TASK [backup/backup-system : Get etcd endpoints] *********************************************************************************************************************************************************************************************************
 E changed: [localhost]
 E
 E TASK [backup/backup-system : Create etcd snapshot]

And from here it hangs indefinitely.

Test Activity
-------------
Normal use

Changed in starlingx:
assignee: nobody → Mihnea Saracin (msaracin)
summary: - Backup & Restore: AIO-SX backup hangs when creating the etcd snapshot
+ Backup & Restore: AIO-SX backup hangs on IPv6 setup when creating the
+ etcd snapshot
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / high - issue w/ backup & restore

Changed in starlingx:
status: New → Triaged
tags: added: stx.5.0
tags: added: stx.update
Changed in starlingx:
importance: Undecided → High
Revision history for this message
Bob Church (rchurch) wrote :
Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ansible-playbooks (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/794297

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ansible-playbooks (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/792195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (f/centos8)
Download full text (52.5 KiB)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/794324
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/163ec9989cc7360dba4c572b2c43effd10306048
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 4e96b762f549aadb0291cc9bcf3352ae923e94eb
Author: Mihnea Saracin <email address hidden>
Date: Sat May 22 15:48:19 2021 +0000

    Revert "Restore host filesystems with collected sizes"

    This reverts commit 255488739efa4ac072424b19f2dbb7a3adb0254e.

    Reason for revert: Did a rework to fix https://bugs.launchpad.net/starlingx/+bug/1926591. The original problem was in puppet, and this fix in ansible was not good enough, it generated some other problems.

    Change-Id: Iea79701a874effecb7fe995ac468d50081d1a84f
    Depends-On: I55ae6954d24ba32e40c2e5e276ec17015d9bba44

commit c064aacc377c8bd5336ceab825d4bcbf5af0b5e8
Author: Angie Wang <email address hidden>
Date: Fri May 21 21:28:02 2021 -0400

    Ensure apiserver keys are present before extract from tarball

    This is to fix the upgrade playbook issue that happens during
    AIO-SX upgrade from stx4.0 to stx5.0 which introduced by
    https://review.opendev.org/c/starlingx/ansible-playbooks/+/792093.
    The apiserver keys are not available in stx4.0 side so we need
    to ensure the keys under /etc/kubernetes/pki are present in the
    backed-up tarball before extracting, otherwise playbook fails
    because the keys are not found in the archive.

    Change-Id: I8602f07d1b1041a7fd3fff21e6f9a422b9784ab5
    Closes-Bug: 928925
    Signed-off-by: Angie Wang <email address hidden>

commit 0261f22ff7c23d2a8608fe3b51725c9f29931281
Author: Don Penney <email address hidden>
Date: Thu May 20 23:09:07 2021 -0400

    Update SX to DX migration to wait for coredns config

    This commit updates the SX to DX migration playbook to wait after
    modifying the system mode to duplex until the runtime manifest that
    updates coredns config has completed. The playbook will wait for up to
    20 minutes to allow for the possibilty that sysinv has multiple
    runtime manifests queued up, each of which could take several minutes.

    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/792494
    Depends-On: https://review.opendev.org/c/starlingx/config/+/792496
    Change-Id: I3bf94d3493ae20eeb16b3fdcb27576ee18c0dc4d
    Closes-Bug: 1929148
    Signed-off-by: Don Penney <email address hidden>

commit 7c4f17bd0d92fc1122823211e1c9787829d206a9
Author: Daniel Safta <email address hidden>
Date: Wed May 19 09:08:16 2021 +0000

    Fixed missing apiserver-etcd-client certs

    When controller-1 is the active controller
    the backup archive does not contain
    /etc/etcd/apiserver-etcd-client.{crt, key}

    This change adds a new task which brings
    the certs from /etc/kubernetes/pki

    Closes-bug: 1928925
    Signed-off-by: Daniel Safta <email address hidden>
    Change-Id: I3c68377603e1af9a71d104e5b1108e9582497a09

commit e221ef8fbe51aa6ca229b584fb5632fe512ad5cb
Author: David Sullivan <email address hidden>
Date: Wed May 19 16:01:27 2021 -0500

    Support boo...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.