Subcloud backup fails to transfer backup file to SystemController

Bug #2011604 reported by Victor Romano
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Victor Romano

Bug Description

Brief Description
-----------------
If a subcloud is installed with the same hostname as a previous one that was backup remotely using the System Controller, the backup of this new subcloud will fail.

Severity
--------
Minor

Steps to Reproduce
------------------
- Backup subcloud with System Controller
- Delete this subcloud and add another one with the same name
- Backup new subcloud

Expected Behavior
------------------
Backup is successful

Actual Behavior
----------------
Backup fails

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
DC with any number of subclouds

Branch/Pull Time/Commit
-----------------------
NA

Last Pass
---------
NA

Timestamp/Logs
--------------
subcloudN_playbook_output.log:
TASK [subcloud-bnr/backup : Transfer platform backup of subcloud1 to the system controller] ***
Saturday 04 March 2023 16:30:21 +0000 (0:00:00.018) 0:04:05.610 ********
FAILED - RETRYING: Transfer platform backup of subcloud1 to the system controller (3 retries left).
FAILED - RETRYING: Transfer platform backup of subcloud1 to the system controller (2 retries left).
FAILED - RETRYING: Transfer platform backup of subcloud1 to the system controller (1 retries left).
fatal: [subcloud1]: FAILED! => changed=true
  attempts: 3
  censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result'

Test Activity
-------------
BnR test

Workaround
----------
Clear host key from the root known_hosts, replacing "subcloudN" for the hostname of the subcloud, like "subcloud1":
ssh-keygen -f "/root/.ssh/known_hosts" -R "subcloudN"

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/877428
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/1f0ba3b7156f44fc9494af03e71cd75c29b8126c
Submitter: "Zuul (22348)"
Branch: master

commit 1f0ba3b7156f44fc9494af03e71cd75c29b8126c
Author: Victor Romano <email address hidden>
Date: Tue Mar 14 19:07:05 2023 -0300

    Change subcloud backup transfer to synchronize

    If the user makes a backup of a subcloud or any other operation that
    requires ssh/scp as root by hostname using the System Controller,
    the subcloud host key will be saved on the root known_hosts. If the
    subcloud host key changes, like if the system is reinstalled, and
    another backup is attempted, the operation will fail because of the
    presence of the previous host key on the System Controller root
    known_hosts.
    The backup transfer method was changed from running rsync using command
    to the synchronize module from Ansible posix collection, so the ssh
    handling is taken care by Ansible.

    Test plan:
      PASS:
      - Backup subcloud1 using dcmanager
      - Change subcloud1 host keys
        (by removing old keys and running "ssh-keygen -A")
      - Backup new subcloud and verify that the process ended without errors
      PASS:
      - Backup subcloud1 using dcmanager
      - Delete and reinstall subcloud1
      - Backup new subcloud and verify that the process ended without errors

    Closes-Bug: 2011604

    Signed-off-by: Victor Romano <email address hidden>
    Change-Id: I561b496be978a13fb2bdded0acf6f6534aed42ac

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/879365
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/8faf51938f89bf86662475384532345f790c9acf
Submitter: "Zuul (22348)"
Branch: master

commit 8faf51938f89bf86662475384532345f790c9acf
Author: Victor Romano <email address hidden>
Date: Tue Mar 14 19:07:05 2023 -0300

    Change subcloud backup transfer to synchronize

    If the user makes a backup of a subcloud or any other operation that
    requires ssh/scp as root by hostname using the System Controller,
    the subcloud host key will be saved on the root known_hosts. If the
    subcloud host key changes, like if the system is reinstalled, and
    another backup is attempted, the operation will fail because of the
    presence of the previous host key on the System Controller root
    known_hosts.
    The backup transfer method was changed from running rsync using command
    to the synchronize module from Ansible posix collection, so the ssh
    handling is taken care by Ansible.
    The restore transfer was also changed to synchronize, to make sure
    the file transfer was still possible after the folder permission
    change during the backup process.

    Test plan:
      PASS:
      - Backup subcloud1 using dcmanager
      - Change subcloud1 host keys
        (by removing old keys and running "ssh-keygen -A")
      - Backup new subcloud and verify that the process ended without errors
      - Reinstall and restore the subcloud and verify that the process ended
        without errors
      PASS:
      - Backup subcloud1 using dcmanager
      - Delete and reinstall subcloud1
      - Backup new subcloud and verify that the process ended without errors
      - Reinstall and restore the subcloud and verify that the process ended
        without errors

    Closes-Bug: 2011604

    Signed-off-by: Victor Romano <email address hidden>
    Change-Id: I3cc7624186aac62dcfb0d9dff413e7d98d57810b

Changed in starlingx:
assignee: nobody → Victor Romano (vgluzrom)
tags: added: stx.9.0 stx.con
tags: added: stx.config
removed: stx.con
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.