Upgrade DC subcloud failed due to data-migration-failed

Bug #1951431 reported by Samuel Presa Toledo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Samuel Presa Toledo

Bug Description

Brief Description
-----------------
Upgrade DC subcloud from StarlingX r/stx4.0 to StarlingX r/stx5.0 failed due to data-migration-failed

Severity
--------
Major

Steps to Reproduce
------------------
Upgrade DC SystemController
Upgrade subcloud

Expected Behavior
-----------------
subcloud upgrade succeeds

Actual Behavior
---------------
subcloud upgrade failed due to data-migration-failed

Reproducibility
---------------
intermitent

System Configuration
--------------------
DC-4 subcloud11

Branch/Pull Time/Commit
-----------------------
N/A

Last Pass
---------
N/A

Timestamp/Logs
--------------
[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud list

+----+------------+------------+--------------+-----------------------+-------------+
| id | name | management | availability | deploy status | sync |
+----+------------+------------+--------------+-----------------------+-------------+
| 3 | subcloud11 | managed | offline | data-migration-failed | out-of-sync |
| 6 | subcloud12 | managed | offline | complete | unknown |
+----+------------+------------+--------------+-----------------------+-------------+

log:

TASK [bootstrap/bringup-bootstrap-applications : Copy default-registry-key to deployment namespace] ***
Friday 29 October 2021 22:26:50 +0000 (0:00:00.450) 0:28:59.004 ********
fatal: [subcloud11]: FAILED! => changed=true
  cmd: 'kubectl get secret default-registry-key --namespace=kube-system -o yaml | sed ''s/namespace: kube-system/namespace: deployment/'' | kubectl apply --namespace=deployment -f -'
  delta: '0:00:00.327462'
  end: '2021-10-29 22:26:51.480412'
  msg: non-zero return code
  rc: 1
  start: '2021-10-29 22:26:51.152950'
  stderr: |-
    Error from server (NotFound): secrets "default-registry-key" not found
    error: no objects passed to apply
  stderr_lines:
  - 'Error from server (NotFound): secrets "default-registry-key" not found'
  - 'error: no objects passed to apply'
  stdout: ''
  stdout_lines: <omitted>

Test Activity
-------------
Feature Testing

Workaround
----------
N/A

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: low; there is no official upgrade support in stx; it's best effort

Changed in starlingx:
assignee: nobody → Samuel Presa Toledo (spresato)
tags: added: stx.distcloud stx.update
Changed in starlingx:
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/818468
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/eb4f81b2c55853d8c3cfc52132889997178cd783
Submitter: "Zuul (22348)"
Branch: master

commit eb4f81b2c55853d8c3cfc52132889997178cd783
Author: Samuel Presa Toledo <email address hidden>
Date: Thu Nov 18 15:59:11 2021 -0500

    Add check/re-generate default-registry-key on B&R

    This checking prevents errors during the upgrade process mainly for
    All-in-one Simplex (AIO-SX), which relies on the correct execution of
    the Backup and Restore (B&R) process. From this, if there is no
    kube-system default-registry-key, the backup playbook will stop its
    execution and the user will be notified about that absence.

    Moreover, if a platform backup with no default-registry-key for
    kube-system tries to be restored, this modification ensures that
    a default-registry-key for kube-system be created in order to
    restore a host properly.

    Test Plan:
    PASS: Run initial bootstrap successfully
    PASS: B&R runs successfully when host contains the
      default-registry-key for kube-system
    PASS: Platform backup playbook stops its execution when host
      does not contain the default-registry-key for kube-system
    PASS: Restore runs successfully when platform backup
      does not contain the default-registry-key for kube-system
    PASS: Run AIO-SX upgrade successfully

    NOTE:
    The upgrade process was only able to run succesfully after manually
    removing the change from https://review.opendev.org/c/starlingx/stx-
    puppet/+/820418 before the upgrade-activate.

    Closes-bug: 1951431
    Signed-off-by: Samuel Presa Toledo <email address hidden>
    Change-Id: Iaeb1e481503f5d1c3a53332f140d35f373b7d6e1

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.7.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.