Backup & Restore: Applications stuck at restore-requested state

Bug #1901636 reported by Senthil Mukundakumar
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Dan Voiculeasa

Bug Description

Brief Description
-----------------

After controller, compute restore the application stuck at restore requested state
[sysadmin@controller-0 ~(keystone_admin)]$ system application-list
+--------------------------+----------+-----------------------------------+----------------------------------------+-------------------+-----------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+----------+-----------------------------------+----------------------------------------+-------------------+-----------+
| cert-manager | 20.06-5 | cert-manager-manifest | certmanager-manifest.yaml | restore-requested | completed |
| nginx-ingress-controller | 20.06-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | restore-requested | completed |
| oidc-auth-apps | 20.06-28 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 20.06-11 | platform-integration-manifest | manifest.yaml | restore-requested | completed |
+--------------------------+----------+-----------------------------------+----------------------------------------+-------------------+-----------+

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cm-cert-manager-856678cfb7-ckmgh 0/1 ErrImagePull 0 175m
cert-manager cm-cert-manager-856678cfb7-tdjsj 0/1 ImagePullBackOff 0 175m
cert-manager cm-cert-manager-cainjector-85849bd97-c6xks 0/1 ImagePullBackOff 0 175m
cert-manager cm-cert-manager-cainjector-85849bd97-mrxxz 0/1 ImagePullBackOff 0 175m
cert-manager cm-cert-manager-webhook-5745478cbc-87krp 0/1 ImagePullBackOff 0 175m
cert-manager cm-cert-manager-webhook-5745478cbc-9qmqc 0/1 ErrImagePull 0 175m
kube-system calico-kube-controllers-5cd4695574-jz8fs 1/1 Running 5 6h18m
kube-system calico-node-5zqnr 1/1 Running 0 2d9h
kube-system calico-node-7zpgb 1/1 Running 0 2d9h
kube-system calico-node-fqqbh 1/1 Running 0 2d9h
kube-system calico-node-hd2ww 1/1 Running 2 2d10h
kube-system calico-node-pqnzd 1/1 Running 0 2d9h
kube-system ceph-pools-audit-1603733100-rh44n 0/1 Completed 0 7h20m
kube-system ceph-pools-audit-1603733400-jbvhd 0/1 Completed 0 7h15m
kube-system ceph-pools-audit-1603733700-bwncq 0/1 Completed 0 7h10m
kube-system ceph-pools-audit-1603736700-z9twk 0/1 ImagePullBackOff 0 175m
kube-system coredns-6d64d47ff4-hvwhh 1/1 Running 0 132m
kube-system coredns-6d64d47ff4-v5cmw 1/1 Running 0 132m
kube-system ic-nginx-ingress-controller-8n2qs 0/1 ImagePullBackOff 0 131m
kube-system ic-nginx-ingress-controller-gtg4c 0/1 ImagePullBackOff 0 175m
kube-system ic-nginx-ingress-default-backend-5ffcfd7744-dbzg9 0/1 ImagePullBackOff 0 175m
kube-system kube-apiserver-controller-0 1/1 Running 6 2d10h
kube-system kube-apiserver-controller-1 1/1 Running 0 2d9h
kube-system kube-controller-manager-controller-0 1/1 Running 3 2d10h
kube-system kube-controller-manager-controller-1 1/1 Running 0 2d9h
kube-system kube-multus-ds-amd64-62b52 1/1 Running 0 78m
kube-system kube-multus-ds-amd64-nq8s2 1/1 Running 0 131m
kube-system kube-multus-ds-amd64-ssxk4 1/1 Running 2 2d10h
kube-system kube-multus-ds-amd64-ts66k 1/1 Running 0 82m
kube-system kube-multus-ds-amd64-vzbfp 1/1 Running 0 78m
kube-system kube-proxy-2tm9n 1/1 Running 0 2d9h
kube-system kube-proxy-2wzv5 1/1 Running 0 2d9h
kube-system kube-proxy-blqq9 1/1 Running 0 2d9h
kube-system kube-proxy-gft8q 1/1 Running 2 2d10h
kube-system kube-proxy-xdjcj 1/1 Running 0 2d9h
kube-system kube-scheduler-controller-0 1/1 Running 3 2d10h
kube-system kube-scheduler-controller-1 1/1 Running 0 2d9h
kube-system kube-sriov-cni-ds-amd64-dq28x 1/1 Running 0 78m
kube-system kube-sriov-cni-ds-amd64-mj8bn 1/1 Running 0 82m
kube-system kube-sriov-cni-ds-amd64-mjzt2 1/1 Running 0 131m
kube-system kube-sriov-cni-ds-amd64-sqksp 1/1 Running 2 2d10h
kube-system kube-sriov-cni-ds-amd64-zdtxk 1/1 Running 0 78m
kube-system kube-sriov-device-plugin-amd64-78qft 1/1 Running 0 78m
kube-system kube-sriov-device-plugin-amd64-9q8fl 1/1 Running 0 82m
kube-system kube-sriov-device-plugin-amd64-wht5r 1/1 Running 0 78m
kube-system rbd-provisioner-77bfb6dbb-gbh9b 0/1 ImagePullBackOff 0 175m
kube-system rbd-provisioner-77bfb6dbb-n4pk6 0/1 ImagePullBackOff 0 175m
kube-system storage-init-rbd-provisioner-dz457 0/1 Completed 0 2d9h
platform-deployment-manager platform-deployment-manager-0 2/2 Running 1 175m

Severity
--------
Major

Steps to Reproduce
------------------

1. Make sure the standard system is UP & ACTIVE
2. Do a backup from the active controller
3. Re-install active controller
4. scp the back up file to the controller
5. Restore the active controller from backup file
ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=/home/sysadmin ansible_become_pass=Li69nux* admin_password=Li69nux* backup_filename=<backup file>"
6. restore controller and compute demo active controller

Expected Behavior
------------------
Application should be uploaded and applied after restore

Actual Behavior
----------------
Application stuck at restore-requested state

Reproducibility
---------------
(1/1)

System Configuration
--------------------
wcp_71_75

Branch/Pull Time/Commit
-----------------------
2020-10-23_20-00-07

Last Pass
---------
unknown

Timestamp/Logs
--------------
https://files.starlingx.kube.cengn.ca/launchpad/1901636

Test Activity
-------------
Feature Testing

description: updated
Changed in starlingx:
assignee: nobody → Dan Voiculeasa (dvoicule)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/760317

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.update
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium - issue w/ B&R

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.5.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/760317
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=3722db3e545f6d8c01efa3fac3949bd2ac78cf50
Submitter: Zuul
Branch: master

commit 3722db3e545f6d8c01efa3fac3949bd2ac78cf50
Author: Dan Voiculeasa <email address hidden>
Date: Thu Oct 29 10:40:43 2020 +0200

    Make sure OSDs are placed on nodes

    With the recent work that added the possibility to take a backup on
    controller-1 and restore on controller-0 the [osd.X] entries are removed
    from ceph.conf before controller-0 unlock.

    An [osd.X] entry must be present in ceph.conf so that a ceph-osd process
    is spawned and the directory structure for that osd is created in
    /var/lib/ceph/osd/ceph-X.

    Mtc runs an initialization script which expects that directory to be
    populated, otherwise it fails and places the node in a degraded state.

    For AIO systems the [osd] entry was present populated, but for STANDARD
    it was not. This commit ensures the entry is populated for both AIO and
    STANDARD systems.

    Tested full install + platform B&R on AIO-SX, AIO-DX, STANDARD with
    controller storage, all with ceph.

    Closes-Bug: 1901636
    Change-Id: I669ad3b22f59136253d42cdf6ac603fba31000be
    Signed-off-by: Dan Voiculeasa <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/762919

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.