Backup & Restore: Controller restore fails - Pods not ready

Bug #1869399 reported by Senthil Mukundakumar
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Medium
Dan Voiculeasa

Bug Description

Brief Description
-----------------

Platform restore fails due to following error:

fatal: [localhost]: FAILED! => {"msg": "The conditional check 'item.stdout is not search(\" condition met\")' failed. The error was: Unexpected templating type error occurred on ({% if item.stdout is not search(\" condition met\") %} True {% else %} False {% endif %}): expected string or buffer\n\nThe error appears to have been in '/usr/share/ansible/stx-ansible/playbooks/roles/bootstrap/bringup-essential-services/tasks/main.yml': line 153, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Fail if any of the Kubernetes component, Networking and Tiller pods is not ready by this time\n ^ here\n"}

PLAY RECAP *********************************************************************************************
localhost : ok=364 changed=195 unreachable=0 failed=1

Severity
--------
Major: System failed to restore

Expected Behavior
------------------

Active Controller should be successfully restore

Actual Behavior
----------------

Active controller failed to restore

Reproducibility
---------------
Reproducible in AIP-PLUS system, AIO-DX, Standard

Steps to Reproduce:
-------------------
1. Make sure the AIO_PLUS system is UP & ACTIVE
2. Do a backup from the active controller
ansible-playbook /usr/share/ansible/stx-ansible/playbooks/backup.yml -e "ansible_become_pass=Li69nux* admin_password=Li69nux*"
3. Bring down all the nodes and re-install active controller
4. scp the back up file to the controller
5. Restore the active controller from backup file
ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "backup_filename=localhost_platform_backup_2019_09_12_20_37_23.tgz admin_password=Li69nux* ansible_become_pass=Li69nux* initial_backup_dir=/home/sysadmin"

System Configuration
--------------------
AIO-PLUS WCP_99_103

Branch/Pull Time/Commit
-----------------------
2020-03-25_21-02-05

Test Activity
-------------
Regression

Revision history for this message
Senthil Mukundakumar (smukunda) wrote :
description: updated
Revision history for this message
Frank Miller (sensfan22) wrote :

At the time of the ansible failure, the tiller, calico, multus and sriov pods are running based on their logs. But there is no coredns pod. Debug should focus on why this pod was not launched.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - pod recovery issue / appears to be reproducible at least on one config

tags: added: stx.4.0 stx.containers
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Frank Miller (sensfan22)
Revision history for this message
Senthil Mukundakumar (smukunda) wrote :

This issue is recently reproduced in AIO-SX(WCP_112) using load 2020-04-20_20-00-00.
Logs attached

Revision history for this message
Senthil Mukundakumar (smukunda) wrote :
Revision history for this message
Senthil Mukundakumar (smukunda) wrote :
summary: - Backup & Restore: AIO-Plus restore fails - Pods not ready
+ Backup & Restore: Controller restore fails - Pods not ready
description: updated
Frank Miller (sensfan22)
Changed in starlingx:
assignee: Frank Miller (sensfan22) → Dan Voiculeasa (dvoicule)
Ghada Khalil (gkhalil)
tags: added: stx.update
Revision history for this message
Senthil Mukundakumar (smukunda) wrote :

The issue is reproducible in regular lab WCP_71_75 with load 2020-05-15_20-00-00

Revision history for this message
Dan Voiculeasa (dvoicule) wrote :

I've done multiple restores different loads, but couldn't reproduce it.

Changed in starlingx:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.