Backup & Restore: AIO-DX Controller hangs at "recover-ceph-data"

Bug #1857146 reported by Kristine Bujold
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Ovidiu Poncea

Bug Description

Brief Description
-----------------

In AIO-DX configuration, the restore hangs at"[recover-ceph-data : Update host config data to get ceph-mon size]". This was experienced in SM 5-6 and reproduced 3 times. Logs are located /folk/cgts_logs/logs/LP-1857146

TASK [restore-platform/restore-more-data : Restart services] ***********************************************************************************
changed: [localhost] => (item=openstack-keystone)
changed: [localhost] => (item=fminit)
changed: [localhost] => (item=fm-api)
changed: [localhost] => (item=sysinv-api)
changed: [localhost] => (item=sysinv-conductor)
changed: [localhost] => (item=sysinv-agent)
changed: [localhost] => (item=openstack-barbican-api)

TASK [restore-platform/restore-more-data : Bring up Maintenance Agent] *************************************************************************
changed: [localhost]

TASK [restore-platform/restore-more-data : Wait for 90 secs before check if services come up] **************************************************
ok: [localhost]

TASK [restore-platform/restore-more-data : Make sure admin-keystone is ready] ******************************************************************
changed: [localhost]

TASK [restore-platform/restore-more-data : Check controller-0 is in online state] **************************************************************
changed: [localhost]

TASK [restore-platform/restore-more-data : Inform user that restore_platform is not successful] ************************************************

TASK [restore-platform/restore-more-data : Check if setup has storage nodes] *******************************************************************
changed: [localhost]

TASK [restore-platform/restore-more-data : Retrieve system mode] *******************************************************************************
changed: [localhost]

TASK [restore-platform/restore-more-data : Fail if system mode is not defined] *****************************************************************

TASK [restore-platform/restore-more-data : Set system mode fact] *******************************************************************************
ok: [localhost]

TASK [restore-platform/restore-more-data : Create flag file in /etc/platform to skip wiping OSDs] **********************************************
changed: [localhost]

TASK [include_role : recover-ceph-data] ********************************************************************************************************

TASK [recover-ceph-data : Restore ceph.conf file] **********************************************************************************************
changed: [localhost]

TASK [recover-ceph-data : Set initial ceph-mon name] *******************************************************************************************
ok: [localhost]

TASK [recover-ceph-data : Update host config data to get ceph-mon size] ************************************************************************

Severity
--------
Critical: Unable to restore active controller

Steps to Reproduce
------------------
1. Bring up the AIO-DX system
2. Backup the system using ansible locally
3. Re-install the controller with the same load
4. Restore the active controller

Expected Behavior
------------------
The active controller should be successfully restored

Actual Behavior
----------------
Restore hangs

Reproducibility
---------------
Reproducible

System Configuration
--------------------
AIO-DX

Branch/Pull Time/Commit
-----------------------
BUILD_ID="2019-12-18_00-10-00"

Test Activity
-------------
Developer Testing

description: updated
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.4.0 / medium priority - needs further investigation

tags: added: stx.4.0 stx.update
tags: added: stx.config
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Ovidiu Poncea (ovidiu.poncea)
Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :

Fix released based on the B&R work done in the past few months.

Changed in starlingx:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.