Backup & Restore: Controller failed to become active after restore in Regular system

Bug #1849379 reported by Senthil Mukundakumar
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Ovidiu Poncea

Bug Description

Brief Description
-----------------

Active controller failed to become active after restore and unlock. It is waiting for sm process.

controller-0:~$ source /etc/platform/openrc
Openstack Admin credentials can only be loaded from the active controller.

Relevant sm logs:

2019-10-21T17:37:22.000 controller-0 sm: debug time[3055.793] log<274> INFO: sm[100741]: sm_process.c(557): Waiting for node configuration to complete.
2019-10-21T17:37:32.000 controller-0 sm: debug time[3065.793] log<275> INFO: sm[100741]: sm_process.c(557): Waiting for node configuration to complete.
2019-10-21T17:37:42.000 controller-0 sm: debug time[3075.793] log<276> INFO: sm[100741]: sm_process.c(557): Waiting for node configuration to complete.
2019-10-21T17:37:52.000 controller-0 sm: debug time[3085.793] log<277> INFO: sm[100741]: sm_process.c(557): Waiting for node configuration to complete.
2019-10-21T17:38:02.000 controller-0 sm: debug time[3095.794] log<278> INFO: sm[100741]: sm_process.c(557): Waiting for node configuration to complete.
2019-10-21T17:38:12.000 controller-0 sm: debug time[3105.794] log<279> INFO: sm[100741]: sm_process.c(557): Waiting for node configuration to complete.
2019-10-21T17:38:22.000 controller-0 sm: debug time[3115.794] log<280> INFO: sm[100741]: sm_process.c(557): Waiting for node configuration to complete.
2019-10-21T17:38:32.000 controller-0 sm: debug time[3125.794] log<281> INFO: sm[100741]: sm_process.c(557): Waiting for node configuration to complete.
2019-10-21T17:38:42.000 controller-0 sm: debug time[3135.794] log<282> INFO: sm[100741]: sm_process.c(557): Waiting for node configuration to complete.
2019-10-21T17:38:52.000 controller-0 sm: debug time[3145.795] log<283> INFO: sm[100741]: sm_process.c(557): Waiting for node configuration to complete.

Severity
--------
Critical: Unable to restore active controller in Regular system

Steps to Reproduce
------------------
1. Bring up the Regular system system
2. Backup the system using ansible locally
3. Re-install the controller with the same load
4. Restore the active controller
5. Unlock active controller

Expected Behavior
------------------
The active controller should be successfully restored and become active

Actual Behavior
----------------
Active controller failed to become active after unlock

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Regular System

Branch/Pull Time/Commit
-----------------------
 BUILD_ID="2019-10-20_20-00-00"

Test Activity
-------------
Feature Testing

Openstack Admin credentials can only be loaded from the active controller.
Error: can only run collect for remote hosts on active controller (reason:35)

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.3.0 / high priority - B&R is a feature deliverable for stx.3.0

tags: added: stx.3.0 stx.update
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Ovidiu Poncea (ovidiu.poncea)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/690924

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/690924
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=bfa2d65fe688e2995044925b7c3cc8d883b29687
Submitter: Zuul
Branch: master

commit bfa2d65fe688e2995044925b7c3cc8d883b29687
Author: Ovidiu Poncea <email address hidden>
Date: Thu Oct 24 07:10:23 2019 -0400

    B&R: Fix failed to unlock controller-0 after platform restore

    Issue is caused by lighttpd not starting on https configurations.
    This happens because restore doesn't copy server-cert.pem from the
    backup. This was fixed by https://review.opendev.org/#/c/685390/,
    problem is it got reverted in https://review.opendev.org/#/c/686057/
    in a refactoring.

    This commit adds back the fix.

    Change-Id: I1e8a15bc95064974675614be8eb4b15cb091f685
    Closes-Bug: 1849379
    Signed-off-by: Ovidiu Poncea <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Senthil Mukundakumar (smukunda) wrote :

Verified using build 2019-10-27_20-00-00

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.