Backup & Restore: Subcloud restore fail - 'dc_root_ca_cert' is undefined

Bug #1876418 reported by Senthil Mukundakumar
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andy

Bug Description

Brief Description
-----------------
In distributed system, subcloud fails due to following error:

TASK [bootstrap/bringup-bootstrap-applications : Install DC admin endpoint root CA certificate] *********************************************
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dc_root_ca_cert' is undefined\n\nThe error appears to have been in '/usr/share/ansible/stx-ansible/playbooks/roles/bootstrap/bringup-bootstrap-applications/tasks/setup_sc_adminep_certs.yml': line 11, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Install DC admin endpoint root CA certificate\n ^ here\n"}

PLAY RECAP **********************************************************************************************************************************
localhost : ok=411 changed=222 unreachable=0 failed=1

Severity
--------
Major - Backup & Restore on subcloud fails

Steps to Reproduce
------------------
1. Install all the apps
2. Verify the subcloud system in sync
3. Backup the subcloud
4. Unmanage subcloud
5. Reinstall subcloud controller
6. cofig_management
7. Restore from backup file
8. Unlock controller

Expected Behavior
------------------
Subcloud active controller restore should be successful

Actual Behavior
----------------
Subcloud restore fails

Reproducibility
---------------
Tried only once in DC subcloud

System Configuration
--------------------
Distributed system with subcloud (WCP_23-34/subcloud5)

Branch/Pull Time/Commit
-----------------------
2020-04-30_20-00-00

Last Pass
---------
Not passed in DC subcloud

Timestamp/Logs
--------------
https://files.starlingx.kube.cengn.ca/launchpad/1876418

Test Activity
-------------
Regression System

tags: added: stx.retestneeded
description: updated
Revision history for this message
Dan Voiculeasa (dvoicule) wrote :

Commit 3bb26d81d51f0590dba2a19caf9cc430673f6018 introduced variable `dc_root_ca_cert`

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - issue seems to be introduced by recent code changes related to the https internal endpoint feature as per the note above. Issue affects B&R on distributed cloud

tags: added: stx.distcloud stx.update
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Andy (andy.wrs)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/725395

Changed in starlingx:
status: Triaged → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.4.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/734701

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/735627

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ansible-playbooks (master)

Change abandoned by Andy Ning (<email address hidden>) on branch: master
Review: https://review.opendev.org/735627
Reason: Abandon this accidentally created review.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/734701
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=6fa5826b36623ab3ad955caf564bce6aa9003889
Submitter: Zuul
Branch: master

commit 6fa5826b36623ab3ad955caf564bce6aa9003889
Author: Andy Ning <email address hidden>
Date: Tue Jun 9 16:29:58 2020 -0400

    Config sysinv to use internal endpoint when validate client token

    Currently in sysinv.conf, "interface" attribute of [keystone_authtoken]
    is not configured, so sysinv will use default admin endpoint when it
    validates client token against keystone. For a system restored from backup,
    admin endpoints are https before controller unlock (restored from
    keystone DB) while haproxy for the https admin endpoints is not
    configured yet (will be configured during unlock). So ansible B&R task
    restore-more-data will fail when it tries to run "system host show"
    command to check host status.

    This update configures sysinv to use "internal" endpoint as default to
    validate client token. Sysinv shouldn't use admin endpoint anyway.

    Change-Id: I32739af9e1867298fdf3e0b0d3779e6c05bb5d31
    Closes-Bug: 1876418
    Signed-off-by: Andy Ning <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/725395
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=92d781b75065359d9e36c4e0388f54488656198f
Submitter: Zuul
Branch: master

commit 92d781b75065359d9e36c4e0388f54488656198f
Author: Andy Ning <email address hidden>
Date: Mon May 4 16:55:00 2020 -0400

    Fix subloud restore 'dc_root_ca_cert' undefined

    'dc_root_ca_cert' holds the DC root CA certificate for admin endpoints.
    For a restored subcloud, DC root CA certificate, subcloud admin
    endpoint root CA certificate and final admin endpoint certificate,
    along with the related cert manager certificate resource and k8s secrets
    are all restored from the backup. So the admin endpoint certificate set
    up is skipped in ansible.

    The same applies to a restored system controller as well. So the dc CA
    certificates creation is skipped in ansible too.

    Change-Id: I2c7005a474e2363e12b76a1b52b7e68fddf88da1
    Closes-Bug: 1876418
    Depends-On: https://review.opendev.org/#/c/734701/
    Signed-off-by: Andy Ning <email address hidden>

Revision history for this message
Senthil Mukundakumar (smukunda) wrote :

Verified in DC3/subcloud1 using 2020-06-24_22-16-59

tags: removed: stx.retestneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/762919

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.