Failure while upgrading controller-0 of System Controller due to dc-cert

Bug #1913039 reported by Jessica Castelino
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Triaged
Low
Bin Qian

Bug Description

Brief Description
-----------------
While upgrading a Standard system controller from stx.4.0 to stx.5.0, a failure is encountered due to dc-cert as given below. After running "system host-upgrade controller-0", I am unable to unlock the controller-0.

[sysadmin@controller-1 ~(keystone_admin)]$ system host-unlock controller-0
Invalid secret dc-cert\dc-adminep-certificate
Traceback (most recent call last):

  File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/amqp.py", line 437, in _process_data
    **args)

  File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/dispatcher.py", line 172, in dispatch
    result = getattr(proxyobj, method)(ctxt, **kwargs)

  File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 1785, in configure_ihost
    self._puppet.update_secure_system_config()

  File "/usr/lib64/python2.7/site-packages/sysinv/puppet/puppet.py", line 31, in _wrapper
    func(self, *args, **kwargs)

  File "/usr/lib64/python2.7/site-packages/sysinv/puppet/puppet.py", line 131, in update_secure_system_config
    config.update(puppet_plugin.obj.get_secure_system_config())

  File "/usr/lib64/python2.7/site-packages/sysinv/puppet/platform.py", line 53, in get_secure_system_config
    config.update(self._get_dc_root_ca_config())

  File "/usr/lib64/python2.7/site-packages/sysinv/puppet/platform.py", line 893, in _get_dc_root_ca_config
    system.distributed_cloud_role)

  File "/usr/lib64/python2.7/site-packages/sysinv/common/utils.py", line 2316, in get_admin_ep_cert
    endpoint_cert_secret_ns, endpoint_cert_secret_name

Exception: Invalid secret dc-cert\dc-adminep-certificate

Severity
--------
Critical

Steps to Reproduce
------------------
Upgrade a standard system controller using the steps given below:
system upgrade-start --force
system upgrade-show (wait for "started")
system host-lock controller-1
system host-list
system host-upgrade controller-1

system upgrade-show (wait for "data-migration-complete")
system host-unlock controller-1 (Wait for controller-1 to become unlocked-enabled. Wait for the DRBD sync 400.001 services-related alarm to be cleared)
system host-swact controller-0
system host-lock controller-0
system host-upgrade controller-0 (wait for "upgrading-hosts")
system host-unlock controller-0

Expected Behavior
------------------
Should be able to unlock controller-0

Actual Behavior
----------------
Error while unlocking controller-0

Reproducibility
---------------
100% Reproducible

System Configuration
--------------------
DC setup with standard system controller and aiosx subcloud

Branch/Pull Time/Commit
-----------------------
22nd January, 2021 at 12:28 am

Last Pass
---------
Yes. The last pass was 15th January, 2021.

Timestamp/Logs
--------------
Collect logs attached

Test Activity
-------------
Developer Testing

Workaround
----------
None

Tags: stx.update
Revision history for this message
Jessica Castelino (jcasteli) wrote :
Ghada Khalil (gkhalil)
tags: added: stx.update
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / high - issue is blocking testing of the upgrade framework

Changed in starlingx:
assignee: nobody → Bin Qian (bqian20)
tags: added: stx.5.0
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Bin Qian (bqian20) wrote :

many important k8s resources are not found on the impacted system, which is a VBox lab.
Basically there is no pods running. cert-manager and dc-cert namespaces don't exist.
sysinv try to look at the k8s secret before unlock, as the critical data isn't accessible, unlock is rejected.
This looks like a system level failure. As it is on a VBox, resources could be part of the issue (it is not identified).

controller-1:~$ kubectl get pods --all-namespaces
No resources found
controller-1:~$ kubectl get namespaces
NAME STATUS AGE
default Active 6h2m
kube-node-lease Active 6h2m
kube-public Active 6h2m
kube-system Active 6h2m

cert-manager application was applied
system application-list
+--------------------------+----------+-----------------------------------+------------------------------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+----------+-----------------------------------+------------------------------------+----------+-----------+
| cert-manager | 20.06-4 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 20.06-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest. | applied | completed |
| | | | yaml | | |
| | | | | | |
| oidc-auth-apps | 20.06-26 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 20.06-9 | platform-integration-manifest | manifest.yaml | uploaded | completed |
+--------------------------+----------+-----------------------------------+------------------------------------+----------+-----------+
[sysadmin@controller-1 ~(keystone_admin)]$

Revision history for this message
Bin Qian (bqian20) wrote :
Revision history for this message
Bart Wensley (bartwensley) wrote :

This LP should not gate stx.5.0. Reasons:
- This issue has only been seen on virtual box.
- Based on Bin's comment above, it is likely that this was a resource related issue.

Please remove the stx.5.0 tag.

Revision history for this message
Bill Zvonar (billzvonar) wrote :

removed the stx.5.0 tag.

tags: removed: stx.5.0
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Lowering the priority given tis is no longer gating for stx.5.0

Changed in starlingx:
importance: High → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.