Comment 0 for bug 1970443

Revision history for this message
Enzo Candotti (ecandotti) wrote :

Brief Description

platform-integ-apps fails to reach applied state after SX system is migrated to DX system

Severity
major

Steps to Reproduce

1)install subcloud as SX
2)Create the following file "migrate-subcloud-overrides.yaml" on centralcontroller

[sysadmin@controller-0 ~(keystone_admin)]$ cat migrate-subcloud-overrides.yaml
---
{
  "ansible_ssh_pass": ******,
  "external_oam_node_0_address": "2620:10A:A001:A103::218",
  "external_oam_node_1_address": "2620:10A:A001:A103::42",
}
[sysadmin@controller-0 ~(keystone_admin)]$

3)Run migrate_sx_to_dx.yml on central cloud and verify that there are not errors in the output

4)When the subcloud is online, managed, and login to subcloud and verify that sx is converted to duplex

[sysadmin@controller-0 ~(keystone_admin)]$ system show
+------------------------+--------------------------------------+
| Property | Value |
+------------------------+--------------------------------------+
| contact | None |
| created_at | 2021-12-05T18:36:08.995159+00:00 |
| description | None |
| distributed_cloud_role | subcloud |
| https_enabled | True |
| latitude | None |
| location | None |
| longitude | None |
| name | dc-subcloud12 |
| region_name | subcloud12 |
| sdn_enabled | False |
| security_feature | spectre_meltdown_v1 |
| service_project_name | services |
| shared_services | [] |
| software_version | 21.12 |
| system_mode | duplex |
| system_type | All-in-one |
| timezone | UTC |
| updated_at | 2021-12-07T19:47:27.308407+00:00 |
| uuid | 0c86a371-c387-4104-9d75-e6948454ffe3 |
| vswitch_type | none |
+------------------------+--------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

5)But the platform-integ-apps failed to reach applied state and it stuck at 25% forever

[sysadmin@controller-0 ~(keystone_admin)]$ system application-list
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------+
| cert-manager | 1.0-25 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.1-17 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-59 | oidc-auth-manifest | manifest.yaml | applied | completed |
| platform-integ-apps | 1.0-42 | platform-integration-manifest | manifest.yaml | applying | processing chart: stx-rbd-provisioner |
| | | | | | , overall completion: 25.0% |
| | | | | | |
| rook-ceph-apps | 1.0-13 | rook-ceph-manifest | manifest.yaml | uploaded | completed |
| vault | 1.0-22 | vault-manifest | vault_manifest.yaml | applied | completed |
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------+

when tried to apply again, it failed

Expected Behavior

after migration the app should be applied properly

Actual Behavior

platform-integ-apps fails to reach applied state

Reproducibility

100%

System Configuration
SX subcloud. Seen on AIO-SX standalone too

Branch/Pull Time/Commit
21.12

Last Pass

21.05

Timestamp/Logs

/var/log/armada/platform-integ-apps-apply_2021-12-07-21-12-42.log

2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller [-] [chart=kube-system-rbd-provisioner]: Error while installing release stx-rbd-provisioner: grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
>---status = StatusCode.UNKNOWN
>---details = "release stx-rbd-provisioner failed: timed out waiting for the condition"
>---debug_error_string = "{"created":"@1638913371.198181944","description":"Error received from peer ipv4:127.0.0.1:24134","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"release stx-rbd-provisioner failed: timed out waiting for the condition","grpc_status":2}"
>
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller Traceback (most recent call last):
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 465, in install_release
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller metadata=self.metadata)
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 923, in __call__
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller return _end_unary_response_blocking(state, call, False, None)
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller raise _InactiveRpcError(state)
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >--status = StatusCode.UNKNOWN
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >--details = "release stx-rbd-provisioner failed: timed out waiting for the condition"
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >--debug_error_string = "{"created":"@1638913371.198181944","description":"Error received from peer ipv4:127.0.0.1:24134","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"release stx-rbd-provisioner failed: timed out waiting for the condition","grpc_status":2}"
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller ^[[00m
2021-12-07 21:42:51.199 178 DEBUG armada.handlers.tiller [-] [chart=kube-system-rbd-provisioner]: Helm getting release status for release=stx-rbd-provisioner, version=0 get_release_status /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:531^[[00m

/var/logs/pods/kube-system_rbd-provisioner-759dfb8b6b-cfbnf_fedb37bb-ca78-4e48-93b2-9d14d98327da/rbd-provisioner/26.log

2021-12-07T21:44:52.687682317Z stderr F F1207 21:44:52.687211 1 main.go:80] Error getting server version: the server has asked for the client to provide credentials

Alarms
-

Test Activity
Regression

Workaround
-

More info:
The problem might be that certificate is missing controller-0's cluster host IP in SANs