2022-04-26 14:37:31 |
Enzo Candotti |
description |
Brief Description
platform-integ-apps fails to reach applied state after SX system is migrated to DX system
Severity
major
Steps to Reproduce
1)install subcloud as SX
2)Create the following file "migrate-subcloud-overrides.yaml" on centralcontroller
[sysadmin@controller-0 ~(keystone_admin)]$ cat migrate-subcloud-overrides.yaml
---
{
"ansible_ssh_pass": ******,
"external_oam_node_0_address": "2620:10A:A001:A103::218",
"external_oam_node_1_address": "2620:10A:A001:A103::42",
}
[sysadmin@controller-0 ~(keystone_admin)]$
3)Run migrate_sx_to_dx.yml on central cloud and verify that there are not errors in the output
4)When the subcloud is online, managed, and login to subcloud and verify that sx is converted to duplex
[sysadmin@controller-0 ~(keystone_admin)]$ system show
+------------------------+--------------------------------------+
| Property | Value |
+------------------------+--------------------------------------+
| contact | None |
| created_at | 2021-12-05T18:36:08.995159+00:00 |
| description | None |
| distributed_cloud_role | subcloud |
| https_enabled | True |
| latitude | None |
| location | None |
| longitude | None |
| name | dc-subcloud12 |
| region_name | subcloud12 |
| sdn_enabled | False |
| security_feature | spectre_meltdown_v1 |
| service_project_name | services |
| shared_services | [] |
| software_version | 21.12 |
| system_mode | duplex |
| system_type | All-in-one |
| timezone | UTC |
| updated_at | 2021-12-07T19:47:27.308407+00:00 |
| uuid | 0c86a371-c387-4104-9d75-e6948454ffe3 |
| vswitch_type | none |
+------------------------+--------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$
5)But the platform-integ-apps failed to reach applied state and it stuck at 25% forever
[sysadmin@controller-0 ~(keystone_admin)]$ system application-list
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------+
| cert-manager | 1.0-25 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.1-17 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-59 | oidc-auth-manifest | manifest.yaml | applied | completed |
| platform-integ-apps | 1.0-42 | platform-integration-manifest | manifest.yaml | applying | processing chart: stx-rbd-provisioner |
| | | | | | , overall completion: 25.0% |
| | | | | | |
| rook-ceph-apps | 1.0-13 | rook-ceph-manifest | manifest.yaml | uploaded | completed |
| vault | 1.0-22 | vault-manifest | vault_manifest.yaml | applied | completed |
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------+
when tried to apply again, it failed
Expected Behavior
after migration the app should be applied properly
Actual Behavior
platform-integ-apps fails to reach applied state
Reproducibility
100%
System Configuration
SX subcloud. Seen on AIO-SX standalone too
Branch/Pull Time/Commit
21.12
Last Pass
21.05
Timestamp/Logs
/var/log/armada/platform-integ-apps-apply_2021-12-07-21-12-42.log
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller [-] [chart=kube-system-rbd-provisioner]: Error while installing release stx-rbd-provisioner: grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
>---status = StatusCode.UNKNOWN
>---details = "release stx-rbd-provisioner failed: timed out waiting for the condition"
>---debug_error_string = "{"created":"@1638913371.198181944","description":"Error received from peer ipv4:127.0.0.1:24134","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"release stx-rbd-provisioner failed: timed out waiting for the condition","grpc_status":2}"
>
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller Traceback (most recent call last):
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 465, in install_release
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller metadata=self.metadata)
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 923, in __call__
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller return _end_unary_response_blocking(state, call, False, None)
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller raise _InactiveRpcError(state)
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >--status = StatusCode.UNKNOWN
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >--details = "release stx-rbd-provisioner failed: timed out waiting for the condition"
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >--debug_error_string = "{"created":"@1638913371.198181944","description":"Error received from peer ipv4:127.0.0.1:24134","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"release stx-rbd-provisioner failed: timed out waiting for the condition","grpc_status":2}"
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller ^[[00m
2021-12-07 21:42:51.199 178 DEBUG armada.handlers.tiller [-] [chart=kube-system-rbd-provisioner]: Helm getting release status for release=stx-rbd-provisioner, version=0 get_release_status /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:531^[[00m
/var/logs/pods/kube-system_rbd-provisioner-759dfb8b6b-cfbnf_fedb37bb-ca78-4e48-93b2-9d14d98327da/rbd-provisioner/26.log
2021-12-07T21:44:52.687682317Z stderr F F1207 21:44:52.687211 1 main.go:80] Error getting server version: the server has asked for the client to provide credentials
Alarms
-
Test Activity
Regression
Workaround
-
More info:
The problem might be that certificate is missing controller-0's cluster host IP in SANs |
Brief Description
platform-integ-apps fails to reach applied state after SX system is migrated to DX system
Severity
major
Steps to Reproduce
1)install subcloud as SX
2)Create "migrate-subcloud-overrides.yaml" file on centralcontroller.
3)Run migrate_sx_to_dx.yml on central cloud and verify that there are not errors in the output
4)When the subcloud is online, managed, and login to subcloud and verify that sx is converted to duplex
[sysadmin@controller-0 ~(keystone_admin)]$ system show
+------------------------+--------------------------------------+
| Property | Value |
+------------------------+--------------------------------------+
| contact | None |
| created_at | 2021-12-05T18:36:08.995159+00:00 |
| description | None |
| distributed_cloud_role | subcloud |
| https_enabled | True |
| latitude | None |
| location | None |
| longitude | None |
| name | dc-subcloud12 |
| region_name | subcloud12 |
| sdn_enabled | False |
| security_feature | spectre_meltdown_v1 |
| service_project_name | services |
| shared_services | [] |
| software_version | 21.12 |
| system_mode | duplex |
| system_type | All-in-one |
| timezone | UTC |
| updated_at | 2021-12-07T19:47:27.308407+00:00 |
| uuid | 0c86a371-c387-4104-9d75-e6948454ffe3 |
| vswitch_type | none |
+------------------------+--------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$
5)But the platform-integ-apps failed to reach applied state and it stuck at 25% forever
[sysadmin@controller-0 ~(keystone_admin)]$ system application-list
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------+
| cert-manager | 1.0-25 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.1-17 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-59 | oidc-auth-manifest | manifest.yaml | applied | completed |
| platform-integ-apps | 1.0-42 | platform-integration-manifest | manifest.yaml | applying | processing chart: stx-rbd-provisioner |
| | | | | | , overall completion: 25.0% |
| | | | | | |
| rook-ceph-apps | 1.0-13 | rook-ceph-manifest | manifest.yaml | uploaded | completed |
| vault | 1.0-22 | vault-manifest | vault_manifest.yaml | applied | completed |
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------+
when tried to apply again, it failed
Expected Behavior
after migration the app should be applied properly
Actual Behavior
platform-integ-apps fails to reach applied state
Reproducibility
100%
System Configuration
SX subcloud. Seen on AIO-SX standalone too
Branch/Pull Time/Commit
21.12
Last Pass
21.05
Timestamp/Logs
/var/log/armada/platform-integ-apps-apply_2021-12-07-21-12-42.log
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller [-] [chart=kube-system-rbd-provisioner]: Error while installing release stx-rbd-provisioner: grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
>---status = StatusCode.UNKNOWN
>---details = "release stx-rbd-provisioner failed: timed out waiting for the condition"
>---debug_error_string = "{"created":"@1638913371.198181944","description":"Error received from peer ipv4:127.0.0.1:24134","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"release stx-rbd-provisioner failed: timed out waiting for the condition","grpc_status":2}"
>
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller Traceback (most recent call last):
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 465, in install_release
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller metadata=self.metadata)
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 923, in __call__
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller return _end_unary_response_blocking(state, call, False, None)
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller raise _InactiveRpcError(state)
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >--status = StatusCode.UNKNOWN
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >--details = "release stx-rbd-provisioner failed: timed out waiting for the condition"
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >--debug_error_string = "{"created":"@1638913371.198181944","description":"Error received from peer ipv4:127.0.0.1:24134","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"release stx-rbd-provisioner failed: timed out waiting for the condition","grpc_status":2}"
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller >
2021-12-07 21:42:51.198 178 ERROR armada.handlers.tiller ^[[00m
2021-12-07 21:42:51.199 178 DEBUG armada.handlers.tiller [-] [chart=kube-system-rbd-provisioner]: Helm getting release status for release=stx-rbd-provisioner, version=0 get_release_status /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:531^[[00m
/var/logs/pods/kube-system_rbd-provisioner-759dfb8b6b-cfbnf_fedb37bb-ca78-4e48-93b2-9d14d98327da/rbd-provisioner/26.log
2021-12-07T21:44:52.687682317Z stderr F F1207 21:44:52.687211 1 main.go:80] Error getting server version: the server has asked for the client to provide credentials
Alarms
-
Test Activity
Regression
Workaround
-
More info:
The problem might be that certificate is missing controller-0's cluster host IP in SANs |
|