Cert-manager app failed to apply after controller-1 of the system controller is upgraded
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Won't Fix
|
Medium
|
Dan Voiculeasa |
Bug Description
Brief Description
-----------------
After the controller-1 of the system controller is upgraded to 20.06, cert-manager app is in apply-failed state.
Severity
--------
Major
Steps to Reproduce
------------------
- import 20.06 load
- apply upgrade patch to enable upgrade to 20.06
- Execute the following commands to upgrade the system controller
- system upgrade-start --force
Note: the --force option allows upgrade while the system has non-service impacting alarms
- system host-lock controller-1
- system host-upgrade controller-1
Note: to monitor the upgrade, run command "system upgrade-show"
- system host-unlock cotroller-1
- system host-swact controller-0
- system host-lock controller-0
- system host-upgrade controller-0
- system host-unlock controller-0
Expected Behavior
------------------
All applied apps remain applied after system controller is upgraded to 20.06
Actual Behavior
----------------
Both platform-integ-apps and cert-manager apps were in the failed-state after the upgrade. However, only the platform-integ-apps could be reapplied successfully.
Armada apply logs:
~~~~~~~~~~~~~~~~~~
2020-06-12 00:05:01.158 16 ERROR armada.cli [-] Caught unexpected exception: grpc._channel.
status = StatusCode.
details = "Connect Failed"
>
2020-06-12 00:05:01.158 16 ERROR armada.cli Traceback (most recent call last):
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/local/
2020-06-12 00:05:01.158 16 ERROR armada.cli self.invoke()
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/local/
2020-06-12 00:05:01.158 16 ERROR armada.cli resp = self.handle(
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/local/
2020-06-12 00:05:01.158 16 ERROR armada.cli return future.result()
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/lib/
2020-06-12 00:05:01.158 16 ERROR armada.cli return self.__get_result()
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/lib/
2020-06-12 00:05:01.158 16 ERROR armada.cli raise self._exception
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/lib/
2020-06-12 00:05:01.158 16 ERROR armada.cli result = self.fn(*self.args, **self.kwargs)
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/local/
2020-06-12 00:05:01.158 16 ERROR armada.cli return armada.sync()
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/local/
2020-06-12 00:05:01.158 16 ERROR armada.cli known_releases = self.tiller.
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/local/
2020-06-12 00:05:01.158 16 ERROR armada.cli releases = get_results()
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/local/
2020-06-12 00:05:01.158 16 ERROR armada.cli for message in response:
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/local/
2020-06-12 00:05:01.158 16 ERROR armada.cli return self._next()
2020-06-12 00:05:01.158 16 ERROR armada.cli File "/usr/local/
2020-06-12 00:05:01.158 16 ERROR armada.cli raise self
2020-06-12 00:05:01.158 16 ERROR armada.cli grpc._channel.
2020-06-12 00:05:01.158 16 ERROR armada.cli status = StatusCode.
2020-06-12 00:05:01.158 16 ERROR armada.cli details = "Connect Failed"
2020-06-12 00:05:01.158 16 ERROR armada.cli debug_error_string = "{"created"
After several manual reapply that failed, another attempt to reapply the app following the removal of armada_service (sudo docker rm armada_service), the apply went a bit further but still failed
2020-06-15 15:54:59.588 16 ERROR armada.
status = StatusCode.UNKNOWN
details = "timed out waiting for the condition"
>
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.588 16 ERROR armada.
2020-06-15 15:54:59.589 16 DEBUG armada.
2020-06-15 15:54:59.825 16 DEBUG armada.
info {
status {
code: FAILED
}
first_deployed {
seconds: 1591892571
nanos: 383032032
}
last_deployed {
seconds: 1592234698
nanos: 398297163
}
Description: "Upgrade \"cm-cert-manager\" failed: timed out waiting for the condition"
}
namespace: "cert-manager"
get_release_status /usr/local/
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.826 16 ERROR armada.
2020-06-15 15:54:59.827 16 ERROR armada.
2020-06-15 15:55:00.180 16 INFO armada.
2020-06-15 15:55:00.186 16 ERROR armada.cli [-] Caught internal exception: armada.
2020-06-15 15:55:00.186 16 ERROR armada.cli Traceback (most recent call last):
2020-06-15 15:55:00.186 16 ERROR armada.cli File "/usr/local/
2020-06-15 15:55:00.186 16 ERROR armada.cli self.invoke()
2020-06-15 15:55:00.186 16 ERROR armada.cli File "/usr/local/
2020-06-15 15:55:00.186 16 ERROR armada.cli resp = self.handle(
2020-06-15 15:55:00.186 16 ERROR armada.cli File "/usr/local/
2020-06-15 15:55:00.186 16 ERROR armada.cli return future.result()
2020-06-15 15:55:00.186 16 ERROR armada.cli File "/usr/lib/
2020-06-15 15:55:00.186 16 ERROR armada.cli return self.__get_result()
2020-06-15 15:55:00.186 16 ERROR armada.cli File "/usr/lib/
2020-06-15 15:55:00.186 16 ERROR armada.cli raise self._exception
2020-06-15 15:55:00.186 16 ERROR armada.cli File "/usr/lib/
2020-06-15 15:55:00.186 16 ERROR armada.cli result = self.fn(*self.args, **self.kwargs)
2020-06-15 15:55:00.186 16 ERROR armada.cli File "/usr/local/
2020-06-15 15:55:00.186 16 ERROR armada.cli return armada.sync()
2020-06-15 15:55:00.186 16 ERROR armada.cli File "/usr/local/
2020-06-15 15:55:00.186 16 ERROR armada.cli raise armada_
2020-06-15 15:55:00.186 16 ERROR armada.cli armada.
2020-06-15 15:55:00.186 16 ERROR armada.cli ^[[00m
Reproducibility
---------------
Seen once
System Configuration
-------
IPv6 distributed cloud
Branch/Pull Time/Commit
-------
Jun 6th load
Last Pass
---------
N/A. This is the first time distributed cloud upgrade is performed.
Timestamp/Logs
--------------
Test Activity
-------------
Developer Testing
tags: | added: stx.containers |
tags: | added: stx.5.0 |
Changed in starlingx: | |
status: | New → Triaged |
assignee: | nobody → Dan Voiculeasa (dvoicule) |
importance: | Undecided → Medium |
Changed in starlingx: | |
status: | Triaged → Won't Fix |
This was seen in load 2020-06-15_20-00-00 in ip-1-4. ----+-- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ----+-- ------- ------- ------- --+---- ------+ ------- ------- ----+ ----+-- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ----+-- ------- ------- ------- --+---- ------+ ------- ------- ----+ =cert- | major | 2020-06-16T20:08 | ----+-- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ----+-- ------- ------- ------- --+---- ------+ ------- ------- ----+ controller- 1 ~(keystone_admin)]$ timed out waiting for input: auto-logout
fm alarm-list
+------
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 400.003 | Evaluation license key will expire on 30-dec-2020; there are 196 days remaining in this | host=controller-0 | minor | 2020-06-17T14:00 |
| | evaluation | | | :30.211280 |
| | | | | |
| 400.003 | Evaluation license key will expire on 30-dec-2020; there are 196 days remaining in this | host=controller-1 | minor | 2020-06-17T14:00 |
| | evaluation | | | :25.009131 |
| | | | | |
| 750.002 | Application Apply Failure | k8s_application
| | | manager | | :15.049402 |
| | | | | |
| 500.101 | Developer patch certificate is enabled | host=controller | critical | 2020-06-16T19:37 |
| | | | | :54.333533 |
| | | | | |
| 900.005 | System Upgrade in progress. | host=controller | minor | 2020-06-16T19:37 |
| | | | | :04.041417 |
| | | | | |
+------
[sysadmin@
Connect...