Upgrade activation failed, leading to swact
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Chris Friesen |
Bug Description
Brief Description
DC Central cloud upgrade activation failed. There was a swact soon after the activation and activation was in a failed state.
From the investigation below information was captured.
code related to activation failure
https:/
This fails the puppet manifest as the Kubernetes isn’t up. I assume it won’t come up until etcd is restarted, which would normally be the next action in the puppet manifest.
Swacting between the controllers restarts etcd
**
2021-12-08 19:37:44,664 p=601519 u=root | changed: [localhost] => (item=apiserver
2021-12-08 19:37:44,750 p=601519 u=root | changed: [localhost] => (item=apiserver
2021-12-08 19:37:44,821 p=601519 u=root | TASK [Create list of etcd classes to pass to puppet] *******
2021-12-08 19:37:44,822 p=601519 u=root | Wednesday 08 December 2021 19:37:44 +0000 (0:00:00.273) 0:00:09.706 ****
2021-12-08 19:37:45,111 p=601519 u=root | changed: [localhost]
2021-12-08 19:37:45,180 p=601519 u=root | TASK [Applying puppet for enabling etcd security] *******
2021-12-08 19:37:45,180 p=601519 u=root | Wednesday 08 December 2021 19:37:45 +0000 (0:00:00.358) 0:00:10.064 ****
2021-12-08 19:38:06,749 p=601519 u=root | fatal: [localhost]: FAILED! => changed=true
cmd:
- /usr/local/
- /opt/platform/
- fd01:1::3
- controller
- runtime
- /tmp/etcd.yml
delta: '0:00:21.453324'
end: '2021-12-08 19:38:06.732159'
msg: non-zero return code
rc: 1
start: '2021-12-08 19:37:45.278835'
stderr: ''
stderr_lines: []
stdout: |-
Applying puppet runtime manifest...
[WARNING]
Warnings found. See /var/log/
stdout_lines: <omitted>
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
**
Severity
Major
Steps to Reproduce
Follow upgrade procedure to upgrade DC central cloud from 21.05 21.12 .
Upgrade activation failure was seen during the upgrade activation step.
Expected Behavior
Upgrade activation success
Actual Behavior
As per description upgrade activation failed
Reproducibility
**
System Configuration
DC-1 Distributed system
Branch/Pull Time/Commit
2021-12-06_23-00-09
Last Pass
"2021-12-
description: | updated |
Changed in starlingx: | |
assignee: | nobody → Chris Friesen (cbf123) |
importance: | Undecided → Medium |
tags: | added: stx.containers |
tags: | added: stx.6.0 stx.cherrypickneeded |
tags: |
added: in-r-stx60 removed: stx.cherrypickneeded |
The fix delivered for bug 1953183 in the Dec 6 load appears to have triggered this issue, as noted in puppet.log on upgrade-activate:
2021-12- 08T19:38: 06.519 ^[[mNotice: 2021-12-08 19:38:06 +0000 /Stage[ main]/Platform: :Etcd:: Upgrade: :Runtime/ Platform: :Sm::Restart[ etcd]/Exec[ sm-restart- etcd]: Dependency Exec[wait_ for_kube_ api_server] has failures: true^[[0m
Workaround in the lab to progress the upgrade was to revert the changes in https:/ /ala-codereview ti-prod. wrs.com/ plugins/ gitiles/ cgcs/opendev. org.starlingx. stx-puppet/ +/89f051d03ecad 99f857088320dcd a79a5a11fcb4 and rerun the upgrade-activate.