Brief Description
Standalone SX upgrade k8s from 1.21 to 1.24 after platform upgrade failed at upgrade-aborting-failed
There are 2 problems here:
1) 1.22.5 control-plane upgrade failed
2) upgrade-abort-failed
Severity:
Critical
Steps to Reproduce:
Multi-version Kubernetes Version Upgrade Cloud Orchestration Strategy Procedure (Simplex)
sw-manager kube-upgrade-strategy create --to-version v1.24.4
sw-manager kube-upgrade-strategy apply
Expected Behavior:
K8s upgrade must be successful
Actual Behavior:
[sysadmin@controller-0 ~(keystone_admin)]$ sw-manager kube-upgrade-strategy show
Strategy Kubernetes Upgrade Strategy:
strategy-uuid: 496f68ef-66dd-41df-b1a8-394f22477ac0
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: abort
current-phase-completion: 100%
state: abort-failed
apply-result: timed-out
apply-reason:
abort-result: failed
abort-reason: Unexpected state: upgrade-aborting
[sysadmin@controller-0 scratch(keystone_admin)]$ system kube-upgrade-show
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | ee87012f-f2d2-418c-8156-2b399aece993 |
| from_version | v1.21.8 |
| to_version | v1.24.4 |
| state | upgrade-aborting-failed |
| created_at | 2023-10-22T03:31:54.187459+00:00 |
| updated_at | 2023-10-22T13:00:42.916532+00:00 |
+--------------+--------------------------------------+
Reproducibility:
100%
System Configuration:
AIO-SX
Last Pass
NA
Timestamp/Logs
sysinv 2023-10-22 09:31:24.175 16390 ERROR sysinv.puppet.common [-] Failed to execute runtime manifest for host abcd:204::3: subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/opt/platform/puppet/22.12/hieradata', 'abcd:204::3', 'controller', 'runtime', '/tmp/tmpxk901ovy.yaml']' returned non-zero exit status 1.
2023-10-22 09:31:24.175 16390 ERROR sysinv.puppet.common Traceback (most recent call last):
2023-10-22 09:31:24.175 16390 ERROR sysinv.puppet.common File "/usr/lib/python3/dist-packages/sysinv/puppet/common.py", line 91, in puppet_apply_manifest
2023-10-22 09:31:24.175 16390 ERROR sysinv.puppet.common subprocess.check_call(cmd, stdout=fnull, stderr=fnull) # pylint: disable=not-callable
2023-10-22 09:31:24.175 16390 ERROR sysinv.puppet.common File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
2023-10-22 09:31:24.175 16390 ERROR sysinv.puppet.common raise CalledProcessError(retcode, cmd)
2023-10-22 09:31:24.175 16390 ERROR sysinv.puppet.common subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/opt/platform/puppet/22.12/hieradata', 'abcd:204::3', 'controller', 'runtime', '/tmp/tmpxk901ovy.yaml']' returned non-zero exit status 1.
sysinv 2023-10-22 09:31:24.192 16390 INFO sysinv.agent.manager [-] Manifests application failed. Reporting failure to conductor. Details: {'personalities': ['controller'], 'classes': ['platform::kubernetes::upgrade_abort'], 'report_status': 'upgrade_abort', 'force': False, 'config_type': 'config_apply_runtime_manifest', 'created_at': '2023-10-22T03:49:37.552467', 'host_uuids': ['be9b4716-05f2-432d-9de5-334f7c3d2b49'], 'host_uuid': 'be9b4716-05f2-432d-9de5-334f7c3d2b49'}.
sysinv 2023-10-22 09:31:24.193 16390 ERROR sysinv.openstack.common.rpc.common [-] Returning exception Failed to execute runtime manifest for host abcd:204::3 to caller: sysinv.common.exception.SysinvException: Failed to execute runtime manifest for host abcd:204::3
sysinv 2023-10-22 09:31:24.193 16390 ERROR sysinv.openstack.common.rpc.common [-] ['Traceback (most recent call last):\n', ' File "/usr/lib/python3/dist-packages/sysinv/puppet/common.py", line 91, in puppet_apply_manifest\n subprocess.check_call(cmd, stdout=fnull, stderr=fnull) # pylint: disable=not-callable\n', ' File "/usr/lib/python3.9/subprocess.py", line 373, in check_call\n raise CalledProcessError(retcode, cmd)\n', "subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/opt/platform/puppet/22.12/hieradata', 'abcd:204::3', 'controller', 'runtime', '/tmp/tmpxk901ovy.yaml']' returned non-zero exit status 1.\n", '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', ' File "/usr/lib/python3/dist-packages/sysinv/agent/manager.py", line 1926, in config_apply_runtime_manifest\n self._apply_runtime_manifest(config_dict)\n', ' File "/usr/lib/python3/dist-packages/sysinv/agent/manager.py", line 1994, in _apply_runtime_manifest\n puppet.puppet_apply_manifest(self._mgmt_ip,\n', ' File "/usr/lib/python3/dist-packages/sysinv/puppet/common.py", line 96, in puppet_apply_manifest\n raise exception.SysinvException(_(msg))\n', 'sysinv.common.exception.SysinvException: Failed to execute runtime manifest for host abcd:204::3\n']: sysinv.common.exception.SysinvException: Failed to execute runtime manifest for host abcd:204::3
sysinv 2023-10-22 09:31:24.200 20887 WARNING sysinv.conductor.manager [-] k8s upgrade abort failed 3 times, giving up
Alarms:
NA
Test Activity:
Feature Testing
Workaround:
NA
Fix proposed to branch: master /review. opendev. org/c/starlingx /stx-puppet/ +/899743
Review: https:/