Brief Description
Subcloud k8s upgrade orchestration failed as it had a problem getting kubelet versions.
$ dcmanager strategy-step list
| subcloud229 | 1 | failed | kube applying vim kube upgrade strategy: (kube-upgrade) Vim strategy apply failed. Unexpected State: aborted. | 2022-12-08 16:40:07.163173 | 2022-12-08 17:05:26.93196
0 |
Severity
Major.
Steps to Reproduce
System Controller running with 1000 subclouds
Check there's 50ms delay between System Controller and subclouds. If not, add delay using Delayomatic.
Apply Subcloud k8s upgrade orchestration (prerequisites: Install System Controller and subclouds with 1.23 K8s then upgrade K8s on the system contoller first).
$ dcmanager kube-upgrade-strategy create --max-parallel-subclouds 250 --subcloud-apply-type parallel --to-version v1.24.4
$ dcmanager kube-upgrade-strategy apply
Expected Behavior
Subcloud K8s upgraded to 1.24.4
Actual Behavior
K8s upgrade failed
Reproducibility
6 out of 1000 subclouds.
System Configuration
Distributed Cloud (DC1000-2)
Last Pass
NA
Timestamp/Logs
// Collect all
System Controller: /folk/cgts_logs/CGTS-41773/ALL_NODES_20221208.172849.tar
Subcloud: /folk/cgts_logs/CGTS-41773/subcloud229_20221208.174415.tar
...
sysinv 2022-12-08 17:02:50.620 74053 WARNING urllib3.connectionpool [-] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f62567a3640>: Failed to establish a new connection: [Errno 111] ECONNREFUSED')': /api/v1/nodes
sysinv 2022-12-08 17:02:50.620 74053 WARNING urllib3.connectionpool [-] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f62569e7d60>: Failed to establish a new connection: [Errno 111] ECONNREFUSED')': /api/v1/nodes
sysinv 2022-12-08 17:02:50.621 74053 WARNING urllib3.connectionpool [-] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f62569e7640>: Failed to establish a new connection: [Errno 111] ECONNREFUSED')': /api/v1/nodes
sysinv 2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager [-] Problem getting kubelet versions.: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='aefd::1', port=6443): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f62569e76d0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager Traceback (most recent call last):
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 169, in _new_conn
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager conn = connection.create_connection(
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 96, in create_connection
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager raise err
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 86, in create_connection
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager sock.connect(sa)
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 253, in connect
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager socket_checkerr(fd)
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 51, in socket_checkerr
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager raise socket.error(err, errno.errorcode[err])
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager ConnectionRefusedError: [Errno 111] ECONNREFUSED
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager During handling of the above exception, another exception occurred:
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager Traceback (most recent call last):
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager httplib_response = self._make_request(
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 382, in _make_request
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager self._validate_conn(conn)
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1012, in _validate_conn
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager conn.connect()
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 353, in connect
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager conn = self._new_conn()
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 181, in _new_conn
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager raise NewConnectionError(
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f62569e76d0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager During handling of the above exception, another exception occurred:
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager Traceback (most recent call last):
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 14818, in kube_upgrade_kubelet
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager kubelet_versions = kube_operator.kube_get_kubelet_versions()
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/sysinv/common/kubernetes.py", line 893, in kube_get_kubelet_versions
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager api_response = c.list_node()
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/kubernetes/client/api/core_v1_api.py", line 16414, in list_node
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager return self.list_node_with_http_info(**kwargs) # noqa: E501
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/kubernetes/client/api/core_v1_api.py", line 16517, in list_node_with_http_info
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager return self.api_client.call_api(
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/kubernetes/client/api_client.py", line 348, in call_api
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager return self.__call_api(resource_path, method,
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/kubernetes/client/api_client.py", line 180, in __call_api
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager response_data = self.request(
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/kubernetes/client/api_client.py", line 373, in request
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager return self.rest_client.GET(url,
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/kubernetes/client/rest.py", line 239, in GET
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager return self.request("GET", url,
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/kubernetes/client/rest.py", line 212, in request
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager r = self.pool_manager.request(method, url,
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/request.py", line 74, in request
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager return self.request_encode_url(
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/request.py", line 96, in request_encode_url
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager return self.urlopen(method, url, **extra_kw)
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/poolmanager.py", line 375, in urlopen
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager response = conn.urlopen(method, u.request_uri, **kw)
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 783, in urlopen
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager return self.urlopen(
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 783, in urlopen
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager return self.urlopen(
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 783, in urlopen
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager return self.urlopen(
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 755, in urlopen
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager retries = retries.increment(
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 574, in increment
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager raise MaxRetryError(_pool, url, error or ResponseError(cause))
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='aefd::1', port=6443): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f62569e76d0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
2022-12-08 17:02:50.622 74053 ERROR sysinv.conductor.manager
sysinv 2022-12-08 17:03:00.634 74053 WARNING urllib3.connectionpool [-] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f62569e7af0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED')': /api/v1/nodes
sysinv 2022-12-08 17:03:00.635 74053 WARNING urllib3.connectionpool [-] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f62569e7280>: Failed to establish a new connection: [Errno 111] ECONNREFUSED')': /api/v1/nodes
sysinv 2022-12-08 17:03:00.637 74053 WARNING urllib3.connectionpool [-] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f6257bf8220>: Failed to establish a new connection: [Errno 111] ECONNREFUSED')': /api/v1/nodes
sysinv 2022-12-08 17:03:00.638 74053 ERROR sysinv.conductor.manager [-] Problem getting kubelet versions.: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='aefd::1', port=6443): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f6256711280>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
...
...
sysinv 2022-12-08 17:03:28.592 79432 ERROR wsme.api [-] Server-side error: "min() arg is an empty sequence". Detail:
Traceback (most recent call last): File "/usr/lib/python3/dist-packages/wsmeext/pecan.py", line 84, in callfunction
result = f(self, *args, **kwargs) File "/usr/lib/python3/dist-packages/sysinv/api/controllers/v1/kube_host_upgrade.py", line 159, in get_all
cp_versions = self._kube_operator.kube_get_control_plane_versions() File "/usr/lib/python3/dist-packages/sysinv/common/retrying.py", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw) File "/usr/lib/python3/dist-packages/sysinv/common/retrying.py", line 206, in call
return attempt.get(self._wrap_exception) File "/usr/lib/python3/dist-packages/sysinv/common/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2]) File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
raise value File "/usr/lib/python3/dist-packages/sysinv/common/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False) File "/usr/lib/python3/dist-packages/sysinv/common/kubernetes.py", line 883, in kube_get_control_plane_versions
node_versions[node_name] = str(min(versions))ValueError: min() arg is an empty sequence
Alarms
NA
Test Activity
Scalability Testing
Workaround
Re-apply the k8s strategy.
Fix proposed to branch: master /review. opendev. org/c/starlingx /config/ +/871114
Review: https:/