Brief Description
-----------------
dcmanager audit process receives 401 errors when querying for Kubernetes version through sysinv API. This happens after the kube-cert-rotation.sh script rotates the admin.conf certificate because the sysinv API service (sysinv-inv) is not restarted after the certificate renewal.
This is related to https://bugs.launchpad.net/starlingx/+bug/1943080
Severity
--------
<Minor: System/Feature is usable with minor issue>
Steps to Reproduce
------------------
Wait for the admin.conf certificate to reach near expiration (1 year expiry, can be checked with kubeadm certs check-expiration).
Fifteen days before the expiration date, kube-cert-rotation.sh will run (scheduled to run everyday at 24:10h due to a cronjob).
Fifteen days after the rotation (when the old certificate expires), check dcmanager audit.log and sysinv.log for 401 errors.
Expected Behavior
------------------
No errors after certificate renewal
Actual Behavior
----------------
401 errors after certificate renewal
Reproducibility
---------------
Happened once, but in theory it should be 100% reproducible, it's just hard to reproduce it due to the long expiry date.
System Configuration
--------------------
Distributed Cloud
Branch/Pull Time/Commit
-----------------------
2021-06-09
Last Pass
---------
Not tested before
Timestamp/Logs
--------------
cron.log:
2022-10-21T00:10:01.000 controller-0 CROND[2895533]: info (root) CMD (/usr/bin/kube-cert-rotation.sh)
2022-10-21T00:10:01.000 controller-0 CROND[2895539]: info (root) CMD (/usr/lib64/sa/sa1 1 1)
2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for serving the Kubernetes API renewed)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for the API server to connect to kubelet renewed)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for the front proxy client renewed)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed)
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the controller manager to use renewed)
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the scheduler manager to use renewed)
2022-10-21T00:10:05.000 controller-0 CROND[2895425]: info (root) CMDOUT (Service (sysinv-conductor) is restarting.)
2022-10-21T00:10:07.000 controller-0 CROND[2895425]: info (root) CMDOUT (Service (cert-mon) is restarting.)
2022-10-21T00:10:07.000 controller-0 CROND[2895425]: info (root) CMDOUT (FM_ERR_ENTITY_NOT_FOUND)
audit.log:
2022-11-04 03:41:07.685 433961 INFO dcmanager.audit.subcloud_audit_manager [-] Triggered subcloud audit: patch=(True) firmware=(True) kube=(True)
2022-11-04 03:41:12.503 433961 WARNING cgtsclient.common.http [-] Request returned failure status.
2022-11-04 03:41:12.504 433961 ERROR dcmanager.audit.subcloud_audit_manager [-] Error in periodic subcloud audit loop: HTTPInternalServerError: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Date': 'Fri, 04 Nov 2022 03:41:12 GMT', 'Content-Length': '129', 'Content-Type': 'application/json'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
sysinv.log:
sysinv 2022-11-04 03:41:12.499 3494410 ERROR wsme.api [-] Server-side error: "(401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Date': 'Fri, 04 Nov 2022 03:41:12 GMT', 'Content-Length': '129', 'Content-Type': 'application/json'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
". Detail:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction
result = f(self, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/sysinv/api/controllers/v1/kube_version.py", line 117, in get_all
version_states = self._kube_operator.kube_get_version_states()
File "/usr/lib64/python2.7/site-packages/sysinv/common/kubernetes.py", line 664, in kube_get_version_states
cp_versions = self.kube_get_control_plane_versions()
File "/usr/lib64/python2.7/site-packages/sysinv/common/kubernetes.py", line 619, in kube_get_control_plane_versions
label_selector="node-role.kubernetes.io/master")
File "/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 13437, in list_node
(data) = self.list_node_with_http_info(**kwargs)
Test Activity
-------------
Found during normal use
Workaround
----------
Restarting the sysinv-api service resolves the issue, it can be done by running:
sudo sm-restart service sysinv-conductor