Activity log for bug #2000256

Date Who What changed Old value New value Message
2022-12-21 14:57:17 Gustavo Herzmann bug added bug
2022-12-21 19:50:46 Gustavo Herzmann description Brief Description ----------------- dcmanager audit process receives 401 errors when querying for Kubernetes version through sysinv-api. This happens after the kube-cert-rotation.sh script rotates the admin.conf certificate because the sysinv-api service is not restarted after the certificate renewal. This is related to https://bugs.launchpad.net/starlingx/+bug/1943080 Severity -------- <Minor: System/Feature is usable with minor issue> Steps to Reproduce ------------------ Wait for the admin.conf certificate to reach near expiration (1 year expiry, can be checked with kubeadm certs check-expiration). Fifteen days before the expiration date, kube-cert-rotation.sh will run (scheduled to run everyday at 24:10h due to a cronjob). Fifteen days after the rotation (when the old certificate expires), check dcmanager audit.log and sysinv.log for 401 errors. Expected Behavior ------------------ No errors after certificate renewal Actual Behavior ---------------- 401 errors after certificate renewal Reproducibility --------------- Happened once, but in theory it should be 100% reproducible, it's just hard to reproduce it due to the long expiry date. System Configuration -------------------- Distributed Cloud Branch/Pull Time/Commit ----------------------- 2021-06-09 Last Pass --------- Not tested before Timestamp/Logs -------------- cron.log: 2022-10-21T00:10:01.000 controller-0 CROND[2895533]: info (root) CMD (/usr/bin/kube-cert-rotation.sh) 2022-10-21T00:10:01.000 controller-0 CROND[2895539]: info (root) CMD (/usr/lib64/sa/sa1 1 1) 2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for serving the Kubernetes API renewed) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for the API server to connect to kubelet renewed) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for the front proxy client renewed) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed) 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the controller manager to use renewed) 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the scheduler manager to use renewed) 2022-10-21T00:10:05.000 controller-0 CROND[2895425]: info (root) CMDOUT (Service (sysinv-conductor) is restarting.) 2022-10-21T00:10:07.000 controller-0 CROND[2895425]: info (root) CMDOUT (Service (cert-mon) is restarting.) 2022-10-21T00:10:07.000 controller-0 CROND[2895425]: info (root) CMDOUT (FM_ERR_ENTITY_NOT_FOUND) audit.log: 2022-11-04 03:41:07.685 433961 INFO dcmanager.audit.subcloud_audit_manager [-] Triggered subcloud audit: patch=(True) firmware=(True) kube=(True) 2022-11-04 03:41:12.503 433961 WARNING cgtsclient.common.http [-] Request returned failure status. 2022-11-04 03:41:12.504 433961 ERROR dcmanager.audit.subcloud_audit_manager [-] Error in periodic subcloud audit loop: HTTPInternalServerError: (401) Reason: Unauthorized HTTP response headers: HTTPHeaderDict({'Date': 'Fri, 04 Nov 2022 03:41:12 GMT', 'Content-Length': '129', 'Content-Type': 'application/json'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401} sysinv.log: sysinv 2022-11-04 03:41:12.499 3494410 ERROR wsme.api [-] Server-side error: "(401) Reason: Unauthorized HTTP response headers: HTTPHeaderDict({'Date': 'Fri, 04 Nov 2022 03:41:12 GMT', 'Content-Length': '129', 'Content-Type': 'application/json'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401} ". Detail: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction result = f(self, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/sysinv/api/controllers/v1/kube_version.py", line 117, in get_all version_states = self._kube_operator.kube_get_version_states() File "/usr/lib64/python2.7/site-packages/sysinv/common/kubernetes.py", line 664, in kube_get_version_states cp_versions = self.kube_get_control_plane_versions() File "/usr/lib64/python2.7/site-packages/sysinv/common/kubernetes.py", line 619, in kube_get_control_plane_versions label_selector="node-role.kubernetes.io/master") File "/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 13437, in list_node (data) = self.list_node_with_http_info(**kwargs) Test Activity ------------- Found during normal use Workaround ---------- Restarting the sysinv-api service resolves the issue, it can be done by running: sudo sm-restart service sysinv-conductor Brief Description ----------------- dcmanager audit process receives 401 errors when querying for Kubernetes version through sysinv API. This happens after the kube-cert-rotation.sh script rotates the admin.conf certificate because the sysinv API service (sysinv-inv) is not restarted after the certificate renewal. This is related to https://bugs.launchpad.net/starlingx/+bug/1943080 Severity -------- <Minor: System/Feature is usable with minor issue> Steps to Reproduce ------------------ Wait for the admin.conf certificate to reach near expiration (1 year expiry, can be checked with kubeadm certs check-expiration). Fifteen days before the expiration date, kube-cert-rotation.sh will run (scheduled to run everyday at 24:10h due to a cronjob). Fifteen days after the rotation (when the old certificate expires), check dcmanager audit.log and sysinv.log for 401 errors. Expected Behavior ------------------ No errors after certificate renewal Actual Behavior ---------------- 401 errors after certificate renewal Reproducibility --------------- Happened once, but in theory it should be 100% reproducible, it's just hard to reproduce it due to the long expiry date. System Configuration -------------------- Distributed Cloud Branch/Pull Time/Commit ----------------------- 2021-06-09 Last Pass --------- Not tested before Timestamp/Logs -------------- cron.log: 2022-10-21T00:10:01.000 controller-0 CROND[2895533]: info (root) CMD (/usr/bin/kube-cert-rotation.sh) 2022-10-21T00:10:01.000 controller-0 CROND[2895539]: info (root) CMD (/usr/lib64/sa/sa1 1 1) 2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for serving the Kubernetes API renewed) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for the API server to connect to kubelet renewed) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for the front proxy client renewed) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed) 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the controller manager to use renewed) 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...) 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml') 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT () 2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the scheduler manager to use renewed) 2022-10-21T00:10:05.000 controller-0 CROND[2895425]: info (root) CMDOUT (Service (sysinv-conductor) is restarting.) 2022-10-21T00:10:07.000 controller-0 CROND[2895425]: info (root) CMDOUT (Service (cert-mon) is restarting.) 2022-10-21T00:10:07.000 controller-0 CROND[2895425]: info (root) CMDOUT (FM_ERR_ENTITY_NOT_FOUND) audit.log: 2022-11-04 03:41:07.685 433961 INFO dcmanager.audit.subcloud_audit_manager [-] Triggered subcloud audit: patch=(True) firmware=(True) kube=(True) 2022-11-04 03:41:12.503 433961 WARNING cgtsclient.common.http [-] Request returned failure status. 2022-11-04 03:41:12.504 433961 ERROR dcmanager.audit.subcloud_audit_manager [-] Error in periodic subcloud audit loop: HTTPInternalServerError: (401) Reason: Unauthorized HTTP response headers: HTTPHeaderDict({'Date': 'Fri, 04 Nov 2022 03:41:12 GMT', 'Content-Length': '129', 'Content-Type': 'application/json'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401} sysinv.log: sysinv 2022-11-04 03:41:12.499 3494410 ERROR wsme.api [-] Server-side error: "(401) Reason: Unauthorized HTTP response headers: HTTPHeaderDict({'Date': 'Fri, 04 Nov 2022 03:41:12 GMT', 'Content-Length': '129', 'Content-Type': 'application/json'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401} ". Detail: Traceback (most recent call last):   File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction     result = f(self, *args, **kwargs)   File "/usr/lib64/python2.7/site-packages/sysinv/api/controllers/v1/kube_version.py", line 117, in get_all     version_states = self._kube_operator.kube_get_version_states()   File "/usr/lib64/python2.7/site-packages/sysinv/common/kubernetes.py", line 664, in kube_get_version_states     cp_versions = self.kube_get_control_plane_versions()   File "/usr/lib64/python2.7/site-packages/sysinv/common/kubernetes.py", line 619, in kube_get_control_plane_versions     label_selector="node-role.kubernetes.io/master")   File "/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 13437, in list_node     (data) = self.list_node_with_http_info(**kwargs) Test Activity ------------- Found during normal use Workaround ---------- Restarting the sysinv-api service resolves the issue, it can be done by running: sudo sm-restart service sysinv-conductor
2022-12-21 19:51:05 Gustavo Herzmann starlingx: assignee Gustavo Herzmann (gherzman)
2022-12-21 19:51:07 Gustavo Herzmann starlingx: status New In Progress
2023-01-16 14:33:43 Ghada Khalil tags stx.distcloud