dcmanager receives 401 when querying Kubernetes versions through sysinv-api after admin.conf certificate renewal

Bug #2000256 reported by Gustavo Herzmann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
In Progress
Undecided
Gustavo Herzmann

Bug Description

Brief Description
-----------------
dcmanager audit process receives 401 errors when querying for Kubernetes version through sysinv API. This happens after the kube-cert-rotation.sh script rotates the admin.conf certificate because the sysinv API service (sysinv-inv) is not restarted after the certificate renewal.

This is related to https://bugs.launchpad.net/starlingx/+bug/1943080

Severity
--------
<Minor: System/Feature is usable with minor issue>

Steps to Reproduce
------------------
Wait for the admin.conf certificate to reach near expiration (1 year expiry, can be checked with kubeadm certs check-expiration).
Fifteen days before the expiration date, kube-cert-rotation.sh will run (scheduled to run everyday at 24:10h due to a cronjob).
Fifteen days after the rotation (when the old certificate expires), check dcmanager audit.log and sysinv.log for 401 errors.

Expected Behavior
------------------
No errors after certificate renewal

Actual Behavior
----------------
401 errors after certificate renewal

Reproducibility
---------------
Happened once, but in theory it should be 100% reproducible, it's just hard to reproduce it due to the long expiry date.

System Configuration
--------------------
Distributed Cloud

Branch/Pull Time/Commit
-----------------------
2021-06-09

Last Pass
---------
Not tested before

Timestamp/Logs
--------------
cron.log:
2022-10-21T00:10:01.000 controller-0 CROND[2895533]: info (root) CMD (/usr/bin/kube-cert-rotation.sh)
2022-10-21T00:10:01.000 controller-0 CROND[2895539]: info (root) CMD (/usr/lib64/sa/sa1 1 1)
2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:01.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for serving the Kubernetes API renewed)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for the API server to connect to kubelet renewed)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate for the front proxy client renewed)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:02.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed)
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the controller manager to use renewed)
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT ()
2022-10-21T00:10:03.000 controller-0 CROND[2895425]: info (root) CMDOUT (certificate embedded in the kubeconfig file for the scheduler manager to use renewed)
2022-10-21T00:10:05.000 controller-0 CROND[2895425]: info (root) CMDOUT (Service (sysinv-conductor) is restarting.)
2022-10-21T00:10:07.000 controller-0 CROND[2895425]: info (root) CMDOUT (Service (cert-mon) is restarting.)
2022-10-21T00:10:07.000 controller-0 CROND[2895425]: info (root) CMDOUT (FM_ERR_ENTITY_NOT_FOUND)

audit.log:
2022-11-04 03:41:07.685 433961 INFO dcmanager.audit.subcloud_audit_manager [-] Triggered subcloud audit: patch=(True) firmware=(True) kube=(True)
2022-11-04 03:41:12.503 433961 WARNING cgtsclient.common.http [-] Request returned failure status.
2022-11-04 03:41:12.504 433961 ERROR dcmanager.audit.subcloud_audit_manager [-] Error in periodic subcloud audit loop: HTTPInternalServerError: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Date': 'Fri, 04 Nov 2022 03:41:12 GMT', 'Content-Length': '129', 'Content-Type': 'application/json'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

sysinv.log:
sysinv 2022-11-04 03:41:12.499 3494410 ERROR wsme.api [-] Server-side error: "(401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Date': 'Fri, 04 Nov 2022 03:41:12 GMT', 'Content-Length': '129', 'Content-Type': 'application/json'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

". Detail:
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction
    result = f(self, *args, **kwargs)

  File "/usr/lib64/python2.7/site-packages/sysinv/api/controllers/v1/kube_version.py", line 117, in get_all
    version_states = self._kube_operator.kube_get_version_states()

  File "/usr/lib64/python2.7/site-packages/sysinv/common/kubernetes.py", line 664, in kube_get_version_states
    cp_versions = self.kube_get_control_plane_versions()

  File "/usr/lib64/python2.7/site-packages/sysinv/common/kubernetes.py", line 619, in kube_get_control_plane_versions
    label_selector="node-role.kubernetes.io/master")

  File "/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 13437, in list_node
    (data) = self.list_node_with_http_info(**kwargs)

Test Activity
-------------
Found during normal use

Workaround
----------
Restarting the sysinv-api service resolves the issue, it can be done by running:
sudo sm-restart service sysinv-conductor

description: updated
Changed in starlingx:
assignee: nobody → Gustavo Herzmann (gherzman)
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.distcloud
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.