Kube root CA upgrade strategy failed - kube-rootca-host-update phase trust-new-ca rejected

Bug #1978365 reported by Kaustubh Dhokte
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Kaustubh Dhokte

Bug Description

*+Brief Description+*

Kube root CA orchestration failed to upgrade.

{code:java}
Logs from subcloud1 nfv-vim.log
2022-05-17T20:46:55.932 controller-0 VIM_Thread[1371003] INFO nfvi_infrastructure_api.py.1086 Existing Host state for controller-0 is updated-host-update-certs
2022-05-17T20:46:56.060 controller-0 VIM_Thread[1371003] ERROR Caught API exception while trying kube-rootca-update-host. error=[OpenStack Rest-API Exception: method=POST, url=https://[2620:10a:a001:ac12::42]:6386/v1/ihosts/ca8cc06a-7491-4cc0-b65c-e3f211644489/kube_update_ca , headers={'Content-Type': 'application/json', 'User-Agent': 'vim/1.0'}, body={"phase": "trust-new-ca"}, status_code=400, reason=HTTP Error 400: Bad Request, response_headers=[('Date', 'Tue, 17 May 2022 20:46:56 GMT'), ('Content-Length', '199'), ('Strict-Transport-Security', 'max-age=63072000; includeSubDomains'), ('Content-Type', 'application/json')], response_body={"error_message": "{\"debuginfo\": null, \"faultcode\": \"Client\", \"faultstring\": \"kube-rootca-host-update phase trust-new-ca rejected: failed to get new root CA cert secret
from kubernetes.\"}"}]
Traceback (most recent call last):

dcmanager kube-rootca-update-strategy show
+------------------------+----------------------------+
| Field | Value |
+------------------------+----------------------------+
| strategy type | kube-rootca-update |
| subcloud apply type | None |
| max parallel subclouds | None |
| stop on failure | False |
| state | failed |
| created_at | 2022-05-17 21:06:04.824714 |
| updated_at | 2022-05-17 21:08:37.273878 |
+------------------------+----------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
compute-0 Ready <none> 6d20h v1.21.8
compute-1 Ready <none> 6d20h v1.21.8
compute-2 Ready <none> 6d20h v1.21.8
compute-3 Ready <none> 6d20h v1.21.8
compute-4 Ready <none> 6d20h v1.21.8
compute-5 Ready <none> 6d20h v1.21.8
controller-0 Ready control-plane,master 6d21h v1.21.8
controller-1 Ready control-plane,master 6d21h v1.21.8

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get clusterrole
NAME CREATED AT
admin 2022-05-11T18:41:52Z
armada-api-runner 2022-05-11T18:42:12Z
calico-kube-controllers 2022-05-11T18:41:56Z
calico-node 2022-05-11T18:41:56Z
cephfs-provisioner 2022-05-11T19:36:42Z
cluster-admin 2022-05-11T18:41:52Z
cm-cert-manager-cainjector 2022-05-11T18:46:03Z
cm-cert-manager-controller-approve:cert-manager-io 2022-05-11T18:46:03Z
cm-cert-manager-controller-certificates 2022-05-11T18:46:03Z
cm-cert-manager-controller-certificatesigningrequests 2022-05-11T18:46:03Z
cm-cert-manager-controller-challenges 2022-05-11T18:46:03Z
cm-cert-manager-controller-clusterissuers 2022-05-11T18:46:03Z
cm-cert-manager-controller-ingress-shim 2022-05-11T18:46:03Z
cm-cert-manager-controller-issuers 2022-05-11T18:46:03Z
cm-cert-manager-controller-orders 2022-05-11T18:46:03Z
cm-cert-manager-edit 2022-05-11T18:46:03Z
cm-cert-manager-view 2022-05-11T18:46:03Z
cm-cert-manager-webhook:subjectaccessreviews 2022-05-11T18:46:03Z
edit 2022-05-11T18:41:52Z
ic-nginx-ingress-ingress-nginx 2022-05-11T18:45:25Z
kubeadm:get-nodes 2022-05-11T18:41:54Z
manager-role 2022-05-11T18:46:36Z
mon-elastic-services 2022-05-13T15:18:15Z
mon-filebeat-cluster-role 2022-05-13T15:38:46Z
mon-ingress-nginx 2022-05-13T15:15:28Z
mon-kube-state-metrics 2022-05-13T15:16:00Z
mon-metricbeat-cluster-role 2022-05-13T15:40:14Z
multus 2022-05-11T18:41:59Z
platform-deployment-manager-proxy-role 2022-05-11T18:46:36Z
privileged-psp-user 2022-05-11T18:42:02Z
rbd-provisioner 2022-05-11T19:36:16Z
restricted-psp-user 2022-05-11T18:42:02Z
system:aggregate-to-admin 2022-05-11T18:41:52Z
system:aggregate-to-edit 2022-05-11T18:41:52Z
system:aggregate-to-view 2022-05-11T18:41:52Z
system:auth-delegator 2022-05-11T18:41:52Z
system:basic-user 2022-05-11T18:41:52Z
system:certificates.k8s.io:certificatesigningrequests:nodeclient 2022-05-11T18:41:52Z
system:certificates.k8s.io:certificatesigningrequests:selfnodeclient 2022-05-11T18:41:52Z
system:certificates.k8s.io:kube-apiserver-client-approver 2022-05-11T18:41:52Z
system:certificates.k8s.io:kube-apiserver-client-kubelet-approver 2022-05-11T18:41:52Z
system:certificates.k8s.io:kubelet-serving-approver 2022-05-11T18:41:52Z
system:certificates.k8s.io:legacy-unknown-approver 2022-05-11T18:41:52Z
system:controller:attachdetach-controller 2022-05-11T18:41:52Z
system:controller:certificate-controller 2022-05-11T18:41:52Z
system:controller:clusterrole-aggregation-controller 2022-05-11T18:41:52Z
system:controller:cronjob-controller 2022-05-11T18:41:52Z
system:controller:daemon-set-controller 2022-05-11T18:41:52Z
system:controller:deployment-controller 2022-05-11T18:41:52Z
system:controller:disruption-controller 2022-05-11T18:41:52Z
system:controller:endpoint-controller 2022-05-11T18:41:52Z
system:controller:endpointslice-controller 2022-05-11T18:41:52Z
system:controller:endpointslicemirroring-controller 2022-05-11T18:41:52Z
system:controller:ephemeral-volume-controller 2022-05-11T18:41:52Z
system:controller:expand-controller 2022-05-11T18:41:52Z
system:controller:generic-garbage-collector 2022-05-11T18:41:52Z
system:controller:horizontal-pod-autoscaler 2022-05-11T18:41:52Z
system:controller:job-controller 2022-05-11T18:41:52Z
system:controller:namespace-controller 2022-05-11T18:41:52Z
system:controller:node-controller 2022-05-11T18:41:52Z
system:controller:persistent-volume-binder 2022-05-11T18:41:52Z
system:controller:pod-garbage-collector 2022-05-11T18:41:52Z
system:controller:pv-protection-controller 2022-05-11T18:41:52Z
system:controller:pvc-protection-controller 2022-05-11T18:41:52Z
system:controller:replicaset-controller 2022-05-11T18:41:52Z
system:controller:replication-controller 2022-05-11T18:41:52Z
system:controller:resourcequota-controller 2022-05-11T18:41:52Z
system:controller:root-ca-cert-publisher 2022-05-11T18:41:52Z
system:controller:route-controller 2022-05-11T18:41:52Z
system:controller:service-account-controller 2022-05-11T18:41:52Z
system:controller:service-controller 2022-05-11T18:41:52Z
system:controller:statefulset-controller 2022-05-11T18:41:52Z
system:controller:ttl-after-finished-controller 2022-05-11T18:41:52Z
system:controller:ttl-controller 2022-05-11T18:41:52Z
system:coredns 2022-05-11T18:41:54Z
system:discovery 2022-05-11T18:41:52Z
system:heapster 2022-05-11T18:41:52Z
system:kube-aggregator 2022-05-11T18:41:52Z
system:kube-controller-manager 2022-05-11T18:41:52Z
system:kube-dns 2022-05-11T18:41:52Z
system:kube-scheduler 2022-05-11T18:41:52Z
system:kubelet-api-admin 2022-05-11T18:41:52Z
system:monitoring 2022-05-11T18:41:52Z
system:node 2022-05-11T18:41:52Z
system:node-bootstrapper 2022-05-11T18:41:52Z
system:node-problem-detector 2022-05-11T18:41:52Z
system:node-proxier 2022-05-11T18:41:52Z
system:persistent-volume-provisioner 2022-05-11T18:41:52Z
system:public-info-viewer 2022-05-11T18:41:52Z
system:service-account-issuer-discovery 2022-05-11T18:41:52Z
system:volume-scheduler 2022-05-11T18:41:52Z
view 2022-05-11T18:41:52Z
{code}

Output from subcloud1:
kubectl get clusterole, kubectl get role commands failed to retrieve in subclouds

{code:java}
sw-manager kube-rootca-update-strategy show
Strategy Kubernetes RootCA Update Strategy:
  strategy-uuid: 175e1f58-66c5-4b1a-bb89-47eabac74231
  controller-apply-type: serial
  storage-apply-type: parallel
  worker-apply-type: parallel
  max-parallel-worker-hosts: 10
  default-instance-action: migrate
  alarm-restrictions: relaxed
  current-phase: abort
  current-phase-completion: 100%
  state: aborted
  apply-result: failed
  apply-reason: remote error: apiexception (403)
reason: forbidden
http response headers: httpheaderdict({'content-length': '256', 'x-content-type-options': 'nosniff', 'x-kubernetes-pf-prioritylevel-uid': 'c199c7aa-c5a4-48be-a84a-43fbc163f6fc', 'cache-control': 'no-cache, private', 'date': 'tue, 17 may 2022 21:07:23 gmt', 'x-kubernetes-pf-flowschema-uid': 'ef9c6975-7b47-4e6e-adef-c93f7b7de4b7', 'content-type': 'application/json'})
http response body: {"kind":"status","apiversion":"v1","metadata":{},"status":"failure","message":"nodes is forbidden: user \"kubernetes-admin\" cannot list resource \"nodes\" in api group \"\" at the cluster scope","reason":"forbidden","details":{"kind":"nodes"},"code":403}

[u'traceback (most recent call last):\n', u' file "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/amqp.py", line 436, in _process_data\n **args)\n', u' file "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/dispatcher.py", line 172, in dispatch\n result = getattr(proxyobj, method)(ctxt, **kwargs)\n', u' file "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 12088, in get_system_health\n alarm_ignore_list=alarm_ignore_list)\n', u' file "/usr/lib64/python2.7/site-packages/sysinv/common/health.py", line 525, in get_system_health_kube_upgrade\n alarm_ignore_list=alarm_ignore_list)\n', u' file "/usr/lib64/python2.7/site-packages/sysinv/common/health.py", line 385, in get_system_health\n success, error_nodes = self._check_kube_nodes_ready()\n', u' file "/usr/lib64/python2.7/site-packages/sysinv/common/health.py", line 217, in _check_kube_nodes_ready\n nodes = self._kube_operator.kube_get_nodes()\n', u' file "/usr/lib64/python2.7/site-packages/sysinv/common/kubernetes.py", line 296, in kube_get_nodes\n api_response = self._get_kubernetesclient_core().list_node()\n', u' file "/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 13437, in list_node\n (data) = self.list_node_with_http_info(**kwargs)\n', u' file "/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 13534, in list_node_with_http_info\n collection_formats=collection_formats)\n', u' file "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 321, in call_api\n _return_http_data_only, collection_formats, _preload_content, _request_timeout)\n', u' file "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 155, in __call_api\n _request_timeout=_request_timeout)\n', u' file "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 342, in request\n headers=headers)\n', u' file "/usr/lib/python2.7/site-packages/kubernetes/client/rest.py", line 231, in get\n query_params=query_params)\n', u' file "/usr/lib/python2.7/site-packages/kubernetes/client/rest.py", line 222, in request\n raise apiexception(http_resp=r)\n', u'apiexception: (403)\nreason: forbidden\nhttp response headers: httpheaderdict({\'content-length\': \'256\', \'x-content-type-options\': \'nosniff\', \'x-kubernetes-pf-prioritylevel-uid\': \'c199c7aa-c5a4-48be-a84a-43fbc163f6fc\', \'cache-control\': \'no-cache, private\', \'date\': \'tue, 17 may 2022 21:07:23 gmt\', \'x-kubernetes-pf-flowschema-uid\': \'ef9c6975-7b47-4e6e-adef-c93f7b7de4b7\', \'content-type\': \'application/json\'})\nhttp response body: {"kind":"status","apiversion":"v1","metadata":{},"status":"failure","message":"nodes is forbidden: user \\"kubernetes-admin\\" cannot list resource \\"nodes\\" in api group \\"\\" at the cluster scope","reason":"forbidden","details":{"kind":"nodes"},"code":403}\n\n\n']
  abort-result: success
  abort-reason:

 kubectl get nodes
Error from server (Forbidden): nodes is forbidden: User "kubernetes-admin" cannot list resource "nodes" in API group "" at the cluster scope
[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get clusterrole
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io is forbidden: User "kubernetes-admin" cannot list resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope
{code}

*+Severity+*

Provide the severity of the defect.

Minor: Failed to upgrade

*+Steps to Reproduce+*

1. Install 250 AWS subclouds
2. dcmanager kube-rootca-update-strategy create --expiry-date 2030-01-01 --max-parallel-subclouds 250 --force
dcmanager kube-rootca-update-strategy apply

*+Expected Behavior+*

Kube rootca update strategy successfully applied

*+Actual Behavior+*

Kube rootca update strategy failed to apply

*+Reproducibility+*

Reproducible

description: updated
Changed in starlingx:
assignee: nobody → Kaustubh Dhokte (kdhokte)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/845490
Committed: https://opendev.org/starlingx/config/commit/144f6fc9c5d81217ae4887711ef36236215e9426
Submitter: "Zuul (22348)"
Branch: master

commit 144f6fc9c5d81217ae4887711ef36236215e9426
Author: Kaustubh Dhokte <email address hidden>
Date: Fri Jun 10 20:30:49 2022 -0400

    Update certs spec to work with version v1

    The change https://review.opendev.org/c/starlingx/config/+/838594
    updated certificate api-version from cert-manager.io/v1alpha2 to
    cert-manager.io/v1. But did not make necessary changes to certificates
    specs to work with the new version.
    This change makes only the required changes to certificates specs to
    work with the new version: cert-manager.io/v1

    The spec organization[] should now be subject:organizations[]
    See the difference here,
    https://cert-manager.io/v0.13-docs/reference/api-docs/#cert-manager.io/v1alpha2.Certificate
     and https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.CertificateSpec

    The organization 'system:masters' in the admin.conf certificate is
    required to authorize the access for kubernetes-admin to cluster objects.
    This authorization is specified in the 'cluster-admin'
    clusterrolebinding. Without this change, all kubectl commands fail.

    In v1, unlike in v1alpha2, CN is ignored by TLS clients during
    authorization (https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.CertificateSpec)
    if any subject alt name is set. My initial understanding here was that
    the CN field value is being ignored due to
    subject:organizations:['system:masters'] (in v1), as all the deployment
    and daemonset pods were failing after "system kube-rootca-pods-update
    --phase=trust-new-ca" (during rootCA update) with an authorization error
    for the user 'kube-apiserver-kubelet-client'.
    This forces the removal of organizations from the apiserver kubelet
    client certificate as all deployments and daemonset pods authenticate
    and authorize with the 'kube-apiserver-kubelet-client' user.

    Without 'system:nodes' in the kubelet client certificate,
    kube-scheduler and kube-controller-manager fail to authorize.
    More Info: https://kubernetes.io/docs/reference/access-authn-authz/node/

    Test Plan:
    On CentOS AIO-SX:
    PASS: Manual kubernetes RootCA update successful
    PASS: Orchestrated kubernetes RootCA update successful.
    PASS: All deployments, daemonsets and pods running as expected after
          RootCA update.

    Closes-Bug: 1978365

    Signed-off-by: Kaustubh Dhokte <email address hidden>
    Change-Id: I767a70a07ab540510e4eb734cb4e282c9918840c

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.7.0 stx.security
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.