kube-controller-manager crashes due to 5 duplicate certs in ca.crt

Bug #1926817 reported by Alex Zero
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
magnum (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

On Kubernetes deployments with cert_manager_api enabled, the certificate container fails to come up because there are five duplicate public certificates in /etc/kubernetes/certs/ca.crt, this causes the service to crash:

Apr 30 23:44:21 k8s-prod-24e2ug52zqb4-master-0 bash[153677]: I0430 23:44:21.416326 1 job_controller.go:144] Starting job controller
Apr 30 23:44:21 k8s-prod-24e2ug52zqb4-master-0 bash[153677]: I0430 23:44:21.416470 1 shared_informer.go:223] Waiting for caches to sync for job
Apr 30 23:44:21 k8s-prod-24e2ug52zqb4-master-0 bash[153677]: I0430 23:44:21.429543 1 dynamic_serving_content.go:111] Loaded a new cert/key pair for "csr-controller::/etc/kubernetes/certs/ca.crt::/etc/kubernetes/certs/ca.key"
Apr 30 23:44:21 k8s-prod-24e2ug52zqb4-master-0 bash[153677]: E0430 23:44:21.430347 1 controllermanager.go:521] Error starting "csrsigning"
Apr 30 23:44:21 k8s-prod-24e2ug52zqb4-master-0 bash[153677]: F0430 23:44:21.430532 1 controllermanager.go:235] error starting controllers: failed to start certificate controller: error reading CA cert file "csr-controller::/etc/kubernetes/certs/ca.crt::/etc/kubernetes/certs/ca.key": expected 1 certificate, found 5
Apr 30 23:44:21 k8s-prod-24e2ug52zqb4-master-0 podman[153677]: 2021-04-30 23:44:21.470875534 +0000 UTC m=+49.221008858 container died df7295074c1b7cbef19a79e6c8741b9dfbcb4fd608863978fb5924de8946ba05 (image=k8s.gcr.io/hyperkube:v1.18.2, name=kube-controller-manager)
Apr 30 23:44:21 k8s-prod-24e2ug52zqb4-master-0 systemd[1]: kube-controller-manager.service: Main process exited, code=exited, status=255/EXCEPTION
Apr 30 23:44:21 k8s-prod-24e2ug52zqb4-master-0 systemd[1]: kube-controller-manager.service: Failed with result 'exit-code'.
Apr 30 23:44:31 k8s-prod-24e2ug52zqb4-master-0 systemd[1]: kube-controller-manager.service: Scheduled restart job, restart counter is at 456.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in magnum (Ubuntu):
status: New → Confirmed
Revision history for this message
Till Plüer (tplueer) wrote (last edit ):

I had the same problem, which caused the entire cluster creation to fail since it blocks the installing the network plugin and occm.

The problem is probably inside the generate_certificates function in make-cert-client.sh / make-cert.sh

The function gets executed 5 times for server, kubelet, admin and proxy and every time the ca certificate is appended to ca.crt

curl $VERIFY_CA -X GET \
        -H "X-Auth-Token: $USER_TOKEN" \
        -H "OpenStack-API-Version: container-infra latest" \
        $MAGNUM_URL/certificates/$CLUSTER_UUID | python -c 'import sys, json; print(json.load(sys.stdin)["pem"])' >> $CA_CERT

As a workaround, I just deleted the duplicate certificates inside ca.crt and restarted the kube-controller-manager. After that, the cluster created successfully

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.