k8s 1.16: openstack-cloud-controller-manager pod stuck in CrashLoopBackOff

Bug #1845231 reported by Joshua Genet
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
CDK Addons
Fix Released
High
Cory Johns

Bug Description

Our fresh deploy of 1.16 on Serverstack is timing out waiting for 8 kube-system pods to start.
The openstack-cloud-controller-manager pod is getting stuck in CrashLoopBackOff.

ubuntu@juju-99b6a7-kubernetes-9:~$ sudo kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-568cb7d86-6zmqg 0/1 Pending 0 167m
coredns-568cb7d86-qlg4f 0/1 Pending 0 167m
csi-cinder-controllerplugin-0 0/4 Pending 0 167m
heapster-v1.6.0-beta.1-59878976bc-m5dtr 0/4 Pending 0 155m
heapster-v1.6.0-beta.1-6747db6947-nrz79 0/4 Pending 0 159m
kubernetes-dashboard-68d79745d4-hxz98 0/1 Pending 0 167m
metrics-server-v0.3.4-589dbbf5f8-hkwrf 0/2 Pending 0 167m
monitoring-influxdb-grafana-v4-b6484f9bb-65mrm 0/2 Pending 0 167m
openstack-cloud-controller-manager-dfk54 0/1 CrashLoopBackOff 35 156m
openstack-cloud-controller-manager-sqdrg 0/1 CrashLoopBackOff 50 167m
openstack-cloud-controller-manager-xprjn 0/1 CrashLoopBackOff 39 159m

This only happens in 1.16. We've had passes with our 1.15 bundle. The only difference between the bundles is the charm revisions.

Our run can be found here:
https://solutions.qa.canonical.com/#/qa/testRun/b8932922-c8d9-4032-84fc-7ed1eded2672

The artifacts and logs can be found here:
https://oil-jenkins.canonical.com/artifacts/b8932922-c8d9-4032-84fc-7ed1eded2672/index.html

Revision history for this message
Cory Johns (johnsca) wrote :

This seems to be an upstream issue[1] with external cloud providers pulling in too much logic from the in-tree code causing it to try to do unnecessary authentication checks and failing when RBAC is enabled with:

W0924 05:33:34.037050 1 authentication.go:262] Unable to get configmap/extension-apiserver-authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLEBINDING_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

The Digital Ocean folks hit this[2] and came up with the workaround to add --authentication-skip-lookup=true to the CCM pod's args, which seems reasonable until the upstream code is refactored.

[1]: https://github.com/kubernetes/cloud-provider/issues/29
[2]: https://github.com/digitalocean/digitalocean-cloud-controller-manager/issues/217

Revision history for this message
Cory Johns (johnsca) wrote :
Changed in charm-kubernetes-master:
assignee: nobody → Cory Johns (johnsca)
status: New → In Progress
importance: Undecided → High
no longer affects: charm-kubernetes-master
Changed in cdk-addons:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Cory Johns (johnsca)
milestone: none → 1.16
Joshua Genet (genet022)
tags: added: cdo-release-blocker
Cory Johns (johnsca)
Changed in cdk-addons:
status: In Progress → Fix Committed
Changed in cdk-addons:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.