k8s 1.16: openstack-cloud-controller-manager pod stuck in CrashLoopBackOff
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
CDK Addons |
Fix Released
|
High
|
Cory Johns |
Bug Description
Our fresh deploy of 1.16 on Serverstack is timing out waiting for 8 kube-system pods to start.
The openstack-
ubuntu@
NAME READY STATUS RESTARTS AGE
coredns-
coredns-
csi-cinder-
heapster-
heapster-
kubernetes-
metrics-
monitoring-
openstack-
openstack-
openstack-
This only happens in 1.16. We've had passes with our 1.15 bundle. The only difference between the bundles is the charm revisions.
Our run can be found here:
https:/
The artifacts and logs can be found here:
https:/
Changed in charm-kubernetes-master: | |
assignee: | nobody → Cory Johns (johnsca) |
status: | New → In Progress |
importance: | Undecided → High |
no longer affects: | charm-kubernetes-master |
Changed in cdk-addons: | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Cory Johns (johnsca) |
milestone: | none → 1.16 |
tags: | added: cdo-release-blocker |
Changed in cdk-addons: | |
status: | In Progress → Fix Committed |
Changed in cdk-addons: | |
status: | Fix Committed → Fix Released |
This seems to be an upstream issue[1] with external cloud providers pulling in too much logic from the in-tree code causing it to try to do unnecessary authentication checks and failing when RBAC is enabled with:
W0924 05:33:34.037050 1 authentication. go:262] Unable to get configmap/ extension- apiserver- authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLEBINDING_NAME --role= extension- apiserver- authentication- reader --serviceaccoun t=YOUR_ NS:YOUR_ SA' apiserver- authentication" is forbidden: User "system: serviceaccount: kube-system: cloud-controlle r-manager" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
configmaps "extension-
The Digital Ocean folks hit this[2] and came up with the workaround to add --authenticatio n-skip- lookup= true to the CCM pod's args, which seems reasonable until the upstream code is refactored.
[1]: https:/ /github. com/kubernetes/ cloud-provider/ issues/ 29 /github. com/digitalocea n/digitalocean- cloud-controlle r-manager/ issues/ 217
[2]: https:/