Completed pods in kube-system namespace cause "Waiting for X kube-system pods to start" status

Bug #1903055 reported by Diko Parvanov
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Fix Released
Medium
Mateo Florido

Bug Description

kubernetes-master/0 waiting idle 0 <MASKED> 6443/tcp Waiting for 7 kube-system pods to start
kubernetes-master/1 waiting idle 1 <MASKED> 6443/tcp Waiting for 7 kube-system pods to start
kubernetes-master/2* waiting idle 2 <MASKED> 6443/tcp Waiting for 7 kube-system pods to start

DSV MODEL(kubernetes) jujumanage@jumphost-1:~$ kubectl -n kube-system get all
NAME READY STATUS RESTARTS AGE
pod/calico-kube-controllers-7b8948df7f-7vnb6 1/1 Running 0 40d
pod/coredns-6b59b8bd9f-v577w 1/1 Running 0 3d4h
pod/kube-state-metrics-69f474f8cb-74w9k 1/1 Running 0 40d
pod/metrics-server-v0.3.6-74c87686d-lgw4v 2/2 Running 0 13d
pod/node-shell-03efe4df-013a-46f8-a344-7f997ed00de7 0/1 Completed 0 4d7h
pod/node-shell-263d1e01-a845-4fe6-bc05-674ce912c1dc 0/1 Completed 0 4d7h
pod/node-shell-2d69cbcf-5525-4713-97a4-bf9820abde49 0/1 Completed 0 10d
pod/node-shell-41356d3f-7fc4-46e1-87a4-a3f6f6cceb26 0/1 Completed 0 4d7h
pod/node-shell-588c0c38-d3a6-426d-94b7-96f46371c0fc 0/1 Completed 0 16d
pod/node-shell-ab38b3fe-724c-4b2b-9ff2-8ed3abd2bfc0 0/1 Completed 0 4d7h
pod/node-shell-d7fd97a0-17ed-4393-86fa-c05de079e1b4 0/1 Completed 0 4d7h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP <MASKED> <none> 53/UDP,53/TCP,9153/TCP 40d
service/kube-state-metrics ClusterIP None <none> 8080/TCP,8081/TCP 40d
service/metrics-server ClusterIP <MASKED> <none> 443/TCP 13d

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/calico-kube-controllers 1/1 1 1 40d
deployment.apps/coredns 1/1 1 1 40d
deployment.apps/kube-state-metrics 1/1 1 1 40d
deployment.apps/metrics-server-v0.3.6 1/1 1 1 13d

NAME DESIRED CURRENT READY AGE
replicaset.apps/calico-kube-controllers-7b8948df7f 1 1 1 40d
replicaset.apps/coredns-6b59b8bd9f 1 1 1 40d
replicaset.apps/kube-state-metrics-69f474f8cb 1 1 1 40d
replicaset.apps/metrics-server-v0.3.6-74c87686d 1 1 1 13d

The node-shell pods are 'Completed' and should not be checked as far as I can see.

kubectl -n kube-system delete pod/node-shell-* fixed this.

Diko Parvanov (dparv)
description: updated
Revision history for this message
George Kraft (cynerva) wrote :

Looks like this could be easily reproduced by creating a Job in the kube-system namespace.

Seems like "Completed" should be added as an acceptable state here: https://github.com/charmed-kubernetes/charm-kubernetes-master/blob/93883d785a5e6394e2de133bc52164aa74695fd5/reactive/kubernetes_master.py#L2473-L2477

summary: - kubernetes-master units in Waiting for X kube-system pods to start
+ Completed pods in kube-system namespace cause "Waiting for X kube-system
+ pods to start" status
Changed in charm-kubernetes-master:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
David van der Spek (vanderspek-david) wrote :

I am also experiencing this as I am using the descheduler-cronjob which is placed in the kube-system namespace so it does not influence itself.

Revision history for this message
flibio (flibionacci) wrote :

Hi Please Help me,
I am getting same error, how can I deploy OLDER version of aws-integrator?

I know I can deploy older version of charmed-kubernetes using "juju deploy charmed-kubernetes-<version_nos> --overlay ~/aws-overlay.yaml --trust"

does this command also deploy the corresponding earlier version of aws-integrator as well?

thanks a lot

Revision history for this message
Drew Freiberger (afreiberger) wrote :

Thanks for the workaround in this bug, Diko.

In my instance, this was hanging up on:

ingress-nginx-admission-create-kkm8v 0/1 Completed 0 14d

I think it's maybe better for the kubernetes-master to delete Completed pods that it creates for single-run tasks during this check step than to allow for Completed state.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

Another thought. Does this mean that all kube-system pods are checked for sanity by kubernetes-master, even if they're deployed by the cloud-admin instead of the kubernetes-master charm? This seems like a potential oversight, as cluster admins may want to deploy things into kube-system namespace for the access rights granted in the namespace, but not want kubernetes-master charm to own the status of the pods.

Changed in charm-kubernetes-master:
assignee: nobody → Mateo Florido (mateoflorido)
Revision history for this message
Mateo Florido (mateoflorido) wrote :
Changed in charm-kubernetes-master:
status: Triaged → Fix Committed
Changed in charm-kubernetes-master:
milestone: none → 1.27
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.