baremetal + kube_ovn: k8s-cp stuck waiting for 46 pods to start

Bug #2006957 reported by Alexander Balderson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Fix Released
High
Mateo Florido

Bug Description

On a baremetal deployment of charmed kuberentes using kube_ovn as the cni, all 3 k8s cp's are stuck waiting for 46 pods to start. All the charms are running latest/sable and are on the k8s 1.26 snap channel. This is running in our standard environment with no changes, where we see passes regularly.

From the logs in /var/log/pods i see a few notes about ovn-central pods failing to find a leader, or connect to any meaningful addresses, but the since there are logs, id assume the pod is actually up?

The logs we collect from kubectl after the run shows that there are 138 pods across all the k8s-cp's, this makes sense with 46 pods on 3 k8s cp-s, but im unsure why the pods are failing to start.

the testrun can be found at:
https://solutions.qa.canonical.com/v2/testruns/5ed81b47-3413-44ba-8207-d16cc4afb030/
bundle at:
https://solutions.qa.canonical.com/v2/testruns/5ed81b47-3413-44ba-8207-d16cc4afb030/
and crashdump at:
https://oil-jenkins.canonical.com/artifacts/5ed81b47-3413-44ba-8207-d16cc4afb030/generated/generated/kubernetes-maas/juju-crashdump-kubernetes-maas-2023-02-10-01.34.49.tar.gz

Revision history for this message
George Kraft (cynerva) wrote :

I'm still looking into this. It's a bizarre one. There are 23 ovn-central pods and 29 kube-ovn-monitor pods. There should only be 3 of each. The extra pods are failing with:

Status: Failed
Reason: NodeAffinity
Message: Pod was rejected: Predicate NodeAffinity failed

These pods have anti-affinity policies that prevent more than 1 of each from being placed on any given node. The failure makes sense, but it does not make sense why there are so many pods. The charm correctly requested 3 replicas.

From the kube-controller-manager logs, it looks like the ReplicaSet controller spam-created pods during a 10 second window in which one of the control-plane nodes was NotReady. The extra pods were never deleted.

I think I should be able to reproduce this by forcing a kubernetes-control-plane node into a NotReady state during kube-ovn deployment. Let me give that a try.

Revision history for this message
George Kraft (cynerva) wrote :

I'm not having any luck reproducing this. Under normal conditions, if there are too many pods, the NodeAffinity error doesn't occur because kube-scheduler is smart enough not to schedule multiple conflicting pods onto the same node. Somehow, this deployment hit a corner case where kube-scheduler thought that the node had room for a new pod, but kubelet thought that it had a conflict.

Revision history for this message
George Kraft (cynerva) wrote :

Regardless of how the deployment got here, I think the kubernetes-control-plane charm either should not block on the Failed pods (which will not retry and will not be cleaned up by Kubernetes), or it should clean up Failed pods on behalf of the user.

Changed in charm-kubernetes-master:
importance: Undecided → Critical
importance: Critical → High
status: New → Triaged
Revision history for this message
Alexander Balderson (asbalderson) wrote :

Thanks for the triage George,

I can say that we've only seen this one time, so something weird happened for sure and I'm not certain we can reproduce it either, but ill let you know if we see it more.

Changed in charm-kubernetes-master:
assignee: nobody → Mateo Florido (mateoflorido)
Revision history for this message
Mateo Florido (mateoflorido) wrote :
Changed in charm-kubernetes-master:
status: Triaged → Fix Committed
Changed in charm-kubernetes-master:
milestone: none → 1.27
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.