Kubernetes master fails to start the pods when using calico

Bug #1854520 reported by Alexander Balderson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Invalid
High
George Kraft

Bug Description

All Solutions-QA runs have been failing when trying to start the pods since we moved to calico.

All other units come up normally, and nothing is standing out to me in the logs.

Crashdump is attached.

Revision history for this message
Alexander Balderson (asbalderson) wrote :
George Kraft (cynerva)
Changed in charm-kubernetes-master:
assignee: nobody → George Kraft (cynerva)
tags: added: cdo-qa foundations-engine
Revision history for this message
Alexander Balderson (asbalderson) wrote :

subscribing field-high since this is blocking solutions-qa testing of calico

Revision history for this message
George Kraft (cynerva) wrote :

Kubelet is failing to start pods with:

Error syncing pod eb88b735-ce2c-4b1d-ab97-24390482bada ("coredns-568cb7d86-x5v7c_kube-system(eb88b735-ce2c-4b1d-ab97-24390482bada)"), skipping: failed to "CreatePodSandbox" for "coredns-568cb7d86-x5v7c_kube-system(eb88b735-ce2c-4b1d-ab97-24390482bada)" with CreatePodSandboxError: "CreatePodSandbox for pod \"coredns-568cb7d86-x5v7c_kube-system(eb88b735-ce2c-4b1d-ab97-24390482bada)\" failed: rpc error: code = Unknown desc = failed to setup network for sandbox \"a7d821e0e598a2d78c38c115dc804f9e839cfb393ee3e99f35b0b0125fd3e473\": Get https://10.5.0.7:443/api/v1/namespaces/kube-system: Service Unavailable"

That's the IP for kubeapi-load-balancer/0. I'm not seeing that request in the kubeapi-load-balancer/0 nginx logs, which tells me that traffic is not going where it's supposed to - possibly a proxy or firewall issue.

I haven't been able to reproduce this. I need more information. How are the applications configured? That information isn't included in the crashdump, see https://github.com/juju/juju-crashdump/issues/50 and a proposed fix here: https://github.com/juju/juju-crashdump/pull/52

Changed in charm-kubernetes-master:
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Alexander Balderson (asbalderson) wrote :

Thanks for the proposal!

I'm attaching the bundle from the deploy. It is basically our standard daily run of kubernetes on serverstack but flannel was replaced with calico. The flannel based run passes every night whereas the calcio based run fails.

Changed in charm-kubernetes-master:
status: Incomplete → New
Revision history for this message
George Kraft (cynerva) wrote :

Ah, thanks. I suspect it has something to do with proxy configuration on containerd. I'll have another go at reproducing this and get back to you.

Changed in charm-kubernetes-master:
status: New → In Progress
Revision history for this message
George Kraft (cynerva) wrote :

I can reproduce this by deploying cs:~containers/kubernetes-calico with http_proxy and https_proxy set. If I add the local network to no_proxy, though, then it works.

Can you try adding 10.5.0.0/24 (or whatever your actual subnet is) to containerd's no_proxy config and see if that resolves your problem?

George Kraft (cynerva)
Changed in charm-kubernetes-master:
status: In Progress → Incomplete
Revision history for this message
Alexander Balderson (asbalderson) wrote :

That did seem to resolve the issue
thanks for the help

Revision history for this message
George Kraft (cynerva) wrote :

Cool, thanks for the follow-up!

Changed in charm-kubernetes-master:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.