Kubernetes Control Plane Charm

When machines have no default route, Kubernetes service IPs are unreachable with "network is unreachable"

Bug #1915357 reported by Narinder Gupta on 2021-02-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Kubernetes Control Plane Charm	Incomplete	Undecided	Unassigned

Bug Description

We did the airgap deployment and deployment was fine no error in juju status and images were pulled. But none of the pods can reach to cluster API.

2021-02-11 00:32:09.266 [INFO][1] main.go 87: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node,policy,namespace,workloadendpoint,serviceaccount", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"etcdv3"}
I0211 00:32:09.267422 1 client.go:352] parsed scheme: ""
I0211 00:32:09.267439 1 client.go:352] scheme "" not registered, fallback to default scheme
I0211 00:32:09.267474 1 passthrough.go:48] ccResolverWrapper: sending new addresses to cc: [{10.77.195.109:2379 0 <nil>}]
I0211 00:32:09.267547 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.77.195.109:2379 <nil>} {10.77.195.125:2379 <nil>} {10.77.195.77:2379 <nil>}]
W0211 00:32:09.281379 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0211 00:32:09.281402 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.77.195.77:2379 <nil>}]
2021-02-11 00:32:09.282 [INFO][1] main.go 108: Ensuring Calico datastore is initialized
2021-02-11 00:32:09.295 [INFO][1] watchersyncer.go 89: Start called
2021-02-11 00:32:09.296 [INFO][1] watchersyncer.go 127: Sending status update Status=wait-for-ready
2021-02-11 00:32:09.296 [INFO][1] main.go 182: Starting status report routine
2021-02-11 00:32:09.296 [INFO][1] node_syncer.go 39: Node controller syncer status updated: wait-for-ready
2021-02-11 00:32:09.296 [INFO][1] main.go 364: Starting controller ControllerType="NetworkPolicy"
2021-02-11 00:32:09.296 [INFO][1] watchersyncer.go 147: Starting main event processing loop
2021-02-11 00:32:09.296 [INFO][1] main.go 364: Starting controller ControllerType="Namespace"
2021-02-11 00:32:09.296 [INFO][1] namespace_controller.go 155: Starting Namespace/Profile controller
2021-02-11 00:32:09.296 [INFO][1] main.go 364: Starting controller ControllerType="Pod"
2021-02-11 00:32:09.296 [INFO][1] main.go 364: Starting controller ControllerType="ServiceAccount"
2021-02-11 00:32:09.296 [INFO][1] policy_controller.go 146: Starting NetworkPolicy controller
2021-02-11 00:32:09.296 [INFO][1] main.go 364: Starting controller ControllerType="Node"
2021-02-11 00:32:09.296 [INFO][1] serviceaaccount_controller.go 149: Starting ServiceAccount/Profile controller
2021-02-11 00:32:09.297 [INFO][1] pod_controller.go 196: Starting Pod/WorkloadEndpoint controller
2021-02-11 00:32:09.297 [INFO][1] node_controller.go 133: Starting Node controller
I0211 00:32:09.297561 1 client.go:352] parsed scheme: ""
E0211 00:32:09.297564 1 reflector.go:125] <email address hidden>+incompatible/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://192.168.183.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 192.168.183.1:443: connect: network is unreachable
I0211 00:32:09.297590 1 client.go:352] scheme "" not registered, fallback to default scheme
I0211 00:32:09.297629 1 passthrough.go:48] ccResolverWrapper: sending new addresses to cc: [{10.77.195.109:2379 0 <nil>}]
E0211 00:32:09.297734 1 reflector.go:125] <email address hidden>+incompatible/tools/cache/reflector.go:98: Failed to list *v1.NetworkPolicy: Get https://192.168.183.1:443/apis/networking.k8s.io/v1/networkpolicies?limit=500&resourceVersion=0: dial tcp 192.168.183.1:443: connect: network is unreachable
I0211 00:32:09.297770 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.77.195.109:2379 <nil>} {10.77.195.125:2379 <nil>} {10.77.195.77:2379 <nil>}]
E0211 00:32:09.297960 1 reflector.go:125] <email address hidden>+incompatible/tools/cache/reflector.go:98: Failed to list *v1.ServiceAccount: Get https://192.168.183.1:443/api/v1/serviceaccounts?limit=500&resourceVersion=0: dial tcp 192.168.183.1:443: connect: network is unreachable
E0211 00:32:09.298034 1 reflector.go:125] <email address hidden>+incompatible/tools/cache/reflector.go:98: Failed to list *v1.Node: Get https://192.168.183.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 192.168.183.1:443: connect: network is unreachable
2021-02-11 00:32:09.299 [INFO][1] watchercache.go 289: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2021-02-11 00:32:09.299 [INFO][1] watchersyncer.go 127: Sending status update Status=resync
2021-02-11 00:32:09.299 [INFO][1] node_syncer.go 39: Node controller syncer status updated: resync
2021-02-11 00:32:09.299 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2021-02-11 00:32:09.299 [INFO][1] watchersyncer.go 221: All watchers have sync'd data - sending data and final sync
2021-02-11 00:32:09.300 [INFO][1] watchersyncer.go 127: Sending status update Status=in-sync
2021-02-11 00:32:09.300 [INFO][1] node_syncer.go 39: Node controller syncer status updated: in-sync
2021-02-11 00:32:09.301 [ERROR][1] main.go 233: Failed to reach apiserver error=<nil>
E0211 00:32:09.301935 1 reflector.go:125] <email address hidden>+incompatible/tools/cache/reflector.go:98: Failed to list *v1.Pod: Get https://192.168.183.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 192.168.183.1:443: connect: network is unreachable

Revision history for this message

Narinder Gupta (narindergupta) wrote on 2021-02-11:

attaching the juju crash dump Edit (3.3 MiB, application/x-tar)

attaching the juju crash dump.

Revision history for this message

George Kraft (cynerva) wrote on 2021-02-11:

The attached crash dump is missing important information. Please run juju-crashdump again with:

sudo snap install juju-crashdump --classic --channel edge
juju-crashdump -a debug-layer -a config

and attach the resulting tarball.

Changed in charm-kubernetes-master:
status:	New → Incomplete

Revision history for this message

Narinder Gupta (narindergupta) wrote on 2021-02-11:

I think I see what's going on. Yes, this is happening because there's no default route.

The routing for the 192.168.183.0/24 subnet is handled by kube-proxy, which creates iptables rules to handle the forwarding traffic. In normal clusters, the initial request will be sent to 192.168.183.1:443, which will get processed by iptables and forwarded to <master-ip>:6443.

The problem here is that there is no route to 192.168.183.0/24, so the request fails before it even gets processed by iptables.

If you can't set up a default route, then you might be able to make it work by creating a dummy route for 192.168.183.0/24 or something like that.

This is very much unexplored/untested territory, we may not be able to offer much assistance

12:47 PM
You would need this route on all masters and workers.

Revision history for this message

Narinder Gupta (narindergupta) wrote on 2021-02-11:

George Kraft (cynerva)

I think I see what's going on. Yes, this is happening because there's no default route.

The problem here is that there is no route to 192.168.183.0/24, so the request fails before it even gets processed by iptables.

If you can't set up a default route, then you might be able to make it work by creating a dummy route for 192.168.183.0/24 or something like that.

sudo ip route add 192.168.183.0/24 dev lo resolves this in my environment.

George Kraft (cynerva) on 2021-03-05

summary:	- Failed to list *v1.NetworkPolicy: network is unreachable + When machines have no default route, Kubernetes service IPs are + unreachable with "network is unreachable"
Changed in charm-kubernetes-master:
status:	Incomplete → New

Revision history for this message

George Kraft (cynerva) wrote on 2021-03-08:

This scenario where machines have no default route seems particularly unusual and is not what we typically expect in an offline deployment. It's not clear to us that this is a use case we should support.

I'm marking this as "Incomplete" for now. If you think there is a valid use case for this kind of environment, please describe the use case to us and we'll re-evaluate the bug accordingly.

Changed in charm-kubernetes-master:
status:	New → Incomplete

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

attaching the juju crash dump Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.