When machines have no default route, Kubernetes service IPs are unreachable with "network is unreachable"

Bug #1915357 reported by Narinder Gupta
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Incomplete
Undecided
Unassigned

Bug Description

We did the airgap deployment and deployment was fine no error in juju status and images were pulled. But none of the pods can reach to cluster API.

2021-02-11 00:32:09.266 [INFO][1] main.go 87: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node,policy,namespace,workloadendpoint,serviceaccount", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"etcdv3"}
I0211 00:32:09.267422 1 client.go:352] parsed scheme: ""
I0211 00:32:09.267439 1 client.go:352] scheme "" not registered, fallback to default scheme
I0211 00:32:09.267474 1 passthrough.go:48] ccResolverWrapper: sending new addresses to cc: [{10.77.195.109:2379 0 <nil>}]
I0211 00:32:09.267547 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.77.195.109:2379 <nil>} {10.77.195.125:2379 <nil>} {10.77.195.77:2379 <nil>}]
W0211 00:32:09.281379 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0211 00:32:09.281402 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.77.195.77:2379 <nil>}]
2021-02-11 00:32:09.282 [INFO][1] main.go 108: Ensuring Calico datastore is initialized
2021-02-11 00:32:09.295 [INFO][1] watchersyncer.go 89: Start called
2021-02-11 00:32:09.296 [INFO][1] watchersyncer.go 127: Sending status update Status=wait-for-ready
2021-02-11 00:32:09.296 [INFO][1] main.go 182: Starting status report routine
2021-02-11 00:32:09.296 [INFO][1] node_syncer.go 39: Node controller syncer status updated: wait-for-ready
2021-02-11 00:32:09.296 [INFO][1] main.go 364: Starting controller ControllerType="NetworkPolicy"
2021-02-11 00:32:09.296 [INFO][1] watchersyncer.go 147: Starting main event processing loop
2021-02-11 00:32:09.296 [INFO][1] main.go 364: Starting controller ControllerType="Namespace"
2021-02-11 00:32:09.296 [INFO][1] namespace_controller.go 155: Starting Namespace/Profile controller
2021-02-11 00:32:09.296 [INFO][1] main.go 364: Starting controller ControllerType="Pod"
2021-02-11 00:32:09.296 [INFO][1] main.go 364: Starting controller ControllerType="ServiceAccount"
2021-02-11 00:32:09.296 [INFO][1] policy_controller.go 146: Starting NetworkPolicy controller
2021-02-11 00:32:09.296 [INFO][1] main.go 364: Starting controller ControllerType="Node"
2021-02-11 00:32:09.296 [INFO][1] serviceaaccount_controller.go 149: Starting ServiceAccount/Profile controller
2021-02-11 00:32:09.297 [INFO][1] pod_controller.go 196: Starting Pod/WorkloadEndpoint controller
2021-02-11 00:32:09.297 [INFO][1] node_controller.go 133: Starting Node controller
I0211 00:32:09.297561 1 client.go:352] parsed scheme: ""
E0211 00:32:09.297564 1 reflector.go:125] <email address hidden>+incompatible/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://192.168.183.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 192.168.183.1:443: connect: network is unreachable
I0211 00:32:09.297590 1 client.go:352] scheme "" not registered, fallback to default scheme
I0211 00:32:09.297629 1 passthrough.go:48] ccResolverWrapper: sending new addresses to cc: [{10.77.195.109:2379 0 <nil>}]
E0211 00:32:09.297734 1 reflector.go:125] <email address hidden>+incompatible/tools/cache/reflector.go:98: Failed to list *v1.NetworkPolicy: Get https://192.168.183.1:443/apis/networking.k8s.io/v1/networkpolicies?limit=500&resourceVersion=0: dial tcp 192.168.183.1:443: connect: network is unreachable
I0211 00:32:09.297770 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.77.195.109:2379 <nil>} {10.77.195.125:2379 <nil>} {10.77.195.77:2379 <nil>}]
E0211 00:32:09.297960 1 reflector.go:125] <email address hidden>+incompatible/tools/cache/reflector.go:98: Failed to list *v1.ServiceAccount: Get https://192.168.183.1:443/api/v1/serviceaccounts?limit=500&resourceVersion=0: dial tcp 192.168.183.1:443: connect: network is unreachable
E0211 00:32:09.298034 1 reflector.go:125] <email address hidden>+incompatible/tools/cache/reflector.go:98: Failed to list *v1.Node: Get https://192.168.183.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 192.168.183.1:443: connect: network is unreachable
2021-02-11 00:32:09.299 [INFO][1] watchercache.go 289: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2021-02-11 00:32:09.299 [INFO][1] watchersyncer.go 127: Sending status update Status=resync
2021-02-11 00:32:09.299 [INFO][1] node_syncer.go 39: Node controller syncer status updated: resync
2021-02-11 00:32:09.299 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2021-02-11 00:32:09.299 [INFO][1] watchersyncer.go 221: All watchers have sync'd data - sending data and final sync
2021-02-11 00:32:09.300 [INFO][1] watchersyncer.go 127: Sending status update Status=in-sync
2021-02-11 00:32:09.300 [INFO][1] node_syncer.go 39: Node controller syncer status updated: in-sync
2021-02-11 00:32:09.301 [ERROR][1] main.go 233: Failed to reach apiserver error=<nil>
E0211 00:32:09.301935 1 reflector.go:125] <email address hidden>+incompatible/tools/cache/reflector.go:98: Failed to list *v1.Pod: Get https://192.168.183.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 192.168.183.1:443: connect: network is unreachable

Revision history for this message
Narinder Gupta (narindergupta) wrote :

attaching the juju crash dump.

Revision history for this message
George Kraft (cynerva) wrote :

The attached crash dump is missing important information. Please run juju-crashdump again with:

sudo snap install juju-crashdump --classic --channel edge
juju-crashdump -a debug-layer -a config

and attach the resulting tarball.

Changed in charm-kubernetes-master:
status: New → Incomplete
Revision history for this message
Narinder Gupta (narindergupta) wrote :

I think I see what's going on. Yes, this is happening because there's no default route.

The routing for the 192.168.183.0/24 subnet is handled by kube-proxy, which creates iptables rules to handle the forwarding traffic. In normal clusters, the initial request will be sent to 192.168.183.1:443, which will get processed by iptables and forwarded to <master-ip>:6443.

The problem here is that there is no route to 192.168.183.0/24, so the request fails before it even gets processed by iptables.

If you can't set up a default route, then you might be able to make it work by creating a dummy route for 192.168.183.0/24 or something like that.

This is very much unexplored/untested territory, we may not be able to offer much assistance

12:47 PM
You would need this route on all masters and workers.

Revision history for this message
Narinder Gupta (narindergupta) wrote :

George Kraft (cynerva)

I think I see what's going on. Yes, this is happening because there's no default route.

The routing for the 192.168.183.0/24 subnet is handled by kube-proxy, which creates iptables rules to handle the forwarding traffic. In normal clusters, the initial request will be sent to 192.168.183.1:443, which will get processed by iptables and forwarded to <master-ip>:6443.

The problem here is that there is no route to 192.168.183.0/24, so the request fails before it even gets processed by iptables.

If you can't set up a default route, then you might be able to make it work by creating a dummy route for 192.168.183.0/24 or something like that.

sudo ip route add 192.168.183.0/24 dev lo resolves this in my environment.

George Kraft (cynerva)
summary: - Failed to list *v1.NetworkPolicy: network is unreachable
+ When machines have no default route, Kubernetes service IPs are
+ unreachable with "network is unreachable"
Changed in charm-kubernetes-master:
status: Incomplete → New
Revision history for this message
George Kraft (cynerva) wrote :

This scenario where machines have no default route seems particularly unusual and is not what we typically expect in an offline deployment. It's not clear to us that this is a use case we should support.

I'm marking this as "Incomplete" for now. If you think there is a valid use case for this kind of environment, please describe the use case to us and we'll re-evaluate the bug accordingly.

Changed in charm-kubernetes-master:
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.