API server cannot connect to Metrics service

Bug #1870590 reported by Elvinas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Triaged
Medium
Unassigned
Kubernetes Worker Charm
Triaged
Medium
Unassigned

Bug Description

Removed old deployment and did a fresh install just to be sure.

Cloud environment: Ubuntu MaaS using KVM
Juju: 2.7.5

Used bundle:
https://jaas.ai/u/containers/kubernetes-calico
Reduced number of nodes and limited CPU/RAM so all environment can fit on a single host

Deployment completed successfully and Kubernetes is operational, i.e. I can do deployments.

However "kubectl top" does not work. After half a day of reading manuals, github issues I have stumbled upon the following command: kubectl describe apiservice v1beta1.metrics.k8s.io

In the output I see:
-----------------------------------------------
Status:
  Conditions:
    Last Transition Time: 2020-04-03T14:53:31Z
    Message: failing or missing response from https://192.168.66.64:443/apis/metrics.k8s.io/v1beta1: Get https://192.168.66.64:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Reason: FailedDiscoveryCheck
    Status: False
    Type: Available
Events: <none>
------------------------------------------------

However metrics server itself is working fine, if I proxy port and run Curl I do get a metrics service response. This IP address is reachable from user pod or worker node too. However it is not reachable on master node. It times out. On a master node I see calico running and routes to metrics service IP but it does not respond
-------------------
root@tops-calf:~# ip r l
default via 192.168.101.1 dev eth0 proto static
192.168.66.64/26 via 192.168.101.7 dev eth0 proto bird
192.168.101.0/24 dev eth0 proto kernel scope link src 192.168.101.5
192.168.201.128/26 via 192.168.101.6 dev eth0 proto bird
------------------

What calico or metrics server options do I miss so that API can communicate with metrics server?

PS. Crashdump is 200MB. Should I really attach that?

Revision history for this message
George Kraft (cynerva) wrote :

I'm not sure if this is the cause, but one major issue I'm seeing here is that the Calico charm's default CIDR is 192.168.0.0/16, which conflicts with the 192.168.101.0/24 network that you're using.

Can you try setting the Calico charm's cidr option to one that doesn't overlap with your 192.168.101.0/24 subnet? If you can, I recommend setting the cidr config at deploy time by using a bundle overlay - changing the config on a live cluster might not work.

Changed in charmed-kubernetes-bundles:
status: New → Incomplete
Revision history for this message
Elvinas (elvinas-3) wrote :
Download full text (3.1 KiB)

As expected changing Calico CIDR did not help. Networking works on worker node, where pod runs, regardless of CIDR. Networking to metrics server does not work on master and another worker host. Not sure if it should be that way, i.e. hosts not supposed to reach pod networks directly.

As control plane are not run as containers, but run directly and there is no Docker environment, not yet sure how to run debug container attached to same namespace. Will leave that for next week. :)

-------------
juju config calico | grep "cidr:" -A8
  cidr:
    default: 192.168.0.0/16
    description: |
      Network CIDR assigned to Calico. This is applied to the default Calico
      pool, and is also communicated to the Kubernetes charms for use in
      kube-proxy configuration.
    source: user
    type: string
    value: 10.100.0.0/16
-----------------

On a master host I do see the following routes and I can ping the gateway to metrics server subnet IP. But metrics server does not respond. Same thing on another worker node. Another worker node cannot reach the pod.

-------------------
ubuntu@super-cub:~$ ip r l
default via 192.168.101.1 dev eth0 proto static
10.100.69.128/26 via 192.168.101.18 dev eth0 proto bird
>>>>10.100.89.128/26 via 192.168.101.17 dev eth0 proto bird <<<<
192.168.101.0/24 dev eth0 proto kernel scope link src 192.168.101.19

>>>>ubuntu@super-cub:~$ ping 192.168.101.17 <<<<<
PING 192.168.101.17 (192.168.101.17) 56(84) bytes of data.
64 bytes from 192.168.101.17: icmp_seq=1 ttl=64 time=0.598 ms
64 bytes from 192.168.101.17: icmp_seq=2 ttl=64 time=0.333 ms
^C
--- 192.168.101.17 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1027ms
rtt min/avg/max/mdev = 0.333/0.465/0.598/0.134 ms

ubuntu@super-cub:~$ ping 10.100.89.134
PING 10.100.89.134 (10.100.89.134) 56(84) bytes of data.
^C
--- 10.100.89.134 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1011ms
-------------------------------

On a worker host I am not sure about routes. Not sure what means blackhole (need more RTFM :D), but metrics server responds properly on a worker host, __where pod runs__.

-----------------------------
ubuntu@fair-fish:~$ ip r l
default via 192.168.101.1 dev eth0 proto static
10.100.69.128/26 via 192.168.101.18 dev eth0 proto bird
10.100.89.128 dev calia1e8dd009f6 scope link
>>>>>>>>>blackhole 10.100.89.128/26 proto bird <<<<<<<<<<<<<<
10.100.89.130 dev calif4dc95e3deb scope link
10.100.89.131 dev caliec7dab409c0 scope link
10.100.89.132 dev cali95554be076a scope link
>>>>>>>>>10.100.89.134 dev cali104c2cee7b4 scope link <<<<<<<<<<< This is the pod
10.100.89.135 dev calif5b837d795e scope link
10.100.89.136 dev calia2d4e67ab0d scope link
192.168.101.0/24 dev eth0 proto kernel scope link src 192.168.101.17

ubuntu@fair-fish:~$ ping 10.100.89.134
PING 10.100.89.134 (10.100.89.134) 56(84) bytes of data.
64 bytes from 10.100.89.134: icmp_seq=1 ttl=64 time=0.083 ms
64 bytes from 10.100.89.134: icmp_seq=2 ttl=64 time=0.064 ms
^C
--- 10.100.89.134 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1019ms
rtt min/avg/max/mdev = 0.064/0.073/0.083/0.012 ms
----------...

Read more...

Revision history for this message
George Kraft (cynerva) wrote :

> Networking to metrics server does not work on master and another worker host. Not sure if it should be that way, i.e. hosts not supposed to reach pod networks directly.

It should not be that way. All master and worker hosts should be able to reach the metrics server, either by Service IP or Pod IP.

Your `ip r l` output looks normal and correct to me, for both the masters and workers.

> Not sure what means blackhole (need more RTFM :D)

Me neither, lol. But I see the same blackhole route on a test cluster where I can reach the metrics-server pod just fine, so I'm not too suspicious of it.

Can you run `sysctl net.ipv4.ip_forward` on your workers and make sure it is set to 1?

How are your KVM instances networked together? Is it possible that Calico traffic is being filtered by a firewall? On AWS, for example, they filter any traffic where the packet's Destination IP does not match where the packet is actually being sent. That kind of filtering causes problems for Calico, which works by routing packets to pods directly through the host worker via routing table entries.

If you don't have a specific need for Calico to use direct routing, then you could try configuring Calico to use IP-in-IP encapsulation by setting the calico charm's ipip option to 'Always'. That can bypass these sorts of issues sometimes.

Revision history for this message
Elvinas (elvinas-3) wrote :
Download full text (3.5 KiB)

Short version:
Found the culprit. Seems to be working. Not sure who's to blame.
It was missing route to container IP subnet on KVM virtualization host.

Long version.

All Kubernetes hosts are VM's on my workstation, deployed via Juju on MaaS. So networking was "automagically" created by MaaS and it seems to be working. I can communicate between nodes and maas/juju deployments works as expected.

sysctl net.ipv4.ip_forward shows 1 as expected.

Juju cluster machines:

Machine State DNS Inst id Series AZ Message
0 started 192.168.101.8 tight-gar bionic default Deployed
1 started 192.168.101.9 modest-aphid bionic default Deployed
2 started 192.168.101.10 live-hound bionic default Deployed
3 started 192.168.101.19 super-cub bionic default Deployed
4 started 192.168.101.17 fair-fish bionic default Deployed
5 started 192.168.101.18 happy-viper bionic default Deployed

Test Kubernetes pod (I have deployed a generic deployment in case metrics server being some kind of special).
bacila@juodas ~/Documents/Work/Kubernetes $ kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-574b87c764-z2trs 1/1 Running 0 19s 10.100.69.132 happy-viper <none> <none>

Regarding firewall, that was an idea worth checking. I was just not sure which corner should I kick into, and which corner should I listen for an echo to identify problem. ip a l listed 37 interfaces, so there were quite a few of them there :) So I started from good old tcpdump.

* SSH to worker node and run ping metrics_server_IP
* On the workstation run "tcpdump -i any icmp"

22:17:03.045019 IP 192.168.101.17 > 10.100.69.132: ICMP echo request, id 9430, seq 1, length 64
22:17:03.045067 IP 192.168.101.1 > 10.100.69.132: ICMP echo request, id 9430, seq 1, length 64
22:17:03.045299 IP 10.100.69.132 > 192.168.101.1: ICMP echo reply, id 9430, seq 1, length 64

So I see that ping reaches test pod and it sends reply to Juju network gateway but it does not reach back origin.

I have looked at main host route list and it lacked container network IP range, which means that ICMP response is sent to default GW.

root@juodas:/home/bacila# ip r l
default via 192.168.4.254 dev enp8s0 proto dhcp metric 100
169.254.0.0/16 dev enp8s0 scope link metric 1000
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.18.0.0/16 dev br-03ebb1003d68 proto kernel scope link src 172.18.0.1 linkdown
192.168.4.0/24 dev enp8s0 proto kernel scope link src 192.168.4.2 metric 100
192.168.100.0/24 dev virbr1 proto kernel scope link src 192.168.100.1 linkdown
192.168.101.0/24 dev virbr2 proto kernel scope link src 192.168.101.1
192.168.123.0/24 dev virbr0 proto kernel scope link src 192.168.123.1 linkdown

So I have added a route to container IP range via juju cluster GW.
root@juodas:/home/bacila# ip r a 10.100.0.0/16 via 192.168.101.1

Voila! ping responds and metrics apiservice no longer shows an error and I have my metrics.

I have...

Read more...

Revision history for this message
George Kraft (cynerva) wrote :

Thanks for the investigation and detailed follow-up.

> 22:17:03.045019 IP 192.168.101.17 > 10.100.69.132: ICMP echo request, id 9430, seq 1, length 64
> 22:17:03.045067 IP 192.168.101.1 > 10.100.69.132: ICMP echo request, id 9430, seq 1, length 64

This is the part that looks off to me. The host NAT'd the source IP from 192.168.101.17 to 192.168.101.1, which doesn't make sense given that the traffic should have been able to route directly without passing through the gateway. I suspect MaaS created iptables NAT rules that are causing this to happen, but I'm not sure.

If you have a minute to share output of `sudo iptables-save` from the host, it would be a big help.

> So it was missing routes on virtualization host. Why they are missing it is a question. Juju is run via non privileged user, so juju cannot update host level routing. So should it be done somehow via MaaS? Or simply install instructions updated with some manual post deployment tasks.

Indeed. We'll need to investigate if this is something that can be fixed in Juju or MaaS, but we may just need to doc it.

Changed in charmed-kubernetes-bundles:
status: Incomplete → Confirmed
Revision history for this message
Elvinas (elvinas-3) wrote :
Download full text (5.4 KiB)

Hm... interesting point. Indeed worker nodes should not need to go through GW as they are on the same subnet. However it might be because my workstation is a training ground for various stuff. I also have docker deployed on this host. As well as MaaS itself. As I have not enough money to run up my personal DC I have to run all in one solution. But I have some future ideas regarding one dark corner in my house. 19" rack is already there . :D

Here is iptables. Note: I cut off lots of crap IP addresses inserted by fail2ban

root@juodas:/home/bacila# iptables-save
# Generated by iptables-save v1.6.1 on Thu Apr 9 19:17:26 2020
*mangle
:PREROUTING ACCEPT [84002114:41938433452]
:INPUT ACCEPT [28876446:22838428054]
:FORWARD ACCEPT [54283218:19050383319]
:OUTPUT ACCEPT [27677762:24122793349]
:POSTROUTING ACCEPT [82173660:43230321739]
-A POSTROUTING -o virbr1 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
-A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
COMMIT
# Completed on Thu Apr 9 19:17:26 2020
# Generated by iptables-save v1.6.1 on Thu Apr 9 19:17:26 2020
*nat
:PREROUTING ACCEPT [493405:31870381]
:INPUT ACCEPT [81386:7674088]
:OUTPUT ACCEPT [295675:39671491]
:POSTROUTING ACCEPT [505514:52291054]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 192.168.101.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.101.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.101.0/24 ! -d 192.168.101.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.101.0/24 ! -d 192.168.101.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.101.0/24 ! -d 192.168.101.0/24 -j MASQUERADE
-A POSTROUTING -s 192.168.123.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.123.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -j MASQUERADE
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.18.0.0/16 ! -o br-03ebb1003d68 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A DOCKER -i br-03ebb1003d68 -j RETURN
COMMIT
# Completed on Thu Apr 9 19:17:26 2020
# Generated by iptables-save v1.6.1 on Thu Apr 9 19:17:26 2020
*filter
:INPUT ACCEPT [11946:5027672]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [11399:4530630]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
:f2b-sshd - [0:0]
-A INPUT -i virbr1 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i virbr1 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i virbr1 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i virbr1 -p tcp -m tcp --dport 67 -j ACCEPT
-A INPUT -i virbr2 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i virbr2 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i virbr2 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i virbr2 -p tcp -m tcp --dport 67 -j ACCEPT
-A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i ...

Read more...

George Kraft (cynerva)
Changed in charmed-kubernetes-bundles:
importance: Undecided → High
Changed in charm-kubernetes-master:
importance: Undecided → High
Changed in charm-kubernetes-worker:
importance: Undecided → High
no longer affects: charmed-kubernetes-bundles
Changed in charm-kubernetes-master:
status: New → Triaged
Changed in charm-kubernetes-worker:
status: New → Triaged
George Kraft (cynerva)
Changed in charm-kubernetes-master:
importance: High → Medium
Changed in charm-kubernetes-worker:
importance: High → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.