kubernetes-master can't talk to kubeapi-load-balancer on dual stack deployment

Bug #1857173 reported by Chris Privitere
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes API Load Balancer
Fix Released
Medium
Cory Johns

Bug Description

Background:
*Deploying charmed-kubernetes via juju 2.7.0 on ubuntu 18.04.
*Using the stock model with no alterations or overlays. ("juju deploy charmed-kubernetes").
*Cluster gets to the state where everything is active except the kubernetes-master nodes
*Juju cloud is a vsphere cloud. All machines are in a dedicated subnet with dhcp services and both ipv4 and ipv6 are enabled on the vsphere/networking side of things.
*I see no way to disable IPv6 from juju or via the charmed-kubernetes charms or overlays, so I'm not able to tell it to not try to use IPv6.
*Debug indicates the failure is from a kubectl get po -n kube-system not working.
*Doing a juju ssh to the kubernetes-master node and running kubectl get po -n kube-system as root gives the following error: "The connection to the server 2620:72:0:8064:250:56ff:fe23:9cd4:443 was refused - did you specify the right host or port?"
*Doing a juju ssh to the kubeapi-load-balancer node and inspecting the nginx config confirms that nginx is only binding to the IPv4 addresses on the node. So yes, the above command SHOULD fail.
*The worker nodes in the cluster have the IPv4 address of the API server specified in their kubeconfig file, thus they work fine when trying to talk to the API server.
*Trying to update the kubeconfig files on the kubernetes-master nodes, however, does not fix the issue and let the checks pass. It does fix kubectl from root's command line on those nodes, however. So perhaps address used by the kubectl commands are being passed in from somewhere else or the kubeconfig is cached somehow?
*Easy remediations in my mind could be any of the following: Enable the load-balancer to run on IPv6 AND IPv4, enable an option to explicitly disable IPv6 for the entire deployment in the charm bundle, make the kubernetes-master checks only use the IPv4 address like the kubernetes-worker checks do.

Here's my juju status, in case it helps.
```
Model Controller Cloud/Region Version SLA Timestamp
tabby tst-vcenter-dc1 tst-vcenter/dc1 2.7.0 unsupported 14:24:25-06:00

App Version Status Scale Charm Store Rev OS Notes
containerd active 5 containerd jujucharms 53 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 295 ubuntu
etcd 3.3.15 active 3 etcd jujucharms 485 ubuntu
flannel 0.11.0 active 5 flannel jujucharms 466 ubuntu
kubeapi-load-balancer 1.14.0 active 1 kubeapi-load-balancer jujucharms 701 ubuntu exposed
kubernetes-master 1.17.0 waiting 2 kubernetes-master jujucharms 788 ubuntu
kubernetes-worker 1.17.0 active 3 kubernetes-worker jujucharms 623 ubuntu exposed

Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0 2620:72:0:8064:250:56ff:fe07:aede Certificate Authority connected.
etcd/0* active idle 1 2620:72:0:8064:250:56ff:fe03:7ee2 2379/tcp Healthy with 3 known peers
etcd/1 active idle 2 2620:72:0:8064:250:56ff:fe1b:2ede 2379/tcp Healthy with 3 known peers
etcd/2 active idle 3 2620:72:0:8064:250:56ff:fe1a:b17d 2379/tcp Healthy with 3 known peers
kubeapi-load-balancer/0* active idle 4 2620:72:0:8064:250:56ff:fe23:9cd4 443/tcp Loadbalancer ready.
kubernetes-master/0 waiting idle 5 2620:72:0:8064:250:56ff:fe06:a110 6443/tcp Waiting for kube-system pods to start
  containerd/4 active idle 2620:72:0:8064:250:56ff:fe06:a110 Container runtime available
  flannel/4 active idle 2620:72:0:8064:250:56ff:fe06:a110 Flannel subnet 10.1.70.1/24
kubernetes-master/1* waiting idle 6 2620:72:0:8064:250:56ff:fe1d:7f88 6443/tcp Waiting for kube-system pods to start
  containerd/3 active idle 2620:72:0:8064:250:56ff:fe1d:7f88 Container runtime available
  flannel/3 active idle 2620:72:0:8064:250:56ff:fe1d:7f88 Flannel subnet 10.1.65.1/24
kubernetes-worker/0* active idle 7 2620:72:0:8064:250:56ff:fe04:a62c 80/tcp,443/tcp Kubernetes worker running.
  containerd/1 active idle 2620:72:0:8064:250:56ff:fe04:a62c Container runtime available
  flannel/1 active idle 2620:72:0:8064:250:56ff:fe04:a62c Flannel subnet 10.1.66.1/24
kubernetes-worker/1 active idle 8 2620:72:0:8064:250:56ff:fe38:4329 80/tcp,443/tcp Kubernetes worker running.
  containerd/2 active idle 2620:72:0:8064:250:56ff:fe38:4329 Container runtime available
  flannel/2 active idle 2620:72:0:8064:250:56ff:fe38:4329 Flannel subnet 10.1.78.1/24
kubernetes-worker/2 active idle 9 2620:72:0:8064:250:56ff:fe18:d028 80/tcp,443/tcp Kubernetes worker running.
  containerd/0* active idle 2620:72:0:8064:250:56ff:fe18:d028 Container runtime available
  flannel/0* active idle 2620:72:0:8064:250:56ff:fe18:d028 Flannel subnet 10.1.59.1/24
```

Revision history for this message
Chris Privitere (cprivite) wrote :

One follow up, the reason updating the kubeconfig file manually doesn't work is because it's getting automatically reset by juju (or something). So even if I put the IPv4 address in there, it gets reset to the IPv6 address next time the checks run.

Revision history for this message
Tomas (toronz2) wrote :

Hit this bug today, the charm doesn't configure NGINX to listen on IPv6:

```
[...]
server {
    listen 443 ssl http2;
    server_name _;
[...]
```

I managed to get around it by creating another NGINX config file exclusively to listen on IPv6:

```
$ sudo cat /etc/nginx/sites-available/apilb-ipv6
upstream target_service2 {
  server 192.168.5.209:6443;
  server 192.168.5.212:6443;

}

server {
    listen [::]:443 ipv6only=on ssl http2;
    server_name _;

    access_log /var/log/nginx.access.log;
    error_log /var/log/nginx.error.log;

    ssl on;
    ssl_session_cache builtin:1000 shared:SSL:10m;
    ssl_certificate /srv/kubernetes/server.crt;
    ssl_certificate_key /srv/kubernetes/server.key;
    ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
    ssl_prefer_server_ciphers on;

    location / {
      proxy_buffering off;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header X-Forwarded-Proto-Version $http2;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $http_connection;
      proxy_set_header X-Stream-Protocol-Version $http_x_stream_protocol_version;

      add_header X-Stream-Protocol-Version $upstream_http_x_stream_protocol_version;

      proxy_pass https://target_service2;
      proxy_read_timeout 600;
    }
}
```

With this in place, then you just symlink it to sites-enabled and reload nginx systemd service.

George Kraft (cynerva)
Changed in charm-kubeapi-load-balancer:
importance: Undecided → Medium
status: New → Triaged
no longer affects: charm-kubernetes-master
Cory Johns (johnsca)
Changed in charm-kubeapi-load-balancer:
assignee: nobody → Cory Johns (johnsca)
milestone: none → 1.19
status: Triaged → In Progress
Revision history for this message
Cory Johns (johnsca) wrote :
tags: added: review-needed
Cory Johns (johnsca)
tags: removed: review-needed
Changed in charm-kubeapi-load-balancer:
status: In Progress → Fix Committed
Changed in charm-kubeapi-load-balancer:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.