Pod's can't communicate with kube-apiserver in multi master cluster

Bug #1742420 reported by Grzegorz Bialas
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Magnum
Fix Released
Undecided
Spyros Trigazis

Bug Description

I have deployed k8s cluster using magnum with 3 masters and 3 nodes:

[DEV]root@ucsubu3000:~/.kube# kubectl get nodes
NAME STATUS AGE VERSION
k8s-test-xtlwldidn5vg-master-0 Ready,SchedulingDisabled 36m v1.7.7
k8s-test-xtlwldidn5vg-master-1 Ready,SchedulingDisabled 36m v1.7.7
k8s-test-xtlwldidn5vg-master-2 Ready,SchedulingDisabled 36m v1.7.7
k8s-test-xtlwldidn5vg-minion-0 Ready 34m v1.7.7
k8s-test-xtlwldidn5vg-minion-1 Ready 34m v1.7.7
k8s-test-xtlwldidn5vg-minion-2 Ready 34m v1.7.7

But it looks like there is problem with authenticating to kube-apiserver from pods.

In coredns logs I see:

E0110 10:11:07.239014 5 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Endpoints: the server has asked for the client to p
rovide credentials (get endpoints)
E0110 10:11:07.239326 5 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Service: the server has asked for the client to pro
vide credentials (get services)
E0110 10:11:07.239455 5 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Namespace: the server has asked for the client to p
rovide credentials (get namespaces)

In dashboard logs I see:

[DEV]root@ucsubu3000:~/.kube# kubectl -n kube-system logs kubernetes-dashboard-3804488410-xwxcv
Using HTTP port: 9090
Creating API server client for https://10.254.0.1:443
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: the server has asked for the client to provide credentials
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/dashboard/blob/master/docs/user-guide/troubleshooting.md

When I deploy helm in tiller logs I see:

[tiller] 2018/01/10 10:44:07 preparing install for
[storage] 2018/01/10 10:44:07 getting release "muddled-woodpecker.v1"
[storage/driver] 2018/01/10 10:44:07 get: failed to get "muddled-woodpecker.v1": the server has asked for the client to provide credentials (get configmaps muddled-woodpecker.v1)
[tiller] 2018/01/10 10:44:07 info: generated name muddled-woodpecker is taken. Searching again.
[storage] 2018/01/10 10:44:07 getting release "guilded-fish.v1"
[storage/driver] 2018/01/10 10:44:07 get: failed to get "guilded-fish.v1": the server has asked for the client to provide credentials (get configmaps guilded-fish.v1)
[tiller] 2018/01/10 10:44:07 info: generated name guilded-fish is taken. Searching again.
[storage] 2018/01/10 10:44:07 getting release "foiled-bunny.v1"
[storage/driver] 2018/01/10 10:44:07 get: failed to get "foiled-bunny.v1": the server has asked for the client to provide credentials (get configmaps foiled-bunny.v1)
[tiller] 2018/01/10 10:44:07 info: generated name foiled-bunny is taken. Searching again.
[storage] 2018/01/10 10:44:07 getting release "yodeling-antelope.v1"
[storage/driver] 2018/01/10 10:44:07 get: failed to get "yodeling-antelope.v1": the server has asked for the client to provide credentials (get configmaps yodeling-antelope.v1)
[tiller] 2018/01/10 10:44:07 info: generated name yodeling-antelope is taken. Searching again.
[storage] 2018/01/10 10:44:07 getting release "anxious-kiwi.v1"
[storage/driver] 2018/01/10 10:44:07 get: failed to get "anxious-kiwi.v1": the server has asked for the client to provide credentials (get configmaps anxious-kiwi.v1)
[tiller] 2018/01/10 10:44:07 info: generated name anxious-kiwi is taken. Searching again.
[tiller] 2018/01/10 10:44:07 warning: No available release names found after 5 tries
[tiller] 2018/01/10 10:44:07 failed install prepare step: no available release name found

Bartosz Bezak (bbezak)
Changed in magnum:
status: New → Confirmed
Revision history for this message
Grzegorz Bialas (gbialas) wrote :

I have opened similar bug on kubernetes github (https://github.com/kubernetes/kubernetes/issues/58071) , but it was closed with comment: "sounds like something you need to take up with the magnum team (especially the credentials references). they can escalate to this repository if needed."

Revision history for this message
Spyros Trigazis (strigazi) wrote :

I managed to reproduce but couldn't found an obvious solution, I'll investigate more.

Coredns should be able to authenticate with the api server.

Revision history for this message
Spyros Trigazis (strigazi) wrote :

Hi,

I think that the problem can be solved by adding:
 --insecure-bind-address=127.0.0.1

On the api server config, like so [1].

I have included in this patch:
https://review.openstack.org/#/c/533593

@gbialas can you confirm it?

[1] https://review.openstack.org/#/c/533593/10/magnum/drivers/common/templates/kubernetes/fragments/configure-kubernetes-master.sh@26

Changed in magnum:
assignee: nobody → Spyros Trigazis (strigazi)
status: Confirmed → In Progress
Revision history for this message
Grzegorz Bialas (gbialas) wrote :

Hi,
unfortunately that doesn't solve problem.
I have added "--insecure-bind-address=127.0.0.1" to /etc/kubernetes/apiserver and rebooted all master nodes.

coredns is in semi-working state:

192.168.3.110 - [31/Jan/2018:10:37:18 +0000] "AAAA IN harping-swan-redis. udp 36 false 512" SERVFAIL qr,rd,ra 36 2.559159ms
192.168.3.110 - [31/Jan/2018:10:37:18 +0000] "A IN harping-swan-redis. udp 36 false 512" SERVFAIL qr,rd,ra 36 2.825283ms
192.168.3.110 - [31/Jan/2018:10:37:18 +0000] "AAAA IN harping-swan-redis. udp 36 false 512" SERVFAIL qr,rd,ra 36 2.570759ms
192.168.3.110 - [31/Jan/2018:10:37:18 +0000] "A IN harping-swan-redis. udp 36 false 512" SERVFAIL qr,rd,ra 36 2.817016ms
E0131 10:37:18.376444 6 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Service: the server has asked for the client to provide credentials (get services)
E0131 10:37:18.376558 6 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Namespace: the server has asked for the client to provide credentials (get namespaces)
E0131 10:37:18.377639 6 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Endpoints: the server has asked for the client to provide credentials (get endpoints)

If some additional steps or tests are needed let me know.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to magnum (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/542742

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to magnum (master)

Reviewed: https://review.openstack.org/533593
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=2329cb7fb4d197e49d6c07d37b2f7ec14a11c880
Submitter: Zuul
Branch: master

commit 2329cb7fb4d197e49d6c07d37b2f7ec14a11c880
Author: Spyros Trigazis <email address hidden>
Date: Mon Jan 15 11:16:02 2018 +0100

    k8s: Fix kubelet, add RBAC and pass e2e tests

    Due to a few several small connected patches for the
    fedora atomic driver, this patch includes 4 smaller patches.

    Patch 1:
    k8s: Do not start kubelet and kube-proxy on master

    Patch [1], misses the removal of kubelet and kube-proxy from
    enable-services-master.sh and therefore they are started if they
    exist in the image or the script will fail.

    https://review.openstack.org/#/c/533593/
    Closes-Bug: #1726482

    Patch 2:
    k8s: Set require-kubeconfig when needed

    From kubernetes 1.8 [1] --require-kubeconfig is deprecated and
    in kubernetes 1.9 it is removed.

    Add --require-kubeconfig only for k8s <= 1.8.

    [1] https://github.com/kubernetes/kubernetes/issues/36745

    Closes-Bug: #1718926

    https://review.openstack.org/#/c/534309/

    Patch 3:
    k8s_fedora: Add RBAC configuration

    * Make certificates and kubeconfigs compatible
      with NodeAuthorizer [1].
    * Add CoreDNS roles and rolebindings.
    * Create the system:kube-apiserver-to-kubelet ClusterRole.
    * Bind the system:kube-apiserver-to-kubelet ClusterRole to
      the kubernetes user.
    * remove creation of kube-system namespaces, it is created
      by default
    * update client cert generation in the conductor with
      kubernetes' requirements
    * Add --insecure-bind-address=127.0.0.1 to work on
      multi-master too. The controller manager on each
      node needs to contact the apiserver (on the same node)
      on 127.0.0.1:8080

    [1] https://kubernetes.io/docs/admin/authorization/node/

    Closes-Bug: #1742420
    Depends-On: If43c3d0a0d83c42ff1fceffe4bcc333b31dbdaab
    https://review.openstack.org/#/c/527103/

    Patch 4:
    k8s_fedora: Update coredns config to pass e2e

    To pass the e2e conformance tests, coredns needs to
    be configured with POD-MODE verified. Otherwise, pods
    won't be resolvable [1].

    [1] https://github.com/coredns/coredns/tree/master/plugin/kubernetes

    https://review.openstack.org/#/c/528566/
    Closes-Bug: #1738633

    Change-Id: Ibd5245ca0f5a11e1d67a2514cebb2ffe8aa5e7de

Changed in magnum:
status: In Progress → Fix Released
Revision history for this message
pengyuan (ypnuaa037) wrote :

Hi, Grzegorz Bialas
I had this problem too, I search a lot for solutions, most of them were about secret token and iptables rules, which didn't solve the problem.
But I felt related to the network, so after I changed flannel backend to vxlan(defautl udp), the problem was solved
If you are using flannel network, you can try it

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to magnum (stable/queens)

Reviewed: https://review.openstack.org/542742
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=eb92701e05bb57e4d608e5bc66a69ed33c82c76e
Submitter: Zuul
Branch: stable/queens

commit eb92701e05bb57e4d608e5bc66a69ed33c82c76e
Author: Spyros Trigazis <email address hidden>
Date: Mon Jan 15 11:16:02 2018 +0100

    k8s: Fix kubelet, add RBAC and pass e2e tests

    Due to a few several small connected patches for the
    fedora atomic driver, this patch includes 4 smaller patches.

    Patch 1:
    k8s: Do not start kubelet and kube-proxy on master

    Patch [1], misses the removal of kubelet and kube-proxy from
    enable-services-master.sh and therefore they are started if they
    exist in the image or the script will fail.

    https://review.openstack.org/#/c/533593/
    Closes-Bug: #1726482

    Patch 2:
    k8s: Set require-kubeconfig when needed

    From kubernetes 1.8 [1] --require-kubeconfig is deprecated and
    in kubernetes 1.9 it is removed.

    Add --require-kubeconfig only for k8s <= 1.8.

    [1] https://github.com/kubernetes/kubernetes/issues/36745

    Closes-Bug: #1718926

    https://review.openstack.org/#/c/534309/

    Patch 3:
    k8s_fedora: Add RBAC configuration

    * Make certificates and kubeconfigs compatible
      with NodeAuthorizer [1].
    * Add CoreDNS roles and rolebindings.
    * Create the system:kube-apiserver-to-kubelet ClusterRole.
    * Bind the system:kube-apiserver-to-kubelet ClusterRole to
      the kubernetes user.
    * remove creation of kube-system namespaces, it is created
      by default
    * update client cert generation in the conductor with
      kubernetes' requirements
    * Add --insecure-bind-address=127.0.0.1 to work on
      multi-master too. The controller manager on each
      node needs to contact the apiserver (on the same node)
      on 127.0.0.1:8080

    [1] https://kubernetes.io/docs/admin/authorization/node/

    Closes-Bug: #1742420
    Depends-On: If43c3d0a0d83c42ff1fceffe4bcc333b31dbdaab
    https://review.openstack.org/#/c/527103/

    Patch 4:
    k8s_fedora: Update coredns config to pass e2e

    To pass the e2e conformance tests, coredns needs to
    be configured with POD-MODE verified. Otherwise, pods
    won't be resolvable [1].

    [1] https://github.com/coredns/coredns/tree/master/plugin/kubernetes

    https://review.openstack.org/#/c/528566/
    Closes-Bug: #1738633

    Change-Id: Ibd5245ca0f5a11e1d67a2514cebb2ffe8aa5e7de

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/magnum 6.1.0

This issue was fixed in the openstack/magnum 6.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/magnum 7.0.0

This issue was fixed in the openstack/magnum 7.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.