Magnum

Pod's can't communicate with kube-apiserver in multi master cluster

Bug #1742420 reported by Grzegorz Bialas on 2018-01-10

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Magnum	Fix Released	Undecided	Spyros Trigazis

Bug Description

I have deployed k8s cluster using magnum with 3 masters and 3 nodes:

[DEV]root@ucsubu3000:~/.kube# kubectl get nodes
NAME STATUS AGE VERSION
k8s-test-xtlwldidn5vg-master-0 Ready,SchedulingDisabled 36m v1.7.7
k8s-test-xtlwldidn5vg-master-1 Ready,SchedulingDisabled 36m v1.7.7
k8s-test-xtlwldidn5vg-master-2 Ready,SchedulingDisabled 36m v1.7.7
k8s-test-xtlwldidn5vg-minion-0 Ready 34m v1.7.7
k8s-test-xtlwldidn5vg-minion-1 Ready 34m v1.7.7
k8s-test-xtlwldidn5vg-minion-2 Ready 34m v1.7.7

But it looks like there is problem with authenticating to kube-apiserver from pods.

In coredns logs I see:

E0110 10:11:07.239014 5 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Endpoints: the server has asked for the client to p
rovide credentials (get endpoints)
E0110 10:11:07.239326 5 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Service: the server has asked for the client to pro
vide credentials (get services)
E0110 10:11:07.239455 5 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Namespace: the server has asked for the client to p
rovide credentials (get namespaces)

In dashboard logs I see:

[DEV]root@ucsubu3000:~/.kube# kubectl -n kube-system logs kubernetes-dashboard-3804488410-xwxcv
Using HTTP port: 9090
Creating API server client for https://10.254.0.1:443
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: the server has asked for the client to provide credentials
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/dashboard/blob/master/docs/user-guide/troubleshooting.md

When I deploy helm in tiller logs I see:

[tiller] 2018/01/10 10:44:07 preparing install for
[storage] 2018/01/10 10:44:07 getting release "muddled-woodpecker.v1"
[storage/driver] 2018/01/10 10:44:07 get: failed to get "muddled-woodpecker.v1": the server has asked for the client to provide credentials (get configmaps muddled-woodpecker.v1)
[tiller] 2018/01/10 10:44:07 info: generated name muddled-woodpecker is taken. Searching again.
[storage] 2018/01/10 10:44:07 getting release "guilded-fish.v1"
[storage/driver] 2018/01/10 10:44:07 get: failed to get "guilded-fish.v1": the server has asked for the client to provide credentials (get configmaps guilded-fish.v1)
[tiller] 2018/01/10 10:44:07 info: generated name guilded-fish is taken. Searching again.
[storage] 2018/01/10 10:44:07 getting release "foiled-bunny.v1"
[storage/driver] 2018/01/10 10:44:07 get: failed to get "foiled-bunny.v1": the server has asked for the client to provide credentials (get configmaps foiled-bunny.v1)
[tiller] 2018/01/10 10:44:07 info: generated name foiled-bunny is taken. Searching again.
[storage] 2018/01/10 10:44:07 getting release "yodeling-antelope.v1"
[storage/driver] 2018/01/10 10:44:07 get: failed to get "yodeling-antelope.v1": the server has asked for the client to provide credentials (get configmaps yodeling-antelope.v1)
[tiller] 2018/01/10 10:44:07 info: generated name yodeling-antelope is taken. Searching again.
[storage] 2018/01/10 10:44:07 getting release "anxious-kiwi.v1"
[storage/driver] 2018/01/10 10:44:07 get: failed to get "anxious-kiwi.v1": the server has asked for the client to provide credentials (get configmaps anxious-kiwi.v1)
[tiller] 2018/01/10 10:44:07 info: generated name anxious-kiwi is taken. Searching again.
[tiller] 2018/01/10 10:44:07 warning: No available release names found after 5 tries
[tiller] 2018/01/10 10:44:07 failed install prepare step: no available release name found

Tags:

Bartosz Bezak (bbezak) on 2018-01-10

Changed in magnum:
status:	New → Confirmed

Revision history for this message

Grzegorz Bialas (gbialas) wrote on 2018-01-11:

I have opened similar bug on kubernetes github (https://github.com/kubernetes/kubernetes/issues/58071) , but it was closed with comment: "sounds like something you need to take up with the magnum team (especially the credentials references). they can escalate to this repository if needed."

Revision history for this message

Spyros Trigazis (strigazi) wrote on 2018-01-18:

I managed to reproduce but couldn't found an obvious solution, I'll investigate more.

Coredns should be able to authenticate with the api server.

Revision history for this message

Spyros Trigazis (strigazi) wrote on 2018-01-30:

Hi,

I think that the problem can be solved by adding:
--insecure-bind-address=127.0.0.1

On the api server config, like so [1].

I have included in this patch:
https://review.openstack.org/#/c/533593

@gbialas can you confirm it?

[1] https://review.openstack.org/#/c/533593/10/magnum/drivers/common/templates/kubernetes/fragments/configure-kubernetes-master.sh@26

OpenStack Infra (hudson-openstack) on 2018-01-30

Changed in magnum:
assignee:	nobody → Spyros Trigazis (strigazi)
status:	Confirmed → In Progress

Revision history for this message

Grzegorz Bialas (gbialas) wrote on 2018-01-31:

Hi,
unfortunately that doesn't solve problem.
I have added "--insecure-bind-address=127.0.0.1" to /etc/kubernetes/apiserver and rebooted all master nodes.

coredns is in semi-working state:

192.168.3.110 - [31/Jan/2018:10:37:18 +0000] "AAAA IN harping-swan-redis. udp 36 false 512" SERVFAIL qr,rd,ra 36 2.559159ms
192.168.3.110 - [31/Jan/2018:10:37:18 +0000] "A IN harping-swan-redis. udp 36 false 512" SERVFAIL qr,rd,ra 36 2.825283ms
192.168.3.110 - [31/Jan/2018:10:37:18 +0000] "AAAA IN harping-swan-redis. udp 36 false 512" SERVFAIL qr,rd,ra 36 2.570759ms
192.168.3.110 - [31/Jan/2018:10:37:18 +0000] "A IN harping-swan-redis. udp 36 false 512" SERVFAIL qr,rd,ra 36 2.817016ms
E0131 10:37:18.376444 6 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Service: the server has asked for the client to provide credentials (get services)
E0131 10:37:18.376558 6 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Namespace: the server has asked for the client to provide credentials (get namespaces)
E0131 10:37:18.377639 6 reflector.go:214] github.com/coredns/coredns/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *api.Endpoints: the server has asked for the client to provide credentials (get endpoints)

If some additional steps or tests are needed let me know.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-09: Fix proposed to magnum (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/542742

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-13: Fix merged to magnum (master)

Reviewed: https://review.openstack.org/533593
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=2329cb7fb4d197e49d6c07d37b2f7ec14a11c880
Submitter: Zuul
Branch: master

commit 2329cb7fb4d197e49d6c07d37b2f7ec14a11c880
Author: Spyros Trigazis <email address hidden>
Date: Mon Jan 15 11:16:02 2018 +0100

k8s: Fix kubelet, add RBAC and pass e2e tests

Due to a few several small connected patches for the
fedora atomic driver, this patch includes 4 smaller patches.

Patch 1:
k8s: Do not start kubelet and kube-proxy on master

    Patch [1], misses the removal of kubelet and kube-proxy from
    enable-services-master.sh and therefore they are started if they
    exist in the image or the script will fail.

https://review.openstack.org/#/c/533593/
Closes-Bug: #1726482

Patch 2:
k8s: Set require-kubeconfig when needed

From kubernetes 1.8 [1] --require-kubeconfig is deprecated and
in kubernetes 1.9 it is removed.

Add --require-kubeconfig only for k8s <= 1.8.

[1] https://github.com/kubernetes/kubernetes/issues/36745

Closes-Bug: #1718926

https://review.openstack.org/#/c/534309/

Patch 3:
k8s_fedora: Add RBAC configuration

    * Make certificates and kubeconfigs compatible
      with NodeAuthorizer [1].
    * Add CoreDNS roles and rolebindings.
    * Create the system:kube-apiserver-to-kubelet ClusterRole.
    * Bind the system:kube-apiserver-to-kubelet ClusterRole to
      the kubernetes user.
    * remove creation of kube-system namespaces, it is created
      by default
    * update client cert generation in the conductor with
      kubernetes' requirements
    * Add --insecure-bind-address=127.0.0.1 to work on
      multi-master too. The controller manager on each
      node needs to contact the apiserver (on the same node)
      on 127.0.0.1:8080

[1] https://kubernetes.io/docs/admin/authorization/node/

    Closes-Bug: #1742420
    Depends-On: If43c3d0a0d83c42ff1fceffe4bcc333b31dbdaab
    https://review.openstack.org/#/c/527103/

Patch 4:
k8s_fedora: Update coredns config to pass e2e

    To pass the e2e conformance tests, coredns needs to
    be configured with POD-MODE verified. Otherwise, pods
    won't be resolvable [1].

[1] https://github.com/coredns/coredns/tree/master/plugin/kubernetes

https://review.openstack.org/#/c/528566/
Closes-Bug: #1738633

Change-Id: Ibd5245ca0f5a11e1d67a2514cebb2ffe8aa5e7de

Reviewed:  https://review.openstack.org/533593
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=2329cb7fb4d197e49d6c07d37b2f7ec14a11c880
Submitter: Zuul
Branch:    master

commit 2329cb7fb4d197e49d6c07d37b2f7ec14a11c880
Author: Spyros Trigazis <spyridon.trigazis@cern.ch>
Date:   Mon Jan 15 11:16:02 2018 +0100

k8s: Fix kubelet, add RBAC and pass e2e tests
    
    Due to a few several small connected patches for the
    fedora atomic driver, this patch includes 4 smaller patches.
    
    Patch 1:
    k8s: Do not start kubelet and kube-proxy on master
    
    Patch [1], misses the removal of kubelet and kube-proxy from
    enable-services-master.sh and therefore they are started if they
    exist in the image or the script will fail.
    
    https://review.openstack.org/#/c/533593/
    Closes-Bug: #1726482
    
    Patch 2:
    k8s: Set require-kubeconfig when needed
    
    From kubernetes 1.8 [1] --require-kubeconfig is deprecated and
    in kubernetes 1.9 it is removed.
    
    Add --require-kubeconfig only for k8s <= 1.8.
    
    [1] https://github.com/kubernetes/kubernetes/issues/36745
    
    Closes-Bug: #1718926
    
    https://review.openstack.org/#/c/534309/
    
    Patch 3:
    k8s_fedora: Add RBAC configuration
    
    * Make certificates and kubeconfigs compatible
      with NodeAuthorizer [1].
    * Add CoreDNS roles and rolebindings.
    * Create the system:kube-apiserver-to-kubelet ClusterRole.
    * Bind the system:kube-apiserver-to-kubelet ClusterRole to
      the kubernetes user.
    * remove creation of kube-system namespaces, it is created
      by default
    * update client cert generation in the conductor with
      kubernetes' requirements
    * Add --insecure-bind-address=127.0.0.1 to work on
      multi-master too. The controller manager on each
      node needs to contact the apiserver (on the same node)
      on 127.0.0.1:8080
    
    [1] https://kubernetes.io/docs/admin/authorization/node/
    
    Closes-Bug: #1742420
    Depends-On: If43c3d0a0d83c42ff1fceffe4bcc333b31dbdaab
    https://review.openstack.org/#/c/527103/
    
    Patch 4:
    k8s_fedora: Update coredns config to pass e2e
    
    To pass the e2e conformance tests, coredns needs to
    be configured with POD-MODE verified. Otherwise, pods
    won't be resolvable [1].
    
    [1] https://github.com/coredns/coredns/tree/master/plugin/kubernetes
    
    https://review.openstack.org/#/c/528566/
    Closes-Bug: #1738633
    
    Change-Id: Ibd5245ca0f5a11e1d67a2514cebb2ffe8aa5e7de

Changed in magnum:
status:	In Progress → Fix Released

Revision history for this message

pengyuan (ypnuaa037) wrote on 2018-02-14:

Hi, Grzegorz Bialas
I had this problem too, I search a lot for solutions, most of them were about secret token and iptables rules, which didn't solve the problem.
But I felt related to the network, so after I changed flannel backend to vxlan(defautl udp), the problem was solved
If you are using flannel network, you can try it

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-14: Fix merged to magnum (stable/queens)

Reviewed: https://review.openstack.org/542742
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=eb92701e05bb57e4d608e5bc66a69ed33c82c76e
Submitter: Zuul
Branch: stable/queens

commit eb92701e05bb57e4d608e5bc66a69ed33c82c76e
Author: Spyros Trigazis <email address hidden>
Date: Mon Jan 15 11:16:02 2018 +0100

k8s: Fix kubelet, add RBAC and pass e2e tests

Due to a few several small connected patches for the
fedora atomic driver, this patch includes 4 smaller patches.

Patch 1:
k8s: Do not start kubelet and kube-proxy on master

    Patch [1], misses the removal of kubelet and kube-proxy from
    enable-services-master.sh and therefore they are started if they
    exist in the image or the script will fail.

https://review.openstack.org/#/c/533593/
Closes-Bug: #1726482

Patch 2:
k8s: Set require-kubeconfig when needed

From kubernetes 1.8 [1] --require-kubeconfig is deprecated and
in kubernetes 1.9 it is removed.

Add --require-kubeconfig only for k8s <= 1.8.

[1] https://github.com/kubernetes/kubernetes/issues/36745

Closes-Bug: #1718926

https://review.openstack.org/#/c/534309/

Patch 3:
k8s_fedora: Add RBAC configuration

[1] https://kubernetes.io/docs/admin/authorization/node/

    Closes-Bug: #1742420
    Depends-On: If43c3d0a0d83c42ff1fceffe4bcc333b31dbdaab
    https://review.openstack.org/#/c/527103/

Patch 4:
k8s_fedora: Update coredns config to pass e2e

    To pass the e2e conformance tests, coredns needs to
    be configured with POD-MODE verified. Otherwise, pods
    won't be resolvable [1].

[1] https://github.com/coredns/coredns/tree/master/plugin/kubernetes

https://review.openstack.org/#/c/528566/
Closes-Bug: #1738633

Change-Id: Ibd5245ca0f5a11e1d67a2514cebb2ffe8aa5e7de

Reviewed:  https://review.openstack.org/542742
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=eb92701e05bb57e4d608e5bc66a69ed33c82c76e
Submitter: Zuul
Branch:    stable/queens

commit eb92701e05bb57e4d608e5bc66a69ed33c82c76e
Author: Spyros Trigazis <spyridon.trigazis@cern.ch>
Date:   Mon Jan 15 11:16:02 2018 +0100

tags:

added: in-stable-queens

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-23: Fix included in openstack/magnum 6.1.0

This issue was fixed in the openstack/magnum 6.1.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-13: Fix included in openstack/magnum 7.0.0

#10

This issue was fixed in the openstack/magnum 7.0.0 release.