Certs problem with all worker nodes

Bug #1816387 reported by yen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EasyRSA Charm
New
Undecided
Unassigned

Bug Description

My cluster has been running fine for two weeks.
But out of a sudden don't know what exactly happened on all work nodes these CA problem just pop out.

```
ychuang@C02W80YAG8WL:~/kubernetes$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
bubnicki NotReady <none> 19d v1.13.2
karpinski NotReady <none> 31d v1.13.1
```
On worker node's /var/log/syslog

```
Feb 17 16:15:42 Bubnicki kubelet.daemon[12294]: I0217 16:15:42.543124 12294 log.go:172] http: TLS handshake error from 192.168.5.132:49796: remote error: tls: bad certificate
```
I have searched on line and found this post. https://devops.stackexchange.com/questions/1765/new-kubernetes-cluster-remote-error-tls-bad-certificate

Saying this means my client's certificate is invalid because either:

The server doesn't trust the client's signing certificate authority
The client doesn't trust the server's signing certificate authority
The certificate's DN doesn't match the hostname

Below is my kube-apiserver start parameters.

```
root@kubernetesBM:/root# ps -ef | grep kube-apiserver | tr " " "\n"
/snap/kube-apiserver/656/kube-apiserver
--advertise-address=172.29.100.185
--min-request-timeout=300
--etcd-cafile=/root/cdk/etcd/client-ca.pem
--etcd-certfile=/root/cdk/etcd/client-cert.pem
--etcd-keyfile=/root/cdk/etcd/client-key.pem
--etcd-servers=https://172.29.100.185:2379
--storage-backend=etcd3
--tls-cert-file=/root/cdk/server.crt
--tls-private-key-file=/root/cdk/server.key
--insecure-bind-address=127.0.0.1
--insecure-port=8080
--audit-log-maxbackup=9
--audit-log-maxsize=100
--audit-log-path=/root/cdk/audit/audit.log
--audit-policy-file=/root/cdk/audit/audit-policy.yaml
--basic-auth-file=/root/cdk/basic_auth.csv
--client-ca-file=/root/cdk/ca.crt
--requestheader-allowed-names=client
--requestheader-client-ca-file=/root/cdk/ca.crt
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--service-account-key-file=/root/cdk/serviceaccount.key
--token-auth-file=/root/cdk/known_tokens.csv
--authorization-mode=RBAC,Node
--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,NodeRestriction
--allow-privileged
--enable-aggregator-routing
--kubelet-certificate-authority=/root/cdk/ca.crt
--kubelet-client-certificate=/root/cdk/client.crt
--kubelet-client-key=/root/cdk/client.key
--kubelet-preferred-address-types=[InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP]
--proxy-client-cert-file=/root/cdk/client.crt
--proxy-client-key-file=/root/cdk/client.key
--service-cluster-ip-range=10.152.183.0/24
--logtostderr
--v=4
```
I have checked on all my certs for time and they all seem legit.

```
root@kubernetesBM:/root# openssl x509 -in /root/cdk/ca.crt -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            d8:71:a7:10:4a:f0:20:b0
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN = 172.29.100.186
        Validity
            Not Before: Jan 16 20:16:29 2019 GMT
            Not After : Jan 13 20:16:29 2029 GMT
```
Cann someone point it to me what else I should check? and how do I fix this issue?

More info.

```
root@Bubnicki:/root/.kube# curl -v https://172.29.100.185:6443/api/v1/nodes --cert /root/cdk/server.crt --key /root/cdk/server.key
* Trying 172.29.100.185...
* TCP_NODELAY set
* Connected to 172.29.100.185 (172.29.100.185) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS handshake, CERT verify (15):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=kubernetes-master_0
* start date: Jan 16 20:26:12 2019 GMT
* expire date: Jan 13 20:26:12 2029 GMT
* subjectAltName: host "172.29.100.185" matched cert's IP address!
* subjectAltName: host "172.29.100.185" matched cert's IP address!
* issuer: CN=172.29.100.186
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5563237f0900)
> GET /api/v1/nodes HTTP/2
> Host: 172.29.100.185:6443
> User-Agent: curl/7.58.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 403
< audit-id: 0de66be1-b16a-42f2-9aaf-39ef12bae5d9
< content-type: application/json
< x-content-type-options: nosniff
< content-length: 308
< date: Mon, 18 Feb 2019 09:49:37 GMT
<
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "nodes is forbidden: User \"kubernetes-worker_3\" cannot list resource \"nodes\" in API group \"\" at the cluster scope",
  "reason": "Forbidden",
  "details": {
    "kind": "nodes"
  },
  "code": 403
* Connection #0 to host 172.29.100.185 left intact
```

Revision history for this message
yen (antigenius0910) wrote :

So I have create a service account and bind it with user "kubernetes-worker_3".
after binding curl to api-server is working now.

One simple question, can anyone tell me who to restart Kubelet in CDK's way?

```
root@Bubnicki:~# curl -v https://172.29.100.185:6443/api/v1/nodes --cert /root/cdk/server.crt --key /root/cdk/server.key

{
  "kind": "NodeList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/nodes",
    "resourceVersion": "4518487"
  },
...
```

Revision history for this message
yen (antigenius0910) wrote :

Don't know what are the parameters I need to use in order to start Kubelet therefore I reboot the worker node.

And turn out they both come back to Ready after reboot. Don't know how to duplicate this issue so far.

```
NAME STATUS ROLES AGE VERSION
bubnicki Ready <none> 20d v1.13.3
karpinski Ready <none> 32d v1.13.3
```

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

To restart kubelet on all workers:

juju run --application kubernetes-worker -- sudo systemctl restart snap.kubelet.daemon.service

Revision history for this message
yen (antigenius0910) wrote :

thank you @Tim

I already tear down the cluster but I will give it a try if I see the problem again. Thank you so much for your help!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.