We're unsure this is directly related to test_dns_provider itself. It seems to be an issue with kube-apiserver getting into a funky state, but not sure how to narrow it down from there.
var/log/syslog:Jun 19 04:17:07 ip-172-31-42-92 kube-apiserver.daemon[26472]: logging error output: "{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"no endpoints available for service \\\"kube-state-metrics\\\"\",\"reason\":\"ServiceUnavailable\",\"code\":503}\n"
---
The metrics server seems to have some issues. These may just be due to kube-apiserver being unable to reach it?
pod-logs/kube-system-metrics-server-v0.3.6-74c87686d-v4s4j-metrics-server-nanny.log:ERROR: logging before flag.Parse: I0619 04:55:53.602595 1 nanny_lib.go:108] Resources are not within the expected limits, updating the deployment. Actual: {Limits:map[] Requests:map[]} Expected: {Limits:map[cpu:{i:{value:0 scale:0} d:{Dec:0xc420407ce0} s: Format:DecimalSI} memory:{i:{value:0 scale:0} d:{Dec:0xc420407e90} s: Format:BinarySI}] Requests:map[cpu:{i:{value:0 scale:0} d:{Dec:0xc420407ce0} s: Format:DecimalSI} memory:{i:{value:0 scale:0} d:{Dec:0xc420407e90} s: Format:BinarySI}]}
---
We're also seeing nginx errors like this on the kubeapi-load-balancer unit.
var/log/nginx.error.log:2020/06/19 04:40:40 [error] 20694#20694: *1351 no live upstreams while connecting to upstream, client: 3.87.109.158, server: _, request: "GET /api/v1/endpoints?limit=500&resourceVersion=0 HTTP/1.1", upstream: "https://target_service/api/v1/endpoints?limit=500&resourceVersion=0", host: "3.81.51.24:443"
---
Here's a link to the artifacts from this run:
https://oil-jenkins.canonical.com/artifacts/aae83527-ae89-44e6-a57a-93a0974b1263/index.html
kube-state-metrics errors are a red herring.
test_dns_provider timed out after 15 minutes. The test waits for coredns pods to be removed, but it looks like a coredns pod is stuck terminating. The node that hosts the pod is in Unknown status.
Kubelet on that node is failing to communicate with kube-apiserver due to "use of closed network connection" errors:
E0619 04:57:09.410568 14620 server.go:269] Authorization error (user=system: kube-apiserver, verb=get, resource=nodes, subresource= metrics) %!(EXTRA *url.Error=Post https:/ /172.31. 39.158: 443/apis/ authorization. k8s.io/ v1/subjectacces sreviews: write tcp 172.31. 39.53:57246- >172.31. 39.158: 443: use of closed network connection) node_status. go:402] Error updating node status, will retry: error getting node "ip-172- 31-39-53. ec2.internal" : an error on the server ("") has prevented the request from succeeding (get nodes ip-172- 31-39-53. ec2.internal) node_status. go:389] Unable to update node status: update node status exceeds retry count
E0619 04:57:11.237701 14620 kubelet_
E0619 04:57:11.237742 14620 kubelet_
This is a known issue in Kubernetes/Golang:
https:/ /github. com/kubernetes/ kubernetes/ issues/ 87615 /github. com/golang/ go/issues/ 39750
https:/