dns-provider=kube-dns fails in K8s 1.20

Bug #1921436 reported by Nikolay Vinogradov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
CDK Addons
Triaged
Medium
Unassigned
Charmed Kubernetes Testing
Fix Released
Medium
Mateo Florido
Kubernetes Control Plane Charm
Fix Released
Medium
Mateo Florido

Bug Description

Hi team,

test_dns_provider never stops on K8s 1.20 as the charm fails to enable kube-dns and the test waits for coredns pod to be removed:

============================= test session starts ==============================
platform linux -- Python 3.8.5, pytest-6.0.2, py-1.9.0, pluggy-0.13.1 -- /home/ubuntu/k8s-validation/.tox/py3/bin/python
cachedir: .tox/py3/.pytest_cache
metadata: {'Python': '3.8.5', 'Platform': 'Linux-5.4.0-51-generic-x86_64-with-glibc2.29', 'Packages': {'pytest': '6.0.2', 'py': '1.9.0', 'pluggy': '0.13.1'}, 'Plugins': {'flaky': '3.7.0', 'metadata': '1.10.0', 'asyncio': '0.14.0', 'html': '2.1.1'}}
rootdir: /home/ubuntu/k8s-validation, configfile: pytest.ini
plugins: flaky-3.7.0, metadata-1.10.0, asyncio-0.14.0, html-2.1.1
collecting ... collected 1 item

jobs/integration/validation.py::test_dns_provider
-------------------------------- live log setup --------------------------------
unknown facade CAASModelOperator
unexpected facade CAASModelOperator found, unable to decipher version to use
unknown delta type: id
Waiting for coredns pods to be removed

2021-03-25-18:42:17 root ERROR [localhost] STDERR follows:
None
Traceback (most recent call last):
  File "/usr/local/bin/fce", line 11, in <module>
    load_entry_point('foundationcloudengine', 'console_scripts', 'fce')()
  File "/home/ubuntu/deployment/cpe-foundation/foundationcloudengine/foundationcloudengine/main.py", line 168, in entry_point
    sys.exit(main(sys.argv[1:]))
  File "/home/ubuntu/deployment/cpe-foundation/foundationcloudengine/foundationcloudengine/main.py", line 159, in main
    opts.func(opts)
  File "/home/ubuntu/deployment/cpe-foundation/foundationcloudengine/foundationcloudengine/validate.py", line 41, in validate_main
    layer.validate(tests=args.validators)
  File "/home/ubuntu/deployment/cpe-foundation/foundationcloudengine/foundationcloudengine/layers/baselayer.py", line 200, in validate
    validator.run()
  File "/home/ubuntu/deployment/cpe-foundation/foundationcloudengine/foundationcloudengine/layers/baselayer.py", line 430, in run
    self.run_inner()
  File "/home/ubuntu/deployment/cpe-foundation/foundationcloudengine/foundationcloudengine/layers/kubernetes.py", line 205, in run_inner
    self.run_tests(controller, model)
  File "/home/ubuntu/deployment/cpe-foundation/foundationcloudengine/foundationcloudengine/layers/kubernetes.py", line 234, in run_tests
    local(cmd, output_mode="live", env=my_env)
  File "/home/ubuntu/deployment/cpe-foundation/foundationcloudengine/foundationcloudengine/remotehelpers.py", line 209, in local
    return run_cmd(cmd, target_machine, **kwargs)
  File "/home/ubuntu/deployment/cpe-foundation/foundationcloudengine/foundationcloudengine/remotehelpers.py", line 147, in run_cmd
    raise subprocess.CalledProcessError(ps.returncode, cmd, output=out, stderr=err)

Looking in the kubernetes-master's Juju logs we see this:

2021-03-25 19:29:39 INFO juju-log Invoking reactive handler: reactive/kubernetes_master.py:1424:configure_cdk_addons
2021-03-25 19:30:03 WARNING update-status Resource: "/v1, Resource=services", GroupVersionKind: "/v1, Kind=Service"
2021-03-25 19:30:45 WARNING update-status Name: "kube-dns", Namespace: "kube-system"
2021-03-25 19:30:49 INFO juju-log Checking if snap.kube-apiserver.daemon is active (0 / 6)
2021-03-25 19:34:52 WARNING update-status
2021-03-25 19:35:35 WARNING update-status cmd ['/snap/cdk-addons/5273/kubectl', '--kubeconfig', '/root/cdk/cdk_addons_kubectl_config', 'apply', '-f', '/root/snap/cdk-addons/5273/addons/kube2021-03-25 19:35:56 WARNING update-status Resource: "/v1, Resource=services", GroupVersionKind: "/v1, Kind=Service"
2021-03-25 19:35:56 WARNING update-status Name: "kube-dns", Namespace: "kube-system"
2021-03-25 19:35:56 WARNING update-status for: "/root/snap/cdk-addons/5273/addons/kube-dns.yaml": Service "kube-dns" is invalid: spec.clusterIPs[0]: Invalid value: "dns_server": must be a valid IP address, (e.g. 10.9.8.7 or 2001:db8::ffff)
2021-03-25 19:35:56 WARNING update-status Error from server (BadRequest): error when creating "/root/snap/cdk-addons/5273/addons/kube-dns.yaml": Deployment in version "v1" cannot be handled as a Deployment: v1.Deployment.Spec: v1.DeploymentSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Requests: Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ory_limit"},"request|..., bigger context ...|"resources":{"limits":{"memory":"dns_memory_limit"},"requests":{"cpu":"100m","memory":"70Mi"}},"secu|...
2021-03-25 19:35:56 INFO juju-log Addons are not ready yet.

Please fix.

Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :

I could reproduce that DNS issue just by 'juju config kubernetes-master dns-provider=kube-dns' and the coredns pods will keep running after that. also in juju status:

kubernetes-master/0* waiting idle 9 10.254.9.237 6443/tcp Waiting to retry addon deployment

and in it's logs I see the errors mentioned in the bug description.

George Kraft (cynerva)
summary: - test_dns_provider fails to switch to kube-dns in K8s 1.20
+ dns-provider=kube-dns fails in K8s 1.20
Revision history for this message
George Kraft (cynerva) wrote :

This is broken in cdk-addons 1.20 due to an upstream change in the kube-dns template[1]. We need to either remove support for kube-dns, or update cdk-addons to handle the new template properly.

We didn't catch this ourselves because the latest version of test_dns_provider passes when it should not. You're running an older version of the test that rightly hangs/fails when kube-dns does not deploy properly. A more recent change broke the waits after switching to kube-dns[2]. The CoreDNS pods have the k8s-app=kube-dns label, and no other labels, so the test incorrectly recognizes them as kube-dns pods.

However we decide to fix this in cdk-addons/kubernetes-master, we will also need to fix test_dns_provider to handle kube-dns appropriately.

[1]: https://github.com/kubernetes/kubernetes/pull/93836/files#diff-f85dd769196d98806fedc6e257d743cd229840dac33527d55eea23634baea84c
[2]: https://github.com/charmed-kubernetes/jenkins/blob/1776749f449a242ae1fe1d8d926f4793d5fee120/jobs/integration/validation.py#L1756-L1757

Changed in cdk-addons:
importance: Undecided → Medium
Changed in charmed-kubernetes-testing:
importance: Undecided → Medium
Changed in charm-kubernetes-master:
importance: Undecided → Medium
Changed in cdk-addons:
status: New → Triaged
Changed in charmed-kubernetes-testing:
status: New → Triaged
Changed in charm-kubernetes-master:
status: New → Triaged
Changed in charm-kubernetes-master:
status: Triaged → In Progress
assignee: nobody → Mateo Florido (mateoflorido)
milestone: none → 1.26
Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charmed-kubernetes-testing:
status: Triaged → Fix Committed
Changed in charm-kubernetes-master:
status: In Progress → Fix Committed
Changed in charmed-kubernetes-testing:
milestone: none → 1.26
assignee: nobody → Mateo Florido (mateoflorido)
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :
Adam Dyess (addyess)
Changed in charmed-kubernetes-testing:
status: Fix Committed → Fix Released
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.