Comment 8 for bug 1915042

Revision history for this message
George Kraft (cynerva) wrote :

Thanks. It looks like the test is mostly failing in the same spot, after the CoreDNS charm is deployed.

I'm not able to confirm exactly what's happening, but the most likely culprit is that the test does not wait for the CoreDNS charm's pods to come up before trying to verify DNS. (Waiting for "active" status on the charm is not enough.)

I recommend the following fixes:
1. Replace validate-dns pod's image[1] with something that has host/nslookup/dig baked in (busybox maybe?) so we don't have to run `apt install` as a start command. That command creates a lot of race conditions - what if we kubectl exec into the pod before it's done?
2. Change restartPolicy[2] to Always.
3. Add retries to verify_dns_resolution and verify_no_dns_resolution[3].
4. After deploying CoreDNS, before verifying DNS[4], call wait_for_pods_ready.

[1]: https://github.com/charmed-kubernetes/jenkins/blob/9f180e2be0d209a6b82be93bda8f9623cd133bf8/jobs/integration/templates/validate-dns-spec.yaml#L9
[2]: https://github.com/charmed-kubernetes/jenkins/blob/9f180e2be0d209a6b82be93bda8f9623cd133bf8/jobs/integration/templates/validate-dns-spec.yaml#L12
[3]: https://github.com/charmed-kubernetes/jenkins/blob/9f180e2be0d209a6b82be93bda8f9623cd133bf8/jobs/integration/validation.py#L1709-L1724
[4]: https://github.com/charmed-kubernetes/jenkins/blob/9f180e2be0d209a6b82be93bda8f9623cd133bf8/jobs/integration/validation.py#L1811-L1812