cloud-compute-relation-changed dns.resolver.NoNameservers: All nameservers failed to answer the query

Bug #1765405 reported by Frode Nordahl on 2018-04-19
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack nova-cloud-controller charm
Low
Unassigned

Bug Description

I came across this situation when deploying nova-cloud-controller to a container that got it's DNS search domain incorrectly set up.

Could the charm have dealt with this differently, should it use fqdns instead for example?

2018-04-19 12:58:37 INFO juju-log cloud-compute:27: Listing cell, 'cell1'
2018-04-19 12:58:39 DEBUG cloud-compute-relation-changed Option "logdir" from group "DEFAULT" is deprecated. Use option "log-dir" from group "DEFAULT".
2018-04-19 12:58:41 DEBUG cloud-compute-relation-changed Option "logdir" from group "DEFAULT" is deprecated. Use option "log-dir" from group "DEFAULT".
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed Traceback (most recent call last):
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1150, in <module>
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed main()
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1144, in main
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed hooks.execute(sys.argv)
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/core/hookenv.py", line 800, in execute
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed self._hooks[hook_name]()
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 663, in compute_changed
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed ssh_compute_add(key, rid=rid, unit=unit)
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/nova_cc_utils.py", line 1001, in ssh_compute_add
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed if ns_query(short):
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 478, in ns_query
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed answers = dns.resolver.query(address, rtype)
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 1132, in query
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed raise_on_no_answer, source_port)
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 947, in query
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed raise NoNameservers(request=request, errors=errors)
2018-04-19 12:58:42 DEBUG cloud-compute-relation-changed dns.resolver.NoNameservers: All nameservers failed to answer the query clever-troll. IN A: Server 127.0.0.53 UDP port 53 answered SERVFAIL
2018-04-19 12:58:42 ERROR juju.worker.uniter.operation runhook.go:113 hook "cloud-compute-relation-changed" failed: exit status 1

Sean Feole (sfeole) wrote :

I hit this on arm64 hardware, queens+bionic,

2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed Traceback (most recent call last):
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1164, in <module>
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed main()
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1157, in main
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed hooks.execute(sys.argv)
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/core/hookenv.py", line 801, in execute
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed self._hooks[hook_name]()
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 666, in compute_changed
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed ssh_compute_add(key, rid=rid, unit=unit)
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/nova_cc_utils.py", line 1005, in ssh_compute_add
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed if ns_query(short):
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 478, in ns_query
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed answers = dns.resolver.query(address, rtype)
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 1132, in query
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed raise_on_no_answer, source_port)
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 947, in query
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed raise NoNameservers(request=request, errors=errors)
2018-05-09 14:08:41 DEBUG cloud-compute-relation-changed dns.resolver.NoNameservers: All nameservers failed to answer the query node-moingo. IN A: Server 127.0.0.53 UDP port 53 answered SERVFAIL

Changed in charm-nova-cloud-controller:
status: New → Triaged
importance: Undecided → Low
Andrew McLeod (admcleod) wrote :

I've just hit this with juju 2.4 beta 2

I would like to point out that the failure is on nova-cloud-controller - it is trying to resolve the hostname of the machine which nova-compute is on. In my case:

DEBUG cloud-compute-relation-changed dns.resolver.NoNameservers: All nameservers failed to answer the query node-jaeger. IN A: Server 127.0.0.53 UDP port 53 answered SERVFAIL

Either the ip rather than hostname should be passed over the relation, or there is a DNS misconfiguration

Andrew McLeod (admcleod) wrote :

I've checked an existing, working deployment, which also passes the hostname over the relation, so DNS configuration is the issue.

Andrew McLeod (admcleod) wrote :

In the existing deployment, the DNS config in the LXD container (nova-cloud-controller) looks as follows:

ubuntu@juju-d1a6e2-2-lxd-2:~$ cat /etc/resolv.conf │······························································
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8) │······························································
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN │······························································
nameserver 10.245.168.6 │······························································
search maas

in the 'broken' one, it is using a local resolver and therefore presumably doesnt forward queries to the MAAS dns server which would respond to the hostname of the nova-compute machine.

/etc/resolv.conf on broken instance only contains:
nameserver 127.0.0.53

---

host and ping both fail to resolve the hostname, but dig succeeds partially:

ubuntu@juju-6406ff-2-lxd-2:/etc$ host node-jaeger
Host node-jaeger not found: 2(SERVFAIL)
ubuntu@juju-6406ff-2-lxd-2:/etc$ ping node-jaeger
ping: node-jaeger: Temporary failure in name resolution
ubuntu@juju-6406ff-2-lxd-2:/etc$ dig node-jaeger

; <<>> DiG 9.11.3-1ubuntu1-Ubuntu <<>> node-jaeger
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13450
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;node-jaeger. IN A

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Thu May 17 19:49:05 UTC 2018
;; MSG SIZE rcvd: 40

dig node-jaeger.maas succeeds:

ubuntu@juju-6406ff-2-lxd-2:/etc$ dig node-jaeger.maas

; <<>> DiG 9.11.3-1ubuntu1-Ubuntu <<>> node-jaeger.maas
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58641
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;node-jaeger.maas. IN A

;; ANSWER SECTION:
node-jaeger.maas. 30 IN A 10.245.168.44

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Thu May 17 19:50:56 UTC 2018
;; MSG SIZE rcvd: 61

Andrew McLeod (admcleod) wrote :

This is a wider problem affecting juju in general:

https://bugs.launchpad.net/juju/+bug/1771885

David Ames (thedac) wrote :

See comment 14 of Bug#1771885 this appears to be a cloud-init bug.
https://bugs.launchpad.net/maas/+bug/1771885/comments/14

David Ames (thedac) wrote :

This is resolved [0]. See specifically [1]: remove the DNS settings from the default (PXE) network in MAAS.

[0] https://bugs.launchpad.net/maas/+bug/1771885
[1] https://bugs.launchpad.net/juju/+bug/1771885/comments/22

Changed in charm-nova-cloud-controller:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments