MaaS DNS stops resolving

Bug #1396667 reported by John George
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MAAS
Won't Fix
High
Unassigned

Bug Description

+++-==============-============-============-=================================
ii maas 1.5.4+bzr229 all MAAS server all-in-one metapackag
ii maas-cli 1.5.4+bzr229 all MAAS command line API tool
ii maas-cluster-c 1.5.4+bzr229 all MAAS server cluster controller
ii maas-common 1.5.4+bzr229 all MAAS server common files
ii maas-dhcp 1.5.4+bzr229 all MAAS DHCP server
ii maas-dns 1.5.4+bzr229 all MAAS DNS server
ii maas-region-co 1.5.4+bzr229 all MAAS server complete region contr
ii maas-region-co 1.5.4+bzr229 all MAAS Server minimum region contro
ii python-django- 1.5.4+bzr229 all MAAS server Django web framework
ii python-maas-cl 1.5.4+bzr229 all MAAS python API client
ii python-maas-pr 1.5.4+bzr229 all MAAS server provisioning librarie

While running tests, with Juju 1.20.13 that bring Juju environments up and down repeatedly, Juju starts failing to bootstrap after several iterations with "Could not resolve hostname".

DNS starts resolving again after setting DHCP and DNS to "Unmanaged" and then back to "Manage DHCP and DNS". I noted the following differences in /etc/bind/maas:

zone.30.0.10.in-addr.arpa
1c1
< ; Zone file modified: 2014-11-26 04:48:31.861169.
---
> ; Zone file modified: 2014-11-26 06:13:33.521922.
7c7
< 0000001937 ; serial
---
> 0000001946 ; serial
zone.maas
1c1
< ; Zone file modified: 2014-11-26 04:48:31.804047.
---
> ; Zone file modified: 2014-11-26 06:13:33.439643.
7c7
< 0000001937 ; serial
---
> 0000001946 ; serial

Even after the "Unmanaged" to "Manage DHCP and DNS" change the next test run hits the same issue:

juju --show-log bootstrap -e maas-kvm-trusty-new --constraints mem=2G arch=amd64
2014-11-26 13:48:03 INFO juju.cmd supercommand.go:37 running juju [1.20.13-trusty-amd64 gc]
2014-11-26 13:48:08 INFO juju.environs.bootstrap bootstrap.go:48 bootstrapping environment "maas-kvm-trusty-new"
2014-11-26 13:48:08 INFO juju.environs.tools tools.go:87 reading tools with major.minor version 1.20
2014-11-26 13:48:08 INFO juju.environs.tools tools.go:95 filtering tools by version: 1.20.13
2014-11-26 13:48:08 INFO juju.environs.tools tools.go:98 filtering tools by series: trusty
2014-11-26 13:48:08 INFO juju.environs.tools tools.go:101 filtering tools by architecture: amd64
2014-11-26 13:48:08 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-11-26 13:48:08 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-11-26 13:48:08 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-11-26 13:48:08 INFO juju.utils http.go:59 hostname SSL verification enabled
2014-11-26 13:48:08 INFO juju.environs.bootstrap bootstrap.go:60 newest version: 1.20.13
2014-11-26 13:48:08 INFO juju.environs.bootstrap bootstrap.go:88 picked bootstrap tools version: 1.20.13
Launching instance
2014-11-26 13:48:10 WARNING juju.provider.maas environ.go:441 picked arbitrary tools &{1.20.13-trusty-amd64 https://swift.canonistack.canonical.com/v1/AUTH_526ad877f3e3464589dc1145dfeaac60/juju-dist/proposed/tools/releases/juju-1.20.13-trusty-amd64.tgz 5c15458309840997eb973eccb003dc0fcea7581fa456acbee7a2f600b8f679ed 8118601}
 - /MAAS/api/1.0/nodes/node-0990f14a-7434-11e4-b72f-525400c43ce5/
Waiting for address
Attempting to connect to juju-qa-maas-node-23.maas:22
Attempting to connect to juju-qa-maas-node-23.maas:22
2014-11-26 14:28:11 ERROR juju.provider.common bootstrap.go:136 bootstrap failed: waited for 40m0s without being able to connect: ssh: Could not resolve hostname juju-qa-maas-node-23.maas: Name or service not known

The full run log is attached.

Note in the log that early on successful bootstraps report the IP they are attempting to connect to, for example:
    Waiting for address
    Attempting to connect to juju-qa-maas-node-1.maas:22
    Attempting to connect to juju-qa-maas-node-1.maas:22
    Attempting to connect to 10.0.30.152:22

Tags: sts juju
Revision history for this message
John George (jog) wrote :
Revision history for this message
John George (jog) wrote :
Revision history for this message
Christian Reis (kiko) wrote :

<jog> kiko, I don't see the DNS issue on 1.7.0

Changed in maas:
status: New → Triaged
importance: Undecided → High
Revision history for this message
John George (jog) wrote :

I do not see this issue with 1.7.0+bzr3299-0u

Revision history for this message
Andres Rodriguez (andreserl) wrote :

This issue are not in 1.7.0.

John George (jog)
tags: added: juju
Christian Reis (kiko)
Changed in maas:
milestone: none → 1.5.5
Revision history for this message
Leonardo Borda (lborda) wrote :

We have seen this issue on another deployed site.

Leo

tags: added: cts
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

We are experiencing a similar issue on MAAS 1.5.4. We tried to workaround this by setting from "Unmanaged" to "Manage DHCP and DNS". Still no dns resolution.

We validated that the background celery task 'write_dns_zone_config' is being called correctly, and the NodeGroup information is correct, we also forced the execution by running:

from maasserver.dns import change_dns_zones
from maasserver.models import NodeGroup

change_dns_zones(NodeGroup.objects.all())

Please also consider as a additional detail that before 'destroy-environment' we have been deploying a lot of LXC containers. This could be related to the problem??

We also found that /etc/bind/maas/zone.maas has no CNAME entries written. We ran this hand-made script:

from maasserver.models import NodeGroup

for x in NodeGroup.objects.all():
    print "NodeGroup", x.id, x.created, x.updated, x.manages_dns(), x.get_managed_interfaces()
    print "Leases", x.dhcplease_set.all()
    for y in x.node_set.all():
        print y, y.ip_addresses(), y.macaddress_set.all()

Here is the output: https://pastebin.canonical.com/124579/, We noticed that no leases are available for the nodes.
Could a dynamic ip address range exhaustion be related to the problem?

As a workaround we edited /etc/maas/templates/dns/zone.template manually adding the CNAME entries,
and re-triggered the write_dns_zone_config task, now resolution is working as expected and we can bootstrap from juju without issues.

Revision history for this message
Matt Rae (mattrae) wrote :

Noting that in maas 1.5 dns records appear to get dropped when the dhcp leases are exhaused in the dhcp range. This happens when deploying lxc containers to nodes with juju. We believe that 1.7 fixes this issue because of this bug https://bugs.launchpad.net/maas/+bug/1314267

tags: added: sts
removed: cts
Revision history for this message
Andres Toomsalu (andres-active) wrote :

Experiencing same symptoms (MAAS DNS resolving not working) with MAAS 1.7.6:

manager@maas:~$ dpkg -l maas*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=======================================-========================-========================-===================================================================================
ii maas 1.7.6+bzr3376-0ubuntu2~1 all MAAS server all-in-one metapackage
ii maas-cli 1.7.6+bzr3376-0ubuntu2~1 all MAAS command line API tool
ii maas-cluster-controller 1.7.6+bzr3376-0ubuntu2~1 all MAAS server cluster controller
ii maas-common 1.7.6+bzr3376-0ubuntu2~1 all MAAS server common files
ii maas-dhcp 1.7.6+bzr3376-0ubuntu2~1 all MAAS DHCP server
ii maas-dns 1.7.6+bzr3376-0ubuntu2~1 all MAAS DNS server
ii maas-proxy 1.7.6+bzr3376-0ubuntu2~1 all MAAS Caching Proxy
ii maas-region-controller 1.7.6+bzr3376-0ubuntu2~1 all MAAS server complete region controller
ii maas-region-controller-min 1.7.6+bzr3376-0ubuntu2~1 all MAAS Server minimum region controller

Switching interface from "Managed" to "Unmanaged" and back to "Managed DNS & DHCP" did fix resoving for a moment - until running juju bootstrap again. No resolver at this point again.

Revision history for this message
Thiago Martins (martinx) wrote :

I am running:

maas 2.0.0+bzr5189-0ubuntu1~16.04.1

My MaaS Nodes does not resolve names, "ping google.com" fails in my nodes.

My nodes /etc/resolv.conf points to maas IP.

My nodes can "ping 8.8.8.8" and maas.

Tried to upgrade / reboot the maas, didn't fixed it.

Revision history for this message
Thiago Martins (martinx) wrote :

Found the problem!

At the file:

/etc/bind/maas/named.conf.options.inside.maas I have:

---
forwarders {
  10.0.0.1
};
---

However, that 10.0.0.1 is blocking requests from MaaS, which affects near to everything inside of maas itself!

Revision history for this message
Andres Rodriguez (andreserl) wrote :

We believe this issue is no longer present in the latest versions of MAAS (2.x). AS such, I'm marking this as a wont fix as it was originally reported against 1.7.

Changed in maas:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.