Domain name should be checked for duplicate against maas_internal_domain

Bug #1960571 reported by Tessa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Jack Lloyd-Walters

Bug Description

When creating a domain or changing the name for an existing one, MAAS should include the maas_internal_domain when checking for duplicates.

Original bug description:
-------------------------

this feels related to this old bug: https://bugs.launchpad.net/maas/+bug/1683047

however, in this case, maas is just creating two references to my base named zone for my hosts:

```
$ head /var/snap/maas/current/bind/named.conf.maas

include "/var/snap/maas/18199/bind/named.conf.rndc.maas";

# Authoritative Zone declarations.
zone "internal.domain" {
    type master;
    file "/var/snap/maas/18199/bind/zone.internal.domain";
};
zone "internal.domain" {
    type master;
    file "/var/snap/maas/18199/bind/zone.internal.domain";
```

this of course causes bind9 not to start, and go into a tailspin constantly restarting itself. if I manually edit the config, then it'll start, but after a little while maas will overwrite it and the problem begins again. and unlike the old bug, it isn't caused by overlapping entries, there's only one zone defined in the dns config part of the interface, so I can't really figure out why it's injecting it twice into the bind config.

Related branches

Revision history for this message
Björn Tillenius (bjornt) wrote :

Hi Tessa,

right, this looks like a different issue, since no subnets are involved here.

Could you please provide the output of 'maas $profile domains read', as well as the MAAS logs (regiond.log, rackd.log and maas.log)?

Changed in maas:
status: New → Incomplete
Revision history for this message
Tessa (unit3) wrote :
Download full text (5.7 KiB)

maas $profile domains read:
```
Success.
Machine-readable output follows:
[
    {
        "authoritative": true,
        "ttl": null,
        "is_default": true,
        "name": "internal.domain",
        "id": 0,
        "resource_record_count": 10,
        "resource_uri": "/MAAS/api/2.0/domains/0/"
    }
]

```

I'm not going to include all of the region/rack/maas logs, as they contain sensitive info about our systems. but here's some excerpts that deal with the issue:

`named.log`:
```
14-Feb-2022 23:20:16.932 found 32 CPUs, using 32 worker threads
14-Feb-2022 23:20:16.932 using 32 UDP listeners per interface
14-Feb-2022 23:20:17.066 using up to 524288 sockets
14-Feb-2022 23:20:17.070 loading configuration from '/var/snap/maas/18199/bind/named.conf'
14-Feb-2022 23:20:17.070 /var/snap/maas/current/bind/named.conf.maas:8: zone 'internal.domain': already exists previous definition: /var/snap/maas/current/bind/named.conf.maas:4
14-Feb-2022 23:20:17.073 loading configuration: failure
14-Feb-2022 23:20:17.073 exiting (due to fatal error)
```

`regiond.log`:
```
2022-02-14 23:21:15 maasserver.region_controller: [critical] Failed configuring DNS; killing and restarting
        Traceback (most recent call last):
          File "/snap/maas/18199/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
            self._runCallbacks()
          File "/snap/maas/18199/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/snap/maas/18199/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
            return _cancellableInlineCallbacks(gen)
          File "/snap/maas/18199/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
            _inlineCallbacks(None, g, status)
        --- <exception caught here> ---
          File "/snap/maas/18199/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
            result = g.send(result)
          File "/snap/maas/18199/lib/python3.8/site-packages/maasserver/region_controller.py", line 222, in _checkSerial
            raise DNSReloadError(
        maasserver.region_controller.DNSReloadError: Failed to reload DNS; timeout or rdnc command failed.

2022-02-14 23:21:16 maasserver.region_controller: [critical] Failed to kill and restart DNS.
        Traceback (most recent call last):
          File "/snap/maas/18199/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in errback
            self._startRunCallbacks(fail)
          File "/snap/maas/18199/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
            self._runCallbacks()
          File "/snap/maas/18199/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/snap/maas/18199/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1475, in gotResult
            _inlineCallbacks(r, g, status)
        --- <exception ...

Read more...

Revision history for this message
Tessa (unit3) wrote :

oh yuck I keep forgetting that lauchpad still doesn't support basic markdown. well, I hope that's readable.

Revision history for this message
Tessa (unit3) wrote :

any thoughts on this? I can't build boxes with maas right now, because the boxes get no DNS, and so can't install initial packages when commissioning or installing.

Alberto Donato (ack)
Changed in maas:
milestone: none → 3.2
status: Incomplete → Triaged
importance: Undecided → High
milestone: 3.2 → none
status: Triaged → New
importance: High → Undecided
no longer affects: maas/3.1
Revision history for this message
Alberto Donato (ack) wrote (last edit ):

From the pasted domain reads output, it seems you set the domain name to "internal.domain"

Did you by chance also set the MAAS internal domain to the same value?

You can check it with:
  maas $profile maas get-config name=maas_internal_domain

(and change it with set-config)

If both have the same name, they'll cause two entries to be created.

Changed in maas:
status: New → Incomplete
Revision history for this message
Tessa (unit3) wrote :

hey Alberto,

we didn't create an extra zone, we just set the maas internal domain name, and then edited the DNS zone in the DNS settings. under the DNS settings, there's only a single zone listed. so I'm still confused as to how there's duplicates in the BIND config, or how we should be managing this to avoid that situation.

note that in our previous maas deployment, which is still running v3.0, we had these separated by accident. the internal domain in the UI is just listed as "maas", and can't be deleted, but when I use the cli query you listed, it shows as something different entirely, which doesn't even appear in the DNS settings. so I'm not sure what to make of that at all. why does the DNS panel not match what that cli query returns?

in fact, searching the latest docs on maas.io, I don't even see any references to the internal domain, its use and management seem fully obscured. it looks like there was a reference to it back in the v2.9 docs, but the page those search results link to (https://maas.io/docs/snap/2.9/ui/maas-communication) 404s, and there doesn't seem to be a comparable section in the 3.1 docs. No clue why the site search returns results for pages that don't exist, but it may be a factor in why this info is so difficult to track down.

what I'd love to see is some docs explaining what the internal domain is used for, and how to manage a maas setup where you just have a single domain for all your devices.

Revision history for this message
Alberto Donato (ack) wrote :

Hi Tessa,

The first two entries from the bind config file are:
 1) the default domain, the one you see when you run `maas $profile domains read`, which defaults to "maas" but in your deployment has been changed to "internal.domain", which is perfectly valid
 2) the internal maas domain, which defaults to "maas-internal" but in your config has also been changed to "internal.domain". That's the one I was referring to from the maas_internal_domain config.

The problem is that you can't have them both set to the same domain, as they're diffrent zones.

For clarity, this is what the default bind config looks like:

# Authoritative Zone declarations.
zone "maas" {
    type master;
    file "/var/snap/maas/x1/bind/zone.maas";
};
zone "maas-internal" {
    type master;
    file "/var/snap/maas/x1/bind/zone.maas-internal";
};

So, changing the name of either the default domain from the UI/API, or the internal one from the CLI config should fix your issue.

Alberto Donato (ack)
summary: - maas 3.1 creates duplicate named entries for primary maas zone
+ Domain name should be checked for duplicate against
+ maas_internal_domain
description: updated
Changed in maas:
status: Incomplete → Triaged
importance: Undecided → Medium
milestone: none → next
Revision history for this message
Tessa (unit3) wrote :

Yep, that seems to do it. Thanks Alberto!

Changed in maas:
milestone: next → 3.3.0
Changed in maas:
assignee: nobody → Jack Lloyd-Walters (lloydwaltersj)
Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 3.3.0 → 3.3.0-beta1
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.