Comment 1 for bug 1997191

Revision history for this message
David A. Desrosiers (setuid) wrote :

I can confirm that this is still an issue on 3.5-beta, 3.4 and 3.3, all from PPA (not snap) on Jammy. I haven't rebuild this machine to test 3.3 and earlier on Focal or Bionic.

I see about 7,000 of these per-hour in the logs. MAAS appears to be working fine and can commission and deploy machines, but it does fail to configure DNS for some reason.

Errors are as follows:

2024-03-06 05:28:08 maasserver.region_controller: [critical] Failed configuring DNS.
 Traceback (most recent call last):
 --- <exception caught here> ---
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
     current.result = callback( # type: ignore[misc]
   File "/usr/lib/python3/dist-packages/maasserver/region_controller.py", line 403, in _onDNSReloadFailure
     failure.trap(DNSReloadError)
   File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
     self.raiseException()
   File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
     raise self.value.with_traceback(self.tb)
   File "/usr/lib/python3/dist-packages/maasserver/region_controller.py", line 345, in _checkSerial
     answers, _, _ = yield self.dnsResolver.lookupAuthority(
 twisted.internet.defer.TimeoutError: [Query(b'maas', 6, 1)]

It repeats 2 times every second or so, indefinitely.

I do have to run an altered bind config because the one MAAS deploys is incorrect. Mine looks like the following. I have to copy this in after initially installing maas, and before the first restart of those services so the configuration is used:

options { directory "/var/cache/bind";
listen-on-v6 { none; };
listen-on { !192.168.120.1; 192.168.120.0/22; };
include "/etc/bind/maas/named.conf.options.inside.maas"; };

I tried firewalling the link-layer scan, under the assumption that MAAS sees 192.168.120.1 (the MAAS IP, and also the network of the virsh network that lives on the host), and see and overlapp with the MAAS IP and the managed subnet (192.168.120.0/22), but that didn't stop the flood of these messages.

My topology is as follows:

1. Baremetal host on LAN IP 192.168.4.20, onto which I install maas 3.x from PPA (not snaps).

2. On that host, there is a _pre-existing_ KVM/virsh network, using virbr0 which forwards to bond0 on the baremetal host. There are several KVM machines, pre-created on the host that MAAS will commission and manage. There are virsh autostart KVMs that come up when the machine is booted.

3. The same baremetal host also has LXD deployed on it that also uses virbr0 to reach the LAN. There are autostart-enabled containers that come up when the machine is booted, including the squid proxy that EVERYTHING on this host talks to.

MAAS itself, is configured with a 192.168.120.1 IP address from the existing virsh network. The rest of the /22 is for MAAS commissioned nodes, LXD containers and Juju-deployed machines and units that consume that IP space.

This all worked perfectly, and has since MAAS 2.4, until very recent 3.4 builds and now 3.5, which suffers from the same issue (among additional others).

It's not clear why this is spewing these errors all the time now, whether a recent change or something else, but the physical and logical configuration of the server has not changed in years. It's locked down intentionally to result in 100% reproducible builds, and is fully scripted to remove/purge maas and reinstall it exactly the same way every single time.

In the last hour, there have been 1,989 of these DNS errors in the log, for a brand-new, clean install.

The defined networks are as follows:

[
    {
        "name": "subnet-1",
        "ip": "192.168.4.0",
        "netmask": "255.255.252.0",
        "vlan_tag": 0,
        "description": "192.168.4.0/22",
        "default_gateway": "192.168.4.1",
        "dns_servers": [],
        "resource_uri": "/MAAS/api/2.0/networks/subnet-1/"
    },
    {
        "name": "subnet-2",
        "ip": "192.168.120.0",
        "netmask": "255.255.252.0",
        "vlan_tag": 0,
        "description": "192.168.120.0/22",
        "default_gateway": "192.168.120.1",
        "dns_servers": [],
        "resource_uri": "/MAAS/api/2.0/networks/subnet-2/"
    }
]

That 'maas' subnet looks like:

    {
        "name": "192.168.120.0/22",
        "description": "",
        "vlan": {
            "vid": 0,
            "mtu": 1500,
            "dhcp_on": true,
            "external_dhcp": null,
            "relay_vlan": null,
            "secondary_rack": null,
            "id": 3,
            "space": "undefined",
            "primary_rack": "pbhwrf",
            "name": "untagged",
            "fabric_id": 2,
            "fabric": "fabric-2",
            "resource_uri": "/MAAS/api/2.0/vlans/3/"
        },
        "cidr": "192.168.120.0/22",
        "rdns_mode": 2,
        "gateway_ip": "192.168.120.1",
        "dns_servers": [],
        "allow_dns": true,
        "allow_proxy": true,
        "active_discovery": false,
        "managed": true,
        "disabled_boot_architectures": [],
        "id": 2,
        "space": "undefined",
        "resource_uri": "/MAAS/api/2.0/subnets/2/"
    }

In virsh, that network is defined as:

<network>
  <name>maas</name>
  <uuid>c1a6ebd8-b53f-4f10-81fc-6a3220b9c2e9</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:4c:d6:c3'/>
  <domain name='maas'/>
  <ip address='192.168.120.1' netmask='255.255.252.0'>
  </ip>
</network>

I can attach logs or any other data needed to help triage this.

Thanks!