Uncaught exception when configuring DNS

Bug #1997191 reported by Alexsander de Souza
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Christian Grabowski

Bug Description

2022-11-20 15:36:59 maasserver.region_controller: [critical] Failed configuring DNS; killing and restarting
    Traceback (most recent call last):
      File "/snap/maas/24982/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 661, in callback
        self._startRunCallbacks(result)
      File "/snap/maas/24982/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
        self._runCallbacks()
      File "/snap/maas/24982/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
        current.result = callback( # type: ignore[misc]
      File "/snap/maas/24982/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1750, in gotResult
        current_context.run(_inlineCallbacks, r, gen, status)
    --- <exception caught here> ---
      File "/snap/maas/24982/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
        result = current_context.run(gen.send, result)
      File "/snap/maas/24982/lib/python3.10/site-packages/maasserver/region_controller.py", line 299, in _checkSerial
        raise DNSReloadError(
    maasserver.region_controller.DNSReloadError: Failed to reload DNS; serial mismatch on domains maas

2022-11-20 15:37:01 maasserver.region_controller: [critical] Failed configuring DNS.
    Traceback (most recent call last):
    --- <exception caught here> ---
      File "/snap/maas/24982/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
        current.result = callback( # type: ignore[misc]
      File "/snap/maas/24982/lib/python3.10/site-packages/maasserver/region_controller.py", line 342, in _onDNSReloadFailure
        failure.trap(DNSReloadError)
      File "/snap/maas/24982/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
        self.raiseException()
      File "/snap/maas/24982/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
        raise self.value.with_traceback(self.tb)
      File "/snap/maas/24982/lib/python3.10/site-packages/maasserver/region_controller.py", line 284, in _checkSerial
        answers, _, _ = yield self.dnsResolver.lookupAuthority(
    twisted.internet.defer.TimeoutError: [Query(b'maas', 6, 1)]

Related branches

summary: - Uncatched exception when configuring DNS
+ Uncaught exception when configuring DNS
Changed in maas:
assignee: nobody → Christian Grabowski (cgrabowski)
Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 3.3.0 → 3.3.0-beta3
status: Fix Committed → Fix Released
Revision history for this message
David A. Desrosiers (setuid) wrote :
Download full text (5.5 KiB)

I can confirm that this is still an issue on 3.5-beta, 3.4 and 3.3, all from PPA (not snap) on Jammy. I haven't rebuild this machine to test 3.3 and earlier on Focal or Bionic.

I see about 7,000 of these per-hour in the logs. MAAS appears to be working fine and can commission and deploy machines, but it does fail to configure DNS for some reason.

Errors are as follows:

2024-03-06 05:28:08 maasserver.region_controller: [critical] Failed configuring DNS.
 Traceback (most recent call last):
 --- <exception caught here> ---
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
     current.result = callback( # type: ignore[misc]
   File "/usr/lib/python3/dist-packages/maasserver/region_controller.py", line 403, in _onDNSReloadFailure
     failure.trap(DNSReloadError)
   File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
     self.raiseException()
   File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
     raise self.value.with_traceback(self.tb)
   File "/usr/lib/python3/dist-packages/maasserver/region_controller.py", line 345, in _checkSerial
     answers, _, _ = yield self.dnsResolver.lookupAuthority(
 twisted.internet.defer.TimeoutError: [Query(b'maas', 6, 1)]

It repeats 2 times every second or so, indefinitely.

I do have to run an altered bind config because the one MAAS deploys is incorrect. Mine looks like the following. I have to copy this in after initially installing maas, and before the first restart of those services so the configuration is used:

options { directory "/var/cache/bind";
listen-on-v6 { none; };
listen-on { !192.168.120.1; 192.168.120.0/22; };
include "/etc/bind/maas/named.conf.options.inside.maas"; };

I tried firewalling the link-layer scan, under the assumption that MAAS sees 192.168.120.1 (the MAAS IP, and also the network of the virsh network that lives on the host), and see and overlapp with the MAAS IP and the managed subnet (192.168.120.0/22), but that didn't stop the flood of these messages.

My topology is as follows:

1. Baremetal host on LAN IP 192.168.4.20, onto which I install maas 3.x from PPA (not snaps).

2. On that host, there is a _pre-existing_ KVM/virsh network, using virbr0 which forwards to bond0 on the baremetal host. There are several KVM machines, pre-created on the host that MAAS will commission and manage. There are virsh autostart KVMs that come up when the machine is booted.

3. The same baremetal host also has LXD deployed on it that also uses virbr0 to reach the LAN. There are autostart-enabled containers that come up when the machine is booted, including the squid proxy that EVERYTHING on this host talks to.

MAAS itself, is configured with a 192.168.120.1 IP address from the existing virsh network. The rest of the /22 is for MAAS commissioned nodes, LXD containers and Juju-deployed machines and units that consume that IP space.

This all worked perfectly, and has since MAAS 2.4, until very recent 3.4 builds and now 3.5, which suffers from the same issue (among additional others).

It's not clear why this is spewing these errors all the time now, whether a rec...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.