Failed to reload DNS; serial mismatch on domains maas

Bug #2062107 reported by Alexsander de Souza
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Committed
High
Christian Grabowski
3.3
Fix Committed
High
Christian Grabowski
3.4
Fix Released
High
Christian Grabowski
3.5
Fix Committed
High
Christian Grabowski

Bug Description

MAAS 3.4.1 SNAP, about 130 physical machines enlisted.

After MAAS starts, everything works fine, but after a few commissioning operations it begin to fail to configure the DNS, as seem below:

2024-04-17 21:08:27 maasserver.region_controller: [critical] Failed configuring DNS; killing and restarting
 Traceback (most recent call last):
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 661, in callback
     self._startRunCallbacks(result)
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
     self._runCallbacks()
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
     current.result = callback( # type: ignore[misc]
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1750, in gotResult
     current_context.run(_inlineCallbacks, r, gen, status)
 --- <exception caught here> ---
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
     result = current_context.run(gen.send, result)
   File "/snap/maas/34087/lib/python3.10/site-packages/maasserver/region_controller.py", line 360, in _checkSerial
     raise DNSReloadError(
 maasserver.region_controller.DNSReloadError: Failed to reload DNS; serial mismatch on domains maas

the BIND logs:

17-Apr-2024 17:04:21.147 zone 23.1.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.23.1.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 23.1.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 52.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.147 zone 135.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.135.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 135.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 18.1.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.18.1.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 133.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 24.1.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 58.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.147 zone 61.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.147 zone 50.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.147 zone 2.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.2.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 8.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.8.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 8.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 63.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.147 zone 134.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.134.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 134.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 42.35.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.42.35.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 42.35.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 18.1.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 2.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 115.92.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.115.92.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 22.1.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.22.1.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 22.1.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 115.92.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 5.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.5.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 56.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.147 zone 14.1.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.14.1.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 14.1.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 5.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone maas/IN: loading from master file /var/snap/maas/34087/bind/zone.maas failed: file not found
17-Apr-2024 17:04:21.147 zone maas/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 6.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.6.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 6.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 60.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.147 zone 3.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.3.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 3.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 7.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.7.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 7.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 20.1.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.20.1.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 20.1.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 1.1.e.d.0.a.9.e.a.b.3.5.1.e.d.f.ip6.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.1.1.e.d.0.a.9.e.a.b.3.5.1.e.d.f.ip6.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 1.1.e.d.0.a.9.e.a.b.3.5.1.e.d.f.ip6.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone f.6.8.9.0.7.4.7.2.e.8.c.2.4.d.f.ip6.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.f.6.8.9.0.7.4.7.2.e.8.c.2.4.d.f.ip6.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone f.6.8.9.0.7.4.7.2.e.8.c.2.4.d.f.ip6.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 25.1.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.25.1.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 25.1.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 54.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.147 zone 48.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.147 zone 1.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.1.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 1.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 49.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.147 zone 129.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.129.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.147 zone 129.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.147 zone 51.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.151 zone 53.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.151 zone maas-internal/IN: loading from master file /var/snap/maas/34087/bind/zone.maas-internal failed: file not found
17-Apr-2024 17:04:21.151 zone maas-internal/IN: not loaded due to errors.
17-Apr-2024 17:04:21.151 zone 10.1.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.10.1.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.151 zone 10.1.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.151 zone 16.1.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.16.1.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.151 zone 16.1.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.151 zone 10.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.10.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.151 zone 10.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.151 zone 128.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.128.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.151 zone 128.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.151 zone 131.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.131.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.151 zone 131.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.151 zone 57.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.151 zone 59.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.151 zone 62.245.10.in-addr.arpa/IN: loaded serial 159655
17-Apr-2024 17:04:21.151 zone 9.245.10.in-addr.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.9.245.10.in-addr.arpa failed: file not found
17-Apr-2024 17:04:21.151 zone 9.245.10.in-addr.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.151 zone 9.1.a.7.6.5.2.9.2.8.0.2.2.4.d.f.ip6.arpa/IN: loading from master file /var/snap/maas/34087/bind/zone.9.1.a.7.6.5.2.9.2.8.0.2.2.4.d.f.ip6.arpa failed: file not found
17-Apr-2024 17:04:21.151 zone 9.1.a.7.6.5.2.9.2.8.0.2.2.4.d.f.ip6.arpa/IN: not loaded due to errors.
17-Apr-2024 17:04:21.151 all zones loaded
17-Apr-2024 17:04:21.151 running

The missing zones make all commissioning and deployment operations to fail, as DNS becomes unresponsive to cloud-init:

ubuntu@weavile:~$ dig 10-1-10-0--23.maas-internal @10.1.10.2

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> 10-1-10-0--23.maas-internal @10.1.10.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 57093
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: c8e431476c2b1e8601000000662060f159f3407f03fc48a6 (good)
;; QUESTION SECTION:
;10-1-10-0--23.maas-internal. IN A

;; Query time: 0 msec
;; SERVER: 10.1.10.2#53(10.1.10.2) (UDP)
;; WHEN: Wed Apr 17 19:53:21 EDT 2024
;; MSG SIZE rcvd: 84

Related branches

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

DNS response when operational (just after startup)

dig 10-1-10-0--23.maas-internal @10.1.10.2

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> 10-1-10-0--23.maas-internal @10.1.10.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64488
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 8a4c9ea673d3e745010000006620619cfa2c2e5fbac9f5ce (good)
;; QUESTION SECTION:
;10-1-10-0--23.maas-internal. IN A

;; ANSWER SECTION:
10-1-10-0--23.maas-internal. 15 IN A 10.1.10.2
10-1-10-0--23.maas-internal. 15 IN A 10.1.10.3

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

boot command line:

set default="0"
set timeout=0

menuentry 'Ephemeral' {
    echo 'Booting under MAAS direction...'
    linux (http,10.1.10.2:5248)/images/ubuntu/amd64/ga-22.04/jammy/candidate/boot-kernel nomodeset ro root=squash:http://10.1.10.2:5248/images/ubuntu/amd64/ga-22.04/jammy/candidate/squashfs ip=::::ficet:BOOTIF ip6=off overlayroot=tmpfs overlayroot_cfgdisk=disabled cc:\{'datasource_list': ['MAAS']\}end_cc cloud-config-url=http://10-1-10-0--23.maas-internal:5248/MAAS/metadata/latest/by-id/xnbkhw/?op=get_preseed log_host=10.1.10.2 log_port=5247 --- BOOTIF=01-${net_default_mac}
    initrd (http,10.1.10.2:5248)/images/ubuntu/amd64/ga-22.04/jammy/candidate/boot-initrd

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote (last edit ):

about the scenario:

- single MAAS region+rack controller
- ProLiant DL360 Gen9, 20 cores, 128 GB RAM
- ~130 machines
- 5 managed VLANs
- Postgresql 14 running in the same machine

At some times the DB experienced high loads, with over 400% of CPU utilization.

Changed in maas:
assignee: nobody → Christian Grabowski (cgrabowski)
Changed in maas:
status: Triaged → In Progress
Changed in maas:
milestone: none → 2.8.9
status: In Progress → Fix Committed
Changed in maas:
milestone: 2.8.9 → 3.6.0
Changed in maas:
importance: Critical → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.