Server reboot will make subnet entries disappear from zone.maas-internal

Bug #2001546 reported by Joao Andre Simioni
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Christian Grabowski
3.3
Fix Released
Undecided
Christian Grabowski

Bug Description

MAAS Version: 3.3.0~rc1-13127-g.e6737625f - SNAP
OS Version: Jammy

Problem:
--------

If the server is rebooted, MAAS will lose the subnet entries in /var/snap/maas/current/bind/zone.maas-internal

Before reboot:
; Zone file modified: 2023-01-03 16:41:58.535436.
$TTL 15
@ IN SOA maas-internal. nobody.example.com. (
              0000000022 ; serial
              600 ; Refresh
              1800 ; Retry
              604800 ; Expire
              15 ; NXTTL
              )

@ 15 IN NS maas.
192-168-45-0--24 15 IN A 192.168.45.15
192-168-122-0--24 15 IN A 192.168.122.127

After reboot:
; Zone file modified: 2023-01-03 16:45:25.257343.
$TTL 15
@ IN SOA maas-internal. nobody.example.com. (
              0000000023 ; serial
              600 ; Refresh
              1800 ; Retry
              604800 ; Expire
              15 ; NXTTL
              )

@ 15 IN NS maas.

Messages seen:
--------------

During enlisting / commissioning / deployment, the following messages are seen:

[ 66.898299] cloud-init[789]: 2023-01-03 16:55:43,848 - main.py[WARNING]: retrieving url 'http://192-168-45-0--24.maas-internal:5248/MAAS/metadata/latest/by-id/c6c8cp/?op=get_preseed' failed: HTTPConnectionPool(host='192-168-45-0--24.maas-internal', port=5248): Max retries exceeded with url: /MAAS/metadata/latest/by-id/c6c8cp/?op=get_preseed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0d126111c0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

[ 95.597697] cloud-init[1571]: Can not apply stage config, no datasource found! Likely bad things to come!
[ 95.597771] cloud-init[1571]: ------------------------------------------------------------
[ 95.597788] cloud-init[1571]: Traceback (most recent call last):
[ 95.597801] cloud-init[1571]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 582, in main_modules
[ 95.597814] cloud-init[1571]: init.fetch(existing="trust")
[ 95.597933] cloud-init[1571]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 434, in fetch
[ 95.598443] cloud-init[1571]: return self._get_data_source(existing=existing)
[ 95.598598] cloud-init[1571]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 325, in _get_data_source
[ 95.598783] cloud-init[1571]: (ds, dsname) = sources.find_source(
[ 95.598867] cloud-init[1571]: File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 957, in find_source
[ 95.598953] cloud-init[1571]: raise DataSourceNotFoundException(msg)
[ 95.599121] cloud-init[1571]: cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes: ()

Workaround:
-----------

Adding / Removing a subnet will force MAAS to regenerate the zone file, bringing back the entries.

Related branches

Revision history for this message
Alberto Donato (ack) wrote :

Could you please provide the exact steps that are causing the issue for you?

Does restarting maas cause the internal zone file to disappear?

Changed in maas:
status: New → Incomplete
Revision history for this message
Joao Andre Simioni (jasimioni) wrote :

Hi Alberto,

The zone file is not missing - only the subnet entries are missing in the file:

192-168-45-0--24 15 IN A 192.168.45.15
192-168-122-0--24 15 IN A 192.168.122.127

Restarting MAAS has inconsistent behavior. I've made some additional tests here, and the behavior when running "snap restart maas" is that the zone file is deleted and recreated without the subnet entries. Sometimes, after around 1 minute, the file gets recreated with the entries. It seems that some job will 'fix' the file after some time.

But the reboot always triggers it, and the file is fixed after I add/remove a subnet.

Below is my reproducer:

$ lxc launch ubuntu:jammy LP2001546 -c limits.memory=8GB -c limits.cpu=4 --vm
$ lxc exec LP2001546 bash

# cd /tmp
# apt update && apt install postgresql -y
# sudo -u postgres psql -c "create user maas with password 'maas';"
# sudo -u postgres psql -c "create database maas owner maas;"
# snap install maas --channel=3.3/candidate
# maas init region+rack --database-uri postgres://maas:maas@localhost
# maas createadmin --username admin --password admin --email admin@admin

# cat /var/snap/maas/current/bind/zone.maas-internal
; Zone file modified: 2023-01-04 13:35:01.268740.
$TTL 15
@ IN SOA maas-internal. nobody.example.com. (
              0000000006 ; serial
              600 ; Refresh
              1800 ; Retry
              604800 ; Expire
              15 ; NXTTL
              )

@ 15 IN NS maas.
10-203-24-0--24 15 IN A 10.203.24.76
fd42-287b-18ec-cc23----64 15 IN AAAA fd42:287b:18ec:cc23:216:3eff:feed:71ca

# reboot

$ lxc exec LP2001546 bash

# cat /var/snap/maas/current/bind/zone.maas-internal
; Zone file modified: 2023-01-04 13:36:48.607347.
$TTL 15
@ IN SOA maas-internal. nobody.example.com. (
              0000000009 ; serial
              600 ; Refresh
              1800 ; Retry
              604800 ; Expire
              15 ; NXTTL
              )

@ 15 IN NS maas.

Revision history for this message
Christian Grabowski (cgrabowski) wrote :

This does appear to be a bug, in that the zonefile should be rewritten and BIND should be reloaded upon restart. However, as of 3.3, we dynamically update records in BIND once an initial zonefile is written, these records are persisted in <zonefile>.jnl instead of the zonefile itself, so they won't appear in the file, but should respond to a dig and other DNS requests.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → High
assignee: nobody → Christian Grabowski (cgrabowski)
milestone: none → 3.3.0
status: Triaged → In Progress
Changed in maas:
milestone: 3.3.0 → 3.4.0
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Horst Renner (hore23) wrote :

I have a quite similar problem.

The difference is, that the entries in the zonefile disappear during enlisting a node. Somewhere around "50-maas-01-commisioning" the zonefile gets empty.

Afterwards the node fails with:

cloud-init[]: request to http://10-11-12-0--24.maas-internal:5248/MAAS/metadata/2012-03-01/ failed. (...) Name or service not known

Before the following correct record was there:

10-11-12-0--24 15 IN A 10.11.12.1

Additionally the network configuration of the controller gets broken, all devices and VLANs are gone in WebUI. A reboot of the controller node fixes this.

The controller has one network interface with 2 tagged VLANs.
Maas version is 3.3.0-13159-g.1c22f7beb

Any help appreciated

Alberto Donato (ack)
Changed in maas:
milestone: 3.4.0 → 3.4.0-beta1
Alberto Donato (ack)
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.