Machines fail to commission using the 3.0 snap due to possible? DNS issue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Christian Grabowski | ||
3.0 |
Fix Released
|
Critical
|
Björn Tillenius |
Bug Description
As seen during this release test run: https:/
Machines are set to commission but never get recognized by maas:
2021-05-25-13:35:32 foundationcloud
2021-05-25-13:35:32 root DEBUG [localhost]: maas root machines read
2021-05-25-13:35:43 foundationcloud
...
...
2021-05-25-14:05:20 root DEBUG [localhost]: maas root machines read
2021-05-25-14:05:24 foundationcloud
Looking into the maas regiond logs the only thing that leaps out is the regiond complaining about failing to reload DNS shortly before the nodes are commissioned:
2021-05-25 13:33:43 maasserver.
Traceback (most recent call last):
File "/snap/
File "/snap/
File "/snap/
return _cancellableInl
File "/snap/
--- <exception caught here> ---
File "/snap/
result = g.send(result)
File "/snap/
raise DNSReloadError(
Then it goes into a loop of failing and restarting:
2021-05-25 13:34:13 maasserver.
Traceback (most recent call last):
File "/snap/
File "/snap/
File "/snap/
File "/snap/
--- <exception caught here> ---
File "/snap/
result = result.
File "/snap/
return g.throw(self.type, self.value, self.tb)
File "/snap/
state = yield self.ensureServ
File "/snap/
result = g.send(result)
File "/snap/
raise ServiceActionEr
2021-05-25 13:34:14 maasserver.
Traceback (most recent call last):
File "/snap/
File "/snap/
File "/snap/
return _cancellableInl
File "/snap/
--- <exception caught here> ---
File "/snap/
result = g.send(result)
File "/snap/
raise DNSReloadError(
This persists until the test run was killed due to the 30 minute commissioning timeout.
From named.log:
25-May-2021 14:07:03.124 starting BIND 9.16.1-Ubuntu (Stable Release) <id:d497c32>
25-May-2021 14:07:03.124 running on Linux x86_64 5.4.0-56-generic #62-Ubuntu SMP Mon Nov 23 19:20:19 UTC 2020
25-May-2021 14:07:03.124 built with '--build=
25-May-2021 14:07:03.124 running as: named -c /var/snap/
25-May-2021 14:07:03.124 compiled by GCC 9.3.0
25-May-2021 14:07:03.124 compiled with OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020
25-May-2021 14:07:03.124 linked to OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020
25-May-2021 14:07:03.124 compiled with libxml2 version: 2.9.10
25-May-2021 14:07:03.124 linked to libxml2 version: 20910
25-May-2021 14:07:03.124 compiled with json-c version: 0.13.1
25-May-2021 14:07:03.124 linked to json-c version: 0.13.1
25-May-2021 14:07:03.124 compiled with zlib version: 1.2.11
25-May-2021 14:07:03.124 linked to zlib version: 1.2.11
25-May-2021 14:07:03.124 -------
25-May-2021 14:07:03.124 BIND 9 is maintained by Internet Systems Consortium,
25-May-2021 14:07:03.124 Inc. (ISC), a non-profit 501(c)(3) public-benefit
25-May-2021 14:07:03.124 corporation. Support and training for BIND 9 are
25-May-2021 14:07:03.124 available at https:/
25-May-2021 14:07:03.124 -------
25-May-2021 14:07:03.124 found 20 CPUs, using 20 worker threads
25-May-2021 14:07:03.124 using 20 UDP listeners per interface
25-May-2021 14:07:03.136 using up to 21000 sockets
25-May-2021 14:07:03.140 loading configuration from '/var/snap/
25-May-2021 14:07:03.144 /var/snap/
25-May-2021 14:07:03.144 loading configuration: failure
25-May-2021 14:07:03.144 exiting (due to fatal error)
Related branches
- Björn Tillenius: Approve
-
Diff: 118 lines (+48/-29)3 files modifiedsrc/maasserver/dns/tests/test_zonegenerator.py (+43/-26)
src/maasserver/dns/zonegenerator.py (+1/-3)
src/provisioningserver/dns/zoneconfig.py (+4/-0)
- MAAS Lander: Approve
- Björn Tillenius: Approve
-
Diff: 118 lines (+48/-29)3 files modifiedsrc/maasserver/dns/tests/test_zonegenerator.py (+43/-26)
src/maasserver/dns/zonegenerator.py (+1/-3)
src/provisioningserver/dns/zoneconfig.py (+4/-0)
description: | updated |
Changed in maas: | |
status: | New → Triaged |
importance: | Undecided → Critical |
milestone: | none → 3.0.0-rc2 |
tags: | added: cdo-qa cdo-release-blocker foundations-engine |
Changed in maas: | |
assignee: | nobody → Björn Tillenius (bjornt) |
Changed in maas: | |
assignee: | Björn Tillenius (bjornt) → Christian Grabowski (cgrabowski) |
Changed in maas: | |
milestone: | none → next |
status: | In Progress → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
milestone: | next → none |
I would suspect the issue is in ZoneGenerator. _gen_reverse_ zones() . If we grab the subnets and domains from the db dump, hopefully we can trigger the error (duplicate zones) in a unit test.
See also dns_update_ all_zones( ), which is the one that ultimately calls ZoneGenerator. as_list( ).