@adam, I happened to have some cycles to look around. There are 2 MAAS servers which manage various ARCHS. ARM64 & PPC64le Both MAAS servers are running version 2.8.1 one appears to be from the snap, the other from PPA. MAAS name: maas MAAS MAAS version: 2.8.1 (8567-g.c4825ca06) MAAS name: Shared POWER MAAS MAAS version: 2.8.1 (8567-g.c4825ca06-0ubuntu1~18.04.1) Regardless the bits in each of these builds I'm assuming is the same. The ARM64, HWE MAAS is using the snap. I see the following failure when systems deploy: [ OK ] Started LSB: automatic crash report generation. [ OK ] Finished Set console scheme. [ OK ] Created slice system-getty.slice. [ OK ] Started Getty on tty1. [ OK ] Reached target Login Prompts. [ OK ] Started System Logging Service. [ OK ] Started Login Service. [ OK ] Started Unattended Upgrades Shutdown. Starting Authorization Manager... [ OK ] Started Authorization Manager. [ OK ] Started Accounts Service. [ OK ] Started Dispatcher daemon for systemd-networkd. [ OK ] Started Snap Daemon. Starting Wait until snapd is fully seeded... [ OK ] Finished Pollinate to seed…seudo random number generator. Starting OpenBSD Secure Shell server... [FAILED] Failed to start OpenBSD Secure Shell server. See 'systemctl status ssh.service' for details. [ OK ] Stopped OpenBSD Secure Shell server. Starting OpenBSD Secure Shell server... [FAILED] Failed to start OpenBSD Secure Shell server. See 'systemctl status ssh.service' for details. [ OK ] Stopped OpenBSD Secure Shell server. Starting OpenBSD Secure Shell server... [FAILED] Failed to start OpenBSD Secure Shell server. See 'systemctl status ssh.service' for details. [ OK ] Stopped OpenBSD Secure Shell server. Starting OpenBSD Secure Shell server... Ubuntu 20.04.1 LTS ubuntu ttyAMA0 ubuntu login: Mounting Mount unit for snapd, revision 8543... [ OK ] Mounted Mount unit for snapd, revision 8543. [ OK ] Stopped Snap Daemon. Starting Snap Daemon... [ OK ] Started Snap Daemon. Mounting Mount unit for core18, revision 1883... [ OK ] Mounted Mount unit for core18, revision 1883. Mounting Mount unit for lxd, revision 16563... [ OK ] Mounted Mount unit for lxd, revision 16563. [ OK ] Listening on Socket unix for snap application lxd.daemon. Starting Service for snap application lxd.activate... [ OK ] Finished Service for snap application lxd.activate. [ OK ] Finished Wait until snapd is fully seeded. Starting Apply the settings specified in cloud-config... [ OK ] Reached target Multi-User System. [ OK ] Reached target Graphical Interface. Starting Update UTMP about System Runlevel Changes... [ OK ] Finished Update UTMP about System Runlevel Changes. [ 164.032142] cloud-init[3533]: Can not apply stage config, no datasource found! Likely bad things to come! [ 164.032748] cloud-init[3533]: ------------------------------------------------------------ [ 164.033240] cloud-init[3533]: Traceback (most recent call last): [ 164.033683] cloud-init[3533]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 485, in main_modules [ 164.034087] cloud-init[3533]: init.fetch(existing="trust") [ 164.034467] cloud-init[3533]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 350, in fetch [ 164.034903] cloud-init[3533]: return self._get_data_source(existing=existing) [ 164.035323] cloud-init[3533]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 255, in _get_data_source [ 164.035696] cloud-init[3533]: (ds, dsname) = sources.find_source(self.cfg, [ 164.036136] cloud-init[3533]: File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 780, in find_source [ 164.036566] cloud-init[3533]: raise DataSourceNotFoundException(msg) [ 164.036961] cloud-init[3533]: cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes: () [ 164.037411] cloud-init[3533]: ------------------------------------------------------------ [FAILED] Failed to start Apply the …ngs specified in cloud-config. See 'systemctl status cloud-config.service' for details. Starting Execute cloud user/final scripts... [ 165.929953] cloud-init[3546]: Can not apply stage final, no datasource found! Likely bad things to come! [ 165.930547] cloud-init[3546]: ------------------------------------------------------------ [ 165.930989] cloud-init[3546]: Traceback (most recent call last): [ 165.931413] cloud-init[3546]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 485, in main_modules [ 165.931834] cloud-init[3546]: init.fetch(existing="trust") [ 165.932240] cloud-init[3546]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 350, in fetch [ 165.932677] cloud-init[3546]: return self._get_data_source(existing=existing) [ 165.933114] cloud-init[3546]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 255, in _get_data_source [ 165.933453] cloud-init[3546]: (ds, dsname) = sources.find_source(self.cfg, [ 165.933872] cloud-init[3546]: File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 780, in find_source [ 165.934292] cloud-init[3546]: raise DataSourceNotFoundException(msg) [ 165.934723] cloud-init[3546]: cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes: () [ 165.935160] cloud-init[3546]: ------------------------------------------------------------ [FAILED] Failed to start Execute cloud user/final scripts. See 'systemctl status cloud-final.service' for details. [ OK ] Reached target Cloud-init target. In the MAAS logs, i see that the region controller appears to be in an "importing state" while the Rack controller appears to have connection problems with its region, (rack and region on the same system) ==> /var/log/maas/rackd.log <== 2020-07-21 23:49:10 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://localhost:5240/MAAS). 2020-07-21 23:49:12 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://localhost:5240/MAAS). 2020-07-21 23:49:13 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://localhost:5240/MAAS). 2020-07-21 23:49:14 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://localhost:5240/MAAS). 2020-07-21 23:49:14 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://localhost:5240/MAAS). 2020-07-21 23:49:16 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://localhost:5240/MAAS). 2020-07-21 23:49:17 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://localhost:5240/MAAS). 2020-07-21 23:49:18 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://localhost:5240/MAAS). 2020-07-21 23:49:18 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://localhost:5240/MAAS). 2020-07-21 23:49:19 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://localhost:5240/MAAS). ==> /var/log/maas/regiond.log <== 2020-07-21 23:47:29 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/ga-18.04/bionic/20200717/boot-kernel HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1) 2020-07-21 23:47:40 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/ga-19.10-lowlatency/eoan/20200611/boot-initrd HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1) 2020-07-21 23:47:42 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/ga-19.10-lowlatency/eoan/20200611/boot-kernel HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1) 2020-07-21 23:47:49 regiond: [info] 127.0.0.1 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService) 2020-07-21 23:48:03 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/ga-19.10-lowlatency/eoan/20200611/squashfs HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1) 2020-07-21 23:48:13 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/ga-19.10/eoan/20200611/boot-initrd HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1) 2020-07-21 23:48:15 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/ga-19.10/eoan/20200611/boot-kernel HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1) 2020-07-21 23:48:19 regiond: [info] 127.0.0.1 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService) 2020-07-21 23:48:27 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/ga-20.04-lowlatency/focal/20200720/boot-initrd HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1) 2020-07-21 23:48:28 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/ga-20.04-lowlatency/focal/20200720/boot-kernel HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1) Meanwhile the 2nd MAAS server appears to be a different issue entirely. Where bind9 appears to be crashing when MAAS tries to start it. Aug 10 16:26:41 maas-dev systemd[1]: bind9.service: Start request repeated too quickly. Aug 10 16:26:41 maas-dev systemd[1]: bind9.service: Failed with result 'exit-code'. Aug 10 16:26:41 maas-dev systemd[1]: Failed to start BIND Domain Name Server. Aug 10 16:26:41 maas-dev sh[11339]: 2020-08-10 16:26:41 maasserver.region_controller: [critical] Failed to kill and restart DNS. Aug 10 16:26:41 maas-dev sh[11339]: #011Traceback (most recent call last): Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 500, in errback Aug 10 16:26:41 maas-dev sh[11339]: #011 self._startRunCallbacks(fail) Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 567, in _startRunCallbacks Aug 10 16:26:41 maas-dev sh[11339]: #011 self._runCallbacks() Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks Aug 10 16:26:41 maas-dev sh[11339]: #011 current.result = callback(current.result, *args, **kw) Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1442, in gotResult Aug 10 16:26:41 maas-dev sh[11339]: #011 _inlineCallbacks(r, g, deferred) Aug 10 16:26:41 maas-dev sh[11339]: #011--- --- Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks Aug 10 16:26:41 maas-dev sh[11339]: #011 result = result.throwExceptionIntoGenerator(g) Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator Aug 10 16:26:41 maas-dev sh[11339]: #011 return g.throw(self.type, self.value, self.tb) Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/provisioningserver/utils/service_monitor.py", line 443, in killService Aug 10 16:26:41 maas-dev sh[11339]: #011 state = yield self.ensureService(name) Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks Aug 10 16:26:41 maas-dev sh[11339]: #011 result = result.throwExceptionIntoGenerator(g) Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator Aug 10 16:26:41 maas-dev sh[11339]: #011 return g.throw(self.type, self.value, self.tb) Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/provisioningserver/utils/service_monitor.py", line 733, in _ensureService Aug 10 16:26:41 maas-dev sh[11339]: #011 yield self._performServiceAction(service, action) Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks Aug 10 16:26:41 maas-dev sh[11339]: #011 result = g.send(result) Aug 10 16:26:41 maas-dev sh[11339]: #011 File "/usr/lib/python3/dist-packages/provisioningserver/utils/service_monitor.py", line 572, in _performServiceAction Aug 10 16:26:41 maas-dev sh[11339]: #011 raise ServiceActionError(error_msg) Aug 10 16:26:41 maas-dev sh[11339]: #011provisioningserver.utils.service_monitor.ServiceActionError: Service 'bind9' failed to start: Job for bind9.service failed because the control process exited with error code. Aug 10 16:26:41 maas-dev sh[11339]: #011See "systemctl status bind9.service" and "journalctl -xe" for details. Aug 10 16:26:41 maas-dev sh[11339]: #011 I'm still investigating and can get you those SOS reports if desired still. Just wanted to get some info in this bug, so that we can at least get an idea of what may appear to be the problem.