This has been happening to us recently with MAAS 3.3.4. It's a sporadic failure. I'm seeing the following in regiond.log: 2023-09-08 19:12:21 twisted.internet.protocol.Factory: [info] RegionServer conne ction established (HOST:IPv6Address(type='TCP', host='::ffff:10.1.16.3', port=52 52, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:10.1.16.3', port=58048, flowInfo=0, scopeID=0)) 2023-09-08 19:12:21 maasserver.rpc.regionservice: [info] Rack controller authent icated from '::ffff:10.1.16.3:58044'. 2023-09-08 19:12:21 maasserver.rpc.regionservice: [info] Rack controller authent icated from '::ffff:10.1.16.3:58048'. 2023-09-08 19:12:23 maasserver.ipc: [info] Worker pid:29451 registered RPC conne ction to ('rsfc68', '10.1.16.3', 5252). 2023-09-08 19:12:25 maasserver.ipc: [info] Worker pid:29451 registered RPC conne ction to ('rsfc68', '10.1.16.3', 5252). 2023-09-08 19:12:25 maasserver.dhcp: [info] Successfully configured DHCPv4 on ra ck controller 'weavile (rsfc68)'. 2023-09-08 19:12:26 maasserver.dhcp: [info] Successfully configured DHCPv6 on ra ck controller 'weavile (rsfc68)'. 2023-09-08 19:12:39 regiond: [info] 127.0.0.1 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService ) 2023-09-08 19:13:02 maasserver.models.node: [info] hoggus: Turning on netboot fo r node 2023-09-08 19:13:02 maasserver.models.node: [info] hoggus: Turning ephemeral dep loy off for node 2023-09-08 19:15:21 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'weavile' (rsfc68). 2023-09-08 19:15:21 maasserver.regiondservices.active_discovery: [info] Active n etwork discovery: Unable to initiate network scanning on any rack controller. Ve rify that the rack controllers are started and have connected to the region. 2023-09-08 19:15:42 maasserver.models.signals.power: [critical] Failed to update power state of machine after state transition. Traceback (most recent call last): File "/snap/maas/28521/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks current.result = callback( # type: ignore[misc] File "/snap/maas/28521/lib/python3.10/site-packages/maasserver/models/node.py", line 6052, in cb_power_control d = getClientFromIdentifiers(client_idents) File "/snap/maas/28521/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 128, in wrapper return func(*args, **kwargs) File "/snap/maas/28521/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 60, in wrapper return maybeDeferred(func, *args, **kwargs) --- --- File "/snap/maas/28521/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks current.result = callback( # type: ignore[misc] File "/snap/maas/28521/lib/python3.10/site-packages/maasserver/models/signals/power.py", line 46, in eb_error failure.trap(Node.DoesNotExist, UnknownPowerType, PowerProblem) File "/snap/maas/28521/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap self.raiseException() File "/snap/maas/28521/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException raise self.value.with_traceback(self.tb) File "/snap/maas/28521/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 190, in maybeDeferred result = f(*args, **kwargs) File "/snap/maas/28521/lib/python3.10/site-packages/maasserver/rpc/__init__.py", line 25, in getClientFromIdentifiers raise exceptions.NoConnectionsAvailable( provisioningserver.rpc.exceptions.NoConnectionsAvailable: Unable to connect to any rack controller rsfc68; no connections available. 2023-09-08 19:17:01 RegionServer,5851,::ffff:10.1.20.3: [info] RegionServer connection lost (HOST:IPv6Address(type='TCP', host='::ffff:10.1.20.3', port=5253, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:10.1.20.3', port=44146, flowInfo=0, scopeID=0)) 2023-09-08 19:17:01 RegionServer,5852,::ffff:10.1.20.3: [info] RegionServer connection lost (HOST:IPv6Address(type='TCP', host='::ffff:10.1.20.3', port=5253, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:10.1.20.3', port=44152, flowInfo=0, scopeID=0)) 2023-09-08 19:17:01 maasserver.ipc: [info] Worker pid:29452 lost RPC connection to ('rsfc68', '10.1.20.3', 5253). 2023-09-08 19:17:01 maasserver.ipc: [info] Worker pid:29452 lost RPC connection to ('rsfc68', '10.1.20.3', 5253). 2023-09-08 19:17:21 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:10.1.20.3', port=5253, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:10.1.20.3', port=53994, flowInfo=0, scopeID=0)) 2023-09-08 19:17:21 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:10.1.20.3', port=5253, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:10.1.20.3', port=54008, flowInfo=0, scopeID=0)) Based on the context around the Python traceback, it looks to me like the rack controller is OK, goes away, and then comes back; but in the brief time that it's non-responsive, the region controller fails. I'm also seeing a lot of Python tracebacks in rackd.log. I'll try to attach rackd.log and regiond.log to another comment; LP is flaking out when I try to attach it to this one.