The customer is using the interim version we provided (3.2.10~alpha3-12060-g.ee175c971-0ubuntu1), and around 30 days after a restart, they start to see RPC problems again. The MAAS environment has two servers - pdx01-m01-c33-cpu-01 and pdx01-m01-c34-cpu-01 Restarting the maas-rackd and maas-regiond on both servers brings the system back to normality. Sos reports are available here: https://drive.google.com/drive/folders/1u01dldSwTUoEYb6s3dz3n-539MlLEwSf?usp=drive_link These logs are seen in maas.log: 2023-11-02T12:43:32+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-11-02T13:04:14+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: message repeated 15 times: [ [error] Can't update service statuses, no RPC connection to region.] 2023-11-02T13:06:00+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-11-02T13:13:15+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: message repeated 6 times: [ [error] Can't update service statuses, no RPC connection to region.] 2023-11-02T13:14:29+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-11-02T13:59:57+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-11-02T14:04:14+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: message repeated 3 times: [ [error] Can't update service statuses, no RPC connection to region.] 2023-11-02T14:05:44+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-11-02T14:16:49+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: message repeated 7 times: [ [error] Can't update service statuses, no RPC connection to region.] 2023-12-04T20:03:50+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-12-29T18:08:20+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-12-30T06:07:42+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-12-30T06:09:40+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-12-30T06:10:53+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-12-30T06:14:34+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2023-12-30T06:15:02+00:00 pdx01-m01-c34-cpu-01 maas.dhcp.probe: [error] Can't initiate DHCP probe; no RPC connection to region. 2023-12-30T06:15:53+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2024-01-03T06:29:23+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2024-01-03T08:53:39+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. 2024-01-03T08:54:32+00:00 pdx01-m01-c34-cpu-01 maas.dhcp.probe: [error] Can't initiate DHCP probe; no RPC connection to region. 2024-01-03T08:54:32+00:00 pdx01-m01-c34-cpu-01 maas.service_monitor_service: [error] Can't update service statuses, no RPC connection to region. In regiond.log, different types of logs can be seen: 2023-12-04 20:02:47 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2023-12-05 03:51:46 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2023-12-05 21:34:47 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2023-12-06 16:27:34 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2023-12-07 16:04:18 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2023-12-29 18:06:49 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2023-12-30 06:04:48 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2023-12-30 11:02:20 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2023-12-30 21:25:47 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2024-01-03 06:30:17 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2024-01-05 06:40:47 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2024-01-05 08:12:30 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! 2024-01-05 08:41:16 -: [critical] Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands! These messages usually precede the point where the system starts to fail. Below we can see the that the Error configuring DHCPv4 starts to be displayed, until a point they happen every minute. This is when the customer identifies it a restart is needed. 2024-01-03 06:29:55 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': 2024-01-03 06:29:55 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': 2024-01-03 06:30:17 maasserver.dhcp: [critical] Error configuring DHCPv6 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 06:30:17 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 06:30:30 maasserver.dhcp: [critical] Error configuring DHCPv6 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': 2024-01-03 06:30:30 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 06:31:06 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 06:31:06 maasserver.dhcp: [critical] Error configuring DHCPv6 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 06:31:06 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 06:31:24 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': 2024-01-03 06:31:59 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 06:32:44 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': 2024-01-03 06:33:01 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 08:07:47 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': 2024-01-03 08:08:01 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 08:51:32 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': 2024-01-03 08:51:41 maasserver.dhcp: [critical] Error configuring DHCPv6 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 08:51:41 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 08:55:39 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': 2024-01-03 08:55:48 maasserver.dhcp: [critical] Error configuring DHCPv6 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 08:55:48 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 08:57:16 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 08:57:16 maasserver.dhcp: [critical] Error configuring DHCPv6 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 08:57:16 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 08:58:16 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 08:58:16 maasserver.dhcp: [critical] Error configuring DHCPv6 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 08:58:16 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 08:59:05 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 08:59:05 maasserver.dhcp: [critical] Error configuring DHCPv6 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': Connection was closed cleanly. 2024-01-03 08:59:05 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 08:59:51 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': 2024-01-03 09:00:00 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:12'. 2024-01-03 09:00:45 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': 2024-01-03 09:00:52 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c34-cpu-01 (xfhrbn)': This is the full stack of these errors: 2024-01-05 11:11:11 maasserver.dhcp: [critical] Error configuring DHCPv4 on rack controller 'pdx01-m01-c33-cpu-01 (cgbctk)': Traceback (most recent call last): --- --- File "/usr/lib/python3/dist-packages/maasserver/dhcp.py", line 875, in configure_dhcp yield client( File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/python3/dist-packages/provisioningserver/rpc/common.py", line 145, in _global_intercept_errback failure.raiseException() File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 467, in raiseException raise self.value.with_traceback(self.tb) File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/python3/dist-packages/twisted/protocols/amp.py", line 1994, in _massageError error.trap(RemoteAmpError) File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 439, in trap self.raiseException() File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 467, in raiseException raise self.value.with_traceback(self.tb) twisted.internet.defer.CancelledError: 2024-01-05 11:11:17 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:10.217.0.11', port=5250, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:10.217.0.131', port=60518, flowInfo=0, scopeID=0)) 2024-01-05 11:11:17 maasserver.rpc.regionservice: [info] Rack controller 'None' disconnected. 2024-01-05 11:11:17 RegionServer,91772,::ffff:10.217.0.131: [info] RegionServer connection lost (HOST:IPv6Address(type='TCP', host='::ffff:10.217.0.11', port=5250, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:10.217.0.131', port=60518, flowInfo=0, scopeID=0)) 2024-01-05 11:11:27 maasserver.dhcp: [info] Successfully configured DHCPv6 on rack controller 'pdx01-m01-c33-cpu-01 (cgbctk)'. 2024-01-05 11:11:27 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:1'. Traceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1475, in gotResult _inlineCallbacks(r, g, status) File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1464, in _inlineCallbacks status.deferred.errback() File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in errback self._startRunCallbacks(fail) File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks self._runCallbacks() --- --- File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/python3/dist-packages/maasserver/rack_controller.py", line 281, in d.addErrback(lambda f: f.trap(NoConnectionsAvailable)) File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 439, in trap self.raiseException() File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 467, in raiseException raise self.value.with_traceback(self.tb) File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/usr/lib/python3/dist-packages/maasserver/dhcp.py", line 957, in configure_dhcp raise ipv4_exc File "/usr/lib/python3/dist-packages/maasserver/dhcp.py", line 875, in configure_dhcp yield client( twisted.internet.defer.CancelledError: I noticed MAAS 3.3 has some re-working related the connectivity between region and racks. Is it something that could change this behavior, in a way that upgrading might be a recommended action to solve this issue? Thanks