Hi guys, Looking at the logs on #3, and based on the times specified there, I see the following: 1. At 10:28:38 I see a traceback, that would lead me to believe that there's DB connection issues: 2019-02-25 10:28:38 maasserver: [error] ################################ Exception: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. ################################ 2019-02-25 10:28:38 maasserver: [error] Traceback (most recent call last): File "/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py", line 236, in _set_autocommit self.connection.autocommit = autocommit psycopg2.OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/maasserver/utils/views.py", line 235, in handle_uncaught_exception raise exc from exc.__cause__ File "/usr/lib/python3/dist-packages/maasserver/utils/views.py", line 297, in get_response with transaction.atomic(): File "/usr/lib/python3/dist-packages/django/db/transaction.py", line 184, in __enter__ connection.set_autocommit(False, force_begin_transaction_with_broken_autocommit=True) File "/usr/lib/python3/dist-packages/django/db/backends/base/base.py", line 411, in set_autocommit self._set_autocommit(autocommit) File "/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py", line 236, in _set_autocommit self.connection.autocommit = autocommit File "/usr/lib/python3/dist-packages/django/db/utils.py", line 94, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "/usr/lib/python3/dist-packages/django/utils/six.py", line 685, in reraise raise value.with_traceback(tb) File "/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py", line 236, in _set_autocommit self.connection.autocommit = autocommit django.db.utils.OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2019-02-25 10:28:38 regiond: [info] 10.100.2.2 GET /MAAS/rpc/ HTTP/1.1 --> 500 INTERNAL_SERVER_ERROR (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService) 2019-02-25 10:28:38 regiond: [info] 10.10.101.74 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService) 2. At 10:28:39, I see the API PUT for DNS, as per #3 I assume this is the update for 10.100.3.2. However, also look at the following failure for ListNodePowerParameters. This leads me to believe that there are still DB connection issues. 2019-02-25 10:28:39 regiond: [info] 127.0.0.1 GET /MAAS/api/2.0/dnsresources/ HTTP/1.1 --> 200 OK (referrer: -; agent: Python-urllib/3.6) 2019-02-25 10:28:39 regiond: [info] 127.0.0.1 PUT /MAAS/api/2.0/dnsresources/1/ HTTP/1.1 --> 200 OK (referrer: -; agent: Python-urllib/3.6) 2019-02-25 10:28:39 provisioningserver.rpc.common: [critical] Unhandled failure dispatching AMP command. This is probably a bug. Please ensure that this error is handled within application code or declared in the signature of the b'ListNodePowerParameters' command. [maas-vhost3:pid=8159:cmd=ListNodePowerParameters:ask=36] Traceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/internet/asyncioreactor.py", line 267, in run self._asyncioEventloop.run_forever() File "/usr/lib/python3/dist-packages/twisted/internet/asyncioreactor.py", line 290, in run f(*args, **kwargs) File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 500, in errback self._startRunCallbacks(fail) File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 567, in _startRunCallbacks self._runCallbacks() --- --- File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/python3/dist-packages/twisted/protocols/amp.py", line 1171, in checkKnownErrors key = error.trap(*command.allErrors) File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 359, in trap self.raiseException() File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 385, in raiseException raise self.value.with_traceback(self.tb) File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext result = inContext.theWork() File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in inContext.theWork = lambda: context.call(ctx, func, *args, **kw) File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 85, in callWithContext return func(*args,**kw) File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 885, in callInContext return func(*args, **kwargs) File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 234, in wrapper result = func(*args, **kwargs) File "/usr/lib/python3/dist-packages/maasserver/utils/orm.py", line 756, in call_within_transaction return func_outside_txn(*args, **kwargs) File "/usr/lib/python3/dist-packages/maasserver/utils/orm.py", line 563, in retrier return func(*args, **kwargs) File "/usr/lib/python3.6/contextlib.py", line 51, in inner with self._recreate_cm(): File "/usr/lib/python3/dist-packages/django/db/transaction.py", line 184, in __enter__ connection.set_autocommit(False, force_begin_transaction_with_broken_autocommit=True) File "/usr/lib/python3/dist-packages/django/db/backends/base/base.py", line 411, in set_autocommit self._set_autocommit(autocommit) File "/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py", line 236, in _set_autocommit self.connection.autocommit = autocommit File "/usr/lib/python3/dist-packages/django/db/utils.py", line 94, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "/usr/lib/python3/dist-packages/django/utils/six.py", line 685, in reraise raise value.with_traceback(tb) File "/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py", line 236, in _set_autocommit self.connection.autocommit = autocommit django.db.utils.InterfaceError: connection already closed 3. At 10:29:13 I see debug messages. 2019-02-25 10:29:13 provisioningserver.rpc.common: [debug] [RPC <- received] AmpBox({b'_ask': b'28', b'_command': b'Ping'}) 2019-02-25 10:29:13 provisioningserver.rpc.common: [debug] [RPC -> responding] AmpBox({b'_answer': b'28'}) 2019-02-25 10:29:13 provisioningserver.rpc.common: [debug] [RPC <- received] AmpBox({b'_ask': b'2c', b'_command': b'Ping'}) So the one thing I'm noticing here is that between 10:28:39 and 10:29:13 there's a lot of tracebacks that imply database connection issues. After 10.29.13, however, things seem to stabilize and no longer see any relevant tracebacks. The above tells me that: 1. failover happens 2. vhost3 is trying to reconnected to the database (e.g. the database is failing over from one place to another). 3. An update to the DNS zone seem to have been updated while there are still database connection issues. So my question is, in what situation does res_maas_region_hostname executes the update? Is pacemaker configured to wait for regiond to be fully connected and not have connection issues before res_maas_region_hostname is executed? Normally, I would expect pacemaker to to wait for a service to be fully up before executing a command that affects this service, in this case, I would expect pacemaker to fully configure that regiond is up and fully connected to the db before executing res_maas_region_hostname. Can you please confirm what happens here? (next I'll investigate other logs).