[2.5] FATAL: remaining connection slots are reserved for non-replication superuser connections

Bug #1799871 reported by Andres Rodriguez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Expired
Critical
Blake Rouse

Bug Description

I had a MAAS that has 2 region/racks + 2 racks controllers. 2 physical machines, 1 deployed as a pod. All of the sudden I started seeing this issue.

After i noticed this issue, I also noticed my secondary region/rack was dead and these logs are from the primary region/rack.

Lastly, I manually restarted regiond on the primary region/rack and things resolved themselves.

Also, max_connections is set to 200.

2018-10-25 03:55:05 provisioningserver.rpc.common: [critical] Unhandled failure dispatching AMP command. This is probably a bug. Please ensure that this error is handled within application code or declared in the signature of the b'GetSyslogConfiguration' command. [maas00:pid=12227:cmd=GetSyslogConfiguration:ask=bb38]
 Traceback (most recent call last):
   File "/usr/lib/python3/dist-packages/twisted/internet/asyncioreactor.py", line 267, in run
     self._asyncioEventloop.run_forever()
   File "/usr/lib/python3/dist-packages/twisted/internet/asyncioreactor.py", line 290, in run
     f(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 500, in errback
     self._startRunCallbacks(fail)
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 567, in _startRunCallbacks
     self._runCallbacks()
 --- <exception caught here> ---
   File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File "/usr/lib/python3/dist-packages/twisted/protocols/amp.py", line 1171, in checkKnownErrors
     key = error.trap(*command.allErrors)
   File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 359, in trap
     self.raiseException()
   File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 385, in raiseException
     raise self.value.with_traceback(self.tb)
   File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext
     result = inContext.theWork()
   File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
     inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
   File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext
     return self.currentContext().callWithContext(ctx, func, *args, **kw)
   File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 85, in callWithContext
     return func(*args,**kw)
   File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 885, in callInContext
     return func(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 234, in wrapper
     result = func(*args, **kwargs)
   File "/usr/lib/python3/dist-packages/maasserver/utils/orm.py", line 755, in call_within_transaction
     with connected(), post_commit_hooks:
   File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
     return next(self.gen)
   File "/usr/lib/python3/dist-packages/maasserver/utils/orm.py", line 684, in connected
     connection.ensure_connection()
   File "/usr/lib/python3/dist-packages/django/db/backends/base/base.py", line 213, in ensure_connection
     self.connect()
   File "/usr/lib/python3/dist-packages/django/db/utils.py", line 94, in __exit__
     six.reraise(dj_exc_type, dj_exc_value, traceback)
   File "/usr/lib/python3/dist-packages/django/utils/six.py", line 685, in reraise
     raise value.with_traceback(tb)
   File "/usr/lib/python3/dist-packages/django/db/backends/base/base.py", line 213, in ensure_connection
     self.connect()
   File "/usr/lib/python3/dist-packages/django/db/backends/base/base.py", line 189, in connect
     self.connection = self.get_new_connection(conn_params)
   File "/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
     connection = Database.connect(**conn_params)
   File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 130, in connect
     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
 django.db.utils.OperationalError: FATAL: remaining connection slots are reserved for non-replication superuser connections

Changed in maas:
importance: Undecided → Critical
status: New → Triaged
assignee: nobody → Blake Rouse (blake-rouse)
milestone: none → 2.5.0rc1
description: updated
description: updated
description: updated
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Further information i found:

1. primary regiond/rackd is running
2. secondary regiond/rackd is dead
3. deployed ubuntu, it worked.
4. deployed custom os (esxi), it worked
5. after a while being idled, the machine started displaying these issues again.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Also, weird thing is that I can use MAAS over the API just fine.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

only other error i see is this:

==> /var/log/maas/rackd.log <==
2018-10-25 04:47:05 provisioningserver.rackdservices.external: [critical] Failed to get external services configurations.
 Traceback (most recent call last):
 Failure: twisted.protocols.amp.UnhandledCommand: (b'UNHANDLED', 'Unknown Error [maas00:pid=19353:cmd=GetDNSConfiguration:ask=5c]')

Revision history for this message
Andres Rodriguez (andreserl) wrote :

database_host: localhost
database_name: maasdb
database_pass: *******
database_port: 5432
database_user: maas
maas_url: http://**.**.**.**:5240/MAAS

Revision history for this message
Blake Rouse (blake-rouse) wrote :

$ ps auxf | grep 'maasdb 10.90.90.3' | wc -l
168

Your .3 regiond is using almost all the connections and I cannot access it over SSH, so I think something has gone really wrong with that machine in general.

Changed in maas:
status: Triaged → Incomplete
Changed in maas:
milestone: 2.5.0rc1 → 2.5.0
Changed in maas:
milestone: 2.5.0 → 2.5.x
Revision history for this message
Adam Collard (adam-collard) wrote :

This bug has not seen any activity in the last 6 months, so it is being automatically closed.

If you are still experiencing this issue, please feel free to re-open.

MAAS Team

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.