Unable to get RPC connection for cluster 'maas'

Bug #1350925 reported by Michael Reed on 2014-07-31
44
This bug affects 8 people
Affects Status Importance Assigned to Milestone
MAAS
Critical
Unassigned

Bug Description

I get the following error when I try to make any changes to a node. This error message appears in the web GUI under the power section on a node edit page when I try to save any changes to a node. The following is the error message:

The cluster controller for this node is not responding; power type validation is not available. Unable to get RPC connection for cluster 'maas'

Restarting Maas "occasionally" clears this up, but most of the time removing and then re-installing maas is what is needed

 uname -a
Linux ms05-01-avaton 3.13.0-30-generic #55-Ubuntu SMP Fri Jul 4 21:40:53 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

ubuntu@ms05-01-avaton:/var/lib/maas/boot-resources$ dpkg -l | grep maas
ii maas 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS server all-in-one metapackage
ii maas-cli 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS command line API tool
ii maas-cluster-controller 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS server cluster controller
ii maas-common 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS server common files
ii maas-dhcp 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS DHCP server
ii maas-dns 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS DNS server
ii maas-enlist 0.4+bzr38-0ubuntu1 amd64 MAAS enlistment tool
ii maas-region-controller 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS server complete region controller
ii maas-region-controller-min 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS Server minimum region controller
ii maas-samba 1.6+bzr2247-0ubuntu1~windows1 all MAAS Samba server
ii maas-test 0.1+bzr147-0ubuntu1 all Utility to test hardware compatibility with MAAS
ii python-django-maas 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS server Django web framework
ii python-maas-client 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS python API client
ii python-maas-provisioningserver 1.6.0+bzr2539-0ubuntu1~beta6~ppa1 all MAAS server provisioning libraries

Michael Reed (mreed8855) wrote :
Julian Edwards (julian-edwards) wrote :

This is a legitimate error and happens when the cluster controller is down. Next time it happens, "sudo service maas-pserv status" on your cluster controller host to check (and restart it if it's down).

If it's down, can you please reply back with its log file here.

Changed in maas:
status: New → Incomplete
mahmoh (mahmoh) wrote :

MaaS 1.6.0 maas-maintainers/testing: Initially incorrectly added an "ILO4 ..." manual entry then deleted it and restarted the server, upon restart attempted to add the correct "Moonshot HP iLo ..." entry and this message appeared on the UI:

***
Power type
The cluster controller for this node is not responding; power type validation is not available. Unable to get RPC connection for cluster 'maas'
- Moonshot HP iLO

Power parameters
Unknown parameter(s): power_address, power_pass, node_id, power_user.
***

Changed in maas:
status: Incomplete → Confirmed
mahmoh (mahmoh) wrote :

^ maas-pserv appeared to be running still, a restart of the process appeared to "fix" the problem.

Julian Edwards (julian-edwards) wrote :

Something odd is happening on the region side of things. Gavin, any ideas with this?

ERROR 2014-08-02 08:19:42,265 twisted Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191
, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, i
n callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in
 callWithContext
    return func(*args,**kw)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/__init__.py",
line 294, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/__init__.py", line 206
, in call_with_lock
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/async.py", line 145, i
n call_within_transaction
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/rpc/regionservice.py", line
364, in update
    self._do_collect(cursor)
  File "/usr/lib/python2.7/dist-packages/maasserver/rpc/regionservice.py", line 477, in _do_collect
    cursor.execute(self._collect_statement)
  File "/usr/lib/python2.7/dist-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "/usr/lib/python2.7/dist-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/usr/lib/python2.7/dist-packages/django/db/backends/util.py", line 51, in execute
    return self.cursor.execute(sql)
django.db.utils.OperationalError: deadlock detected
DETAIL: Process 1934 waits for ShareLock on transaction 3135; blocked by process 1935.
Process 1935 waits for ShareLock on transaction 3134; blocked by process 1934.
HINT: See server log for query details.

Changed in maas:
status: Confirmed → Triaged
importance: Undecided → Critical
Gavin Panella (allenap) wrote :

I think that's related to bug 1351482, i.e. fallout from switching to SERIALIZABLE isolation.

tz (fdat) wrote :

ii maas 1.5+bzr2252-0ubuntu1
ii maas-cli 1.5+bzr2252-0ubuntu1
ii maas-cluster-controller 1.5+bzr2252-0ubuntu1
ii maas-common 1.5+bzr2252-0ubuntu1
ii maas-dhcp 1.5+bzr2252-0ubuntu1
ii maas-dns 1.5+bzr2252-0ubuntu1
ii maas-region-controller 1.5+bzr2252-0ubuntu1
ii maas-region-controller-min 1.5+bzr2252-0ubuntu1
ii python-django-maas 1.5+bzr2252-0ubuntu1
ii python-maas-client 1.5+bzr2252-0ubuntu1
ii python-maas-provisioningserver 1.5+bzr2252-0ubuntu1
--

sudo service maas-pserv status:
maas-pserv start/running, process 26810

--

pserv.log:
2014-08-12 17:03:51+0000 [-] Starting factory <HTTPClientFactory: http://localhost/MAAS/rpc/>
2014-08-12 17:03:51+0000 [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://localhost/MAAS/rpc/>
2014-08-12 17:05:16+0000 [-] Starting factory <HTTPClientFactory: http://localhost/MAAS/rpc/>
2014-08-12 17:05:16+0000 [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://localhost/MAAS/rpc/>

maas.log:
ERROR 2014-08-12 16:58:35,430 maasserver Unable to get RPC connection for cluster 'master'
ERROR 2014-08-12 16:58:35,434 maasserver Unable to get RPC connection for cluster 'master'

Julian Edwards (julian-edwards) wrote :

Michael, can you try with the release candidate build and see if you can reproduce?

Changed in maas:
status: Triaged → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
mahmoh (mahmoh) wrote :

@mreed: could you please confirm this is no longer the case? I've seen this recently in the stable branch release as of two weeks ago. Thank you.

Changed in maas:
status: Expired → Incomplete
tags: added: hs-arm64
rikoder (erick-openchill) wrote :

just add this bug on 14.04 fully upgraded (amd64).
I confirm restarting maas-pserv fixed the problem for me.

Ivan Zoratti (izoratti) wrote :

 I am having this same problem at the moment, with MAAS 1.7 on 14.04.1 LTS.

I fixed it by adding the hostname (MAAS) in the /etc/hosts file.

Ortiz (araujo-ortiz) on 2015-01-11
Changed in maas:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers