maasserver: [error] Error while calling DescribePowerTypes: RPC connection timed out to rack controller

Bug #1681390 reported by Tytus Kurek
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Unassigned

Bug Description

There is MaaS (2.1.3) installation on 2 bare metal nodes in the HA mode (no load balancing):

- hrzagt1-f1-sc-support-01.hrzagt1.sc.in.XYZ (ctwya7): region and rack controller
- hrzagt1-f1-sc-support-02.hrzagt1.sc.in.XYZ (ssdcba): region and rack controller

Attached are configuration files from both nodes ("conf" directory).

This setup was initially successful (all services turned into green status in MaaS GUI, I was able to enlist and commission nodes), but over time something broke and nodes started to fail when commissioning. The following symptoms have been observed:

1) The following error messages have been found in log files of the primary region controller (ctwya7):

2017-04-10 07:27:13 maasserver.rpc.regionservice: [info] Rack controller 'ssdcba' disconnected.
2017-04-10 07:27:13 RegionServer,10,::ffff:10.24.122.3: [info] RegionServer connection lost (HOST:IPv6Address(TCP, '::ffff:10.24.122.2', 5250) PEER:IPv6Address(TCP, '::ffff:10.24.122.3', 39596))
2017-04-10 07:27:13 maasserver.models.signals.power: [critical] Failed to update power state of machine after state transition.

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1184, in gotResult
    _inlineCallbacks(r, g, deferred)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1174, in _inlineCallbacks
    deferred.errback()
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 434, in errback
    self._startRunCallbacks(fail)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python3/dist-packages/maasserver/models/signals/power.py", line 52, in eb_error
    Node.DoesNotExist, UnknownPowerType, PowerProblem)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 342, in trap
    self.raiseException()
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 368, in raiseException
    raise self.value.with_traceback(self.tb)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
 File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 2111, in confirm_power_driver_operable
    missing_packages = yield power_driver_check(client, power_type)
twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:27:13 maasserver.models.signals.power: [critical] Failed to update power state of machine after state transition.

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1184, in gotResult
    _inlineCallbacks(r, g, deferred)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1174, in _inlineCallbacks
    deferred.errback()
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 434, in errback
    self._startRunCallbacks(fail)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python3/dist-packages/maasserver/models/signals/power.py", line 52, in eb_error
    Node.DoesNotExist, UnknownPowerType, PowerProblem)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 342, in trap
    self.raiseException()
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 368, in raiseException
    raise self.value.with_traceback(self.tb)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 2111, in confirm_power_driver_operable
    missing_packages = yield power_driver_check(client, power_type)
twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:27:15 twisted.python.log: [info] ::ffff:10.24.110.127 - - [10/Apr/2017:07:27:15 +0000] "GET /MAAS/rpc/ HTTP/1.0" 200 2104 "-" "provisioningserver.rpc.clusterservice.ClusterClientService"
2017-04-10 07:27:17 twisted.python.log: [info] ::1 - - [10/Apr/2017:07:27:16 +0000] "GET /MAAS/ HTTP/1.1" 200 5301 "http://10.24.110.100/MAAS/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
2017-04-10 07:27:19 twisted.python.log: [info] ::1 - - [10/Apr/2017:07:27:18 +0000] "GET /MAAS/ HTTP/1.1" 200 5301 "http://10.24.110.100/MAAS/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
2017-04-10 07:27:23 maasserver: [error] Error while calling DescribePowerTypes: RPC connection timed out to rack controller 'hrzagt1-f2-sc-support-02' (ssdcba).

2) Secondary rack controller (ssdcba) reports in MaaS GUI that it's missing connection to 1 region controller ("ssdcba.png" file).

3) The following error messages have been found in log files of the secondary rack controller (ssdcba):

2017-04-10 07:43:05 ClusterClient,client: [info] ClusterClient connection lost (HOST:IPv6Address(TCP, '::ffff:10.24.122.3', 40474) PEER:IPv6Address(TCP, '::ffff:10.24.122.2', 5250))
2017-04-10 07:43:05 twisted.internet.defer: [critical] Unhandled error in Deferred:
2017-04-10 07:43:05 twisted.internet.defer: [critical]

Traceback (most recent call last):
Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:43:05 provisioningserver.rackdservices.ntp: [critical] Failed to update NTP configuration.

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 434, in errback
    self._startRunCallbacks(fail)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1184, in gotResult
    _inlineCallbacks(r, g, deferred)
--- <exception caught here> ---
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python3/dist-packages/provisioningserver/rackdservices/ntp.py", line 71, in _getConfiguration
    GetTimeConfiguration, system_id=client.localIdent)
twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:43:05 provisioningserver.utils.services: [critical] Failed to update and/or record network interface configuration: Connection to the other side was lost in a non-clean fashion.

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1184, in gotResult
    _inlineCallbacks(r, g, deferred)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1174, in _inlineCallbacks
    deferred.errback()
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 434, in errback
    self._startRunCallbacks(fail)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 342, in trap
    self.raiseException()
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 368, in raiseException
    raise self.value.with_traceback(self.tb)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 556, in _configureNetworkDiscovery
    self.getDiscoveryState)
twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:43:06 ClusterClient,client: [info] ClusterClient connection lost (HOST:IPv6Address(TCP, '::ffff:10.24.123.3', 60238) PEER:IPv6Address(TCP, '::ffff:10.24.123.2', 5251))
2017-04-10 07:43:06 twisted.internet.defer: [critical] Unhandled error in Deferred:
2017-04-10 07:43:06 twisted.internet.defer: [critical]

Traceback (most recent call last):
Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:43:06 twisted.internet.defer: [critical] Unhandled error in Deferred:
2017-04-10 07:43:06 twisted.internet.defer: [critical]

Traceback (most recent call last):
Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:43:06 twisted.internet.defer: [critical] Unhandled error in Deferred:
2017-04-10 07:43:06 twisted.internet.defer: [critical]

Traceback (most recent call last):
Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:43:06 provisioningserver.rackdservices.service_monitor_service: [critical] Failed to monitor services and update region.

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 434, in errback
    self._startRunCallbacks(fail)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1184, in gotResult
    _inlineCallbacks(r, g, deferred)
--- <exception caught here> ---
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python3/dist-packages/provisioningserver/rackdservices/service_monitor_service.py", line 94, in _updateRegion
    services = yield self._buildServices(services)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python3/dist-packages/provisioningserver/rackdservices/service_monitor_service.py", line 106, in _buildServices
    status, status_info = yield state.getStatusInfo(service)
twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:43:06 provisioningserver.rackdservices.dhcp_probe_service: [critical] Unable to probe for DHCP servers.

(UNABLE TO OBTAIN TRACEBACK FROM EVENT)
2017-04-10 07:43:06 twisted.internet.defer: [critical] Unhandled error in Deferred:
2017-04-10 07:43:06 twisted.internet.defer: [critical]

Traceback (most recent call last):
Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:43:06 twisted.internet.defer: [critical] Unhandled error in Deferred:
2017-04-10 07:43:06 twisted.internet.defer: [critical]

Traceback (most recent call last):
Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
2017-04-10 07:43:06 twisted.internet.defer: [critical] Unhandled error in Deferred:
2017-04-10 07:43:06 twisted.internet.defer: [critical]

Traceback (most recent call last):
Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.

Additional information:

1. Contents of /var/log/maas/*

Attached ("logs" directory).

2. Output from dpkg -l '*maas*'|cat

ctwya7:

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-==============================-============-=============================================
un maas <none> <none> (no description available)
ii maas-cli 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.1.3+bzr5573-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.1.3+bzr5573-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.1.3+bzr5573-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

ssdcba:

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-==============================-============-=============================================
un maas <none> <none> (no description available)
ii maas-cli 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.1.3+bzr5573-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.1.3+bzr5573-0ubuntu1~16.04.1 all Region controller API service for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.1.3+bzr5573-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

3. Instructions on how we can re-create your bug and what the expected result should be.

Ho to re-create: https://docs.ubuntu.com/maas/2.1/en/manage-ha
Expected result: secondary rack controller (ssdcba) does not get disconnected

Revision history for this message
Tytus Kurek (tkurek) wrote :
Changed in maas:
milestone: none → 2.2.0rc2
status: New → Incomplete
Changed in maas:
importance: Undecided → Critical
Changed in maas:
milestone: 2.2.0rc2 → 2.2.0rc3
Changed in maas:
status: Incomplete → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.