MSCM power drivers throwing EOFError intermittently

Bug #1544757 reported by Sean Feole
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Newell Jensen
1.10
Fix Released
Medium
Newell Jensen
1.9
Fix Released
Medium
Newell Jensen

Bug Description

Problem Description:

MAAS power drivers for the HP Moonshot can no longer query the Node power status.

In the maas logs we will see many Errors like the ones below across all of the systems.

==> /var/log/maas/maas.log <==
Feb 11 17:22:11 maas-devel maas.power: [ERROR] ms10-39-mcdivitt: Failed to refresh power state:
Feb 11 17:22:13 maas-devel maas.power: [INFO] ms10-05-avaton: Power state has changed from error to off.
Feb 11 17:22:14 maas-devel maas.power: [INFO] ms10-18n2-slayton: Power state has changed from error to off.
Feb 11 17:23:40 maas-devel maas.power: [ERROR] Power state could not be queried:
Feb 11 17:23:40 maas-devel maas.power: [ERROR] ms10-01-avaton.1ss: Failed to refresh power state:
Feb 11 17:23:42 maas-devel maas.power: [INFO] ms10-18n3-slayton: Power state has changed from error to off.
Feb 11 17:27:25 maas-devel maas.power: [ERROR] Power state could not be queried:
Feb 11 17:27:25 maas-devel maas.power: [ERROR] ms10-39-mcdivitt: Failed to refresh power state:
Feb 11 17:27:26 maas-devel maas.power: [ERROR] Power state could not be queried:
Feb 11 17:27:26 maas-devel maas.power: [ERROR] ms10-18n1-slayton: Failed to refresh power state:

This does not appear to break the actual commissioning/deploying of hosts. It simply breaks the lights on the web ui page, they will flicker between Gray / Red / Green. And adds the event logs with the following:

Failed to query node's BMC - Power state could not be queried: Thu, 11 Feb. 2016 17:27:26
Failed to query node's BMC - Power state could not be queried: Thu, 11 Feb. 2016 16:34:28

I have tried, removing and adding the host back into Maas, restarting the maas-clusterd. This was an upgrade from maas 1.8.3 -> 1.9, this was not a fresh install of 1.9.

maas:
  Installed: 1.9.0+bzr4533-0ubuntu1~trusty1
  Candidate: 1.9.0+bzr4533-0ubuntu1~trusty1
  Version table:
 *** 1.9.0+bzr4533-0ubuntu1~trusty1 0
        500 http://ppa.launchpad.net/maas/stable/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status

Maas Installed from ppa:maas/stable

ii maas 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS server all-in-one metapackage
ii maas-cli 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS command line API tool
ii maas-cluster-controller 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS server cluster controller
ii maas-common 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS server common files
ii maas-dhcp 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS DHCP server
ii maas-dns 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS DNS server
ii maas-proxy 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS Caching Proxy
ii maas-region-controller 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS server complete region controller
ii maas-region-controller-min 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS Server minimum region controller
ii python-django-maas 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS server Django web framework
ii python-maas-client 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS python API client
ii python-maas-provisioningserver 1.9.0+bzr4533-0ubuntu1~trusty1 all MAAS server provisioning libraries

Tags: hyperscale

Related branches

Changed in maas:
assignee: nobody → Newell Jensen (newell-jensen)
Changed in maas:
milestone: none → 2.0.0
Revision history for this message
Sean Feole (sfeole) wrote :
Download full text (3.2 KiB)

Additional Crashes from maas clusterd.log

==> /var/log/maas/maas.log <==
Feb 12 12:48:10 maas-devel maas.power: [ERROR] Power state could not be queried:
Feb 12 12:48:10 maas-devel maas.power: [ERROR] ms10-03-avaton.1ss: Failed to refresh power state:

==> /var/log/maas/clusterd.log <==
2016-02-12 12:48:10-0500 [ClusterClient,client] Failed to refresh power state.
 Traceback (most recent call last):
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 423, in errback
     self._startRunCallbacks(fail)
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 490, in _startRunCallbacks
     self._runCallbacks()
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1155, in gotResult
     _inlineCallbacks(r, g, deferred)
 --- <exception caught here> ---
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1097, in _inlineCallbacks
     result = result.throwExceptionIntoGenerator(g)
   File "/usr/lib/python2.7/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
     return g.throw(self.type, self.value, self.tb)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/power/query.py", line 126, in get_power_state
     system_id, hostname, power_type, context)
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1097, in _inlineCallbacks
     result = result.throwExceptionIntoGenerator(g)
   File "/usr/lib/python2.7/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
     return g.throw(self.type, self.value, self.tb)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/drivers/power/__init__.py", line 246, in query
     self.power_query, system_id, context)
   File "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191, in _worker
     result = context.call(ctx, function, *args, **kwargs)
   File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
     return self.currentContext().callWithContext(ctx, func, *args, **kw)
   File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
     return func(*args,**kw)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/drivers/power/mscm.py", line 57, in power_query
     return power_state_mscm(host, username, password, node_id)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/drivers/hardware/mscm.py", line 184, in power_state_mscm
     power_state = mscm.get_node_power_state(node_id)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/drivers/hardware/mscm.py", line 143, in get_node_power_state
     power_state = self._run_cli_command("show node power %s" % node_id)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/drivers/hardware/mscm.py", line 74, in _run_cli_command
     self.host, username=self.username, password=self.password)
   File "/usr/lib/python2.7/dist-packages/paramiko/client.py", line 306, in connect
     t.start_client()
 ...

Read more...

Revision history for this message
Newell Jensen (newell-jensen) wrote : Re: [Bug 1544757] Re: Maas 1.9 Power Drivers for HP Moonshot are no longer stable
Download full text (7.8 KiB)

This is not an additional error. This is the issue I mentioned in our
meeting this morning that I found.

On Fri, Feb 12, 2016 at 9:49 AM, Sean Feole <email address hidden>
wrote:

> Additional Crashes from maas clusterd.log
>
> ==> /var/log/maas/maas.log <==
> Feb 12 12:48:10 maas-devel maas.power: [ERROR] Power state could not be
> queried:
> Feb 12 12:48:10 maas-devel maas.power: [ERROR] ms10-03-avaton.1ss: Failed
> to refresh power state:
>
> ==> /var/log/maas/clusterd.log <==
> 2016-02-12 12:48:10-0500 [ClusterClient,client] Failed to refresh power
> state.
> Traceback (most recent call last):
> File
> "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 423, in
> errback
> self._startRunCallbacks(fail)
> File
> "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 490, in
> _startRunCallbacks
> self._runCallbacks()
> File
> "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in
> _runCallbacks
> current.result = callback(current.result, *args, **kw)
> File
> "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1155, in
> gotResult
> _inlineCallbacks(r, g, deferred)
> --- <exception caught here> ---
> File
> "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1097, in
> _inlineCallbacks
> result = result.throwExceptionIntoGenerator(g)
> File
> "/usr/lib/python2.7/dist-packages/twisted/python/failure.py", line 389, in
> throwExceptionIntoGenerator
> return g.throw(self.type, self.value, self.tb)
> File
> "/usr/lib/python2.7/dist-packages/provisioningserver/power/query.py", line
> 126, in get_power_state
> system_id, hostname, power_type, context)
> File
> "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1097, in
> _inlineCallbacks
> result = result.throwExceptionIntoGenerator(g)
> File
> "/usr/lib/python2.7/dist-packages/twisted/python/failure.py", line 389, in
> throwExceptionIntoGenerator
> return g.throw(self.type, self.value, self.tb)
> File
> "/usr/lib/python2.7/dist-packages/provisioningserver/drivers/power/__init__.py",
> line 246, in query
> self.power_query, system_id, context)
> File
> "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191,
> in _worker
> result = context.call(ctx, function, *args, **kwargs)
> File
> "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in
> callWithContext
> return self.currentContext().callWithContext(ctx, func, *args,
> **kw)
> File
> "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in
> callWithContext
> return func(*args,**kw)
> File
> "/usr/lib/python2.7/dist-packages/provisioningserver/drivers/power/mscm.py",
> line 57, in power_query
> return power_state_mscm(host, username, password, node_id)
> File
> "/usr/lib/python2.7/dist-packages/provisioningserver/drivers/hardware/mscm.py",
> line 184, in power_sta...

Read more...

Revision history for this message
Newell Jensen (newell-jensen) wrote : Re: Maas 1.9 Power Drivers for HP Moonshot are no longer stable

The above stacktrace is very intermittent and was always present but the power templates never caught the exception that was being thrown from the hardware driver.

Changed in maas:
status: New → Confirmed
importance: Undecided → Medium
summary: - Maas 1.9 Power Drivers for HP Moonshot are no longer stable
+ MSCM power drivers throwing EOFError intermittently
Changed in maas:
status: Confirmed → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.