Intermittent power command issues with ResponseNeverReceived

Bug #2060710 reported by Alan Baghumian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
New
Undecided
Unassigned

Bug Description

Hello MAAS Team!

We have multiple deployments of MAAS (All 3.3.5 Snaps at the moment) in different locations and primarily use Redfish as the power driver.

These environments are spread across different subnets and different hardware platform types (all X86-64 / amd64) however they all show intermittent power action issues with the following trace visible in the Rackd logs:

2024-03-29 10:15:22 provisioningserver.rpc.power: [critical] e5439e68-host: Power on failed.
 Traceback (most recent call last):
 --- <exception caught here> ---
   File "/snap/maas/32636/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
     current.result = callback( # type: ignore[misc]
   File "/snap/maas/32636/lib/python3.10/site-packages/provisioningserver/rpc/power.py", line 242, in eb_cancelled
     failure.trap(CancelledError)
   File "/snap/maas/32636/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
     self.raiseException()
   File "/snap/maas/32636/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
     raise self.value.with_traceback(self.tb)
   File "/snap/maas/32636/lib/python3.10/site-packages/provisioningserver/rpc/power.py", line 292, in change_power_state
     yield perform_power_driver_change(
   File "/snap/maas/32636/lib/python3.10/site-packages/provisioningserver/drivers/power/__init__.py", line 372, in perform_power
     yield power_func(system_id, context)
   File "/snap/maas/32636/lib/python3.10/site-packages/provisioningserver/drivers/power/redfish.py", line 262, in power_on
     url, node_id, headers = yield self.process_redfish_context(context)
   File "/snap/maas/32636/lib/python3.10/site-packages/provisioningserver/drivers/power/redfish.py", line 197, in process_redfish_context
     node_id = yield self.get_node_id(url, headers)
   File "/snap/maas/32636/lib/python3.10/site-packages/provisioningserver/drivers/power/redfish.py", line 218, in get_node_id
     systems, _ = yield self.redfish_request(b"GET", uri, headers)
 twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]

The workaround is to re-try the power commands and they work.

This makes us believe perhaps there is a bug involves causing this problem to happen.

Please let me know if you need additional details or logs and I will be more than happy to assist.

Best,
Alan

Revision history for this message
Eline Maaike De Weerd (emdw) wrote :

Hi Alan,

Thanks for the bug report. If you have full logs, it would be great if you could provide those, as they might be helpful. Seeing the SSL error I'm also wondering if you could share more information about the setup here, perhaps something with the certificate is going awry?
Thanks!

Revision history for this message
Alan Baghumian (alanbach) wrote :

Hi Eline,

I shared the SOS reports from some of the affected machines internally. I suspect the issue is not with the certs since a retry works, but can get more information on that aspect as well.

Best,
Alan

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.