1.8b1 Failed deployment/release timeout
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Raphaël Badin |
Bug Description
Every so often MAAS incorrectly marks a node as failing to deploy (or to release).
The node event log claims that it failed to power on (via AMT), even though the PXE curtin request was received and recorded.
Log from a failed deployment:
PXE Request - power off
Failed to power on node - Timed out Thu, 09 Apr. 2015 04:53:33
Node changed status - From 'Deploying' to 'Failed deployment' Thu, 09 Apr. 2015 04:53:33
Installation complete - Node disabled netboot Thu, 09 Apr. 2015 04:53:28
PXE Request - curtin install Thu, 09 Apr. 2015 04:52:00
PXE Request - curtin install Thu, 09 Apr. 2015 04:51:58
Powering node on Thu, 09 Apr. 2015 04:51:33
Log from a successful deployment:
Node changed status - From 'Deploying' to 'Deployed' Wed, 08 Apr. 2015 13:20:54
Installation complete - Node disabled netboot Wed, 08 Apr. 2015 13:20:19
PXE Request - curtin install Wed, 08 Apr. 2015 13:18:46
Node powered on Wed, 08 Apr. 2015 13:18:43
Powering node on Wed, 08 Apr. 2015 13:18:20
Log from a failed release:
Node powered off Thu, 09 Apr. 2015 09:21:03
Failed to power off node - Timed out Thu, 09 Apr. 2015 09:20:41
Node changed status - From 'Releasing' to 'Releasing failed' Thu, 09 Apr. 2015 09:19:40
Powering node off Thu, 09 Apr. 2015 09:18:41
Shouldn't a node that issues a PXE request, completes installation etc. be marked as powered on?
Related branches
- Gavin Panella (community): Approve
-
Diff: 15 lines (+2/-2)1 file modifiedsrc/provisioningserver/rpc/power.py (+2/-2)
tags: | added: landscape power |
Changed in maas: | |
importance: | Undecided → Critical |
status: | New → Triaged |
milestone: | none → 1.8.0 |
summary: |
- 1.8b1 Failed deployment timeout powering on AMT + 1.8b1 Failed deployment/release timeout |
description: | updated |
Changed in maas: | |
assignee: | nobody → Raphaël Badin (rvb) |
status: | Triaged → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
Maybe retries are succeeding but MAAS doesn't notice? It's odd