Failed commissioning leaves machine completely cut off from MAAS

Bug #1730524 reported by Jeff Lane 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Lee Trager

Bug Description

MAAS's designed behaviour changes the MAAS user password to a randomly generated password in the BMC every time a node is enlisted, commissioned or re-commissioned. There is a chance that a failed commissioning run can cause the node to become completely cut off from MAAS where the only options to recover the node is to either set a default user and password in the BMC and modify MAAS to use that combo instead of the default maas:$RANDOMPASSWORD combo, OR delete the node and repeat enlistment and commissioning.

What happens in this case is this:

MAAS powers the node on to commission
commissioning ephemeral boots and begins doing it's thing.
Ephemeral sets the new password for the maas user in the BMC
Ephemeral does other stuff
Something breaks
Ephemeral fails to update MAAS with new maas user password
MAAS markes node as Failed Commissioning and shows Power Error because it no longer has the current BMC password.

At this point MAAS is no longer able to talk to the node at all. The only way to recover is, as stated above, manually set a password in the BMC and modify MAAS manually, or delete the node and start over from scratch.

To fix this. either:

1: Change the behaviour to ONLY create passwords during enlistment and subsequently only on user demand, rather than re-create passwords every time a node is commissioned.
2: At least have the Ephemeral IMMEDIATELY update MAAS with the new maas user password BEFORE it attempts anything else.

Idea 1 fixes the problem, Idea 2 is more of a band-aid and could still leave the system in an uncontrollable state.

I discovered this issue while trying to root cause a but with apt proxies during commissioning:
https://bugs.launchpad.net/maas/+bug/1730456

Related branches

Jeff Lane  (bladernr)
description: updated
Lee Trager (ltrager)
Changed in maas:
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Lee Trager (ltrager)
Changed in maas:
milestone: none → next
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Changed in maas:
milestone: next → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.