MAAS boots inaccessible nodes when there's no SSH registered for a given user

Bug #986185 reported by Diogo Matsubara
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Jeroen T. Vermeulen

Bug Description

Consider the following situation:
1. User installas the MAAS server from scratch
2. User doesn't add a ssh key to his account
3. User enlists and commission a new node
4. MAAS boots that node and set it to Ready

After step 4, the node will talk to the MAAS server to get a SSH key, since there's no SSH key for the user, that node will be inaccessible over ssh. This is confusing because the node is in the Ready state but no one can use it.

Tags: api ui appserver

Related branches

Revision history for this message
Diogo Matsubara (matsubara) wrote :

I fixed the docs in https://wiki.ubuntu.com/ServerTeam/MAAS/Juju to instruct the user to add a SSH, so this makes this bug less of a problem

Revision history for this message
Diogo Matsubara (matsubara) wrote :

Setting to high since we have a workaround that mitigates this problem

Changed in maas:
importance: Critical → High
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

That's not true for Juju access. Juju takes care of making sure the SSH keys of the user running juju are available on the nodes. That only affects Nodes started using 'Start node' (which isn't working anyway).

description: updated
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Should we prevent use of "start node" where the user has no ssh keys, or just warn? I'm not sure of the implications of outright prevention.

tags: added: api appserver ui
Revision history for this message
C de-Avillez (hggdh2) wrote :

Actually, not only when you press 'start node'. You can also just reboot the target machine, and it will go thru the install process -- and end up unusable.

Additionally, I did not know we intended MAAS to be only usable twith Juju.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

A small correction. As far as I'm aware, the node does not retrieve an ssh key for the user from the MAAS server after step 4. A Ready node is not “ready for use”; it's “ready for allocation” to a user, who will get SSH access once they acquire the node. In the development branch, clicking “Start node” on the node's page will allocate it to you.

So I think it's that point of acquisition where we should check for SSH keys. Especially given that the problem may happen to a different user from the admin who commissioned the node.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Discussed a solution for the remaining problem with Raphaël: generalize node actions so that they support conditional disabling like we do with the Delete Node button. Make the Delete Node button an action. Likewise, conditionally disable the Start Node button if you don't have an SSH key set up. In that situation, the Start Node button will be disabled and a tooltip will tell you why.

Changed in maas:
assignee: nobody → Jeroen T. Vermeulen (jtv)
status: Triaged → In Progress
Revision history for this message
Julian Edwards (julian-edwards) wrote :

I've seen some mention of this affecting commissioning - I don't see how that can be a problem as we don't expect people to SSH to a machine when it's commissioning. Was this mentioned in error?

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Bug 987961 describes essentially the same problem, but in terms of an admin adding (and thus commissioning) a node while not having any SSH keys set up. The bug we're discussing now describes it in terms of keys being installed after commissioning. I marked the two as duplicates.

As you say, why should they expect to be able to SSH into the node? The fact that commissioning leaves a node in a state where that is possible just adds to the confusion and encourages dangerous practice. Once Ready, the node could be allocated and wiped while the admin was logged in or God forbid even running services on it.

I think the confusion over the Start Node button also contributed. We now know that in MAAS terms, it was expected to acquire-and-start, not just start.

That combination of SSH access and Ready state may have caused a mistaken impression that a Ready node, after starting, was “ready to use” rather than “available for allocation.” At least, that is the confusion I was concerned about when we discussed that terminology.

Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.