Removing a lost unit (due to a redeployed server that became a new machine), releases the server in MAAS, bringing down the new machine.

Bug #1948427 reported by Jose Guedez
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

Juju: 2.8.7
MAAS: 2.9.0~rc3

Looks like juju maps different machines to the same identical HW in MAAS, so actions on one machine or units in that machine affect the other with major impact.

Steps to reproduce:

1-Deploy a juju unit U/1 to a new server in MAAS (deploys the server and creates juju machine N1)
2-Server HW fails, all juju agents in N1 become lost in juju status.
3-Server HW is repaired with a new motherboard for example, is re-commissioned without issues in MAAS.
4-Another unit U/2 is deployed in juju, the original machine is now deployed by juju in MAAS (creates a new juju machine N2)
5-Remove the stale units U/1, need --force because otherwise nothing happens.
 juju remove-unit --force U/1

Outcome:
* U/1 disappears from juju status, and the server is released in MAAS
* Since the server apparently internally it had the same ID (so internally 2 juju machines had the same ID, one status lost N1, the other working N2) => MAAS shuts down the hardware (potentially wiping storage in it)
* All units in juju machine N2 become lost as well. This can have major impact to services running there, hard to recover, and potentially cause data loss depending on MAAS configuration.

Revision history for this message
Andrea Ieri (aieri) wrote :

(thinking aloud) once you have two machines in the same model with different ids it's already too late. Could perhaps juju notice the attempt at redeploying an already existing machine under a new id and refuse to proceed?

Revision history for this message
James Troup (elmo) wrote :

Subscribing field-high as this could very easily cause data loss.

Revision history for this message
Felipe Reyes (freyes) wrote :

when juju deploys a MAAS machine sets the "owner-data", some of the keys stored there are juju-controller-uuid, juju-machine-id, juju-model-uuid and juju-units-deployed, this is something that could/should be checked before releasing a maas node.

Let's say host baremetal01 was originally deployed as machine-1 with ubuntu/0 allocated, then baremetal01 gets redeployed as machine-2 with new-ubuntu/0, when "juju remove-unit ubuntu/0" is removed, and juju attempts to remove baremetal01 (since no more units are allocated to machine-1), it should check with maas owner data of baremetal01 that juju-machine-id=="machine-1"

Changed in juju:
importance: Undecided → High
status: New → Triaged
tags: added: maas-provider remove-unit
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.