Comment 0 for bug 1417874

Revision history for this message
Stuart Bishop (stub) wrote : Impossible to cleanly remove a node from a cluster

For charms needing to manage a clustered service (such as MongoDB, Cassandra, Redis, Swift) it is impossible to safely destroy a unit. The departing node on the doomed unit must be decommissioned to avoid potential data loss and manual repair of the cluster. Unfortunately, juju provides no suitable hooks to do this. The peer relation-departed and relation-broken hooks are supposed to support this use case, but do not.

The relation-departed hook cannot be used to decommission the departing node, because it is impossible to tell if the unit running the hook contains the doomed node or not. For example, in a 3 unit service (cassandra/0, cassandra/1, cassandra/2), if we drop unit 1 the following hooks are fired:

cassandra/0's peer relation-departed hook with $REMOTE_UNIT==cassandra/1
cassandra/1's peer relation-departed hook with $REMOTE_UNIT==cassandra/0
cassandra/1's peer relation-departed hook with $REMOTE_UNIT==cassandra/2
cassandra/2's peer relation-departed hook with $REMOTE_UNIT==cassandra/1

When any of these hooks are run, there is not enough context to tell if it is the local unit or $REMOTE_UNIT that is the unit being destroyed. The hooks cannot tell which node needs to be decommissioned and safely removed from the cluster.

The relation-broken hook cannot be used to decommission the departing node either, as it is run after the relation-departed hooks. The relation-departed hooks are responsible for revoking access from departing units, so the relation-broken hook cannot safely remove its node from the cluster as by this point the rest of the cluster is refusing to talk to it.

Without new features, I think charm authors are forced to use one of the following work arounds:
   - Require the operator to manually decomission nodes before dropping a unit
   - Require the operator to manually repair the cluster after dropping a unit
   - Keep access open to departing units indefinitely and decommission the node in relation-broken, rather than have relation-departed keep the cluster secure.

For a fix, I think we require a new hook that is run on the departing unit before the relation-departed hooks are fired. Because relation-departed is the only point units can revoke access from the doomed unit, decommissioning must happen before then. If decommissioning is attempted in relation-departed or relation-broken, access rights will likely have already been removed by some or all of the remaining units.