Activity log for bug #1417874

Date Who What changed Old value New value Message
2015-02-04 05:10:20 Stuart Bishop bug added bug
2015-02-05 14:10:42 Curtis Hovey tags charms feature hooks
2015-02-05 14:11:03 Curtis Hovey juju-core: status New Triaged
2015-02-05 14:11:05 Curtis Hovey juju-core: importance Undecided Medium
2015-02-05 14:22:11 Curtis Hovey tags charms feature hooks canonical-is charms feature hooks
2015-02-12 17:33:09 Jorge Niedbalski tags canonical-is charms feature hooks canonical-is charms cts feature hooks
2015-03-03 22:53:28 Curtis Hovey juju-core: milestone 1.24-alpha1
2015-04-21 13:54:13 Curtis Hovey juju-core: milestone 1.24-alpha1
2015-05-25 11:18:27 Stuart Bishop summary Impossible to cleanly remove a node from a cluster Impossible to cleanly remove a unit from a relation
2015-05-25 11:47:48 Stuart Bishop description For charms needing to manage a clustered service (such as MongoDB, Cassandra, Redis, Swift) it is impossible to safely destroy a unit. The departing node on the doomed unit must be decommissioned to avoid potential data loss and manual repair of the cluster. Unfortunately, juju provides no suitable hooks to do this. The peer relation-departed and relation-broken hooks are supposed to support this use case, but do not. The relation-departed hook cannot be used to decommission the departing node, because it is impossible to tell if the unit running the hook contains the doomed node or not. For example, in a 3 unit service (cassandra/0, cassandra/1, cassandra/2), if we drop unit 1 the following hooks are fired: cassandra/0's peer relation-departed hook with $REMOTE_UNIT==cassandra/1 cassandra/1's peer relation-departed hook with $REMOTE_UNIT==cassandra/0 cassandra/1's peer relation-departed hook with $REMOTE_UNIT==cassandra/2 cassandra/2's peer relation-departed hook with $REMOTE_UNIT==cassandra/1 When any of these hooks are run, there is not enough context to tell if it is the local unit or $REMOTE_UNIT that is the unit being destroyed. The hooks cannot tell which node needs to be decommissioned and safely removed from the cluster. The relation-broken hook cannot be used to decommission the departing node either, as it is run after the relation-departed hooks. The relation-departed hooks are responsible for revoking access from departing units, so the relation-broken hook cannot safely remove its node from the cluster as by this point the rest of the cluster is refusing to talk to it. Without new features, I think charm authors are forced to use one of the following work arounds: - Require the operator to manually decomission nodes before dropping a unit - Require the operator to manually repair the cluster after dropping a unit - Keep access open to departing units indefinitely and decommission the node in relation-broken, rather than have relation-departed keep the cluster secure. For a fix, I think we require a new hook that is run on the departing unit before the relation-departed hooks are fired. Because relation-departed is the only point units can revoke access from the doomed unit, decommissioning must happen before then. If decommissioning is attempted in relation-departed or relation-broken, access rights will likely have already been removed by some or all of the remaining units. A relation-departed hook cannot be used by a charm to perform cleanup, as the remote service may have already run its relation-departed hook and revoked access. From the documentation, "this should be used to remove all references to the remote unit, because there's no guarantee that it's still part of the system". The situation is worse for a peer relation. In addition to the above catch-22, the unit running the relation-departed hook has no idea if it is the unit leaving the service or if it is the remote unit leaving the service. So as a concrete example, it is impossible for the Cassandra charm to automatically decommission a node before it is removed. The peer-relation-departed hook cannot decommission the node because the charm has no idea which unit is actually being dropped. And even if it did, the decommissioning process would fail as it takes time and the other units in the cluster will have revoked its access before it completed. Instead, the operator is required to manually decommission nodes before dropping the unit. Failing to do this requires lengthy cleanup operations, and data stored at replication factor 1 will be lost. Before the relation-departed hooks are run, another hook needs to be run on the departing unit to provide it with the opportunity it needs. relation-departing seems the obvious choice.
2015-08-06 13:26:50 Edward Hope-Morley tags canonical-is charms cts feature hooks canonical-is charms feature hooks sts
2015-11-09 14:11:10 Mario Splivalo bug added subscriber Mario Splivalo
2016-02-08 19:43:08 Jorge Niedbalski tags canonical-is charms feature hooks sts canonical-is charms feature hooks sts sts-needs-review
2016-07-12 11:27:38 Stuart Bishop bug added subscriber The Canonical Sysadmins
2016-08-11 14:32:02 Jorge Niedbalski tags canonical-is charms feature hooks sts sts-needs-review canonical-is charms feature hooks sts-rfe
2016-08-11 15:22:28 Jorge Niedbalski tags canonical-is charms feature hooks sts-rfe canonical-is charms feature hooks sts sts-rfe
2016-08-11 15:32:19 Jorge Niedbalski summary Impossible to cleanly remove a unit from a relation [RFE] Impossible to cleanly remove a unit from a relation
2016-10-17 13:17:37 Anastasia juju-core: status Triaged Won't Fix
2016-10-20 12:36:51 Anastasia bug task added juju
2016-10-20 12:37:00 Anastasia juju: status New Triaged
2016-10-20 12:37:05 Anastasia juju: importance Undecided Wishlist
2017-03-27 06:07:57 Ian Booth juju: milestone 2.2-beta3
2017-03-27 06:08:04 Ian Booth juju: importance Wishlist High
2017-04-28 15:27:35 Canonical Juju QA Bot juju: milestone 2.2-beta3 2.2-beta4
2017-05-11 18:22:59 Canonical Juju QA Bot juju: milestone 2.2-beta4 2.2-rc1
2017-05-31 02:00:58 Tim Penhey juju: milestone 2.2-rc1
2018-03-27 21:00:06 Dmitrii Shcherbakov bug added subscriber Dmitrii Shcherbakov
2018-03-27 21:05:11 Dmitrii Shcherbakov bug watch added https://github.com/juju/docs/issues/2357
2019-11-27 08:46:06 Sandor Zeestraten bug added subscriber Sandor Zeestraten
2020-03-24 15:22:11 Achilleas Anagnostopoulos juju: assignee Achilleas Anagnostopoulos (achilleasa)
2020-03-24 15:22:18 Achilleas Anagnostopoulos juju: milestone 2.8-beta1
2020-03-24 15:22:26 Achilleas Anagnostopoulos juju: status Triaged In Progress
2020-03-30 15:24:14 Achilleas Anagnostopoulos juju: status In Progress Fix Committed
2020-03-31 11:22:57 Dominique Poulain bug added subscriber Dominique Poulain
2020-04-28 12:15:51 Vladimir Grevtsev bug added subscriber Vladimir Grevtsev
2020-06-04 00:41:06 Harry Pidcock juju: status Fix Committed Fix Released