[RFE] reinstall a single subordinate unit
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
A unit may for various reasons[*] end up in a corrupt state, and getting back to a healthy one is generally non-trivial. In such a situation restarting from scratch (i.e. reinstalling the unit) is the safest course of action, but if the affected unit is a subordinate there are only suboptimal options:
1. remove and re-add the principal unit
2. remove and re-add the machine
3. remove and re-add the relation to the principal unit
Choosing option 1 involves having to (potentially) reinstall several healthy charms, and may require some manual cleanup work afterwards.
Choosing option 2 may simply be impossible in production: if the subordinate is hosted on a baremetal OpenStack compute node it is not desirable to have to migrate instances and have to recreate ceph OSDs for what amounts to a configuration management issue.
Choosing option 3 has the benefit of limiting the scope of the change to a single application, but causes a cloud-wide reinstallation of all the related subordinate units, potentially triggering dozens of removals/
For the reasons above, I think it would be very beneficial to have a mechanism to trigger a clean reinstallation of a single subordinate unit that does not affect any other (healthy) unit.
[*] as a concrete example: we've had a faulty bcache layer cause corruption in state/uniter and .unit-state.db files of a subordinate installed on a compute node. Removing and readding the relation was the only way out, but triggered changes on all the other computes of that cloud.
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → Wishlist |
This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.