[wishlist] Action to 'purge-osd' and 'set-osd-out' needed for fully-charmed disk lifecycle management
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph Monitor Charm |
Triaged
|
Wishlist
|
Unassigned |
Bug Description
When performing a replacement of an OSD, it is best practice to purge (Luminous) or remove the disk from the crush, osd, and auth maps (pre-Luminous) after setting the disk out.
First, we need the ability to set a single OSD out/down. It is possible with the ceph-osd charm to set an entire node's OSDs out with the osd-out action, but either ceph-mon or ceph-osd need the ability to take only a single failing disk out of the cluster before an expected replacement or in response to a failure.
Secondly, the disk will need to be able to be purged/removed from the maps so that ceph-osd charm action add-disk can be used upon successful replacement of the disk.
Here's a typical process today:
1. juju ssh ceph-mon/0 sudo ceph osd out $OSD_NAME (aka osd.26)
2. juju ssh -t ceph-mon/0 sudo watch ceph status
* Wait for this to show HEALTH_OK and no "recovery/backfill" lines
3. juju ssh ceph-mon/0 sudo ceph osd purge $OSD_ID --yes-i-
* Note: this is a Luminous only command. Fall-back for pre-luminous would be: osd rm; osd crush rm; osd auth rm on pre-luminous as noted in [1].
4. juju run-action --wait $OSD_UNIT zap-disk devices=
* Be VERY SURE as this will destroy data completely from the drive. If you've already added the disk back into the cluster at the same ID as it was previously (i.e. before/after is osd.26), do not run this command. Instead you'll need to use lvm commands to directly remove the vg and pv from record.
[1] http://
Changed in charm-ceph-mon: | |
status: | New → Triaged |
importance: | Undecided → Wishlist |