Comment 4 for bug 1698154

Trent Lloyd (lathiat) wrote :

The currently committed scheme to keep track of which osd-devices were already formatted is likely not a sufficient safe guard and needs further improvement. Reasons are

(1) Device letters change (i.e. when adding a new disk, but it comes up earlier in the device list order). This is very common on production hardware, not a side case. Devices are also not guaranteed to come up in the same order. It would be easy for an existing OSD to get pushed off into a device name that is not listed or previously initialised and accidentally get re-initialised.

(2) New OSDs are added to the host but manually with ceph commands (in a way this is similar to the above). This has happened on production deployments when the add-disk action command was broken. However even if you use the add-disk action, the device may not be listed in the osd-devices config option if the action was taken before the charm stored the list of previously initialised osd-devices (the cation uses osdize(), so an new deployments would populate it - but old deployments won't). Even if we added code to populate this value on existing clusters from the osd-devices config option, OSDs that are in use but not in the config option won't be added to the previously initialised list and thus are still vulnerable to being reformatted.

(3) This is slightly out of use case for the charms, but something I can still see people doing and it would be ideal to avoid data loss from is also moving an existing OSD from one machine to another to recover it, which is something explicitly supported and generally recommended as a perfectly reasonable action within the Ceph community.

In all of the above situations, even with the improved commit, data loss could still occur. Plus a fix is still required for already deployed clusters.

An improvement on this situation would be to have the osd-devices stored list store the OSD FSID(stored in OSDPATH/fsid file), device ID path (e.g. /dev/disk/by-id) or similar instead of the block device path but that won't help with the OSD movement scenario or the manual ceph-disk initialise scenario.

Otherwise one way we could potentially handle this better which would automatically include already deployed OSDs Ceph also stores the Ceph cluster "fsid" which is unique to the current ceph cluster on the OSD in the "OSDPATH/ceph_fsid" file. This would be a much better way to determine if the OSD is part of the current cluster or not. Unsure if there is an easy way to detect this value without first mounting the OSD, though.

Lastly as a minor note, a place that storing the OSD UUID instead is likely to go wrong in future is when doing bluestore conversions, which requires reformatting the OSD which would give it a new UUID. That workflow may need to be considered. Bluestore and Luminous also format and partition the disk differently (in particular LVM may be used instead of GPT in some cases) and this may change or influence any code designed to detect this information.

Having said all of that, I think that this option should be removed and instead replaced with a juju action to one-shot reformat OSDs.

On the first instance it seems like this option would be used mostly in testing but looking at it's use in actual deployments I think it's often used because an environment may be redeployed a couple of times during pre-deployment testing. Having this option turned on for long term production is dangerous (even with the safe guards) and I think a juju action that instructions are given for in the status output would replace it sufficiently, and prevent the likely reality that the option is instead just left on in production (as we have seen in most deployments now). I don't think it's reasonable to expect users to remember to turn the option off.

Even if we kept the option for test/development scenarios, by adding a juju action to complement it, we could ensure that osd-reformat is set to False by default in real deployment scenarios. So I think this option is really needed either way.

The majority of example (e.g. openstack-base) and actual production bundles I have looked at currently all set this option to yes.