Ceph OSD Charm

Bug #1698154
Comment #5

Comment 5 for bug 1698154

Revision history for this message

Felipe Reyes (freyes) wrote on 2018-05-29: Re: [Bug 1698154] Re: race condition causes reformat of osd - ceph-osd processes and charm config-changed hook upon boot

On Tue, May 29, 2018 at 05:57:56AM -0000, Trent Lloyd wrote:
> The currently committed scheme to keep track of which osd-devices were
> already formatted is likely not a sufficient safe guard and needs
> further improvement. Reasons are
>
> (1) Device letters change (i.e. when adding a new disk, but it comes up
> earlier in the device list order). This is very common on production
> hardware, not a side case. Devices are also not guaranteed to come up in
> the same order. It would be easy for an existing OSD to get pushed off
> into a device name that is not listed or previously initialised and
> accidentally get re-initialised.

If a user is concerned about this, they should use /dev/disk/{by-path,by-id} in the osd-devices option, I don't believe we have any restriction on the usage of those paths, we just check the prefix '/dev/' to check if the argument is a directory or a disk device.

>
> (2) New OSDs are added to the host but manually with ceph commands (in a
> way this is similar to the above). This has happened on production
> deployments when the add-disk action command was broken.

We could add an action to mark a device as osdized to allow an operator merge the view of the charm with what the system is actually running.

> However even if you use the add-disk action, the device may not be
> listed in the osd- devices config option if the action was taken
> before the charm stored the list of previously initialised osd-devices
> (the cation uses osdize(), so an new deployments would populate it -
> but old deployments won't).

As I mentioned in comment #3, the upgrade-charm needs to populate the osd-devices list.

> Even if we added code to populate this value on existing clusters from
> the osd-devices config option, OSDs that are in use but not in the
> config option won't be added to the previously initialised list and
> thus are still vulnerable to being reformatted.

osd-reformat is not an option someone should have set in this case and in conjunction with the action I suggest previously a user could get to state where the charm is in sync with the running system.

>
> (3) This is slightly out of use case for the charms, but something I can
> still see people doing and it would be ideal to avoid data loss from is
> also moving an existing OSD from one machine to another to recover it,
> which is something explicitly supported and generally recommended as a
> perfectly reasonable action within the Ceph community.

In this case the user should have osd-reformat set to false (or "").

[...]
> Having said all of that, I think that this option should be removed and instead replaced with a juju action to one-shot reformat OSDs.

This has been discussed, but the lack of support in the bundles to run actions at the end of a deployment make it non an option.

On Tue, May 29, 2018 at 05:57:56AM -0000, Trent Lloyd wrote:
> The currently committed scheme to keep track of which osd-devices were
> already formatted is likely not a sufficient safe guard and needs
> further improvement.  Reasons are
> 
> (1) Device letters change (i.e. when adding a new disk, but it comes up
> earlier in the device list order). This is very common on production
> hardware, not a side case. Devices are also not guaranteed to come up in
> the same order. It would be easy for an existing OSD to get pushed off
> into a device name that is not listed or previously initialised and
> accidentally get re-initialised.

> 
> (2) New OSDs are added to the host but manually with ceph commands (in a
> way this is similar to the above). This has happened on production
> deployments when the add-disk action command was broken.

We could add an action to mark a device as osdized to allow an operator merge the view of the charm with what the system is actually running.

As I mentioned in comment #3, the upgrade-charm needs to populate the osd-devices list.

> 
> (3) This is slightly out of use case for the charms, but something I can
> still see people doing and it would be ideal to avoid data loss from is
> also moving an existing OSD from one machine to another to recover it,
> which is something explicitly supported and generally recommended as a
> perfectly reasonable action within the Ceph community.

In this case the user should have osd-reformat set to false (or "").

[...]
> Having said all of that, I think that this option should be removed and instead replaced with a juju action to one-shot reformat OSDs.

This has been discussed, but the lack of support in the bundles to run actions at the end of a deployment make it non an option.