Being unable to find usable space when osd-journal / bluestore-db / bluestore-wal is set should fail and enter a blocked state
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
Triaged
|
Wishlist
|
Unassigned |
Bug Description
Currently if you set osd-journal, bluestore-db or bluestore-wal to an incorrect value such as "nvme0n1" instead of "/dev/nvme0n1" the charm silently ignores this and initializes the OSD without such a journal/db/wal device.
This has resulted in multiple production deployments not using their NVMe devices and this was not picked up until performance problems arose months into the deployment once the cluster load was high enough to notice this issue. Erasing and re-initializing all of the OSDs to correct this issue is an intensive and currently manual process.
In this example situation the issue is due to an incorrectly set variable (forgetting to use /dev/) however there are other situations where this could arise such as the device in question being valid but the wrong device, the device having no room left for further partitions (perhaps during OSD expansion), etc.
I believe the current logic is designed to allow you to specify multiple potential devices that may not exist on all nodes and allow the charm to find a possibly valid device, or, to create partitions over multiple devices as space runs out.
I propose that if such a config option is set and no such devices are found during device initializion, the charm should not initialize that device and enter a blocked state instead.
Changed in charm-ceph-osd: | |
status: | New → Confirmed |
tags: | added: sts |
Changed in charm-ceph-osd: | |
status: | Confirmed → New |
Changed in charm-ceph-osd: | |
status: | New → Triaged |
importance: | Undecided → Wishlist |