Being unable to find usable space when osd-journal / bluestore-db / bluestore-wal is set should fail and enter a blocked state

Bug #1833029 reported by Trent Lloyd
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Triaged
Wishlist
Unassigned

Bug Description

Currently if you set osd-journal, bluestore-db or bluestore-wal to an incorrect value such as "nvme0n1" instead of "/dev/nvme0n1" the charm silently ignores this and initializes the OSD without such a journal/db/wal device.

This has resulted in multiple production deployments not using their NVMe devices and this was not picked up until performance problems arose months into the deployment once the cluster load was high enough to notice this issue. Erasing and re-initializing all of the OSDs to correct this issue is an intensive and currently manual process.

In this example situation the issue is due to an incorrectly set variable (forgetting to use /dev/) however there are other situations where this could arise such as the device in question being valid but the wrong device, the device having no room left for further partitions (perhaps during OSD expansion), etc.

I believe the current logic is designed to allow you to specify multiple potential devices that may not exist on all nodes and allow the charm to find a possibly valid device, or, to create partitions over multiple devices as space runs out.

I propose that if such a config option is set and no such devices are found during device initializion, the charm should not initialize that device and enter a blocked state instead.

Tags: sts
Trent Lloyd (lathiat)
Changed in charm-ceph-osd:
status: New → Confirmed
tags: added: sts
Trent Lloyd (lathiat)
Changed in charm-ceph-osd:
status: Confirmed → New
James Page (james-page)
Changed in charm-ceph-osd:
status: New → Triaged
importance: Undecided → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.