[add-disk action, bcache] charm tries to add bcache OSD again
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
New
|
Undecided
|
Unassigned |
Bug Description
The charm tries to re-initialize a disk already active using bcache when it was added using the add-disk action.
[Steps to reproduce]
1) Deploy ceph-osd charm using openstack-
2) Add a bcache disk
juju add-storage ceph-osd/0 cache-devices=
3) Add a new OSD disk
juju add-storage ceph-osd/0 osd-devices=
4) Due to Bug #1985884 the added OSD didn't use bcache, so remove and re-add it
juju run-action --wait ceph-osd/0 remove-disk osd-devices=
juju run-action ceph-osd/0 add-disk osd-devices=
5) The added OSD has it's reweight set to 0 from the remove-disk, so we need to fix that
juju ssh ceph-mon/0 sudo ceph osd crush reweight osd.N 1
6) Confirm ceph is happy everything is as expected
7) Add another disk
juju add-storage ceph-osd/0 osd-devices=
[Result]
When re-scanning the disks it sees /dev/bcache0 as already processed by the unit and not /dev/vdd the underlying disk. It then tries and fails to re-add it because the device is already in use.
2022-08-12 07:42:04 INFO unit.ceph-
2022-08-12 07:42:05 INFO unit.ceph-
2022-08-12 07:42:05 WARNING unit.ceph-
2022-08-12 07:42:05 INFO unit.ceph-
2022-08-12 07:42:05 WARNING unit.ceph-
2022-08-12 07:42:05 WARNING unit.ceph-
2022-08-12 07:42:05 WARNING unit.ceph-
It tries to do this over and over again
[Fix]
This same kind of issue has re-occured on multiple occasions. The charm really needs to grow an awareness of multi-disk interactions and at runtime parse and understand the full block device tree including LVM, bcache, vault encryption, db/wal devices, etc.
The charm seems to make some attempt to keep track of this as 'osd-aliases' however bcache device names are not reliable and will change and reorder on boot. The charm really needs to actively associate OSDs at runtime.