Existing filestore OSDs specified as folder paths in osd-devices will be erroneously half-initialised as BlueStore if bluestore=true
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
Fix Released
|
Undecided
|
dongdong tao |
Bug Description
Similar in principal to #1841010.
This bug concerns Ceph deployments with bluestore set to true on the ceph-osd charm.
When osd-devices specifies a series of folder paths, which in turn are XFS-formatted, bache devices mounted via UUID, containing FileStore data, the config-changed hook will attempt to re-initialise the OSDs as bluestore. This fails, but not before the ceph-disk utility updates the 'type' file in the OSD hierarchy to bluestore, preventing the OSD from starting up again.
Attempting to start OSDs after this happens produces the below error. Manually updating the 'type' file back to 'filestore' allows the OSDs to start without apparent data corruption, however the charm is left in a bad state, as setting bluestore=false will trigger another potentially disruptive config-changed hook to run.
The following protections are therefore required -
1) The charm should take no action if the specified directory or block device contains data of any kind. This would allow setting bluestore=false without risk of again updating the 'type' file.
2) The Ceph ceph-disk tool should ideally also be given similar protections. This will likely require filing an upstream bug to work out why type is being updated when the directory contains a filestore OSD structure already.
--- logs follow ---
Aug 22 01:46:11 openstack systemd[1]: Starting Ceph object storage daemon osd.9...
Aug 22 01:46:11 openstack systemd[1]: Started Ceph object storage daemon osd.9.
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.676121 7fab52d5de00 -1 bluestore(
failed to open /var/lib/
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.676154 7fab52d5de00 -1 bluestore(
failed to open /var/lib/
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.676173 7fab52d5de00 -1 bluestore(
failed to open /var/lib/
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.676196 7fab52d5de00 -1 bluestore(
failed to open /var/lib/
Aug 22 01:46:11 openstack ceph-osd[1424270]: starting osd.9 at - osd_data /var/lib/
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.697743 7fab52d5de00 -1 bdev(0x560426880d80 /var/lib/
en got: (2) No such file or directory
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.698421 7fab52d5de00 -1 bluestore(
failed to open /var/lib/
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.698472 7fab52d5de00 -1 bdev(0x560426880fc0 /var/lib/
en got: (2) No such file or directory
Aug 22 01:46:11 openstack-4 ceph-osd[1424270]: 2019-08-22 01:46:11.698482 7fab52d5de00 -1 osd.9 0 OSD:init: unable to mount object store
Aug 22 01:46:11 openstack-4 ceph-osd[1424270]: 2019-08-22 01:46:11.698503 7fab52d5de00 -1 ** ERROR: osd init failed: (2) No such file or directory
Aug 22 01:46:11 openstack-4 systemd[1]: ceph-osd@9.service: Main process exited, code=exited, status=1/FAILURE
Aug 22 01:46:11 openstack-4 systemd[1]: ceph-osd@9.service: Unit entered failed state.
Aug 22 01:46:11 openstack-4 systemd[1]: ceph-osd@9.service: Failed with result 'exit-code'.
tags: | added: canonical-bootstack |
tags: | added: uosci |
Changed in charm-ceph-osd: | |
milestone: | none → 19.10 |
Changed in charm-ceph-osd: | |
status: | Fix Committed → Fix Released |
-Detailed bug analysis:
Charm did call ceph-disk command "sudo', '-u', 'ceph', 'ceph-disk', 'prepare', '--data-dir', '/srv/ceph/ceph1', '--bluestore'"
I followed the ceph-disk flow, ceph-disk only changed the 'type' flag to bluestore and failed to continue because ceph-disk find out that
the path was already initialized by osd, ceph-disk check the 'magic' file to see if the path was already initilized or not.
so, ceph-disk did its protection very well.
But, osd charm didn't do the protection well, it intended to allow ceph-disk to initialize an existing osd dir.
osd-261 didn't set a dir path to db as 'osd-devices' for a directory based osd . 'osd-devices' , []) .format( path))
and osd-291 has below protection code to check if a dir path is an existing 'osd-devices'
----
osd_devices = db.get(
if path in osd_devices:
log('Device {} already processed by charm,'
' skipping'
return
----
Since we are upgrading from osd-261 to osd-291, above protection will fail absolutely at the first time for 'config-changed' hook.
But, As the db should now set the 'osd-devices' correctly, the config-changed hook should not try to re-initialize the existing dir as the first time.