Existing filestore OSDs specified as folder paths in osd-devices will be erroneously half-initialised as BlueStore if bluestore=true

Bug #1841021 reported by James Hebden
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Fix Released
Undecided
dongdong tao

Bug Description

Similar in principal to #1841010.

This bug concerns Ceph deployments with bluestore set to true on the ceph-osd charm.
When osd-devices specifies a series of folder paths, which in turn are XFS-formatted, bache devices mounted via UUID, containing FileStore data, the config-changed hook will attempt to re-initialise the OSDs as bluestore. This fails, but not before the ceph-disk utility updates the 'type' file in the OSD hierarchy to bluestore, preventing the OSD from starting up again.

Attempting to start OSDs after this happens produces the below error. Manually updating the 'type' file back to 'filestore' allows the OSDs to start without apparent data corruption, however the charm is left in a bad state, as setting bluestore=false will trigger another potentially disruptive config-changed hook to run.

The following protections are therefore required -
1) The charm should take no action if the specified directory or block device contains data of any kind. This would allow setting bluestore=false without risk of again updating the 'type' file.
2) The Ceph ceph-disk tool should ideally also be given similar protections. This will likely require filing an upstream bug to work out why type is being updated when the directory contains a filestore OSD structure already.

--- logs follow ---

Aug 22 01:46:11 openstack systemd[1]: Starting Ceph object storage daemon osd.9...
Aug 22 01:46:11 openstack systemd[1]: Started Ceph object storage daemon osd.9.
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.676121 7fab52d5de00 -1 bluestore(/var/lib/ceph/osd/ceph-9/block) _read_bdev_label
failed to open /var/lib/ceph/osd/ceph-9/block: (2) No such file or directory
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.676154 7fab52d5de00 -1 bluestore(/var/lib/ceph/osd/ceph-9/block) _read_bdev_label
failed to open /var/lib/ceph/osd/ceph-9/block: (2) No such file or directory
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.676173 7fab52d5de00 -1 bluestore(/var/lib/ceph/osd/ceph-9/block) _read_bdev_label
failed to open /var/lib/ceph/osd/ceph-9/block: (2) No such file or directory
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.676196 7fab52d5de00 -1 bluestore(/var/lib/ceph/osd/ceph-9/block) _read_bdev_label
failed to open /var/lib/ceph/osd/ceph-9/block: (2) No such file or directory
Aug 22 01:46:11 openstack ceph-osd[1424270]: starting osd.9 at - osd_data /var/lib/ceph/osd/ceph-9 /var/lib/ceph/osd/ceph-9/journal
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.697743 7fab52d5de00 -1 bdev(0x560426880d80 /var/lib/ceph/osd/ceph-9/block) open o$
en got: (2) No such file or directory
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.698421 7fab52d5de00 -1 bluestore(/var/lib/ceph/osd/ceph-9/block) _read_bdev_label
failed to open /var/lib/ceph/osd/ceph-9/block: (2) No such file or directory
Aug 22 01:46:11 openstack ceph-osd[1424270]: 2019-08-22 01:46:11.698472 7fab52d5de00 -1 bdev(0x560426880fc0 /var/lib/ceph/osd/ceph-9/block) open o$
en got: (2) No such file or directory
Aug 22 01:46:11 openstack-4 ceph-osd[1424270]: 2019-08-22 01:46:11.698482 7fab52d5de00 -1 osd.9 0 OSD:init: unable to mount object store
Aug 22 01:46:11 openstack-4 ceph-osd[1424270]: 2019-08-22 01:46:11.698503 7fab52d5de00 -1 ** ERROR: osd init failed: (2) No such file or directory
Aug 22 01:46:11 openstack-4 systemd[1]: ceph-osd@9.service: Main process exited, code=exited, status=1/FAILURE
Aug 22 01:46:11 openstack-4 systemd[1]: ceph-osd@9.service: Unit entered failed state.
Aug 22 01:46:11 openstack-4 systemd[1]: ceph-osd@9.service: Failed with result 'exit-code'.

Xav Paice (xavpaice)
tags: added: canonical-bootstack
Revision history for this message
dongdong tao (taodd) wrote :

-Detailed bug analysis:

Charm did call ceph-disk command "sudo', '-u', 'ceph', 'ceph-disk', 'prepare', '--data-dir', '/srv/ceph/ceph1', '--bluestore'"

I followed the ceph-disk flow, ceph-disk only changed the 'type' flag to bluestore and failed to continue because ceph-disk find out that
the path was already initialized by osd, ceph-disk check the 'magic' file to see if the path was already initilized or not.
so, ceph-disk did its protection very well.

But, osd charm didn't do the protection well, it intended to allow ceph-disk to initialize an existing osd dir.

osd-261 didn't set a dir path to db as 'osd-devices' for a directory based osd .
and osd-291 has below protection code to check if a dir path is an existing 'osd-devices'
----
osd_devices = db.get('osd-devices', [])
if path in osd_devices:
log('Device {} already processed by charm,'
' skipping'.format(path))
return
----
Since we are upgrading from osd-261 to osd-291, above protection will fail absolutely at the first time for 'config-changed' hook.

But, As the db should now set the 'osd-devices' correctly, the config-changed hook should not try to re-initialize the existing dir as the first time.

Changed in charm-ceph-osd:
assignee: nobody → dongdong tao (taodd)
status: New → In Progress
Revision history for this message
dongdong tao (taodd) wrote :
Ryan Beisner (1chb1n)
tags: added: uosci
Changed in charm-ceph-osd:
milestone: none → 19.10
Revision history for this message
Edward Hope-Morley (hopem) wrote :

ceph-osd patch submitted with charms.ceph sync - https://review.opendev.org/#/c/681342/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (master)

Fix proposed to branch: master
Review: https://review.opendev.org/684293

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ceph-osd (master)

Change abandoned by Edward Hope-Morley (<email address hidden>) on branch: master
Review: https://review.opendev.org/681342
Reason: abandoning since dongdong is going to submit his own patch

Revision history for this message
dongdong tao (taodd) wrote :

charm-recheck

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master)

Reviewed: https://review.opendev.org/684293
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=777256051058c5201e8d9035aabcd8d85ce12a79
Submitter: Zuul
Branch: master

commit 777256051058c5201e8d9035aabcd8d85ce12a79
Author: taodd <email address hidden>
Date: Tue Sep 24 18:19:38 2019 +0800

    Sync charms.ceph to get fix

    Change-Id: Ib3d4b79690eb5931b4f0680b937590b317a91427
    Closes-Bug: #1841021

Changed in charm-ceph-osd:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (stable/19.07)

Fix proposed to branch: stable/19.07
Review: https://review.opendev.org/685661

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (stable/19.07)

Reviewed: https://review.opendev.org/685661
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=0ea4b78d696b1fdf18b30e7fb4e9e5d7ac899f1d
Submitter: Zuul
Branch: stable/19.07

commit 0ea4b78d696b1fdf18b30e7fb4e9e5d7ac899f1d
Author: taodd <email address hidden>
Date: Tue Sep 24 18:19:38 2019 +0800

    Sync charms.ceph to get fix

    Change-Id: Ib3d4b79690eb5931b4f0680b937590b317a91427
    Closes-Bug: #1841021
    (cherry picked from commit 777256051058c5201e8d9035aabcd8d85ce12a79)
    Signed-off-by: taodd <email address hidden>

David Ames (thedac)
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.