TL;DR: it is possible to prepare and activate an osd with bluestore backend on top of regular files by using a directory path during ceph-disk invocations for testing purposes. BlueStore handles that properly which I have tested on my virtual environment.
In this case one of 2 open(2) or posix_fadvise(2) calls failed in KernelDevice::open(const string& p) for a path given "/srv/osd/block".
Given that I was not able to reproduce it on my environment with just using the same ceph-disk prepare and ceph-disk activate calls I am assuming that something is wrong with the directory that was specified in the charm or the file system it resides on.
open(2) EINVAL returns if:
EINVAL The filesystem does not support the O_DIRECT flag. See NOTES
for more information.
EINVAL Invalid value in flags.
EINVAL O_TMPFILE was specified in flags, but neither O_WRONLY nor O_RDWR was specified.
posix_fadvise(2)
EINVAL An invalid value was specified for advice.
2 things we need to do:
1. Modify osdize_dir to pass filestore as a default as we've used the same approach for the block device code path
2. Get an environment where issue is triggered and get contents of /proc/mounts, an output of `stat /srv/osd` and an output of /srv/osd/block after the issue is triggered.
The rest of the analysis is below.
---
bluestore is the new default objectstore and this is the commit for ceph-disk that does the change:
unit-ceph-3: 14:56:53 INFO unit.ceph/3.juju-log mon:0: osdize dir cmd: ['sudo', '-u', 'ceph', 'ceph-disk', 'prepare', '--data-dir', u'/srv/osd']
Even as of jewel data_dir doesn't really do anything in ceph-disk:
➜ ceph_disk git:(jewel) grep -RiP 'data.*?dir'
main.py: '--data-dir',
main.py: help='verify that DATA is a dir',
main.py: help='path to OSD data (a disk block device or directory)',
main.py: LOG.debug('Data dir %s already exists', path)
main.py: LOG.debug('Preparing osd data dir %s', path)
main.py: raise Error('data path for directory does not exist',
main.py: LOG.debug('%s osd.%s data dir is ready at %s', cluster, osd_id, path)
It used to mean something in hammer:
➜ src git:(hammer) grep -iP 'data.*?dir' ceph-disk LOG.debug('Data dir %s already exists', path) LOG.debug('Preparing osd data dir %s', path)
if args.data_dir: raise Error('data path for directory does not exist', args.data)
if args.data_dir: raise Error('data path is not a directory', args.data)
LOG.debug('%s osd.%s data dir is ready at %s', cluster, osd_id, path) '--data-dir', help='verify that DATA is a dir',
help='path to OSD data (a disk block device or directory)',
Either way, ceph-disk in Luminous uses the following class hierarchy:
Some code relevant to how directory detection is translated to creating files that will actually store data regardless of the object store type (filestore or bluestore):
TL;DR: it is possible to prepare and activate an osd with bluestore backend on top of regular files by using a directory path during ceph-disk invocations for testing purposes. BlueStore handles that properly which I have tested on my virtual environment.
In this case one of 2 open(2) or posix_fadvise(2) calls failed in KernelDevice: :open(const string& p) for a path given "/srv/osd/block".
http:// man7.org/ linux/man- pages/man2/ open.2. html man7.org/ linux/man- pages/man2/ posix_fadvise. 2.html /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ KernelDevice. cc#L64
http://
https:/
Given that I was not able to reproduce it on my environment with just using the same ceph-disk prepare and ceph-disk activate calls I am assuming that something is wrong with the directory that was specified in the charm or the file system it resides on.
open(2) EINVAL returns if:
EINVAL The filesystem does not support the O_DIRECT flag. See NOTES
for more information.
EINVAL Invalid value in flags.
EINVAL O_TMPFILE was specified in flags, but neither O_WRONLY nor
O_RDWR was specified.
posix_fadvise(2)
EINVAL An invalid value was specified for advice.
2 things we need to do:
1. Modify osdize_dir to pass filestore as a default as we've used the same approach for the block device code path
2. Get an environment where issue is triggered and get contents of /proc/mounts, an output of `stat /srv/osd` and an output of /srv/osd/block after the issue is triggered.
The rest of the analysis is below.
---
bluestore is the new default objectstore and this is the commit for ceph-disk that does the change:
https:/ /github. com/ceph/ ceph/commit/ 5cfe4cfa13a
Now, as for our code:
unit-ceph-3: 14:56:53 INFO unit.ceph/ 3.juju- log mon:0: osdize dir cmd: ['sudo', '-u', 'ceph', 'ceph-disk', 'prepare', '--data-dir', u'/srv/osd']
Even as of jewel data_dir doesn't really do anything in ceph-disk:
➜ ceph_disk git:(jewel) grep -RiP 'data.*?dir' 'Preparing osd data dir %s', path)
main.py: '--data-dir',
main.py: help='verify that DATA is a dir',
main.py: help='path to OSD data (a disk block device or directory)',
main.py: LOG.debug('Data dir %s already exists', path)
main.py: LOG.debug(
main.py: raise Error('data path for directory does not exist',
main.py: LOG.debug('%s osd.%s data dir is ready at %s', cluster, osd_id, path)
It used to mean something in hammer:
➜ src git:(hammer) grep -iP 'data.*?dir' ceph-disk
LOG.debug( 'Data dir %s already exists', path)
LOG.debug( 'Preparing osd data dir %s', path)
raise Error('data path for directory does not exist', args.data)
raise Error('data path is not a directory', args.data)
'--data- dir',
help=' verify that DATA is a dir',
if args.data_dir:
if args.data_dir:
LOG.debug('%s osd.%s data dir is ready at %s', cluster, osd_id, path)
help='path to OSD data (a disk block device or directory)',
Either way, ceph-disk in Luminous uses the following class hierarchy:
PrepareSpace <- PrepareData <- PrepareFilestor eData
PrepareSpace <- PrepareData <- PrepareBluestor eData
Some code relevant to how directory detection is translated to creating files that will actually store data regardless of the object store type (filestore or bluestore):
PrepareSpace /github. com/ceph/ ceph/blob/ luminous/ src/ceph- disk/ceph_ disk/main. py#L2169 /github. com/ceph/ ceph/blob/ luminous/ src/ceph- disk/ceph_ disk/main. py#L2300- L2308
https:/
has 'def prepare' which acts differently upon file type (regular or block special):
https:/
If stat.S_ISDIR -> self.type = self.FILE /github. com/ceph/ ceph/blob/ luminous/ src/ceph- disk/ceph_ disk/main. py#L2863- L2871
https:/
prepare -> prepare_file /github. com/ceph/ ceph/blob/ luminous/ src/ceph- disk/ceph_ disk/main. py#L2941- L2953 /github. com/ceph/ ceph/blob/ luminous/ src/ceph- disk/ceph_ disk/main. py#L2300- L2308
https:/
https:/
Type of a file is checked via regular stat(2): /github. com/ceph/ ceph/blob/ luminous/ src/ceph- disk/ceph_ disk/main. py#L2873- L2877 /github. com/ceph/ ceph/blob/ luminous/ src/ceph- disk/ceph_ disk/main. py#L2863- L2871
https:/
https:/
activate_dir is also performed based upon stat(2): /github. com/ceph/ ceph/blob/ luminous/ src/ceph- disk/ceph_ disk/main. py#L3582- L3620 /github. com/ceph/ ceph/blob/ luminous/ src/ceph- disk/ceph_ disk/main. py#L3763- L3769
https:/
https:/
Now, from the BlueStore's perspective itself it is also : https:/ /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ BlueStore. cc#L5125- L5127 /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ BlueStore. cc#L4953- L5048 /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ BlueStore. cc#L4997 /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ BlueStore. cc#L5008
https:/
https:/
https:/
----
And a practical test: paste.ubuntu. com/25396295/
http://
----
Looking more at the error messages:
unit-ceph-3: 14:57:01 INFO unit.ceph/ 3.mon-relation- changed 2017-08-25 14:57:01.889232 ffffa4066000 -1 bdev(0xaaab03430780 /srv/osd/block) open open got: (22) Invalid argument 3.mon-relation- changed 2017-08-25 14:57:01.889438 ffffa4066000 -1 bluestore(/srv/osd) mkfs failed, (22) Invalid argument 3.mon-relation- changed 2017-08-25 14:57:01.889510 ffffa4066000 -1 OSD::mkfs: ObjectStore::mkfs failed with error (22) Invalid argument 3.mon-relation- changed 2017-08-25 14:57:01.889907 ffffa4066000 -1 ESC[0;31m ** ERROR: error creating empty object store in /srv/osd: (22) Invalid argumentESC[0m
unit-ceph-3: 14:57:01 INFO unit.ceph/
unit-ceph-3: 14:57:01 INFO unit.ceph/
unit-ceph-3: 14:57:01 INFO unit.ceph/
This error is important: "open open got: (22) Invalid argument"
That's what was called:
bdev(0xaaab03430780 /srv/osd/block)
https:/ /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ BlueStore. cc#L4099- L4104 /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ BlockDevice. h#L82-L145
https:/
virtual int open(const std::string& path) = 0;
pure virtual function overriden in "class KernelDevice : public BlockDevice" /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ KernelDevice. h#L106
https:/
dout prefix of our error "bdev(" << this << " " << path << ")" - address of this KernelDevice object and an argument: /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ KernelDevice. cc#L34
https:/
There are 3 places which can output such an error:
fd_direct = ::open( path.c_ str(), O_RDWR | O_DIRECT); /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ KernelDevice. cc#L73
https:/
fd_buffered = ::open( path.c_ str(), O_RDWR); /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ KernelDevice. cc#L79
https:/
r = posix_fadvise( fd_buffered, 0, 0, POSIX_FADV_RANDOM); /github. com/ceph/ ceph/blob/ luminous/ src/os/ bluestore/ KernelDevice. cc#L93
https:/
One of those failed with EINVAL.