Comment 2 for bug 1713099

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote : Re: ceph + pike - ceph-disk prepare fails when directory specified rather than block device

TL;DR: it is possible to prepare and activate an osd with bluestore backend on top of regular files by using a directory path during ceph-disk invocations for testing purposes. BlueStore handles that properly which I have tested on my virtual environment.

In this case one of 2 open(2) or posix_fadvise(2) calls failed in KernelDevice::open(const string& p) for a path given "/srv/osd/block".

http://man7.org/linux/man-pages/man2/open.2.html
http://man7.org/linux/man-pages/man2/posix_fadvise.2.html
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/KernelDevice.cc#L64

Given that I was not able to reproduce it on my environment with just using the same ceph-disk prepare and ceph-disk activate calls I am assuming that something is wrong with the directory that was specified in the charm or the file system it resides on.

open(2) EINVAL returns if:

       EINVAL The filesystem does not support the O_DIRECT flag. See NOTES
              for more information.

       EINVAL Invalid value in flags.

       EINVAL O_TMPFILE was specified in flags, but neither O_WRONLY nor
              O_RDWR was specified.

posix_fadvise(2)
       EINVAL An invalid value was specified for advice.

2 things we need to do:

1. Modify osdize_dir to pass filestore as a default as we've used the same approach for the block device code path

2. Get an environment where issue is triggered and get contents of /proc/mounts, an output of `stat /srv/osd` and an output of /srv/osd/block after the issue is triggered.

The rest of the analysis is below.

---

bluestore is the new default objectstore and this is the commit for ceph-disk that does the change:

https://github.com/ceph/ceph/commit/5cfe4cfa13a

Now, as for our code:

unit-ceph-3: 14:56:53 INFO unit.ceph/3.juju-log mon:0: osdize dir cmd: ['sudo', '-u', 'ceph', 'ceph-disk', 'prepare', '--data-dir', u'/srv/osd']

Even as of jewel data_dir doesn't really do anything in ceph-disk:

➜ ceph_disk git:(jewel) grep -RiP 'data.*?dir'
main.py: '--data-dir',
main.py: help='verify that DATA is a dir',
main.py: help='path to OSD data (a disk block device or directory)',
main.py: LOG.debug('Data dir %s already exists', path)
main.py: LOG.debug('Preparing osd data dir %s', path)
main.py: raise Error('data path for directory does not exist',
main.py: LOG.debug('%s osd.%s data dir is ready at %s', cluster, osd_id, path)

It used to mean something in hammer:

➜ src git:(hammer) grep -iP 'data.*?dir' ceph-disk
        LOG.debug('Data dir %s already exists', path)
        LOG.debug('Preparing osd data dir %s', path)
            if args.data_dir:
                raise Error('data path for directory does not exist', args.data)
            if args.data_dir:
                raise Error('data path is not a directory', args.data)
    LOG.debug('%s osd.%s data dir is ready at %s', cluster, osd_id, path)
        '--data-dir',
        help='verify that DATA is a dir',
        help='path to OSD data (a disk block device or directory)',

Either way, ceph-disk in Luminous uses the following class hierarchy:

PrepareSpace <- PrepareData <- PrepareFilestoreData

PrepareSpace <- PrepareData <- PrepareBluestoreData

Some code relevant to how directory detection is translated to creating files that will actually store data regardless of the object store type (filestore or bluestore):

PrepareSpace
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph_disk/main.py#L2169
has 'def prepare' which acts differently upon file type (regular or block special):
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph_disk/main.py#L2300-L2308

If stat.S_ISDIR -> self.type = self.FILE
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph_disk/main.py#L2863-L2871

prepare -> prepare_file
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph_disk/main.py#L2941-L2953
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph_disk/main.py#L2300-L2308

Type of a file is checked via regular stat(2):
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph_disk/main.py#L2873-L2877
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph_disk/main.py#L2863-L2871

activate_dir is also performed based upon stat(2):
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph_disk/main.py#L3582-L3620
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph_disk/main.py#L3763-L3769

Now, from the BlueStore's perspective itself it is also : https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/BlueStore.cc#L5125-L5127
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/BlueStore.cc#L4953-L5048
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/BlueStore.cc#L4997
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/BlueStore.cc#L5008

----

And a practical test:
http://paste.ubuntu.com/25396295/

----

Looking more at the error messages:

unit-ceph-3: 14:57:01 INFO unit.ceph/3.mon-relation-changed 2017-08-25 14:57:01.889232 ffffa4066000 -1 bdev(0xaaab03430780 /srv/osd/block) open open got: (22) Invalid argument
unit-ceph-3: 14:57:01 INFO unit.ceph/3.mon-relation-changed 2017-08-25 14:57:01.889438 ffffa4066000 -1 bluestore(/srv/osd) mkfs failed, (22) Invalid argument
unit-ceph-3: 14:57:01 INFO unit.ceph/3.mon-relation-changed 2017-08-25 14:57:01.889510 ffffa4066000 -1 OSD::mkfs: ObjectStore::mkfs failed with error (22) Invalid argument
unit-ceph-3: 14:57:01 INFO unit.ceph/3.mon-relation-changed 2017-08-25 14:57:01.889907 ffffa4066000 -1 ESC[0;31m ** ERROR: error creating empty object store in /srv/osd: (22) Invalid argumentESC[0m

This error is important: "open open got: (22) Invalid argument"

That's what was called:

bdev(0xaaab03430780 /srv/osd/block)

https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/BlueStore.cc#L4099-L4104
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/BlockDevice.h#L82-L145

virtual int open(const std::string& path) = 0;

pure virtual function overriden in "class KernelDevice : public BlockDevice"
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/KernelDevice.h#L106

dout prefix of our error "bdev(" << this << " " << path << ")" - address of this KernelDevice object and an argument:
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/KernelDevice.cc#L34

There are 3 places which can output such an error:

fd_direct = ::open(path.c_str(), O_RDWR | O_DIRECT);
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/KernelDevice.cc#L73

fd_buffered = ::open(path.c_str(), O_RDWR);
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/KernelDevice.cc#L79

r = posix_fadvise(fd_buffered, 0, 0, POSIX_FADV_RANDOM);
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/KernelDevice.cc#L93

One of those failed with EINVAL.