ceph + pike - ceph-disk prepare fails when directory specified rather than block device (lxd, zfs)

Bug #1713099 reported by Andrew McLeod
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Won't Fix
Undecided
Chris MacNaughton
OpenStack Ceph Charm (Retired)
Won't Fix
Undecided
Chris MacNaughton

Bug Description

Charm Config (with xenial-pike on arm64, although suspect this is not restricted to arm64):

 ceph:
    annotations:
      gui-x: '750'
      gui-y: '500'
    charm: cs:~openstack-charmers-next/ceph
    num_units: 3
    options:
      fsid: 5a791d94-980b-11e4-b6f6-3c970e8b1cf7
      monitor-secret: AQAi5a9UeJXUExAA+By9u+GPhl8/XiUQ4nwI3A==
      osd-devices: /srv/osd
      use-direct-io: False
      source: cloud:xenial-pike/proposed

error summary:

2017-08-25 15:03:46.288447 ffff78497000 -1 bdev(0xaaab0f0d8780 /srv/osd/block) open open got: (22) Invalid argument
2017-08-25 15:03:46.288509 ffff78497000 -1 bluestore(/srv/osd) mkfs failed, (22) Invalid argument
2017-08-25 15:03:46.288523 ffff78497000 -1 OSD::mkfs: ObjectStore::mkfs failed with error (22) Invalid argument
2017-08-25 15:03:46.288681 ffff78497000 -1 ** ERROR: error creating empty object store in /srv/osd: (22) Invalid argument

More complete log:

https://pastebin.canonical.com/196755/

Revision history for this message
Andrew McLeod (admcleod) wrote :

accidentally logged against libvirt

affects: libvirt (Ubuntu) → charm-ceph
Ryan Beisner (1chb1n)
tags: added: openstack pike uosci
Changed in charm-ceph:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
Changed in charm-ceph-osd:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (6.7 KiB)

TL;DR: it is possible to prepare and activate an osd with bluestore backend on top of regular files by using a directory path during ceph-disk invocations for testing purposes. BlueStore handles that properly which I have tested on my virtual environment.

In this case one of 2 open(2) or posix_fadvise(2) calls failed in KernelDevice::open(const string& p) for a path given "/srv/osd/block".

http://man7.org/linux/man-pages/man2/open.2.html
http://man7.org/linux/man-pages/man2/posix_fadvise.2.html
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/KernelDevice.cc#L64

Given that I was not able to reproduce it on my environment with just using the same ceph-disk prepare and ceph-disk activate calls I am assuming that something is wrong with the directory that was specified in the charm or the file system it resides on.

open(2) EINVAL returns if:

       EINVAL The filesystem does not support the O_DIRECT flag. See NOTES
              for more information.

       EINVAL Invalid value in flags.

       EINVAL O_TMPFILE was specified in flags, but neither O_WRONLY nor
              O_RDWR was specified.

posix_fadvise(2)
       EINVAL An invalid value was specified for advice.

2 things we need to do:

1. Modify osdize_dir to pass filestore as a default as we've used the same approach for the block device code path

2. Get an environment where issue is triggered and get contents of /proc/mounts, an output of `stat /srv/osd` and an output of /srv/osd/block after the issue is triggered.

The rest of the analysis is below.

---

bluestore is the new default objectstore and this is the commit for ceph-disk that does the change:

https://github.com/ceph/ceph/commit/5cfe4cfa13a

Now, as for our code:

unit-ceph-3: 14:56:53 INFO unit.ceph/3.juju-log mon:0: osdize dir cmd: ['sudo', '-u', 'ceph', 'ceph-disk', 'prepare', '--data-dir', u'/srv/osd']

Even as of jewel data_dir doesn't really do anything in ceph-disk:

➜ ceph_disk git:(jewel) grep -RiP 'data.*?dir'
main.py: '--data-dir',
main.py: help='verify that DATA is a dir',
main.py: help='path to OSD data (a disk block device or directory)',
main.py: LOG.debug('Data dir %s already exists', path)
main.py: LOG.debug('Preparing osd data dir %s', path)
main.py: raise Error('data path for directory does not exist',
main.py: LOG.debug('%s osd.%s data dir is ready at %s', cluster, osd_id, path)

It used to mean something in hammer:

➜ src git:(hammer) grep -iP 'data.*?dir' ceph-disk
        LOG.debug('Data dir %s already exists', path)
        LOG.debug('Preparing osd data dir %s', path)
            if args.data_dir:
                raise Error('data path for directory does not exist', args.data)
            if args.data_dir:
                raise Error('data path is not a directory', args.data)
    LOG.debug('%s osd.%s data dir is ready at %s', cluster, osd_id, path)
        '--data-dir',
        help='verify that DATA is a dir',
        help='path to OSD data (a disk block device or directory)',

Either way, ceph-disk in Luminous uses the following class hierarchy:

PrepareSpace <- PrepareData <- PrepareFilestoreData

Prepar...

Read more...

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

That makes perfect sense Dmitri, as the use-direct-io config flag is set to False. What this means is that ceph with directories on bluestore will _not_ work in LXD containers, which is unfortunate

Revision history for this message
Andrew McLeod (admcleod) wrote :

It seems likely that this is a ZFS issue rather than specifically an LXD issue - the backing store I used for lxd was zfs, which is known not to support O_DIRECT.

I am re-testing this with directory backed lxd now.

Revision history for this message
Andrew McLeod (admcleod) wrote :

Re-testing has confirmed that his bug is only visible when the LXD storage backend is set to ZFS - does not occur if it is 'dir'.

summary: ceph + pike - ceph-disk prepare fails when directory specified rather
- than block device
+ than block device (lxd, zfs)
tags: added: lxd zfs
Revision history for this message
James Page (james-page) wrote :

I think we need to accept this as a limitation of use of bluestore OSD's on LXD containers using directory based OSD's.

This is mainly a testing use-case, so we can work around it - any real bluestore testing needs to be done on hardware (applies to production use as well).

Changed in charm-ceph-osd:
status: New → Won't Fix
Changed in charm-ceph:
status: New → Won't Fix
status: Won't Fix → Triaged
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.