unable to create pools before OSD's are up and running

Bug #1774279 reported by Simon Monette on 2018-05-30
60
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack ceph-mon charm
Medium
Unassigned
OpenStack ceph-osd charm
Medium
Unassigned
ceph (Ubuntu)
Medium
Unassigned

Bug Description

When we deploy our Openstack bundle with gnocchi in it, most of the time the charms will be in error.

After looking at the logs, it seems gnocchi tried to create its pool in ceph, but it does so after ceph-mon is ready but before ceph-radosgw is also ready.

The logs on ceph-radosgw are filled with the error messages:
2018-05-30 21:09:25.739133 7f48374f4e80 0 pidfile_write: ignore empty --pid-file
2018-05-30 21:09:25.748857 7f48374f4e80 -1 auth: unable to find a keyring on /etc/ceph/keyring.rados.gateway: (2) No such file or directory

The file "/etc/ceph/keyring.rados.gateway" is indeed not present.
On a deployment that worked, it was present.

Also on ceph-mon no other pools get to be created, no matter how long we wait.
# ceph df
GLOBAL:
    SIZE AVAIL RAW USED %RAW USED
    488G 488G 325M 0.07
POOLS:
    NAME ID USED %USED MAX AVAIL OBJECTS
    default.rgw.buckets 1 0 0 154G 0

The only time the deployment worked was because the ceph-radosgw units were ready before gnocchi.

We then tried to deploy the same bundle, but without adding the gnocchi units to the machines.

This time ceph-radosgw did not had any issue creating its keyring and all pools were visible in ceph-mon

We then added the gnocchi units to the machines, after-all the ceph charms were deployed.
The Openstack deployment continued and worked without issue, on every tries.

There maybe a race conditions in how gnocchi handle its relations.

Environment:
All the latest released charms on Xenial/Queen
gnocchi-7
ceph-radosgw-257

James Page (james-page) wrote :

I think this is less about gnocchi, and more about ceph-mon (which does the actual pool creation). In later ceph releases, pool creation fails if no OSD's are present at the point in time the pool is created. Its possible to work around this to some extent by setting the expected-osd-count configuration option on ceph-mon, but even this has some chance of racing.

I think we need to evolve the ceph-mon/ceph-osd relation to allow ceph-mon to determine when OSD's are up and usable; at this point pools can be created.

FWIW this behaviour appears to have been introduced in newer Ceph releases.

summary: - gnocchi send requests to ceph to early
+ unable to create pools before OSD's are up and running
affects: charm-gnocchi → charm-ceph-mon
Changed in charm-ceph-mon:
status: New → Triaged
Changed in charm-ceph-osd:
status: New → Triaged
Changed in charm-ceph-mon:
importance: Undecided → Medium
Changed in charm-ceph-osd:
importance: Undecided → Medium
James Page (james-page) wrote :

I'm still digging into the bug - my original 'can't create pools without OSD's' appears to be untrue to trying to figure out why the broker request actually fails.

James Page (james-page) wrote :

Reproduced:

2018-07-11 11:18:54 INFO juju-log client:5: Creating pool 'glance' (replicas=3)
2018-07-11 11:18:54 DEBUG client-relation-changed Error ERANGE: pg_num 200 size 3 would mean 1200 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)
2018-07-11 11:18:55 ERROR juju-log client:5: Command '['ceph', '--id', 'admin', 'osd', 'pool', 'create', 'glance', '200']' returned non-zero exit status 34.
2018-07-11 11:18:55 ERROR juju-log client:5: Unexpected error occurred while processing requests: {'api-version': 1, 'request-id': '2a919b75-84fc-11e8-ba20-fa163e7e0b1a', 'ops': [{'group': 'images', 'name': 'glance', 'weight': 5, 'replicas': 3, 'pg_num': None, 'group-namespace': None, 'op': 'create-pool'}]}

James Page (james-page) wrote :

One possible workaround is to set mon_max_pg_per_osd to something stupidly high; this will allow the pool to be created sans any OSD's actually being in. The OSD pool config will still be right sized based on the expected OSD count (if used).

James Page (james-page) wrote :

For example:

juju config ceph-mon config-flags="{'global': {'mon max pg per osd': 100000}}"

James Page (james-page) wrote :

This works around the issue; we need a more complete solution but this at least gets automated deployments with vault based dmcrypt keys running again.

Simon Monette (simon-monette) wrote :

Ok thanks, we will try this workaround in one of our deployment to see if we can reproduce the behavior.

James Page (james-page) wrote :

Raising Ubuntu task; it would be nice if we could tell the ceph-mon cluster how many OSD's there *will* be, rather than it defaulting to '3' if there are no OSD's currently checked in.

This will require a new configuration option and adoption by the ceph project.

For a charm based deployment, the value of 'in' osds is unreliable until the cloud is fully deployed; having this trip hazard is non-ideal.

James Page (james-page) on 2018-07-11
Changed in charm-ceph-mon:
milestone: none → 18.08
Changed in charm-ceph-osd:
milestone: none → 18.08
Changed in ceph (Ubuntu):
status: New → Triaged
importance: Undecided → Medium

Change abandoned by Seyeong Kim (<email address hidden>) on branch: master
Review: https://review.openstack.org/581392
Reason: Need discussion about fundamental behaviour.

James Page (james-page) on 2018-09-12
Changed in charm-ceph-mon:
milestone: 18.08 → 18.11
Changed in charm-ceph-osd:
milestone: 18.08 → 18.11
James Page (james-page) on 2018-11-20
Changed in charm-ceph-mon:
milestone: 18.11 → 19.04
Changed in charm-ceph-osd:
milestone: 18.11 → 19.04
David Ames (thedac) on 2019-04-17
Changed in charm-ceph-mon:
milestone: 19.04 → 19.07
Changed in charm-ceph-osd:
milestone: 19.04 → 19.07
David Ames (thedac) on 2019-08-12
Changed in charm-ceph-mon:
milestone: 19.07 → 19.10
Changed in charm-ceph-osd:
milestone: 19.07 → 19.10
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers