unable to create pools before OSD's are up and running
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph Monitor Charm |
Triaged
|
Medium
|
Unassigned | ||
Ceph OSD Charm |
Triaged
|
Medium
|
Unassigned | ||
ceph (Ubuntu) |
Triaged
|
Medium
|
Unassigned |
Bug Description
When we deploy our Openstack bundle with gnocchi in it, most of the time the charms will be in error.
After looking at the logs, it seems gnocchi tried to create its pool in ceph, but it does so after ceph-mon is ready but before ceph-radosgw is also ready.
The logs on ceph-radosgw are filled with the error messages:
2018-05-30 21:09:25.739133 7f48374f4e80 0 pidfile_write: ignore empty --pid-file
2018-05-30 21:09:25.748857 7f48374f4e80 -1 auth: unable to find a keyring on /etc/ceph/
The file "/etc/ceph/
On a deployment that worked, it was present.
Also on ceph-mon no other pools get to be created, no matter how long we wait.
# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
488G 488G 325M 0.07
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
default.
The only time the deployment worked was because the ceph-radosgw units were ready before gnocchi.
We then tried to deploy the same bundle, but without adding the gnocchi units to the machines.
This time ceph-radosgw did not had any issue creating its keyring and all pools were visible in ceph-mon
We then added the gnocchi units to the machines, after-all the ceph charms were deployed.
The Openstack deployment continued and worked without issue, on every tries.
There maybe a race conditions in how gnocchi handle its relations.
Environment:
All the latest released charms on Xenial/Queen
gnocchi-7
ceph-radosgw-257
Changed in charm-ceph-mon: | |
milestone: | none → 18.08 |
Changed in charm-ceph-osd: | |
milestone: | none → 18.08 |
Changed in ceph (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in charm-ceph-mon: | |
milestone: | 18.08 → 18.11 |
Changed in charm-ceph-osd: | |
milestone: | 18.08 → 18.11 |
Changed in charm-ceph-mon: | |
milestone: | 18.11 → 19.04 |
Changed in charm-ceph-osd: | |
milestone: | 18.11 → 19.04 |
Changed in charm-ceph-mon: | |
milestone: | 19.04 → 19.07 |
Changed in charm-ceph-osd: | |
milestone: | 19.04 → 19.07 |
Changed in charm-ceph-mon: | |
milestone: | 19.07 → 19.10 |
Changed in charm-ceph-osd: | |
milestone: | 19.07 → 19.10 |
Changed in charm-ceph-mon: | |
milestone: | 19.10 → 20.01 |
Changed in charm-ceph-osd: | |
milestone: | 19.10 → 20.01 |
Changed in charm-ceph-mon: | |
milestone: | 20.01 → 20.05 |
Changed in charm-ceph-osd: | |
milestone: | 20.01 → 20.05 |
Changed in charm-ceph-mon: | |
milestone: | 20.05 → 20.08 |
Changed in charm-ceph-osd: | |
milestone: | 20.05 → 20.08 |
Changed in charm-ceph-mon: | |
milestone: | 20.08 → none |
Changed in charm-ceph-osd: | |
milestone: | 20.08 → none |
I think this is less about gnocchi, and more about ceph-mon (which does the actual pool creation). In later ceph releases, pool creation fails if no OSD's are present at the point in time the pool is created. Its possible to work around this to some extent by setting the expected-osd-count configuration option on ceph-mon, but even this has some chance of racing.
I think we need to evolve the ceph-mon/ceph-osd relation to allow ceph-mon to determine when OSD's are up and usable; at this point pools can be created.
FWIW this behaviour appears to have been introduced in newer Ceph releases.