Ceph-mon units are stuck on waiting state after charm upgrade
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph Monitor Charm |
Invalid
|
High
|
Unassigned | ||
Ceph OSD Charm |
Fix Released
|
High
|
James Page |
Bug Description
After charm upgrade from 18.05 to 19.10 the ceph-mon are stuck in "waiting" state with message:
"Monitor bootstrapped but waiting for number of OSDs to reach expected-osd-count "
Seems like all the relations between ceph-mon and ceph-osds are returning bootstrapped_osds = 0
However the osds are running and this is not affecting the cluster health.
$ juju status ceph-mon
Model Controller Cloud/Region Version SLA Timestamp
openstack juju-controller1 prodmaas 2.7.0 unsupported 10:46:57+01:00
App Version Status Scale Charm Store Rev OS Notes
ceph-mon 10.2.11 waiting 3 ceph-mon local 1 ubuntu
filebeat 6.8.6 active 3 filebeat jujucharms 25 ubuntu
nrpe-lxd active 3 nrpe jujucharms 60 ubuntu
telegraf active 3 telegraf jujucharms 29 ubuntu
Unit Workload Agent Machine Public address Ports Message
ceph-mon/0 waiting idle 4/lxd/0 10.116.178.19 Monitor bootstrapped but waiting for number of OSDs to reach expected-osd-count (36)
filebeat/6 active idle 10.116.178.19 Filebeat ready.
nrpe-lxd/0 active idle 10.116.178.19 icmp,5666/tcp ready
telegraf/12 active idle 10.116.178.19 9103/tcp Monitoring ceph-mon/0
ceph-mon/1 waiting idle 7/lxd/0 10.116.178.21 Monitor bootstrapped but waiting for number of OSDs to reach expected-osd-count (36)
filebeat/16 active idle 10.116.178.21 Filebeat ready.
nrpe-lxd/2 active idle 10.116.178.21 icmp,5666/tcp ready
telegraf/14 active idle 10.116.178.21 9103/tcp Monitoring ceph-mon/1
ceph-mon/2* waiting idle 10/lxd/0 10.116.178.22 Monitor bootstrapped but waiting for number of OSDs to reach expected-osd-count (36)
filebeat/24 active idle 10.116.178.22 Filebeat ready.
nrpe-lxd/9 active idle 10.116.178.22 icmp,5666/tcp ready
telegraf/21 active idle 10.116.178.22 9103/tcp Monitoring ceph-mon/2
$ juju status ceph-osd
Unit Workload Agent Machine Public address Ports Message
ceph-osd/0 active idle 3 213.173.196.139 Unit is ready (3 OSD)
nrpe-physical/29 active idle 213.173.196.139 ready
ceph-osd/1 active idle 4 213.173.196.193 Unit is ready (3 OSD)
nrpe-physical/13 active idle 213.173.196.193 icmp,5666/tcp ready
ceph-osd/2 active idle 5 213.173.196.138 Unit is ready (3 OSD)
nrpe-physical/9 active idle 213.173.196.138 icmp,5666/tcp ready
ceph-osd/3 active idle 6 213.173.196.192 Unit is ready (3 OSD)
nrpe-physical/8 active idle 213.173.196.192 icmp,5666/tcp ready
ceph-osd/4 active idle 7 213.173.196.140 Unit is ready (3 OSD)
nrpe-physical/33 active idle 213.173.196.140 ready
ceph-osd/5 active idle 8 213.173.196.190 Unit is ready (3 OSD)
nrpe-physical/18 active idle 213.173.196.190 icmp,5666/tcp ready
ceph-osd/6 active idle 9 213.173.196.141 Unit is ready (3 OSD)
nrpe-physical/11 active idle 213.173.196.141 icmp,5666/tcp ready
ceph-osd/7 active idle 10 213.173.196.142 Unit is ready (3 OSD)
nrpe-physical/14 active idle 213.173.196.142 ready
ceph-osd/8 active idle 11 213.173.196.143 Unit is ready (3 OSD)
nrpe-physical/7 active idle 213.173.196.143 ready
ceph-osd/9* active idle 12 213.173.196.200 Unit is ready (3 OSD)
nrpe-physical/21 active idle 213.173.196.200 ready
ceph-osd/10 active idle 13 213.173.196.135 Unit is ready (3 OSD)
nrpe-physical/3 active idle 213.173.196.135 ready
ceph-osd/11 active idle 14 213.173.196.136 Unit is ready (3 OSD)
nrpe-physical/1 active idle 213.173.196.136 icmp,5666/tcp ready
Changed in charm-ceph-mon: | |
importance: | Undecided → Medium |
Changed in charm-ceph-osd: | |
importance: | Undecided → Medium |
tags: | added: charm-upgrade |
Changed in charm-ceph-mon: | |
status: | Incomplete → New |
Changed in charm-ceph-osd: | |
status: | Incomplete → New |
Changed in charm-ceph-mon: | |
importance: | Medium → High |
Changed in charm-ceph-osd: | |
importance: | Medium → High |
Changed in charm-ceph-mon: | |
status: | Incomplete → Invalid |
Changed in charm-ceph-osd: | |
milestone: | none → 20.08 |
Changed in charm-ceph-osd: | |
status: | Fix Committed → Fix Released |
tags: | added: openstack-upgrade |
Workaround:
To check the bootstrapped_osds relation data, run: osds=3" osds=3"
juju run --unit ceph-mon/0 'relation-list -r $(relation-ids osd) | xargs -I{} sh -c '\''echo {}; relation-get -r $(relation-ids osd) - {}; echo'\'''
It should show bootstrapped-osds: "0"
To fix it, run on all ceph-osd manually (making sure the ceph-osd count is of course the correct one):
You get the RELATION by executing:
juju run --unit ceph-mon/0 relation-ids osd (for example osd:46)
juju run --application ceph-osd "relation-set -r RELATION bootstrapped-
Monitor (with the first command) and if some OSDs do not report the correct data, run manually per osd
juju run --unit ceph-osd/8 "relation-set -r RELATION bootstrapped-