[cold-boot] ceph-osd processes not running
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
New
|
Undecided
|
Unassigned |
Bug Description
After a full cloud outage, all ceph-osd processes failed to startup; note that the ceph-mon's where running in LXD containers on the same infrastructure so its possible that they where not running/not in quorum during the attempted start of the daemons.
Unfortunately its a bit hard to tell why they did not start:
2020-10-24 22:17:29.302775 7f4971114fc0 1 osd.9 15943 check_osdmap_
2020-10-24 22:17:34.851216 7f4971114fc0 0 osd.9 15943 load_pgs
2020-10-24 22:17:38.684364 7f4971114fc0 0 osd.9 15943 load_pgs opened 147 pgs
2020-10-24 22:17:38.684619 7f4971114fc0 0 osd.9 15943 using weightedpriority op queue with priority op cut off at 64.
2020-10-24 22:17:38.685875 7f4971114fc0 -1 osd.9 15943 log_to_monitors {default=true}
that's the last log message for each one.
ceph-mons did not start until the following day:
2020-10-25 18:38:03.830361 7fac531900c0 0 set uid:gid to 64045:64045 (ceph:ceph)
2020-10-25 18:38:03.830397 7fac531900c0 0 ceph version 12.2.13 (584a20eb0237c6
so this just looks like a startup race in a hyperconverged deployment architecture.
tags: | added: cold-boot |