[cold-boot] ceph-osd processes not running

Bug #1902028 reported by James Page
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
New
Undecided
Unassigned

Bug Description

After a full cloud outage, all ceph-osd processes failed to startup; note that the ceph-mon's where running in LXD containers on the same infrastructure so its possible that they where not running/not in quorum during the attempted start of the daemons.

Unfortunately its a bit hard to tell why they did not start:

2020-10-24 22:17:29.302775 7f4971114fc0 1 osd.9 15943 check_osdmap_features require_osd_release 0 -> ^L
2020-10-24 22:17:34.851216 7f4971114fc0 0 osd.9 15943 load_pgs
2020-10-24 22:17:38.684364 7f4971114fc0 0 osd.9 15943 load_pgs opened 147 pgs
2020-10-24 22:17:38.684619 7f4971114fc0 0 osd.9 15943 using weightedpriority op queue with priority op cut off at 64.
2020-10-24 22:17:38.685875 7f4971114fc0 -1 osd.9 15943 log_to_monitors {default=true}

that's the last log message for each one.

ceph-mons did not start until the following day:

2020-10-25 18:38:03.830361 7fac531900c0 0 set uid:gid to 64045:64045 (ceph:ceph)
2020-10-25 18:38:03.830397 7fac531900c0 0 ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable), process ceph-mon, pid 255

so this just looks like a startup race in a hyperconverged deployment architecture.

Tags: cold-boot
James Page (james-page)
tags: added: cold-boot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.