All monitors crashes after OSD host reboot

Bug #1682424 reported by George Shuklin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Expired
Low
Unassigned

Bug Description

All three monitors have been crashing (repeatedly) after host with few OSDs had been rebooted.

Trace from monitor:
...
    -2> 2017-04-13 12:34:39.204681 7fd2857aa700 5 -- op tracker -- seq: 22, time: 2017-04-13 12:34:39.204681, event: osdmap:prepare_update, op: osd_boot(osd.21 booted 0 features 576460752032874495 v2022)
    -1> 2017-04-13 12:34:39.204693 7fd2857aa700 5 -- op tracker -- seq: 22, time: 2017-04-13 12:34:39.204692, event: osdmap:prepare_boot, op: osd_boot(osd.21 booted 0 features 576460752032874495 v2022)
     0> 2017-04-13 12:34:39.213266 7fd2857aa700 -1 mon/OSDMonitor.cc: In function 'bool OSDMonitor::prepare_boot(MonOpRequestRef)' thread 7fd2857aa700 time 2017-04-13 12:34:39.204709
mon/OSDMonitor.cc: 2105: FAILED assert(osdmap.get_uuid(from) == m->sb.osd_fsid)

 ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55f6c17cc260]
 2: (OSDMonitor::prepare_boot(std::shared_ptr<MonOpRequest>)+0x1bd2) [0x55f6c1477e82]
 3: (OSDMonitor::prepare_update(std::shared_ptr<MonOpRequest>)+0x28b) [0x55f6c14aaa1b]
 4: (PaxosService::dispatch(std::shared_ptr<MonOpRequest>)+0xb4f) [0x55f6c145b84f]
 5: (PaxosService::C_RetryMessage::_finish(int)+0x58) [0x55f6c145ce38]
 6: (C_MonOp::finish(int)+0x82) [0x55f6c14250c2]
 7: (Context::complete(int)+0x9) [0x55f6c14241a9]
 8: (void finish_contexts<Context>(CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x1fb) [0x55f6c142a8db]
 9: (Paxos::finish_round()+0x287) [0x55f6c14512b7]
 10: (Paxos::handle_last(std::shared_ptr<MonOpRequest>)+0xe19) [0x55f6c1452499]
 11: (Paxos::dispatch(std::shared_ptr<MonOpRequest>)+0x250) [0x55f6c1452cc0]
 12: (Monitor::dispatch_op(std::shared_ptr<MonOpRequest>)+0xa38) [0x55f6c141e5c8]
 13: (Monitor::_ms_dispatch(Message*)+0x554) [0x55f6c141edc4]
 14: (Monitor::ms_dispatch(Message*)+0x23) [0x55f6c1441e93]
 15: (DispatchQueue::entry()+0xf2b) [0x55f6c18c1fab]
 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x55f6c17b25ad]
 17: (()+0x76fa) [0x7fd28de0e6fa]
 18: (clone()+0x6d) [0x7fd28c0c8b5d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Affected version: 10.2.6-0ubuntu0.16.04.1

Revision history for this message
George Shuklin (george-shuklin) wrote :
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ceph (Ubuntu):
status: New → Confirmed
Revision history for this message
James Page (james-page) wrote :

The upstream ceph bug was marked as can't reproduce.

@george-shuklin - do you still see this issue? 10.2.6 is quite a few point releases back from the current package version in 16.04

Changed in ceph (Ubuntu):
status: Confirmed → Incomplete
importance: Undecided → Low
Revision history for this message
George Shuklin (george-shuklin) wrote :

If only you've asked me about this one year earlier. Just 6 months ago I've abandoned any hopes for this bug to proceed and I dropped the laboratory installation with this crash I specifically kept for those purposes.

Unfortunately, no, I don't have it anymore.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for ceph (Ubuntu) because there has been no activity for 60 days.]

Changed in ceph (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.