Active ceph-mgr crashes on receiving report from a non-active mgr

Bug #1955345 reported by Ponnuvel Palaniyappan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
High
Ponnuvel Palaniyappan
Ussuri
Fix Released
Undecided
Unassigned
ceph (Ubuntu)
Fix Released
High
Ponnuvel Palaniyappan
Focal
Fix Released
High
Ponnuvel Palaniyappan

Bug Description

[Impact]
An active ceph-mgr crashes and another ceph-mgr takes over and becomes
the active mgr. But this could again hit same issue and crash and the cycle can continue indefinitely (previously crashed ceph-mgr gets restarted by systemd).

This could affect the cluster stability/usability as ceph mgr handles a number of essential operations (modules that control/change Ceph cluster behaviour, metrics, etc).

[Test Plan]
Deploy and operate a Ceph cluster normally.
Increase the log level of mgr to 20.
Observe MMgrReport sent from non-active mgrs get ignored (no crash).

[Where problems could occur]
Possibly the fix may not actually fix and mgr continue to crash as before.
Might incorrectly ignore reports from active mgrs.

[Other Info]
Upstream main bug: https://tracker.ceph.com/issues/48022
Octopus backport PR: https://github.com/ceph/ceph/pull/43861
Octopus backport bug: https://tracker.ceph.com/issues/53198

This has been already been fixed and available in Pacific.
So needed to backport only for Octopus.

Tags: sts
Changed in ceph (Ubuntu):
assignee: nobody → Ponnuvel Palaniyappan (pponnuvel)
status: New → In Progress
description: updated
Changed in ceph (Ubuntu Focal):
assignee: nobody → Ponnuvel Palaniyappan (pponnuvel)
status: New → In Progress
Changed in ceph (Ubuntu):
importance: Undecided → High
Changed in ceph (Ubuntu Focal):
importance: Undecided → High
tags: added: sts
Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

Attaching debdiff for Focal.

Changed in cloud-archive:
assignee: nobody → Ponnuvel Palaniyappan (pponnuvel)
assignee: Ponnuvel Palaniyappan (pponnuvel) → nobody
importance: Undecided → High
status: New → In Progress
assignee: nobody → Ponnuvel Palaniyappan (pponnuvel)
Revision history for this message
Brian Murray (brian-murray) wrote : Proposed package upload rejected

An upload of ceph to focal-proposed has been rejected from the upload queue for the following reason: "The patch d/p/bug1955345.patch doesn't actually appear in the debdiff or d/p/series file.".

Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

This has been superseded by https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1964802
which is a newer Octopus release (15.2.16) and contains the commits for fixing this as well.

Changed in ceph (Ubuntu):
status: In Progress → Fix Released
Changed in ceph (Ubuntu Focal):
status: In Progress → Fix Released
Changed in cloud-archive:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.