mgr: relax "pending_service_map.epoch > service_map.epoch" assert

Bug #2019293 reported by dongdong tao
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

This issue has been observed from ubuntu Octopus release.
An assert will be triggered during the mgr fail-over process if the new active one unexpectedly received two continuous service map update.
The upstream fix has relaxed the assert condition to allow the new active mgr to receive multiple service map update in a fail-over scenario.

[Test Case]

1. Deploy a 15.2.16 ceph cluster

2. upgrade it to 15.2.17, inject multiple service map to the monitor

3. stop the active mgr

4. observe the new active mgr will hit the assert condition

[Potential Regression]
The new active mgr would be required to process multiple service map, it might slow down a little bit on the fail-over process, but still much better than crash.

[Other info]

Upstream bug tracker: https://tracker.ceph.com/issues/51835
Upstream PR: https://github.com/ceph/ceph/pull/45984
we need to backport it to octopus

dongdong tao (taodd)
description: updated
dongdong tao (taodd)
description: updated
Revision history for this message
dongdong tao (taodd) wrote :
description: updated
dongdong tao (taodd)
tags: added: sts
tags: added: sts-sru-needed
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "focal debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Thanks for the patch dongdong. I'd like to ask you to add some DEP-3 headers [1] to your patch, such as Origin (link to the upstream commit), Bug-Ubuntu (link to this bug) and Bug (link to the upstream bug). Could you please add them?

[1] https://dep-team.pages.debian.net/deps/dep3/

Revision history for this message
Dave Jones (waveform) wrote :

A question about the patch: it doesn't apply cleanly against the current focal version of ceph. Specifically, the parts that add the patch into the packaging (the changes directly under debian/) are fine, but there's other changes under src/test which appear to re-create pre-existing files (and which wouldn't be acceptable anyway as they'd be touching the orig tar-ball content).

In addition to the changes requested by Lucas, could you upload another version without the changes under src/test?

Revision history for this message
Dave Jones (waveform) wrote :

Unsubscribing ~ubuntu-sponsors; please re-subscribe once comments 3 and 4 above are addressed.

dongdong tao (taodd)
tags: removed: patch
Revision history for this message
dongdong tao (taodd) wrote :

new debdiff file uploaded, please take a look

Revision history for this message
James Page (james-page) wrote :

Upload made to UNAPPROVED queue for SRU team review in focal.

Changed in ceph (Ubuntu):
status: New → Invalid
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello dongdong, or anyone else affected,

Accepted ceph into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/15.2.17-0ubuntu0.20.04.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ceph (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
dongdong tao (taodd) wrote :

I've tested the proposed package and it can fix the issue.

Test steps.
1. Deploy a 15.2.16 ceph cluster

2. upgrade it to 15.2.17, inject multiple service map to the monitor

3. stop the active mgr

4. observe the new active mgr will no longer hit the assert condition

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

This verification is ok, but the SRU is blocked on the ambiguous verification of #1996010.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 15.2.17-0ubuntu0.20.04.5

---------------
ceph (15.2.17-0ubuntu0.20.04.5) focal; urgency=medium

   * d/p/bluestore-leak-fix.patch: Fix leak in bluestore cache (LP: #1996010).
   * d/p/bail-after-error.patch: Bail after exception in mon (LP: #1969000).
   * d/p/relax-epoch.patch: Relax epoch-based assertions (LP: #2019293).

 -- Luciano Lo Giudice <email address hidden> Fri, 22 Sep 2023 09:21:41 +0100

Changed in ceph (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.