[SRU] Active scrub blocks upmap balancer

Bug #1911900 reported by Ponnuvel Palaniyappan on 2021-01-15
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Medium
Unassigned
Ussuri
Medium
Unassigned
ceph (Ubuntu)
Medium
Unassigned
Bionic
Medium
Unassigned
Focal
Medium
Unassigned
Groovy
Medium
Unassigned
Hirsute
Medium
Unassigned

Bug Description

[Impact]

When scrubs are in progress, balancer stop due to the bug [0]. And shows:

<timestamp> calc_pg_upmaps abort due to max <= 0

in the logs.

Typically when deep-scrub is done in maintenance windows and can take few hours. If balancing is paused for the duration, it can affect client I/O performance later when balacing starts happening after deep-scrub is done.

This bug was introduced in Octopus. We need to backport upstream bug [0] to just Octopus. It's been fixed in upstream master branch [1].

[Test Case]

In an Octopus Ceph cluster that has some data (large enough to be able to notice balancing), take down one or more OSDs to introduce "unbalanced" objects.
Make sure Ceph balancer module is enabled and active (which should be the default case in Octopus).
Perform some I/O so that data goes to the rest of the OSDs.

Then start deep-scrubbing and re-add the previously taken down so balancing start to happen.

[Regression potential]

Low potential. This is a bug fix of a previously correct code.

If anything goes wrong, the balancer module might not function properly and thus
leaving the cluster unbalanced and potentially requiring manual balancing.

[Other Info]

It's been accepted upstream and backported to Octopus. Ref [0] and [1].

[0] https://tracker.ceph.com/issues/48309
[1] https://github.com/ceph/ceph/pull/38337

For Hirsute, James is working on a snapshot of Pacific and that should include the fix for this.

description: updated
tags: added: seg sts
Changed in cloud-archive:
status: New → In Progress
summary: - Active scrub blocks upmap balancer
+ [SRU] Active scrub blocks upmap balancer
description: updated
no longer affects: ceph (Ubuntu Hirsute)
no longer affects: ceph (Ubuntu Groovy)
no longer affects: ceph (Ubuntu Focal)
affects: ceph (Ubuntu) → focal (Ubuntu)
affects: focal (Ubuntu) → ceph (Ubuntu)
tags: added: sts-sru-needed
Changed in cloud-archive:
importance: Undecided → Medium
Changed in ceph (Ubuntu Bionic):
importance: Undecided → Medium
Changed in ceph (Ubuntu Focal):
importance: Undecided → Medium
Changed in ceph (Ubuntu Groovy):
importance: Undecided → Medium
Changed in ceph (Ubuntu Hirsute):
importance: Undecided → Medium

Attaching debdiff for Focal/Octopus.

Changed in ceph (Ubuntu Bionic):
status: New → In Progress

The attachment "focal-octopus.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
description: updated
Changed in ceph (Ubuntu Hirsute):
status: New → Triaged
Changed in ceph (Ubuntu Groovy):
status: New → Triaged
Changed in ceph (Ubuntu Focal):
status: New → Triaged
Changed in cloud-archive:
status: In Progress → Invalid

Thanks, Corey, for catching the unintended change in the previous debdiff. I've corrected it and attached a new debdiff here.

Corey Bryant (corey.bryant) wrote :

Thanks very much Ponnuvel. I've pushed these changes to the groovy and focal package branches. The builds are failing for unrelated 15.2.8 point release updates and I've asked Chris to take a look at those. Once the package gets uploaded to groovy/focal we'll need to subscribe ubuntu-sru to this bug.

Hello Ponnuvel, or anyone else affected,

Accepted ceph into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/15.2.8-0ubuntu0.20.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ceph (Ubuntu Groovy):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-groovy
Changed in ceph (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed-focal
Łukasz Zemczak (sil2100) wrote :

Hello Ponnuvel, or anyone else affected,

Accepted ceph into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/15.2.8-0ubuntu0.20.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Hello Ponnuvel, or anyone else affected,

Accepted ceph into ussuri-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ussuri-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ussuri-needed to verification-ussuri-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ussuri-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ussuri-needed

focal packages are installed & verified for ceph 15.2.8. Attached the steps used.

tags: added: verification-done-focal
removed: verification-needed-focal

Ussuri packages are installed and verified (15.2.8-0ubuntu0.20.04.1~cloud0).

In tests for focal and ussuri, rados bench was used to drive some I/O and randomly chosen PGs were repeated deeb-scrubbed in a loop (ceph pg deep-scrub <pg>) to keep scrubbing process run at all times.

tags: added: verification-ussuri-done
removed: verification-ussuri-needed
Corey Bryant (corey.bryant) wrote :

@Ponnuvel, by any chance would you be able to verify this on groovy?

Corey Bryant (corey.bryant) wrote :

This is fix released in hirsute via 16.1.0

Changed in ceph (Ubuntu Hirsute):
status: Triaged → Fix Released

Same steps followed for installing & test groovy ceph packages (15.2.8-0ubuntu0.20.10.1)

tags: added: verification-done-groovy
removed: verification-needed-groovy
tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 15.2.8-0ubuntu0.20.10.1

---------------
ceph (15.2.8-0ubuntu0.20.10.1) groovy; urgency=medium

  [ Chris MacNaughton ]
  * New upstream point release (LP: #1912355):
    - d/cephadm.install, d/librgw-dev.install, d/librgw2.install: Upstream
      point release removes files that were being installed.
    - d/rules: Remove installation of /etc/sudoers.d/cephadm as it is
      removed upstream.
  * d/p/disable-log-slow-requests.patch: Remove logging every slow request
    details to monitors LP: #1909162).

  [ Ponnuvel Palaniyappan ]
  * d/p/bug1911900-fix-scrub-blocking-balancer.patch:
    Prevent scrub from stopping balancer (LP: #1911900)

 -- Ponnuvel Palaniyappan <email address hidden> Thu, 04 Feb 2021 11:18:13 +0000

Changed in ceph (Ubuntu Groovy):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 15.2.8-0ubuntu0.20.04.1

---------------
ceph (15.2.8-0ubuntu0.20.04.1) focal; urgency=medium

  [ Chris MacNaughton ]
  * New upstream point release (LP: #1912355):
    - d/rules,cephadm.install,librgw-dev.install,librgw2.install: Drop files
      no longer included in point release.
  * d/p/disable-log-slow-requests.patch: Remove logging every slow request
    details to monitors LP: #1909162).

  [ Ponnuvel Palaniyappan ]
  * d/p/bug1911900-fix-scrub-blocking-balancer.patch:
    Prevent scrub from stopping balancer (LP: #1911900)

 -- Ponnuvel Palaniyappan <email address hidden> Thu, 04 Feb 2021 11:28:51 +0000

Changed in ceph (Ubuntu Focal):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

This bug was fixed in the package ceph - 15.2.8-0ubuntu0.20.04.1~cloud0
---------------

 ceph (15.2.8-0ubuntu0.20.04.1~cloud0) bionic-ussuri; urgency=medium
 .
   * New upstream release for the Ubuntu Cloud Archive.
 .
 ceph (15.2.8-0ubuntu0.20.04.1) focal; urgency=medium
 .
   [ Chris MacNaughton ]
   * New upstream point release (LP: #1912355):
     - d/rules,cephadm.install,librgw-dev.install,librgw2.install: Drop files
       no longer included in point release.
   * d/p/disable-log-slow-requests.patch: Remove logging every slow request
     details to monitors LP: #1909162).
 .
   [ Ponnuvel Palaniyappan ]
   * d/p/bug1911900-fix-scrub-blocking-balancer.patch:
     Prevent scrub from stopping balancer (LP: #1911900)

Edward Hope-Morley (hopem) wrote :

Hi Pon, if you still need Bionic SRU for this one can you attach a debdiff for bionic. Thanks.

Changed in ceph (Ubuntu Bionic):
status: In Progress → New
Changed in cloud-archive:
assignee: Ponnuvel Palaniyappan (pponnuvel) → nobody

The regression that this patch fixes wasn't introduced (or backported to) Luminous by upstream. So this doesn't affect Bionic (confirmed by checking Ubuntu's latest Bionic source too).

Changed in ceph (Ubuntu Bionic):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers