[SRU] bluefs doesn't compact log file

Bug #1914911 reported by dongdong tao
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Queens
Fix Released
High
Unassigned
ceph (Ubuntu)
Fix Released
Undecided
Ponnuvel Palaniyappan
Bionic
Fix Released
High
Ponnuvel Palaniyappan

Bug Description

[Impact]

For a certain type of workload, the bluefs might never compact the log file, which would cause the bluefs log file slowly grows to a huge size (some bigger than 1TB for a 1.5T device).

There are more details in the bluefs perf counters when this issue happened:
e.g.
"bluefs": {
"gift_bytes": 811748818944,
"reclaim_bytes": 0,
"db_total_bytes": 888564350976,
"db_used_bytes": 867311747072,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 0,
"slow_used_bytes": 0,
"num_files": 11,
"log_bytes": 866545131520,
"log_compactions": 0,
"logged_bytes": 866542977024,
"files_written_wal": 2,
"files_written_sst": 3,
"bytes_written_wal": 32424281934,
"bytes_written_sst": 25382201
}

This bug could eventually cause osd crash and failed to restart as it couldn't get through the bluefs replay phase during boot time.
We might see below log when trying to restart the osd:
bluefs mount failed to replay log: (5) Input/output error

As we can see the log_compactions is 0, which means it's never compacted and the log file size(log_bytes) is already 800+G. After the compaction, the log file size would need to be reduced to around 1G.

[Test Case]

Deploy a test ceph cluster (Luminous 12.2.13 which has the bug) and drive I/O. The compaction doesn't get triggered often when most I/O are reads. So fill up the cluster initially with lots of writes and then start reading heavy reads (no writes). Then the problem should occur. Smaller sized OSDs are OK as we'are only interested filling up the OSD and grow the bluefs log.

[Where problems could occur]

This fix has been part of all upstream releases since Mimic, so there's been quite good "runtime".
The changes ensure that compaction happens more often. But that's not going to cause any problem.
I can't see any real problems.

[Other Info]
 - It's only needed for Luminous (Bionic). All new releases since have this already.
 - Upstream master PR: https://github.com/ceph/ceph/pull/17354
 - Upstream Luminous PR: https://github.com/ceph/ceph/pull/34876/files

Changed in ceph (Ubuntu):
assignee: nobody → Ponnuvel Palaniyappan (pponnuvel)
Changed in ceph (Ubuntu Bionic):
status: New → In Progress
Changed in ceph (Ubuntu Bionic):
assignee: nobody → Ponnuvel Palaniyappan (pponnuvel)
tags: added: sts-sru-needed
summary: - bluefs doesn't compact log file
+ [SRU] bluefs doesn't compact log file
description: updated
tags: added: seg
description: updated
Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

Attached debdiff for bionic (fixed the previous patch which had additional unnecessary changes).

Revision history for this message
James Page (james-page) wrote :

Hi Pon

The patch in #3 appears to be malformed in some way:

patching file debian/changelog
patching file debian/patches/bug1914911.patch
patch: **** malformed patch at line 67: diff -Nru ceph-12.2.13/debian/patches/series ceph-12.2.13/debian/patches/series

also it would make it slightly easier for sponsoring if you leave the changelog entry as UNRELEASED.

Also please feel free to raise merge proposals again the git repository rather than attaching patches to bug reports!

https://code.launchpad.net/~ubuntu-server-dev/ubuntu/+source/ceph/+git/ceph

Changed in ceph (Ubuntu Bionic):
importance: Undecided → High
Changed in ceph (Ubuntu):
status: New → Invalid
Changed in cloud-archive:
status: New → Invalid
Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

Hi James,

I manually edited the debdiff because it produced a lot of cruft when doing debdiff <old-dsc> <new-dsc> which aren't relevant to the patch (that probably cause the issue). Perhaps that's not an issue after all?

I created the patch again and attached here (that does contain the cruft I noted) and also left the release as UNRELEASED. Please verify if this is OK.

I wasn't aware that I could propose changes against git repos for SRUs. I can try that route next time :)

Thanks!

Revision history for this message
James Page (james-page) wrote :

Great thanks!

Revision history for this message
James Page (james-page) wrote :

I've commit to the git repo - test packages soon.

Revision history for this message
James Page (james-page) wrote :

Updated to bionic UNAPPROVED for SRU team review.

Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello dongdong, or anyone else affected,

Accepted ceph into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/12.2.13-0ubuntu0.18.04.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ceph (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
dongdong tao (taodd) wrote :

Verified the bionic-proposed ceph package, can confirm the bluefs compaction performed even with a very low workload.

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello dongdong, or anyone else affected,

Accepted ceph into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-queens-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 12.2.13-0ubuntu0.18.04.7

---------------
ceph (12.2.13-0ubuntu0.18.04.7) bionic; urgency=medium

  * d/p/bug1914911.patch: cherry pick fix to ensure more regular compaction
    of the bluefs log (LP: #1914911).

 -- Ponnuvel Palaniyappan <email address hidden> Fri, 26 Mar 2021 09:35:30 +0000

Changed in ceph (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

Verified that log compactions occur in bluefs with read/write I/O. Attaching test notes.

tags: added: verification-queens-done
removed: verification-queens-needed
Revision history for this message
James Page (james-page) wrote :

UCA/queens update already released.

Mathew Hodson (mhodson)
Changed in ceph (Ubuntu):
status: Invalid → Fix Released
Changed in cloud-archive:
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.