[SRU] the leak in bluestore_cache_other mempool

Bug #1996010 reported by dongdong tao
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
New
Undecided
Unassigned
Ussuri
Fix Released
Undecided
Unassigned
Wallaby
Fix Released
Undecided
Unassigned
Xena
Fix Released
Undecided
Unassigned
Yoga
Fix Released
Undecided
Unassigned
ceph (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned
Kinetic
Fix Released
Undecided
Unassigned
Lunar
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

This issue has been observed from ceph octopus 15.2.16.
Bluestore's onode cache might be completely disabled because of the entry leak happened in bluestore_cache_other mempool.

Below log shows the cache's maximum size had become 0:
------
2022-10-25T00:47:26.562+0000 7f424f78e700 30 bluestore.MempoolThread(0x564a9dae2a68) _resize_shards max_shard_onodes: 0 max_shard_buffer: 8388608
-------

The dump_mempools bluestore_cache_other had consumed most majority of the cache due to the leak while only 3 onodes (2 of them are pinned) are in the cache:
---------------
"bluestore_cache_onode": {
"items": 3,
"bytes": 1848
},
"bluestore_cache_meta": {
"items": 13973,
"bytes": 111338
},
"bluestore_cache_other": {
"items": 5601156,
"bytes": 224152996
},
"bluestore_Buffer": {
"items": 1,
"bytes": 96
},
"bluestore_Extent": {
"items": 20,
"bytes": 960
},
"bluestore_Blob": {
"items": 8,
"bytes": 832
},
"bluestore_SharedBlob": {
"items": 8,
"bytes": 896
},
--------------

This could cause the io experiencing high latency as the 0 sized cache will significantly increasing the need to fetch the meta data from rocksdb or even from disk.
Another impact is that this can significantly increase the possibility of hitting the race condition in Onode::put [2], which will crash the osds, especially in large cluster.

[Test Case]

1. Deploy a 15.2.16 ceph cluster

2. Create enough rbd images to spread all over the OSDs

3. Stressingthem with fio 4k randwrite workload in parallel until the OSDs got enough onodes in its cache (more than 60k onodes and you'll see the bluestore_cache_other is over 1 GB):

   fio --name=randwrite --rw=randwrite --ioengine=rbd --bs=4k --direct=1 --numjobs=1 --size=100G --iodepth=16 --clientname=admin --pool=bench --rbdname=test

4. Shrink the pg_num to a very low number so that pgs per osd is around 1.
Once the shrink finished

5. Enable debug_bluestore=20/20, we can observe a 0 sized onode cache by grep max_shard_onodes. Also can observe the leaked bluestore_cache_other mempool via "ceph daemon osd.id dump_mempools"

[Potential Regression]
The patch correct the apparent wrong AU calculation of the bluestore_cache_other pool, it wouldn't increase any regression.

[Other Info]
The patch[1] had been backported to upstream Pacific and Quincy, but not Octopus.
Pacific is going to have it on 16.2.11 which is still pending.
Quincy already had it in 17.2.4

We'll need to backport this fix to Octopus.

[1]https://github.com/ceph/ceph/pull/46911

[2]https://tracker.ceph.com/issues/56382

dongdong tao (taodd)
description: updated
summary: - the leak in bluestore_cache_other mempool
+ [SRU] the leak in bluestore_cache_other mempool
tags: added: sts-sru-needed
tags: added: seg
Revision history for this message
dongdong tao (taodd) wrote :

This is the debdiff based on focal proposed 15.2.17

Revision history for this message
dongdong tao (taodd) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "focal-15.2.17-debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
dongdong tao (taodd) wrote :

pacific debdiff uploaded based on focal-xena

affects: cloud-archive → xena
dongdong tao (taodd)
affects: xena → cloud-archive
no longer affects: cloud-archive/victoria
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ceph (Ubuntu Focal):
status: New → Confirmed
Changed in ceph (Ubuntu):
status: New → Confirmed
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

It looks like only Quincy has a point release in-flight [1] that will pick up this fix; we should discuss if it's better to pick up a Pacific point release for this rather than a specific patch, but we should prioritize this fix after the references point release.

1: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1998958

Revision history for this message
Robie Basak (racb) wrote :

This patch is in the general sponsorship queue, but it was added automatically and it's not clear to me if this is ready/wanted by the openstack team. Please could you clarify?

Changed in ceph (Ubuntu Lunar):
status: Confirmed → Fix Released
Changed in ceph (Ubuntu Kinetic):
status: New → Fix Released
Changed in ceph (Ubuntu Jammy):
status: New → Fix Released
Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

Newer point releases (both Ubuntu and Ubuntu Cloud Archive) have got the fixes for Pacific & Quincy:
Wallaby: 16.2.11-0ubuntu0.21.04.1~cloud0
xena: 16.2.11-0ubuntu0.21.10.1~cloud0
yoga: 17.2.5-0ubuntu0.22.04.3~cloud0
jammy: 17.2.5-0ubuntu0.22.04.3
kinetic: 17.2.5-0ubuntu0.22.10.3
lunar: 17.2.5-0ubuntu2
mantic: 17.2.6-0ubuntu1

SRU needed for Ussuri and Focal.

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

As Robie mentioned in comment #8, it is not clear to me if this SRU to Focal will be handled by the OpenStack team or if you want help to get this landed. Could you please clarify that? In case the OpenStack team is going to handle this, please unsubscribe ~ubuntu-sponsors.

I just took a quick look and your debdiff in comment #1 is outdated, you need to rebase your changes against the latest version in focal-updates which is 15.2.17-0ubuntu0.20.04.4.

Revision history for this message
dongdong tao (taodd) wrote :

New debdiff file attached

dongdong tao (taodd)
tags: removed: patch
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Still waiting on the information request on comments #8 and #10.

Revision history for this message
dongdong tao (taodd) wrote :

I've removed the "patch" tag, i believe it should be handled by the openstack team like usual.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Unsubscribing sponsors too.

Revision history for this message
James Page (james-page) wrote :

Upload made to UNAPPROVED queue for SRU team review in focal.

Changed in ceph (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello dongdong, or anyone else affected,

Accepted ceph into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/15.2.17-0ubuntu0.20.04.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ceph (Ubuntu Focal):
status: Confirmed → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello dongdong, or anyone else affected,

Accepted ceph into ussuri-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ussuri-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ussuri-needed to verification-ussuri-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ussuri-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ussuri-needed
Revision history for this message
dongdong tao (taodd) wrote (last edit ):

I've tested the proposed package, it can fix the reported bluestore_other_pool leak problem

Testing steps:

1. Deploy a new ceph cluster with the proposed package.

2. Create enough rbd images to spread all over the OSDs

3. Stressingthem with fio 4k randwrite workload in parallel until the OSDs got enough onodes in its cache (more than 60k onodes and you'll see the bluestore_cache_other is over 1 GB):

   fio --name=randwrite --rw=randwrite --ioengine=rbd --bs=4k --direct=1 --numjobs=1 --size=100G --iodepth=16 --clientname=admin --pool=bench --rbdname=test

4. Shrink the pg_num to a very low number so that pgs per osd is around 1.
Once the shrink finished

5. Enable debug_bluestore=20/20, we can no longer observe a 0-sized onode cache by grep max_shard_onodes.

tags: added: verification-done verification-focal-done verification-ussuri-done
removed: verification-needed verification-needed-focal verification-ussuri-needed
tags: added: verification-done-focal
removed: verification-focal-done
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

@taodd, thanks for your verification. But please be more specific about:

- the actual package you tested (full Ubuntu version, and where it came from)
- the Ubuntu release you are testing, or if it's the cloud archive, or both.

You changed both the focal and ussuri tags, but didn't make it clear in comment #19 which one you were testing, or that you tested both at the same time. You just said that you tested "the proposed package", but there are two: focal, and ussuri.

It might be that you tested, say, the cloud archive version, and assumed the same would apply for focal, or the other way around. We need an unambiguous verification in order to release the package to updates.

Revision history for this message
dongdong tao (taodd) wrote :

I believe I've done it against the focal-proposed package.

Anyway, I've spent some time to re-verify the package again.
I've performed the testing with both the focal-proposed (15.2.17-0ubuntu0.20.04.5) and cloud-archive: unsure-proposed (15.2.17-0ubuntu0.20.04.5~cloud0) ceph packages, and the testing result is good to me.

Testing steps are:

1. Deploy two new ceph cluster with the focal-proposed and ussuri-proposed ceph package respectively.

2. Create enough rbd images to spread all over the OSDs

3. Stressingthem with fio 4k randwrite workload in parallel until the OSDs got enough onodes in its cache (more than 60k onodes and you'll see the bluestore_cache_other is over 1 GB):

   fio --name=randwrite --rw=randwrite --ioengine=rbd --bs=4k --direct=1 --numjobs=1 --size=100G --iodepth=16 --clientname=admin --pool=bench --rbdname=test

4. Shrink the pg_num to a very low number so that pgs per osd is around 1.
Once the shrink finished

5. Enable debug_bluestore=20/20, we can no longer observe a 0-sized onode cache by grep max_shard_onodes.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 15.2.17-0ubuntu0.20.04.5

---------------
ceph (15.2.17-0ubuntu0.20.04.5) focal; urgency=medium

   * d/p/bluestore-leak-fix.patch: Fix leak in bluestore cache (LP: #1996010).
   * d/p/bail-after-error.patch: Bail after exception in mon (LP: #1969000).
   * d/p/relax-epoch.patch: Relax epoch-based assertions (LP: #2019293).

 -- Luciano Lo Giudice <email address hidden> Fri, 22 Sep 2023 09:21:41 +0100

Changed in ceph (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package ceph - 15.2.17-0ubuntu0.20.04.5~cloud0
---------------

 ceph (15.2.17-0ubuntu0.20.04.5~cloud0) bionic-ussuri; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 ceph (15.2.17-0ubuntu0.20.04.5) focal; urgency=medium
 .
    * d/p/bluestore-leak-fix.patch: Fix leak in bluestore cache (LP: #1996010).
    * d/p/bail-after-error.patch: Bail after exception in mon (LP: #1969000).
    * d/p/relax-epoch.patch: Relax epoch-based assertions (LP: #2019293).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.