xfs_btree_cur leak

Bug #1327360 reported by Michael S. Fischer on 2014-06-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Precise
Medium
Joseph Salisbury

Bug Description

There appears to be a kernel memory leak of xfs_btree_cur in recent Precise kernels (3.2.0-45 and -63 are affected, for sure). The slab can grow unbounded; we've seen it grow larger than 32GB via slabtop. The affected hosts have XFS mounted on / (root filesystem).

We have another host running 3.2.0-38 in which we do not see this problem (it has a 37TB XFS filesystem mounted, but not on root).

Upgrading to the latest 3.8.0-41 kernel via linux-generic-lts-raring seems to resolve the issue.

Michael S. Fischer (otterley) wrote :

 $ uname -a
Linux REDACTED 3.2.0-45-generic #70-Ubuntu SMP Wed May 29 20:12:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ sudo grep xfs_btree_cur /proc/slabinfo
xfs_btree_cur 26695461 26695461 208 39 2 : tunables 0 0 0 : slabdata 684499 684499 0

--

$ uname -a
Linux REDACTED 3.8.0-41-generic #60~precise1-Ubuntu SMP Fri May 16 00:18:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ sudo grep xfs_btree_cur /proc/slabinfo
xfs_btree_cur 936 936 208 39 2 : tunables 0 0 0 : slabdata 24 24 0

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1327360

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Michael S. Fischer (otterley) wrote :

See http://www.redhat.com/archives/dm-devel/2012-July/msg00015.html for a confirmation of this issue on mainline 3.5rc4 (since fixed).

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key precise
Joseph Salisbury (jsalisbury) wrote :

Strange, I see that the commit that fixes this bug was applied to mainline twice. Once in v3.5-rc4 and then again in v3.6-rc1:

76d0953 xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
079da28 xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near

$ git describe --contains 76d0953
v3.5-rc4~7^2~3

$ git describe --contains 079da28
v3.6-rc1~42^2~36

At any rate, neither was cc'd to stable, so the fix will not be in Precise.

I'll build a Precise test kernel with a cherry-pick of commit 079da28.

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Confirmed → In Progress
Joseph Salisbury (jsalisbury) wrote :

I built a Precise test kernel with a cherry-pick of commit 079da28. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1327360/

Can you test this kernel and see if it resolves this bug? If it does, we can send an SRU request to the kernel-team mailing list for inclusion in Precise.

Thanks! We'd appreciate some time, as we have largely dealt with the
issue by upgrading to the 3.8.0 backport kernel, and I'm going on
vacation next week. I'll see if one of my colleagues can get back to
you.

Michael S. Fischer (otterley) wrote :

The patched did not seem to help.

$ uname -rv
3.2.0-65-generic #98~lp1327360v1 SMP Fri Jun 13 21:24:49 UTC 2014
$ uptime
 18:02:53 up 19:30, 1 user, load average: 1.62, 1.71, 1.68
$ sudo grep xfs_btree_cur /proc/slabinfo
xfs_btree_cur 161382 161382 208 39 2 : tunables 0 0 0 : slabdata 4138 4138 0

Compare to a similar host in the pool:

$ uname -rv
3.8.0-42-generic #62~precise1-Ubuntu SMP Wed Jun 4 22:04:18 UTC 2014
$ uptime
 18:01:59 up 5 days, 16:26, 1 user, load average: 13.43, 11.69, 10.94
$ sudo grep xfs_btree_cur /proc/slabinfo
xfs_btree_cur 936 936 208 39 2 : tunables 0 0 0 : slabdata 24 24 0

Joseph Salisbury (jsalisbury) wrote :

Ok, re-reading the mail archive, it looks like the following commit is needed:

e3a746f5 xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near

I'll backport this commit and build another test kernel.

Joseph Salisbury (jsalisbury) wrote :

I built a Precise test kernel with a cherry-pick of commit e3a746f5 . The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1327360/

Can you test this kernel and see if it resolves this bug?

Changed in linux (Ubuntu):
status: In Progress → Incomplete
Michael S. Fischer (otterley) wrote :

I ran this kernel on a relatively busy server over the past 5 days and it looks like the second version of your patched kernel has closed the memory leak and resolved the bug.

Joseph Salisbury (jsalisbury) wrote :

Does the patched kernel continue to help? If so, I can request that commit e3a746f5 is included in upstream 3.2. It which case, it will be included in Precise through the normal stable update process.

Michael S. Fischer (otterley) wrote :

Yes, it did continue to help. I'd suggest it be included for future
stable updates.

Changed in linux (Ubuntu Precise):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → nobody
status: Incomplete → In Progress
Joseph Salisbury (jsalisbury) wrote :

This patch was requested and accepted in upstream 3.2.62. Precise now has the 3.2.62 upstream updates as of Ubuntu-3.2.0-68.102, which is in the -proposed repository.

Would it be possible for you to test this latest kernel and post back if it resolves this bug?
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Thank you in advance!

Changed in linux (Ubuntu Precise):
status: In Progress → Fix Committed
Joseph Salisbury (jsalisbury) wrote :

Can you apply the latest updates and confirm if this bug is fixed or not?

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers