xfs_btree_cur leak

Bug #1327360 reported by Michael S. Fischer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
Medium
Unassigned
Precise
Fix Released
Medium
Joseph Salisbury

Bug Description

There appears to be a kernel memory leak of xfs_btree_cur in recent Precise kernels (3.2.0-45 and -63 are affected, for sure). The slab can grow unbounded; we've seen it grow larger than 32GB via slabtop. The affected hosts have XFS mounted on / (root filesystem).

We have another host running 3.2.0-38 in which we do not see this problem (it has a 37TB XFS filesystem mounted, but not on root).

Upgrading to the latest 3.8.0-41 kernel via linux-generic-lts-raring seems to resolve the issue.

Revision history for this message
Michael S. Fischer (otterley) wrote :

 $ uname -a
Linux REDACTED 3.2.0-45-generic #70-Ubuntu SMP Wed May 29 20:12:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ sudo grep xfs_btree_cur /proc/slabinfo
xfs_btree_cur 26695461 26695461 208 39 2 : tunables 0 0 0 : slabdata 684499 684499 0

--

$ uname -a
Linux REDACTED 3.8.0-41-generic #60~precise1-Ubuntu SMP Fri May 16 00:18:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ sudo grep xfs_btree_cur /proc/slabinfo
xfs_btree_cur 936 936 208 39 2 : tunables 0 0 0 : slabdata 24 24 0

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1327360

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Michael S. Fischer (otterley) wrote :

See http://www.redhat.com/archives/dm-devel/2012-July/msg00015.html for a confirmation of this issue on mainline 3.5rc4 (since fixed).

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key precise
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Strange, I see that the commit that fixes this bug was applied to mainline twice. Once in v3.5-rc4 and then again in v3.6-rc1:

76d0953 xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
079da28 xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near

$ git describe --contains 76d0953
v3.5-rc4~7^2~3

$ git describe --contains 079da28
v3.6-rc1~42^2~36

At any rate, neither was cc'd to stable, so the fix will not be in Precise.

I'll build a Precise test kernel with a cherry-pick of commit 079da28.

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Confirmed → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Precise test kernel with a cherry-pick of commit 079da28. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1327360/

Can you test this kernel and see if it resolves this bug? If it does, we can send an SRU request to the kernel-team mailing list for inclusion in Precise.

Revision history for this message
Michael S. Fischer (otterley) wrote : Re: [Bug 1327360] Re: xfs_btree_cur leak

Thanks! We'd appreciate some time, as we have largely dealt with the
issue by upgrading to the 3.8.0 backport kernel, and I'm going on
vacation next week. I'll see if one of my colleagues can get back to
you.

Revision history for this message
Michael S. Fischer (otterley) wrote :

The patched did not seem to help.

$ uname -rv
3.2.0-65-generic #98~lp1327360v1 SMP Fri Jun 13 21:24:49 UTC 2014
$ uptime
 18:02:53 up 19:30, 1 user, load average: 1.62, 1.71, 1.68
$ sudo grep xfs_btree_cur /proc/slabinfo
xfs_btree_cur 161382 161382 208 39 2 : tunables 0 0 0 : slabdata 4138 4138 0

Compare to a similar host in the pool:

$ uname -rv
3.8.0-42-generic #62~precise1-Ubuntu SMP Wed Jun 4 22:04:18 UTC 2014
$ uptime
 18:01:59 up 5 days, 16:26, 1 user, load average: 13.43, 11.69, 10.94
$ sudo grep xfs_btree_cur /proc/slabinfo
xfs_btree_cur 936 936 208 39 2 : tunables 0 0 0 : slabdata 24 24 0

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Ok, re-reading the mail archive, it looks like the following commit is needed:

e3a746f5 xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near

I'll backport this commit and build another test kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Precise test kernel with a cherry-pick of commit e3a746f5 . The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1327360/

Can you test this kernel and see if it resolves this bug?

Changed in linux (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
Michael S. Fischer (otterley) wrote :

I ran this kernel on a relatively busy server over the past 5 days and it looks like the second version of your patched kernel has closed the memory leak and resolved the bug.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Does the patched kernel continue to help? If so, I can request that commit e3a746f5 is included in upstream 3.2. It which case, it will be included in Precise through the normal stable update process.

Revision history for this message
Michael S. Fischer (otterley) wrote :

Yes, it did continue to help. I'd suggest it be included for future
stable updates.

Changed in linux (Ubuntu Precise):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → nobody
status: Incomplete → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This patch was requested and accepted in upstream 3.2.62. Precise now has the 3.2.62 upstream updates as of Ubuntu-3.2.0-68.102, which is in the -proposed repository.

Would it be possible for you to test this latest kernel and post back if it resolves this bug?
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Thank you in advance!

Changed in linux (Ubuntu Precise):
status: In Progress → Fix Committed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you apply the latest updates and confirm if this bug is fixed or not?

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.