memory leak in hv_storvsc (3.13.0-63-generic)

Bug #1499203 reported by Oskar Liljeblad
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Trusty
Fix Released
High
Joseph Salisbury

Bug Description

Slab and SUnreclaim values in /proc/meminfo keep increasing. On one servers it reached 85% of physical memory after 14 days - but on most other servers it increases more slowly. I checked /proc/slabinfo and almost all allocations were in kmalloc-512. So I enabled "slub_debug=U,kmalloc-512" on one server, and after only 24h of uptime 11% of the memory was used by kmalloc-512 and unreclaimable. With debugging enabled I could see the following in /sys/kernel/slab/kmalloc-512/alloc_calls:

521294 storvsc_queuecommand+0x359/0x790 [hv_storvsc] age=161922/955116/20882927 pid=1-41545

All other counters were below 2000. In /sys/kernel/slab/kmalloc-512/free_calls I see the following:

516823 <not-available> age=4315783846 pid=0

The hv_storvsc module is for Hyper-V. We are (unfortunately) running Hyper-V 6.3.9600.16384 with Microsoft System Center 2012 R2 Update rollup 3 for all the servers with this issue.

Kernels are stock linux-image-3.13.0-63-generic, 3.13.0-63.103, x86_64, from Ubuntu 14.04 LTS . /proc/version_signature contains:

  Ubuntu 3.13.0-63.103-generic 3.13.11-ckt25

No output from lspci -vnvn. The problem described above happens on both single and multicore virtual machines. CPU in hypervisors are E5-2630 v2 @ 2.60GHz. Let me know if you need more info or if I can do more debugging.

Regards,

Oskar Liljeblad
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Sep 24 00:31 seq
 crw-rw---- 1 root audio 116, 33 Sep 24 00:31 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.13
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg:
 [59081.977909] systemd-udevd[26480]: starting version 204
 [59124.051974] init: systemd-logind main process (756) killed by TERM signal
DistroRelease: Ubuntu 14.04
InstallationDate: Installed on 2014-09-09 (380 days ago)
InstallationMedia: Ubuntu-Server 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.3)
IwConfig:
 eth0 no wireless extensions.

 eth1 no wireless extensions.

 lo no wireless extensions.
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
MachineType: Microsoft Corporation Virtual Machine
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 hyperv_fb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-63-generic.efi.signed root=UUID=f4d228d6-2eee-40fc-bf3f-633e46fa8301 ro slub_debug=U,kmalloc-512
ProcVersionSignature: Ubuntu 3.13.0-63.103-generic 3.13.11-ckt25
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-63-generic N/A
 linux-backports-modules-3.13.0-63-generic N/A
 linux-firmware 1.127.15
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
Uname: Linux 3.13.0-63-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

WifiSyslog:
 Sep 24 02:06:19 adm-backup1 dhclient: message repeated 1447 times: [ DHCPREQUEST of 10.40.128.9 on eth0 to 192.0.2.253 port 67 (xid=0x429dad4)]
 Sep 24 02:06:37 adm-backup1 dhclient: DHCPREQUEST of 10.40.128.9 on eth0 to 255.255.255.255 port 67 (xid=0x429dad4)
 Sep 24 02:06:37 adm-backup1 dhclient: DHCPACK of 10.40.128.9 from 192.0.2.253
 Sep 24 02:06:37 adm-backup1 dhclient: bound to 10.40.128.9 -- renewal in 44877 seconds.
_MarkForUpload: True
dmi.bios.date: 11/26/2012
dmi.bios.vendor: Microsoft Corporation
dmi.bios.version: Hyper-V UEFI Release v1.0
dmi.board.asset.tag: None
dmi.board.name: Virtual Machine
dmi.board.vendor: Microsoft Corporation
dmi.board.version: Hyper-V UEFI Release v1.0
dmi.chassis.asset.tag: 6126-4244-1659-0314-3158-3955-44
dmi.chassis.type: 3
dmi.chassis.vendor: Microsoft Corporation
dmi.chassis.version: Hyper-V UEFI Release v1.0
dmi.modalias: dmi:bvnMicrosoftCorporation:bvrHyper-VUEFIReleasev1.0:bd11/26/2012:svnMicrosoftCorporation:pnVirtualMachine:pvrHyper-VUEFIReleasev1.0:rvnMicrosoftCorporation:rnVirtualMachine:rvrHyper-VUEFIReleasev1.0:cvnMicrosoftCorporation:ct3:cvrHyper-VUEFIReleasev1.0:
dmi.product.name: Virtual Machine
dmi.product.version: Hyper-V UEFI Release v1.0
dmi.sys.vendor: Microsoft Corporation

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1499203

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Oskar Liljeblad (oskar) wrote : BootDmesg.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Oskar Liljeblad (oskar) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Oskar Liljeblad (oskar) wrote : ProcEnviron.txt

apport information

Revision history for this message
Oskar Liljeblad (oskar) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Oskar Liljeblad (oskar) wrote : ProcModules.txt

apport information

Revision history for this message
Oskar Liljeblad (oskar) wrote : UdevDb.txt

apport information

Revision history for this message
Oskar Liljeblad (oskar) wrote : UdevLog.txt

apport information

Oskar Liljeblad (oskar)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key kernel-hyper-v
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to see if this is a regression. Would it be possible for you to test the Ubuntu-3.13.0-59 kernel? It can be downloaded from:
https://launchpad.net/~ubuntu-security/+archive/ubuntu/ppa/+build/7723132

You would need to install the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Changed in linux (Ubuntu Trusty):
importance: Undecided → High
Changed in linux (Ubuntu):
importance: Medium → High
Changed in linux (Ubuntu Trusty):
status: New → Confirmed
Revision history for this message
Oskar Liljeblad (oskar) wrote : Re: [Bug 1499203] Re: memory leak in hv_storvsc (3.13.0-63-generic)

Hi! I've installed the 3.13.0-59 kernel below on one of the (most)
troubled machines. We should be able to see results in less than a
day or so. I will get back to you!

Thanks

Oskar

On Wednesday, September 30, 2015 at 17:46, Joseph Salisbury wrote:
> I'd like to see if this is a regression. Would it be possible for you to test the Ubuntu-3.13.0-59 kernel? It can be downloaded from:
> https://launchpad.net/~ubuntu-security/+archive/ubuntu/ppa/+build/7723132
>
> You would need to install the linux-image and linux-image-extra .deb
> packages.
>
> Thanks in advance!
>
> ** Also affects: linux (Ubuntu Trusty)
> Importance: Undecided
> Status: New
>
> ** Changed in: linux (Ubuntu Trusty)
> Importance: Undecided => High
>
> ** Changed in: linux (Ubuntu)
> Importance: Medium => High
>
> ** Changed in: linux (Ubuntu Trusty)
> Status: New => Confirmed
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1499203
>
> Title:
> memory leak in hv_storvsc (3.13.0-63-generic)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1499203/+subscriptions
>

Revision history for this message
Oskar Liljeblad (oskar) wrote :

After a couple of days of uptime with 3.13.0-59 it seems there are no leaks, at
least not like before. There are no kmalloc-512 allocations by hv_storsvc at
all. Total Slab memory is 3.63% of MemTotal, and SUnreclaim is 0.41%. (4GB
MemTotal.)

Regards,

Oskar Liljeblad

On Wednesday, September 30, 2015 at 20:28, Oskar Liljeblad wrote:
> Hi! I've installed the 3.13.0-59 kernel below on one of the (most)
> troubled machines. We should be able to see results in less than a
> day or so. I will get back to you!
>
> Thanks
>
> Oskar
>
> On Wednesday, September 30, 2015 at 17:46, Joseph Salisbury wrote:
> > I'd like to see if this is a regression. Would it be possible for you to test the Ubuntu-3.13.0-59 kernel? It can be downloaded from:
> > https://launchpad.net/~ubuntu-security/+archive/ubuntu/ppa/+build/7723132
> >
> > You would need to install the linux-image and linux-image-extra .deb
> > packages.
> >
> > Thanks in advance!
> >
> > ** Also affects: linux (Ubuntu Trusty)
> > Importance: Undecided
> > Status: New
> >
> > ** Changed in: linux (Ubuntu Trusty)
> > Importance: Undecided => High
> >
> > ** Changed in: linux (Ubuntu)
> > Importance: Medium => High
> >
> > ** Changed in: linux (Ubuntu Trusty)
> > Status: New => Confirmed
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1499203
> >
> > Title:
> > memory leak in hv_storvsc (3.13.0-63-generic)
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1499203/+subscriptions
> >

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

To see if it is the cause of this issue, I built a test kernel with a revert of commit 97b2591. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1499203/

Can you test this kernel and see if it resolves this bug? If it does not, we would have to perform a kernel bisect to identify the commit that caused this regression.

Thanks in advance!

Revision history for this message
Oskar Liljeblad (oskar) wrote :

On Tuesday, October 06, 2015 at 21:24, Joseph Salisbury wrote:
> To see if it is the cause of this issue, I built a test kernel with a
> revert of commit 97b2591. The test kernel can be downloaded from:
>
> http://kernel.ubuntu.com/~jsalisbury/lp1499203/
>
> Can you test this kernel and see if it resolves this bug? If it does
> not, we would have to perform a kernel bisect to identify the commit
> that caused this regression.
>
> Thanks in advance!

The test kernel is up and running. I will let you know in a couple of
days.

Regards,

Oskar

Revision history for this message
Oskar Liljeblad (oskar) wrote :

On Wednesday, October 07, 2015 at 08:57, Oskar Liljeblad wrote:
> > To see if it is the cause of this issue, I built a test kernel with a
> > revert of commit 97b2591. The test kernel can be downloaded from:
> >
> > http://kernel.ubuntu.com/~jsalisbury/lp1499203/
> >
> > Can you test this kernel and see if it resolves this bug? If it does
> > not, we would have to perform a kernel bisect to identify the commit
> > that caused this regression.
> >
> > Thanks in advance!
>
> The test kernel is up and running. I will let you know in a couple of
> days.

The 3.13.0-66.107~lp1445195Commit97b2591Reverted kernel seem to work just
fine. No memory leaks as far as I can see.

Regards,

Oskar Liljeblad

Revision history for this message
Oskar Liljeblad (oskar) wrote :
Download full text (9.8 KiB)

On Friday, October 09, 2015 at 06:59, Oskar Liljeblad wrote:
> > > To see if it is the cause of this issue, I built a test kernel with a
> > > revert of commit 97b2591. The test kernel can be downloaded from:
> > >
> > > http://kernel.ubuntu.com/~jsalisbury/lp1499203/
[..]
> The 3.13.0-66.107~lp1445195Commit97b2591Reverted kernel seem to work just
> fine. No memory leaks as far as I can see.

By the way, I had to downgrade the kernel above to 3.13.0-65.106 on one
server because of some strange IO lockup issues. I'm afraid this won't be
of much help, but I'm writing it anyway.
It started 1 minute after boot with the new kernel:

Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106544] BUG: unable to handle kernel NULL pointer dereference at (null)
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106592] IP: [<ffffffff81206c5b>] eventpoll_release_file+0x2b/0xa0
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106624] PGD 1f72db067 PUD 1fa753067 PMD 0
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106659] Oops: 0000 [#1] SMP
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106684] Modules linked in: joydev hid_generic mac_hid serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd nls_iso8859_1 hid_hyperv hyperv_fb hid hyperv_keyboard lp parport hv_netvsc hv_utils hv_storvsc hv_vmbus
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106848] CPU: 1 PID: 1286 Comm: mongod Not tainted 3.13.0-66-generic #107~lp1445195Commit97b2591Reverted
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106884] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106923] task: ffff8801f722c800 ti: ffff8801f72ce000 task.ti: ffff8801f72ce000
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106950] RIP: 0010:[<ffffffff81206c5b>] [<ffffffff81206c5b>] eventpoll_release_file+0x2b/0xa0
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.106986] RSP: 0018:ffff8801f72cfe78 EFLAGS: 00010246
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107006] RAX: 0000000000000000 RBX: ffff8801f775e300 RCX: 0000000040000010
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107032] RDX: 0000000001000000 RSI: 0000000000000000 RDI: ffffffff81c72e80
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107058] RBP: ffff8801f72cfea0 R08: 0000000000000000 R09: 0000000000000001
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107084] R10: ffff8801f775ece1 R11: 0000000000000293 R12: 0000000000000010
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107110] R13: ffff8801f775ece1 R14: ffff8801f775ee40 R15: ffff8801f775e3b0
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107137] FS: 00007f23b299f700(0000) GS:ffff8801fee20000(0000) knlGS:0000000000000000
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107166] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107190] CR2: 0000000000000000 CR3: 00000001f7a94000 CR4: 00000000001406e0
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107224] Stack:
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107235] ffff8801f775e300 0000000000000010 ffff8801f775ece1 ffff8801f775ee40
Oct 13 00:06:16 af-mdbdrs2 kernel: [ 66.107270] ffff880036927a40 ffff8801f72cfee8 fffff...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I think this bug may be due to the same commit as in bug 1495983. I built a test kernel for that bug. Can you test the Trusty test kernel and see if it resolves this bug as well? It can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1495983/patched-kernel/trusty/trusty-with-use-small-sg_tablesizeANDCommit7e5ec368/

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Actually that is a 32 bit test kernel and that bug is specific to 32 bit. So you can disregard by last request in comment #16.

Revision history for this message
Oskar Liljeblad (oskar) wrote :

On Tuesday, October 20, 2015 at 19:29, Joseph Salisbury wrote:
> Actually that is a 32 bit test kernel and that bug is specific to 32
> bit. So you can disregard by last request in comment #16.

Any update on this?

Thanks

Oskar

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you test test the latest mainline kernel, to see if the memory leak happens there as well? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily/

Revision history for this message
Oskar Liljeblad (oskar) wrote :

On Tuesday, December 08, 2015 at 18:53, Joseph Salisbury wrote:
> Would it be possible for you test test the latest mainline kernel, to
> see if the memory leak happens there as well? It can be downloaded
> from:
>
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily/

It is running now - I will let you know in the next few days!

Regards,

Oskar

Revision history for this message
Oskar Liljeblad (oskar) wrote :

On Wednesday, December 09, 2015 at 13:17, Oskar Liljeblad wrote:
> > Would it be possible for you test test the latest mainline kernel, to
> > see if the memory leak happens there as well? It can be downloaded
> > from:
> >
> > http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily/
>
> It is running now - I will let you know in the next few days!

No memory leaks in 4.4.0-040400rc4.201512061930 so far!
I think it is safe to say that this version does not have the leak issue.

Regards,

Oskar

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for testing! I'm going to review the commits in mainline to see if anything sticks out as the fix. Otherwise, we can perform a "Reverse" bisect to identify the exact commit that fixes this.

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Trusty):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Confirmed → In Progress
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Oskar Liljeblad (oskar) wrote :

On Friday, December 11, 2015 at 09:58, Oskar Liljeblad wrote:
> > > Would it be possible for you test test the latest mainline kernel, to
> > > see if the memory leak happens there as well? It can be downloaded
> > > from:
> > >
> > > http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily/
> >
> > It is running now - I will let you know in the next few days!
>
> No memory leaks in 4.4.0-040400rc4.201512061930 so far!
> I think it is safe to say that this version does not have the leak issue.

It seems the leaks are fixed in 3.13.0-79 as well.

Regards,

Oskar Liljeblad

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update, Oskar. Would you say this bug is now resolved? If so, I can close it.

Revision history for this message
Oskar Liljeblad (oskar) wrote :

On Tuesday, March 01, 2016 at 17:41, Joseph Salisbury wrote:
> Thanks for the update, Oskar. Would you say this bug is now resolved?
> If so, I can close it.

Yeah, so far so good, feel free to close it!

Regards,

Oskar

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.