iovec array memory leak

Bug #588293 reported by Dick Tump on 2010-06-01
40
This bug affects 6 people
Affects Status Importance Assigned to Milestone
qemu-kvm (Ubuntu)
Medium
Serge Hallyn
Lucid
Medium
Serge Hallyn
Maverick
Medium
Serge Hallyn

Bug Description

Binary package hint: qemu-kvm

Ubuntu release: 10.04 LTS
Package version: 0.12.3+noroms-0ubuntu9

There seems to be a huge memory leak in qemu/kvm 0.12.3. After a while, a virtual machine is using much more memory on the host system, than actually is in use on the guest system. A guest system with only 512 MB assigned, can even use multiple GB's of memory.

This bug is also discussed here:
http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2989366&group_id=180599

Would it be possible to fix this bug in Ubuntu? It makes KVM a bit useless currently, because running 2 GB of VM's can even cost 16 GB of memory.

============
Impact: A memory leak causes long-running (measured in only a few days)
kvm sessions to invoke the OOM killer. This makes many kvm or cloud
applications impossible.

The issue is addressed with upstream patch with commit-id
7eb58a6c556c3880e6712cbf6d24d681261c5095. A back-port of the
patch is attached, and can be seen in
https://code.edge.launchpad.net/~serge-hallyn/ubuntu/lucid/qemu-kvm/memleak-fix2/

To reproduce: simply start a kvm instance, ensure that aio is being
used, and let it run.

Regression potential: the patch simply ensured that already-allocated
memory is freed in an error path which otherwise lets that memory leak.
============

Related branches

Mathias Gug (mathiaz) wrote :

Could you outline what type of guest are causing the leak? What are the guest doing?

Changed in qemu-kvm (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Dick Tump (dicktump) wrote :

The guests are running Debian Lenny with the default 2.6.26 kernel or Ubuntu 10.04 LTS with the default 2.6.32 kernel.

They are doing different things, but mostly webserving tasks.

The strange thing is, one of the virtual servers is running an Exim/Spamassassin based spam checking gateway. And the memory usage of that specific virtual server grows a lot faster than the others. Data is partly saved on a ramdisk on this virtual machine. So maybe it's related to how much I/O there is on this ramdisk.

David Weber (wb-munzinger) wrote :

Same problem here. Also the servers which write much a more affected.

As far as I see this commit should fix it.
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=7eb58a6c556c3880e6712cbf6d24d681261c5095

Ubuntu Server Team, can you please test and backport this?

Dustin Kirkland  (kirkland) wrote :

Hi Serge,

Assigning this to you. Should be really straightforward to fix. Ping me on IRC, and I'll walk you through it.

Thanks!

Changed in qemu-kvm (Ubuntu):
status: Incomplete → Triaged
assignee: nobody → Serge Hallyn (serge-hallyn)
Dustin Kirkland  (kirkland) wrote :

Bug #591610 is possibly a duplicate.

description: updated
Dustin Kirkland  (kirkland) wrote :

Marking Maverick task fix-released, as this should be in the qemu-kvm 0.12.4 upload that was just pushed to Maverick.

Changed in qemu-kvm (Ubuntu Lucid):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Serge Hallyn (serge-hallyn)
milestone: none → lucid-updates
Changed in qemu-kvm (Ubuntu Maverick):
status: Triaged → Fix Released
Dustin Kirkland  (kirkland) wrote :

I have also sponsored Serge's change:
qemu-kvm_0.12.3+noroms-0ubuntu9.1_source.changes

This should be available in lucid-proposed eventually. Please test there and let us know if this fixes your problem, and then we can get it promoted to lucid-updates.

Thanks.

Changed in qemu-kvm (Ubuntu Lucid):
status: Triaged → Fix Committed
Chris Jones (cmsj) wrote :

Dustin: has that package been uploaded? I don't see it yet

David Weber (wb-munzinger) wrote :

I'm not familiar with Ubuntu Package Managment but it seems there are some problems to upload the package to lucid

https://code.launchpad.net/~serge-hallyn/ubuntu/lucid/qemu-kvm/memleak-fix2/+merge/27644

Hope this gets fixed soon, I have to restart some virtual machines every few days to keep the host running.

Dustin Kirkland  (kirkland) wrote :

It's still in the queue at:

https://edge.launchpad.net/ubuntu/lucid/+queue?queue_state=1&queue_text=

The bzr branch isn't working, but that's no matter.

We just need to get an archive admin sru team member to accept the package for proposed.

Dustin

John Dong (jdong) wrote :

ACK from sru team

Accepted into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed

Thanks for the patch :)

I've installed the patched version on two production nodes here. I will have the results in a few days probably.

Chris Jones (cmsj) wrote :

I filed bug #591610 which may well be a dupe of this bug. I ran a test overnight (as described in that bug) and I still see the same significant leaking as before

Dick Tump (dicktump) wrote :

One of the virtual machines with 2 GB RAM, is already using 3.5 GB RAM here, so the leaking still exists. I hoped that this was the fix...

Has anyone already tested qemu 0.12.4? If this didn't fix it, the bug might still exist in the new qemu version.

Chris Jones (cmsj) wrote :

Dick: I rebuilt 0.12.4 from debian and then maverick to test for my bug (mentioned in comment #14) and it completely fixed the leaking, I was able to leave my IO tests running all day without any sign of leakage.

David Weber (wb-munzinger) wrote :

I now found the right commit

http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=012d4869c1eb195e83f159ed7b2bced33f37f960

A virtual Machine running with this for about an hour still uses not more memory than when it was started.

Attached a patched version of kvm for amd64

David Weber (wb-munzinger) wrote :
Serge Hallyn (serge-hallyn) wrote :

Chris Jones suggested
   spew -g --raw --statistics -t -v 5G foo
in a guest to reproduce the memory leak, and this worked for me. So now I've
 1. confirmed the leak in the stock lucid package
 2. built my own package with the commit David suggested applied, and confirmed no memory leak there

I've not yet re-installed the package currently in lucid-proposed to confirm that I still
see the memory leak there. Assuming I do, then I will push my changes to my bzr
tree and re-submit for sponsorship. If I do not, then I"ll request further confirmation
that the bug still exists in lucid-proposed.

David, thanks for pointing to the precise commit id!

Serge Hallyn (serge-hallyn) wrote :

I cannot reproduce this bug with the qemu-kvm package from lucid-proposed.

Can anyone confirm that the bug still exists?

David, can you past your /etc/apt/sources.list and output of 'dpkg -l grep qemu', 'uname -a',
and tell me what you're running in the guests (both distro/version and workloads)?

Chris Jones (cmsj) wrote :

Serge: I ran a test with the (first?) -proposed fix and I was still able to see leaking, assuming I didn't get something wrong with downgrading from 0.12.4 to the -proposed packages.

Serge Hallyn (serge-hallyn) wrote :

Thanks, Chris. The I'll propose the branch with the new patch identified by David
for merging. FWIW it is at

https://code.launchpad.net/~serge-hallyn/ubuntu/lucid/qemu-kvm/memleak-fix

Serge Hallyn (serge-hallyn) wrote :

David and Chris, could you please try to reproduce with the packages in
ppa:serge-hallyn/virt, which was created with the exact tree which I would
like to propose for merge into lucid-proposed? You can try it by doing
(as root):

cat >> /etc/apt/sources.list.d/serge-hallyn-virt.list << EOF
deb http://ppa.launchpad.net/serge-hallyn/virt/ubuntu lucid main
EOF

apt-get update
apt-get upgrade

I then ran
 spew -g --raw --statistics -t -v 5G foo
in two guests at the same time, while doing
 while [ 1 ]; do
  for p in `pidof kvm`; do
   grep RSS /proc/$p/status
  done
  sleep 10s
  echo
 done
on the host. Neither guest exceeded its assigned memory size with this
package on my system.

Please let me know if it fixes the memory leak for you.

David Weber (wb-munzinger) wrote :

Serge: The package from your ppa fixes the problem for me!

Serge Hallyn (serge-hallyn) wrote :

Thanks, David! I resubmitted the source branch for merge into lucid
proposed. Will update if/when it gets merged.

Chris Jones (cmsj) wrote :

Serge: I've re-run my testing and I indeed see no leakage with your packages dated the 23rd :)

summary: - Memory leak
+ Memory leak in qemu
summary: - Memory leak in qemu
+ iovec array memory leak

Accepted qemu-kvm into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Chris Jones (cmsj) wrote :

Colin: I've run the same tests I ran earlier, against the packages in lucid-proposed and this bug is fixed for my test case (4 VMs doing all the IO they can eat), thanks!

Nikolai Bochev (n-bochev) wrote :

Any time frame on this hitting the updates ? I ain't eager on putting proposed on my live servers for *any* reason. I did some testing on a test server and it seems the leak is fixed with the package from proposed.

Martin Pitt (pitti) on 2010-07-05
tags: added: verification-done
removed: verification-needed
David Weber (wb-munzinger) wrote :

Proposed Package works now on one of our productive Servers. The bug is fixed and so far no other problems appeared! So please pull it into lucid-updates.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu-kvm - 0.12.3+noroms-0ubuntu9.2

---------------
qemu-kvm (0.12.3+noroms-0ubuntu9.2) lucid-proposed; urgency=low

  * Previous patch did not fix memleak for everyone. Appending another
    separate memleak fix patch. This package (tried out in
    ppa:serge-hallyn/virt/ubuntu) was confirmed by David Weber to fix
    his memory leak. (LP: #588293)
 -- Serge Hallyn <email address hidden> Mon, 21 Jun 2010 11:55:23 -0500

Changed in qemu-kvm (Ubuntu Lucid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers