vsftpd causes a vmalloc space leak in Lucid

Bug #720095 reported by Peter Matulis on 2011-02-16
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Lucid
Medium
Stefan Bader

Bug Description

SRU justification:

Impact: With activated network namespace (CONFIG_NET_NS) support it is possible to create new namespaces much faster than cleaning them up. This can lead to memory pressure and in the case of vsftp it is easily possible to bring down the server by just repeatedly connecting to it.

Fix: The issue was fixed by a long series of changes to make cleanup quicker. The vast amount of changes makes them unsuited for SRU. So it was decided that the safest way for a 2.6.32 based kernel is to turn that feature off (it was considered experimental until 2.6.37 anyway).

Testcase: Se report.

---

A simple stress test conducted on a KVM guest running standard updated Lucid with vsftpd demonstrates that memory is continuously used up until OOM Killer starts to protect the system (after ~12 min on my system). If test is terminated before that point is reached then memory is freed only after several hours. If vsftpd is stopped then memory is freed (after ~45 min on my system).

This does not occur with the 2.6.35 kernel (LTS backported kernel).

The test is started in this way:

$ for i in 1 2 3 4 5 6 7 8 ; do ./feedftp $i >/dev/null & done

What is observed during the test is that /proc/vmallocinfo grows continually with lines like the following being added:

0xffffe8ffff800000-0xffffe8ffffa00000 2097152 pcpu_get_vm_areas+0x0/0x790 vmalloc
0xffffe8ffffa00000-0xffffe8ffffc00000 2097152 pcpu_get_vm_areas+0x0/0x790 vmalloc
0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x790 vmalloc

Attached:

- test script (see feedftp)
- tarball containing the proc file at various times during the test (see vmallocinfo.32.tar)
- dmesg output showing OOM Killer at work (see dmesg-oom.32.txt)

May be related: https://bugs.launchpad.net/ubuntu/+source/vsftpd/+bug/682865

=========

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-28-server (not installed)
Regression: No
Reproducible: Yes
ProcVersionSignature: User Name 2.6.32-28.55-server 2.6.32.27+drm33.12
Uname: Linux 2.6.32-28-server x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
CurrentDmesg: [ 12.360149] eth0: no IPv6 routers present
Date: Wed Feb 16 14:00:19 2011
Lspci: Error: [Errno 2] No such file or directory
Lsusb: Error: [Errno 2] No such file or directory
MachineType: Bochs Bochs
PciMultimedia:

ProcCmdLine: root=UUID=7bbf58be-9c5f-4113-ad1b-4611fd131d33 ro
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux
dmi.bios.date: 01/01/2007
dmi.bios.vendor: Bochs
dmi.bios.version: Bochs
dmi.chassis.type: 1
dmi.chassis.vendor: Bochs
dmi.modalias: dmi:bvnBochs:bvrBochs:bd01/01/2007:svnBochs:pnBochs:pvr:cvnBochs:ct1:cvr:
dmi.product.name: Bochs
dmi.sys.vendor: Bochs

Peter Matulis (petermatulis) wrote :
description: updated
Peter Matulis (petermatulis) wrote :
Peter Matulis (petermatulis) wrote :
Peter Matulis (petermatulis) wrote :
description: updated
description: updated
description: updated
tags: added: kernel-key
description: updated
Andy Whitcroft (apw) wrote :

From the report it seems that this was broken in v2.6.32 and maybe fixed in v2.6.35. The commit below sounds plausable and might be worth looking at:

  commit 02b709df817c0db174f249cc59e5f7fd01b64d92
  Author: Nick Piggin <email address hidden>
  Date: Mon Feb 1 22:25:57 2010 +1100

    mm: purge fragmented percpu vmap blocks

    Improve handling of fragmented per-CPU vmaps. We previously don't free
    up per-CPU maps until all its addresses have been used and freed. So
    fragmented blocks could fill up vmalloc space even if they actually had
    no active vmap regions within them.

    Add some logic to allow all CPUs to have these blocks purged in the case
    of failure to allocate a new vm area, and also put some logic to trim
    such blocks of a current CPU if we hit them in the allocation path (so
    as to avoid a large build up of them).

    Christoph reported some vmap allocation failures when using the per CPU
    vmap APIs in XFS, which cannot be reproduced after this patch and the
    previous bug fix.

    Cc: <email address hidden>
    Cc: <email address hidden>
    Tested-by: Christoph Hellwig <email address hidden>
    Signed-off-by: Nick Piggin <email address hidden>
    --
    Signed-off-by: Linus Torvalds <email address hidden>

summary: - vsftpd causes memory leak in Lucid
+ vsftpd causes a vmalloc space leak in Lucid
Stefan Bader (smb) wrote :

Seems we got this already in Lucid.

Stefan Bader (smb) wrote :

It may be interesting if you had a spare disk and could do a bare metal installation of Lucid to repeat that test there.

Stefan Bader (smb) on 2011-02-16
Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → Medium
status: New → Confirmed

Hi, we have this issue on bare metal installation as you say. On a Dell M605 (AMD processor) and on an IBM x3650 (Intel).

As well, it might help you to know that we tested it on SUSE with kernel 2.6.32-24, and the bug is not there.

Jef Goupil (jef-goupil-ec) wrote :

Hi, I am working with Walter Richards. During our testing, we realized that it takes several hours to retrieve the memory after we stopped the ftp connections.
On our 32GB Dell server, it took 5 hours to fill all the memory (until OOM kills at 2.1GB of free memory). Then, it took 8 hours before it started to retrieve the free memory, and then it took another 7 hours to completely get the 30GB back...

description: updated
Stefan Bader (smb) wrote :

Ok, so this is not really something conclusive yet, but it seems to me (when playing with that locally) that for that memory allocations to grow, there is no actual file transfer needed. Just looping through doing a connect and immediately disconnect again showed those 2M chunks growing. So I guess I can concentrate on that area further on.

Stefan Bader (smb) wrote :

So this is what I found out so far: whenever a client connects to vsftpd, it forks of a process to handle the connection. This is done in a way that also duplicates the network namespace (beside of the process namespace). This can actually be observed by the fact that every time that happens there is a message about "lo: Disabled Privacy Extensions" (which is slightly stupid to note as that is the default for lo). Anyway, so cloning the network namespace also sets up the snmp mib structures and those are allocated by pcpu.

The main problem seems to be that cleaning up those structures is done in Lucid for each interface on its own by putting it onto a work queue. This seems to be rather slow, so while the test case is running its seems the system is too busy with creating new namespaces than it is able to clean them up. Even after stopping the test this takes a while and because the way that vmalloc pcpu areas are handled can potentially stick even longer (the areas are not exclusively used by network namespace, something else may use parts of the area and the area cannot be cleaned up until the last user is gone).

Between 2.6.32 and 2.6.35, there was a series of changes that allowed to batch the cleanup of network namespace. The comment on one of those indicated that 4000 namespaces would have taken more than 7 minutes before and could be reduced to 44 seconds (at the price of an increased cpu load). I was able to backport all required changes and this seems to avoid the build up of the vmalloc area (I am not sure about that but it felt like the speed of connects and disconnects was lower). I am still reluctant to go that road because the required changes were somewhat big and the more gets changed, the higher chance to pick up some regression. Also the question is whether the test case models a realistic usage. To clarify, this is not a real leakage. It is a combination of specially allocated memory and slowness / complicated policy to free that allocations.

Jef Goupil (jef-goupil-ec) wrote :

Hi,
our production servers have more than 250000 vsftpd connections for day:
# zgrep CONNECT vsftpd.log.3.gz|wc -l
 260210
which is about 3 connections per second. Each connection may have up to 200 file transferts. The testcase produces about 15 connections per seconds. It is 5 times more than our reality, but we can see the problem with only 3 connections per seconds.

Btw, for the "Disabled Privacy Extensions" messages, this can be avoided by disabling the ipv6 (ipv6.disable=1 on the grub command line).
Thanks

Stefan Bader (smb) wrote :

The privacy extension message was more of a side note. It has been removed in current code (probably because of the limited use).

So the test case is making sense. And I was also beginning to think whether this could be seen as a security issue. As someone could bring down the server by doing many connects. It probably takes a while. Still...

Unfortunately I am not really happy with backports I mentioned before. They really seem to slow down the number of connects. So it could only be due to that, that the vmalloc space is not filled up quicker that it gets released. I think I have to do a few more experiments. Not sure it can be done but I would rather want to avoid making too large changes.

Stefan Bader (smb) wrote :

Unfortunately this got a bit hit by other issues I was looking at. So I did not really see any small change to improve things. I guess I need to reach out for help from upstream. Meanwhile there is this series of patches I backported from 2.6.35 that allow network namespaces to be cleaned up in batches. From my feeling this seems to slow down the rate I see the test tasks connecting, but this seems to be the same on 2.6.35. Maybe that is something one can live with.
In order to give other people a chance to look at that I have prepared kernel packages and put them to:
http://people.canonical.com/~smb/lucid-netnsbp
Meanwhile I try to get some help...

Jef Goupil (jef-goupil-ec) wrote :

Hi,
I installed the following files from your site and rebooted:

/root/linux-headers-2.6.32-31-server_2.6.32-31.60+netnsbp1_amd64.deb
/root/linux-headers-2.6.32-31_2.6.32-31.60+netnsbp1_all.deb
/root/linux-image-2.6.32-31-server_2.6.32-31.60+netnsbp1_amd64.deb
/root/linux-libc-dev_2.6.32-31.60+netnsbp1_amd64.deb
/root/linux-tools-2.6.32-31_2.6.32-31.60+netnsbp1_amd64.deb
/root/linux-tools-common_2.6.32-31.60+netnsbp1_all.deb

Then, my ftp tests worked properly without using any free memory... The file /proc/vmallocinfo stayed at around 5 lines (2 new lines only) (grep pcpu /proc/vmallocinfo). Before, it was constantly growing.

Seems it is fixed in this kernel level.

Thanks a lot!

Jef Goupil (jef-goupil-ec) wrote :

Hi,
Sorry, I forgot to add a comment about the process netns: it was using a lot of CPU during my tests...:

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
     37 root 20 0 0 0 0 D 15 0.0 0:51.89 netns
 7218 root 20 0 33464 1328 888 D 1 0.0 0:00.03 vsftpd
 7220 root 20 0 33464 1328 888 D 1 0.0 0:00.03 vsftpd
 7221 root 20 0 33464 1328 888 D 1 0.0 0:00.03 vsftpd

I think this is what you were affraid of... slowing down. I am not sure what would be the effect on my production servers.

Thanks!

Stefan Bader (smb) wrote :

Yes, basically cleanup is rather done in batches, which takes more cpu but could also affect lock contention. That and the fact that it requires backporting several patches which may cause effects we don't know of, causes me to be a bit reluctant about the changes.

Stefan Bader (smb) wrote :

After trying various approaches of backports which all seemed not really satisfying, it was decided that the safest way to go is to just turn off support for network namespaces. While this can have some impact on use-cases which try to containerize network, the feature was too immature to be turned on the first place. To use network namespaces in Lucid people should use the lts backport kernel.

Changed in linux (Ubuntu Lucid):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → Medium
status: New → Fix Committed
Stefan Bader (smb) wrote :

The problem only occurs on Lucid with network namespaces turned on. So not valid for Maverick and later.

Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
status: Confirmed → Invalid
Stefan Bader (smb) on 2011-03-30
description: updated

I think I've been experiencing this bug on a production vmware guest server running Lucid with vsftp being connected to frequently by client machines.

The thing is, this bug shows as being "fix committed" - and the implication I get from comment #18 is that the current production (ie: not backport) kernel has netns disabled. I'm all up to date, with kernel 2.6.32-30-server, and still seeing elevated netns cpu usage, and a general slowdown of other activity which I believe is related.

Is there something we need to do specifically to ensure netns is disabled?

Excerpts from Rachel Greenham's message of Sat Apr 16 11:25:10 UTC 2011:
> I think I've been experiencing this bug on a production vmware guest
> server running Lucid with vsftp being connected to frequently by client
> machines.
>
> The thing is, this bug shows as being "fix committed" - and the
> implication I get from comment #18 is that the current production (ie:
> not backport) kernel has netns disabled. I'm all up to date, with
> kernel 2.6.32-30-server, and still seeing elevated netns cpu usage, and
> a general slowdown of other activity which I believe is related.
>
> Is there something we need to do specifically to ensure netns is
> disabled?

Rachel, Fix Committed means that the developers have it in their tree
but it hasn't been released yet. Presumably this means that netns will
be disabled in the next lucid kernel update.

Quite right, I should have noticed that. :-) I may just need to be patient then, although this being a production machine experiencing real problems since go-live with users connecting in volume may preclude patience. I had been considering embedding a java FTP server into our application instead, although now I've thought of that other advantages of doing so come to mind. :-) Late reply because commenting on this bug didn't subscribe me to it like I expected. Will subscribe now.

Accepted linux into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Steve Conklin (sconklin) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-lucid

Applied successfully to two test instances; can confirm absence of netns process, and nothing seems to be broken. :-) My problem is only exhibiting on a production server under load though, and while I can see elevated netns cpu usage most of the time it only becomes a problem intermittently, so it may take a little longer to install this there and see if it really helps.

tags: added: verification-done
removed: verification-needed-lucid
Launchpad Janitor (janitor) wrote :
Download full text (12.8 KiB)

This bug was fixed in the package linux - 2.6.32-32.62

---------------
linux (2.6.32-32.62) lucid-proposed; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #767370

  [ Stefan Bader ]

  * (config) Disable CONFIG_NET_NS
    - LP: #720095

  [ Upstream Kernel Changes ]

  * Revert "drm/radeon/kms: Fix retrying ttm_bo_init() after it failed
    once."
    - LP: #736234
  * Revert "drm/radeon: fall back to GTT if bo creation/validation in VRAM
    fails."
    - LP: #736234
  * x86: pvclock: Move scale_delta into common header
  * KVM: x86: Fix a possible backwards warp of kvmclock
  * KVM: x86: Fix kvmclock bug
  * cpuset: add a missing unlock in cpuset_write_resmask()
    - LP: #736234
  * keyboard: integer underflow bug
    - LP: #736234
  * RxRPC: Fix v1 keys
    - LP: #736234
  * ixgbe: fix for 82599 erratum on Header Splitting
    - LP: #736234
  * mm: fix possible cause of a page_mapped BUG
    - LP: #736234
  * powerpc/kdump: CPUs assume the context of the oopsing CPU
    - LP: #736234
  * powerpc/kdump: Use chip->shutdown to disable IRQs
    - LP: #736234
  * powerpc: Use more accurate limit for first segment memory allocations
    - LP: #736234
  * powerpc/pseries: Add hcall to read 4 ptes at a time in real mode
    - LP: #736234
  * powerpc/kexec: Speedup kexec hash PTE tear down
    - LP: #736234
  * powerpc/crashdump: Do not fail on NULL pointer dereferencing
    - LP: #736234
  * powerpc/kexec: Fix orphaned offline CPUs across kexec
    - LP: #736234
  * netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values
    - LP: #736234
  * nfsd: wrong index used in inner loop
    - LP: #736234
  * r8169: use RxFIFO overflow workaround for 8168c chipset.
    - LP: #736234
  * Staging: comedi: jr3_pci: Don't ioremap too much space. Check result.
    - LP: #736234
  * net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules,
    CVE-2011-1019
    - LP: #736234
    - CVE-2011-1019
  * ip6ip6: autoload ip6 tunnel
    - LP: #736234
  * Linux 2.6.32.33
    - LP: #736234
  * drm/radeon: fall back to GTT if bo creation/validation in VRAM fails.
    - LP: #652934, #736234
  * drm/radeon/kms: Fix retrying ttm_bo_init() after it failed once.
    - LP: #652934, #736234
  * drm: fix unsigned vs signed comparison issue in modeset ctl ioctl,
    CVE-2011-1013
    - LP: #736234
    - CVE-2011-1013
  * Linux 2.6.32.33+drm33.15
    - LP: #736234
  * econet: Fix crash in aun_incoming(). CVE-2010-4342
    - LP: #736394
    - CVE-2010-4342
  * igb: only use vlan_gro_receive if vlans are registered, CVE-2010-4263
    - LP: #737024
    - CVE-2010-4263
  * irda: prevent integer underflow in IRLMP_ENUMDEVICES, CVE-2010-4529
    - LP: #737823
    - CVE-2010-4529
  * hwmon/f71882fg: Set platform drvdata to NULL later
    - LP: #742056
  * mtd: add "platform:" prefix for platform modalias
    - LP: #742056
  * libata: no special completion processing for EH commands
    - LP: #742056
  * MIPS: MTX-1: Make au1000_eth probe all PHY addresses
    - LP: #742056
  * x86/mm: Handle mm_fault_error() in kernel space
    - LP: #742056
  * ftrace: Fix memory leak with function graph and cpu hotplug
    - LP: #742056
  * x86: Fix panic when ...

Changed in linux (Ubuntu Lucid):
status: Fix Committed → Fix Released
Serge Hallyn (serge-hallyn) wrote :

lxc is now not usable on lucid.

Serge Hallyn (serge-hallyn) wrote :

Actually, this isn't making sense to me. CLONE_NEWNET requires privilege, so this isn't something a random user can exploit. So what is the value in turning netns support off in the kernel as opposed to just stopping vsftpd from using it? (Attached debdiff not tested, but should suffice. I'll test if it will be considered IN PLACE of turning off CONFIG_NET_NS).

Stefan Metzmacher (metze) wrote :

Fixing vsftpd looks like a much better fix for this!

On 01/06/11 17:08, Stefan Metzmacher wrote:
> Fixing vsftpd looks like a much better fix for this

It would seem at first sight to be simpler; but presumably the problem
was that there are bugs in the implementation in the Lucid kernel (and
upstream) that won't necessarily *only* impact vsftpd users, although we
were the ones who first reported it. Certainly from my practical point
of view I'd have been happy with a simple vsftpd update to remove the
problem. :-)

The bug being in the kernel, and backporting the fix to it being deemed
too complicated (see nearer the top of this bug report thread) the
decision was therefore to disable the feature.

To those that depend on the feature, ie: lxc users (aside: i hadn't
heard of that! after googling i may want to use it now!), given the
feature is buggy in the lucid - and upstream - kernel *anyway*, maybe
the appropriate action is to use the maverick backport kernel?

--
Rachel

On 02/06/11 14:32, Rachel Greenham wrote:
> On 01/06/11 17:08, Stefan Metzmacher wrote:
>> Fixing vsftpd looks like a much better fix for this

Also presumably disabling it in vsftpd will hurt people who want to use
that in an lxc setting without providing an easily-applied solution.

--
Rachel

Stefan Metzmacher (metze) wrote :

As far as I understand the problem, the problem comes with creating a new network namespace with every clone() syscall.

In a lxc setup only the startup process creates a new network namespace, just once.

I can't see why vsftpd (without CLONE_NEWNET) won't run within an already established lxc session.

Alex Bligh (ubuntu-alex-org) wrote :

The released resolution broke a production environment here: See #844185

I propose this is instead fixed by disabling it in vsftpd.

Stefan Bader (smb) wrote :

On 07.09.2011 12:16, Alex Bligh wrote:
> The released resolution broke a production environment here: See #844185
>
> I propose this is instead fixed by disabling it in vsftpd.
>
The problem is that nobody can say that vsftp was or is the only vector that
allows to DOS a system doing something that involves network namespaces.

If netns is essential. It is probably a better solution to move to the
LTS-backports kernel which is newer and does not have those memory cleanup issues.

Alex Bligh (ubuntu-alex-org) wrote :

That is sadly not an option. LTS-backport kernel has a spectacular and easy to repeat Oops when namespaces are used. See #843892.

It is not guaranteed by the kernel (it certainly wasn't in 2.6.32) that namespaces would be created and deleted instantly and without undue system pressure. It seems to me that the bug is in applications which think this can be done. As an example, in 2.6.32 creating 1000 interfaces and deleting them takes a huge time on delete due to RCU sync issues. If a userspace program did this, prompted by an external user, our response would not be to disable creation and deletion of interfaces. Rather we'd fix the userspace program not to do it.

Alex Bligh (ubuntu-alex-org) wrote :

Note also that if vsftp continues to use clone(NEW_NETNS) (i.e. network namespaces) it is likely to suffer from #843892 anyway, so not using network namespaces will give you a stability increase. (NB - I have not tested vsftp against the bug in #843892 but as you can see from the text, it is hardly difficult to hit).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Patches