kernel does not support limiting swap usage (memory.memsw.limit_in_bytes missing)

Bug #1348688 reported by Brian Candler on 2014-07-25
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned

Bug Description

(Sorry I'm not sure exactly what package to report this against - kernel perhaps? libvirt is what I was using to replicate the problem)

Host platform: ubuntu 14.04 amd64, Mac Mini, 16GB RAM.

Short version: create an LXC domain with memtune > swap_hard_limit set in the XML:

<domain type='lxc'>
  <name>gold-lxc-20140717</name>
  <uuid>b2a02d49-bb1e-4aec-81d1-58910892780e</uuid>
  <memory unit='KiB'>327680</memory>
  <currentMemory unit='KiB'>327680</currentMemory>
  <memtune>
    <swap_hard_limit unit='KiB'>131072</swap_hard_limit>
  </memtune>
  ...

(full version at end of this report)

Now try to start it:

$ virsh -c lxc: start gold-lxc-20140717error: Failed to start domain gold-lxc-20140717
error: internal error: guest failed to start: Unable to write to '/sys/fs/cgroup/memory/machine/gold-lxc-20140717.libvirt-lxc/memory.memsw.limit_in_bytes': No such file or directory

The reason this matters is because otherwise the LXC memory limit applies only to real RAM used. If the guest exceeds this it can still use as much swap space as it likes, and is therefore effectively unlimited (and can happily DoS the swap disk).

Long version:

I created an ubuntu 14.04 i386 VM image using python-vmbuilder, loopback-mounted it with qemu-nbd, and rsync'd it to create a root filesystem for an LXC guest.

Then defined a guest using libvirt XML and started it using "virsh -c lxc: start <domain>" (as per XML at end but without the <memtune> section). It starts successfully, networking is fine, I can get a console etc.

Now, the libvirt XML description says the guest's memory limit is 320MB:

  <memory unit='KiB'>327680</memory>
  <currentMemory unit='KiB'>327680</currentMemory>

and indeed the cgroups setting has been set:

$ cat /sys/fs/cgroup/memory/machine/gold-lxc-20140717.libvirt-lxc/memory.limit_in_bytes
335544320

However inside the guest I can happily allocate as much memory as I like, up to just under 4GB, which is the limit for a 32-bit guest.

Here's the test program I ran in the guest (usemem.c):

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int main(void)
{
    char *p;
    int i,j;
    int ok=0, fail=0;
    for (i=0; i<4096; i++) {
        p = malloc(1024*1024);
        if (p) {
            ok++;
            for (j=0; j<1024*1024; j++)
                p[j] = rand();
        }
        else
            fail++;
    }
    fprintf(stderr, "Done: %d ok, %d fail\n", ok, fail);
    sleep(600);
    return fail ? 1 : 0;
}

Result from running:

Done: 4076 ok, 20 fail

View from the host:

nsrc@kit1:~/workshop-kit$ ps auxwww | grep usemem | grep -v grep
nsrc 10506 96.1 1.3 4192152 224776 ? S+ 14:41 0:55 ./usemem

$ cat /sys/fs/cgroup/memory/machine/gold-lxc-20140717.libvirt-lxc/memory.limit_in_bytes
335544320
$ cat /sys/fs/cgroup/memory/machine/gold-lxc-20140717.libvirt-lxc/memory.max_usage_in_bytes
335544320
$ cat /sys/fs/cgroup/memory/machine/gold-lxc-20140717.libvirt-lxc/memory.usage_in_bytes
292331520

You can see there's definitely 4GB in use by this process, and yet the cgroup thinks less than 280MB is in use, which is below the 320MB limit.

However if you look at swap usage in the host while the memory suck program is running:

$ free
             total used free shared buffers cached
Mem: 16338300 3066952 13271348 1684 135512 1489268
-/+ buffers/cache: 1442172 14896128
Swap: 16678908 3971248 12707660

and after it has terminated:

$ free
             total used free shared buffers cached
Mem: 16338300 2774440 13563860 1684 135544 1489188
-/+ buffers/cache: 1149708 15188592
Swap: 16678908 5484 16673424

i.e. the LXC guest used nearly 4GB of swap, and then gave it up when it terminated.

Additional info:

cgroup view from inside the guest:

$ cat /proc/self/cgroup
11:name=systemd:/
10:hugetlb:/
9:perf_event:/machine/gold-lxc-20140717.libvirt-lxc
8:blkio:/machine/gold-lxc-20140717.libvirt-lxc
7:freezer:/machine/gold-lxc-20140717.libvirt-lxc
6:devices:/machine/gold-lxc-20140717.libvirt-lxc
5:memory:/machine/gold-lxc-20140717.libvirt-lxc
4:cpuacct:/machine/gold-lxc-20140717.libvirt-lxc
3:cpu:/machine/gold-lxc-20140717.libvirt-lxc
2:cpuset:/machine/gold-lxc-20140717.libvirt-lxc

cgroup settings visible in the host:

$ ls /sys/fs/cgroup/memory/machine/gold-lxc-20140717.libvirt-lxc/
cgroup.clone_children memory.limit_in_bytes
cgroup.event_control memory.max_usage_in_bytes
cgroup.procs memory.move_charge_at_immigrate
memory.failcnt memory.numa_stat
memory.force_empty memory.oom_control
memory.kmem.failcnt memory.pressure_level
memory.kmem.limit_in_bytes memory.soft_limit_in_bytes
memory.kmem.max_usage_in_bytes memory.stat
memory.kmem.slabinfo memory.swappiness
memory.kmem.tcp.failcnt memory.usage_in_bytes
memory.kmem.tcp.limit_in_bytes memory.use_hierarchy
memory.kmem.tcp.max_usage_in_bytes notify_on_release
memory.kmem.tcp.usage_in_bytes tasks
memory.kmem.usage_in_bytes

And here's the full XML as promised:

<domain type='lxc'>
  <name>gold-lxc-20140717</name>
  <uuid>b2a02d49-bb1e-4aec-81d1-58910892780e</uuid>
  <memory unit='KiB'>327680</memory>
  <currentMemory unit='KiB'>327680</currentMemory>
  <memtune>
    <swap_hard_limit unit='KiB'>131072</swap_hard_limit>
  </memtune>
  <vcpu placement='static'>1</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64'>exe</type>
    <init>/sbin/init</init>
  </os>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/lib/libvirt/libvirt_lxc</emulator>
    <filesystem type='mount' accessmode='passthrough'>
      <source dir='/data1/lxc/gold-20140717/rootfs'/>
      <target dir='/'/>
    </filesystem>
    <interface type='bridge'>
      <mac address='52:54:5d:00:0a:88'/>
      <source bridge='br-lan'/>
    </interface>
    <console type='pty'>
      <target type='lxc' port='0'/>
    </console>
  </devices>
  <seclabel type='none'/>
</domain>

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: libvirt-bin 1.2.2-0ubuntu13.1.1
ProcVersionSignature: Ubuntu 3.13.0-32.57-generic 3.13.11.4
Uname: Linux 3.13.0-32-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.2
Architecture: amd64
Date: Fri Jul 25 14:29:54 2014
InstallationDate: Installed on 2014-07-16 (8 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
SourcePackage: libvirt
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.libvirt.qemu.conf: [inaccessible: [Errno 13] Permission denied: '/etc/libvirt/qemu.conf']

Brian Candler (b-candler) wrote :
Serge Hallyn (serge-hallyn) wrote :

Indeed this is simply not built into the kernels (at least up to and including Utopic)

affects: libvirt (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
summary: - LXC does not limit swap usage (memory.memsw.limit_in_bytes missing)
+ kernel does not support limiting swap usage (memory.memsw.limit_in_bytes
+ missing)
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.19 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.19-rc4-vivid/

tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Medium → High
Brian Candler (b-candler) wrote :

After updating to that kernel (under 14.04), the problem is not fixed.

$ uname -a
Linux kit1 3.19.0-031900rc4-generic #201501112135 SMP Sun Jan 11 21:36:48 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

* The LXC instance still does not start if the <memtune><swap_hard_limit unit='KiB'>131072</swap_hard_limit></memtune> setting is included.

$ virsh -c lxc: start gold-lxc-20140717
error: Failed to start domain gold-lxc-20140717
error: internal error: guest failed to start: Unable to write to '/sys/fs/cgroup/memory/machine/gold-lxc-20140717.libvirt-lxc/memory.memsw.limit_in_bytes': No such file or directory

$ ls /sys/fs/cgroup/memory/machine/gold-lxc-20140717.libvirt-lxc/
cgroup.clone_children memory.kmem.tcp.failcnt memory.oom_control
cgroup.event_control memory.kmem.tcp.limit_in_bytes memory.pressure_level
cgroup.procs memory.kmem.tcp.max_usage_in_bytes memory.soft_limit_in_bytes
memory.failcnt memory.kmem.tcp.usage_in_bytes memory.stat
memory.force_empty memory.kmem.usage_in_bytes memory.swappiness
memory.kmem.failcnt memory.limit_in_bytes memory.usage_in_bytes
memory.kmem.limit_in_bytes memory.max_usage_in_bytes memory.use_hierarchy
memory.kmem.max_usage_in_bytes memory.move_charge_at_immigrate notify_on_release
memory.kmem.slabinfo memory.numa_stat tasks

(Note it does not include memory.memsw.limit_in_bytes)

* After reverting to kernel 3.13.0-43 and doing the same lxc start which fails in the same way:

$ ls /sys/fs/cgroup/memory/machine/gold-lxc-20140717.libvirt-lxc/
cgroup.clone_children memory.kmem.tcp.failcnt memory.oom_control
cgroup.event_control memory.kmem.tcp.limit_in_bytes memory.pressure_level
cgroup.procs memory.kmem.tcp.max_usage_in_bytes memory.soft_limit_in_bytes
memory.failcnt memory.kmem.tcp.usage_in_bytes memory.stat
memory.force_empty memory.kmem.usage_in_bytes memory.swappiness
memory.kmem.failcnt memory.limit_in_bytes memory.usage_in_bytes
memory.kmem.limit_in_bytes memory.max_usage_in_bytes memory.use_hierarchy
memory.kmem.max_usage_in_bytes memory.move_charge_at_immigrate notify_on_release
memory.kmem.slabinfo memory.numa_stat tasks

Looks identical to me.

tags: added: kernel-bug-exists-upstream
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Brian Candler (b-candler) wrote :

First you need to convince me that this is a kernel bug.

How come libvirt's LXC driver is trying to use a /sys API that doesn't even exist in the very latest mainline kernel?

I would guess that either libvirt assumes the kernel is built with some option that the Ubuntu kernel hasn't been built with; or it assumes some patch has been applied; or it's trying to use some old API which no longer exists.

None of those cases would count as a kernel bug.

OTOH, I don't see anything at
http://libvirt.org/sources/virshcmdref/html-single/#sect-memtune
which implies any special options are required.

There are requirements for namespaces to be compiled in at
https://libvirt.org/drvlxc.html

I had enconntered this same issue. I am not quite sure if it's CONFIG_* option not enabled or else however I was able to remedy the issue by adding 'swapaccount=1' to the kernel line. That in turn enabled memory.memsw.* options in cgroups. Hope this helps you.

Sincerely yours,
LH

petre (petrem.) wrote :

The standard kernel config for 16.04 has:

...
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
# CONFIG_MEMCG_SWAP_ENABLED is not set
...

I've not re-compiled with CONFIG_MEMCG_SWAP_ENABLED set to see if it fixes the issue.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers