Wrong memory.stat values in container root cgroup

Bug #1631406 reported by hda_launchpad
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

1) Ubuntu 16.04.1 LTS

2) linux-image-4.4.0-41-generic, bug happens and first tested on 4.4.0-38, happens in all versions, including from 4.4.0-38 to 4.4.0-41 and possibly. See this bug on totally different xenial lts machines

3) What you expected to happen:
cat /sys/fs/cgroup/memory/lxc/containername/memory.stat should display correct information

4) What happened instead
cat /sys/fs/cgroup/memory/lxc/containername/memory.stat shows incorrect information
cat /sys/fs/cgroup/memory/memory.stat in container shows different information
lxc exec containername -- cat /sys/fs/cgroup/memory/memory.stat shows totally different information from two above. Meanwhile cat /sys/fs/cgroup/memory/lxc/containername/memory.usage_in_bytes is always correct.

5) mount | grep cgroup output:
tmpfs on /sys/fs/cgroup type tmpfs (rw,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event,release_agent=/run/cgmanager/agents/cgm-release-agent.perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids,release_agent=/run/cgmanager/agents/cgm-release-agent.pids)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)

6) I was able to reproduce the bug on another machine where memory.stat for container is correct after removing this packages, but it makes no sense at all:
mountall:amd64 plymouth-theme-ubuntu-text:amd64 plymouth:amd64 upstart:amd64
Installing back didn't help.

7) memory.stat information for programs running in containers is correct. Only memory.stat information for container root is incorrect.

Bug graphical representation:
https://cloud.githubusercontent.com/assets/6484506/19012936/9f19ff7a-87b2-11e6-9e68-889659663249.png
First chart is based on cat /sys/fs/cgroup/memory/lxc/containername/memory.usage_in_bytes and cat /sys/fs/cgroup/memory/lxc/containername/memsw_usage_in_bytes. Second chart based on incorrect cat /sys/fs/cgroup/memory/lxc/containername/memory.stat

Possibly the same problems (bugs quite old):
https://issues.apache.org/jira/browse/MESOS-758
https://groups.google.com/forum/#!topic/linux.kernel/EwiYK53Itk8

Bug original sources:
https://github.com/firehol/netdata/issues/1019
https://github.com/lxc/lxd/issues/2422

hda_launchpad (hda-me)
description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1631406

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
hda_launchpad (hda-me) wrote :

I didn't use apport ether apport uninstalled on all my machines. And as I already said in bug I have this on different machines with totally different hardware.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.8 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key xenial
Revision history for this message
hda_launchpad (hda-me) wrote :

Will check tomorrow, with mainline kernel on test machine. However not sure zfs modules will build against it. (containers run under zfs backend)

Revision history for this message
hda_launchpad (hda-me) wrote :

Unfortunately, zfs modules dkms failed with mainline kernel.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.