Kernel memory resource leak on hosts running containers performing sudo tasks

Bug #1861792 reported by Stig Telfer
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Michal Nasiadka
Rocky
Fix Released
High
Michal Nasiadka
Stein
Fix Released
High
Michal Nasiadka
Train
Fix Released
High
Michal Nasiadka
Ussuri
Fix Released
High
Michal Nasiadka

Bug Description

There appears to be an issue with an interaction between systemd, systemd-logind and cgroupfs that leads to the following symptoms:

In systemd journald, regular logs of this form:

systemd[1]: Failed to start Session c314331 of user root.

In /proc/slabinfo, ever-increasing growth in the inode_cache slab:

# grep inode_cache /proc/slabinfo
inode_cache 3396340 3397020 592 55 8 : tunables 0 0 0 : slabdata 61764 61764 0

In /sys/fs/cgroup/systemd/user.slice/user-0.slice, large numbers of directories are being generated:

# ls /sys/fs/cgroup/systemd/user.slice/user-0.slice | wc -l
555543

Ultimately this is a resource leak that may result in memory exhaustion.

The issue appears to be that systemd is requested to create a cgroup for a PID that is not found. Here's an strace of systemd in action:

open("/sys/fs/cgroup/systemd/user.slice/user-0.slice/session-c327326.scope/cgroup.procs", O_WRONLY|O_NOCTTY|O_CLOEXEC) = 60
fcntl(60, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE)
fstat(60, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe636abd000
write(60, "28114\n", 6) = -1 ESRCH (No such process)

systemd is not handling this well and does not clean up the cgroup state created to this point. This causes the growing number of files and directories in cgroupfs (and the growth in inode_cache).

Systemd gets the instruction to attach a new process to a cgroup via dbus message from systemd-logind. Here is an example from dbus-monitor --system:

method call time=1580407532.821483 sender=:1.2 -> destination=org.freedesktop.systemd1 serial=3148388 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=StartTransientUnit
   string "session-c313601.scope"
   string "fail"
   array [
      struct {
         string "Slice"
         variant string "user-0.slice"
      }
      struct {
         string "Description"
         variant string "Session c313601 of user root"
      }
      struct {
         string "After"
         variant array [
               string "systemd-logind.service"
            ]
      }
      struct {
         string "After"
         variant array [
               string "systemd-user-sessions.service"
            ]
      }
      struct {
         string "SendSIGHUP"
         variant boolean true
      }
      struct {
         string "PIDs"
         variant array [
               uint32 32682
            ]
      }
      struct {
         string "TasksMax"
         variant uint64 18446744073709551615
      }
   ]
   array [
   ]

The PID referenced is not found. Systemd-logind appears to be relaying a CreateSession notification from another source on dbus. It appears this other source is transient, and is attached to dbus for the duration of a sudo command.

During the time when the cgroup creation attempt occurs, it appears that the only process exec is for a periodic neutron-rootwrap polling command, which is being executed from within the Neutron Kolla containers.

My theory is that references are made on dbus to processes in a different PID namespace from the host. These messages make their way to systemd on the host, which then cannot act accordingly as it is in a different process namespace. Systemd failing to clean up is a secondary issue to the root cause.

I see this on latest CentOS 7.7 with Rocky deployed via Kayobe and Kolla-Ansible. I believe it also affects Stein.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Could you paste your kernel and systemd versions? I am not seeing this behavior on Stein.

Revision history for this message
Stig Telfer (stigtelfer) wrote :

Thanks Radoslaw -

Kernel is 3.10.0-957.10.1.el7.x86_64
Systemd is systemd-219-62.el7_6.5.x86_64

Revision history for this message
Stig Telfer (stigtelfer) wrote :

Apologies, wrong data. Should have been:

kernel-3.10.0-1062.9.1.el7.x86_64
systemd-219-67.el7_7.2.x86_64

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Hmm, now afraid to do an upgrade. ;D

Could you verify whether any regular sudo usage does not end up in this behavior? I.e. log in as a regular user and then try sudoing to root?

Revision history for this message
Stig Telfer (stigtelfer) wrote :

Regular sudo commands in the container don't appear to make it happen. It appears to be only commands exec'd by the Neutron processes - some cgroup / process group connection.

I applied this patch from mnasiadka - https://review.opendev.org/#/c/707598/ - and it has stopped the resource leakage. What it does is restrict the mount of /var/run, so that there is no shared dbus connection between container and host.

I'm now checking if it has caused any side-effect issues.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

I wonder why it's still not common - can't reproduce. Your logic seems fine but I can't wrap my head why it has not hit me.
Could you list those neutron services?

Changed in kolla-ansible:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Michal Nasiadka (mnasiadka)
Changed in kolla-ansible:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/707375
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=227008cf68aa68f340d95703e85355ae81585506
Submitter: Zuul
Branch: master

commit 227008cf68aa68f340d95703e85355ae81585506
Author: Michal Nasiadka <email address hidden>
Date: Wed Feb 12 13:39:33 2020 +0100

    Change /run bind mount for neutron/openvswitch

    Currently we have a very wide /run mount for all Neutron/OVS services,
    which allows sudo/rootwrap to contact with the hosts dbus - all symptoms
    are documented in the related bug.

    Since we use tcp connections to OVS from Neutron agents - removing
    bind mounts.

    Closes-Bug: #1861792

    Change-Id: Ifee4bec7b2e9ef4e2d624b1411f1a9e6332325c6

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/708905

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/train)

Reviewed: https://review.opendev.org/708905
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=c4ad080d9a0a79caadac39ecc1a2891219741797
Submitter: Zuul
Branch: stable/train

commit c4ad080d9a0a79caadac39ecc1a2891219741797
Author: Michal Nasiadka <email address hidden>
Date: Wed Feb 12 13:39:33 2020 +0100

    Change /run bind mount for neutron/openvswitch

    Currently we have a very wide /run mount for all Neutron/OVS services,
    which allows sudo/rootwrap to contact with the hosts dbus - all symptoms
    are documented in the related bug.

    Since we use tcp connections to OVS from Neutron agents - removing
    bind mounts.

    Closes-Bug: #1861792

    Change-Id: Ifee4bec7b2e9ef4e2d624b1411f1a9e6332325c6
    (cherry picked from commit 227008cf68aa68f340d95703e85355ae81585506)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/709115

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/stein)

Reviewed: https://review.opendev.org/709115
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=8be74eb8cb8c177dd2c2ac23ea9b8a4edccb6e87
Submitter: Zuul
Branch: stable/stein

commit 8be74eb8cb8c177dd2c2ac23ea9b8a4edccb6e87
Author: Michal Nasiadka <email address hidden>
Date: Wed Feb 12 13:39:33 2020 +0100

    Change /run bind mount for neutron/openvswitch

    Currently we have a very wide /run mount for all Neutron/OVS services,
    which allows sudo/rootwrap to contact with the hosts dbus - all symptoms
    are documented in the related bug.

    Since we use tcp connections to OVS from Neutron agents - removing
    bind mounts.

    Closes-Bug: #1861792

    Change-Id: Ifee4bec7b2e9ef4e2d624b1411f1a9e6332325c6
    (cherry picked from commit 227008cf68aa68f340d95703e85355ae81585506)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/709225

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (stable/rocky)

Change abandoned by Michal Nasiadka (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/709225

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/rocky)

Reviewed: https://review.opendev.org/707598
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=4d856317e39ef2c1ac05f7575b7110506c503f9f
Submitter: Zuul
Branch: stable/rocky

commit 4d856317e39ef2c1ac05f7575b7110506c503f9f
Author: Michal Nasiadka <email address hidden>
Date: Wed Feb 12 13:39:33 2020 +0100

    Change /run bind mount for neutron/openvswitch

    Currently we have a very wide /run mount for all Neutron/OVS services,
    which allows sudo/rootwrap to contact with the hosts dbus - all symptoms
    are documented in the related bug.

    Since we use tcp connections to OVS from Neutron agents - removing
    bind mounts.

    Closes-Bug: #1861792

    Change-Id: Ifee4bec7b2e9ef4e2d624b1411f1a9e6332325c6
    (cherry picked from commit 227008cf68aa68f340d95703e85355ae81585506)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 7.2.1

This issue was fixed in the openstack/kolla-ansible 7.2.1 release.

Revision history for this message
fxpester (a-yurtaykin) wrote :

just hit it on our setup, fix for neutron works, thx
but also got same problem with cinder-volume containers running on compute nodes

looks like it is not affecting ceph to make similar workaround to cinder-volume container
but if you use lvm_backend it hurts a lot

our workaround for now:
  - "{% if enable_cinder_backend_lvm | bool %}/run/:/run/:shared{% endif %}"

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.