Ubuntu
systemd package

xenial: leftover scope units for Kubernetes transient mounts

Bug #1847512 reported by Mauricio Faria de Oliveira on 2019-10-09

This bug report is a duplicate of: Bug #1846787: systemd-logind leaves leftover sessions and scope files. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	systemd (Ubuntu)	Invalid	Undecided	Unassigned
	Xenial	In Progress	Medium	Mauricio Faria de Oliveira

Bug Description

[Impact]

When running Kubernetes on Xenial there's a leftover scope unit
for the transient mounts used by a pod (eg, secret volume mount)
together with its associate cgroup dirs, after the pod completes,
almost every time such pod is created:

    /sys/fs/cgroup/devices/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
    /sys/fs/cgroup/pids/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
    /sys/fs/cgroup/blkio/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
    /sys/fs/cgroup/memory/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
    /sys/fs/cgroup/cpu,cpuacct/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
    /sys/fs/cgroup/systemd/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope

This problem becomes noticeable with Kubernetes CronJobs as time
goes by, as it repeatedly recreates pods to run the cronjob task.

Over time, the leftover units (and associated cgroup directories)
pile up to a significant amount, and start to cause problems for
other components, with effects on /sys/fs/cgroup/ scanning:

- Kubelet CPU/Memory Usage linearly increases using CronJob [1]

and systemd commands time out, breaking things like Ansible:

- failed: [...] (item=apt-daily-upgrade.service) => {[...]
"msg": "Unable to disable service apt-daily-upgrade.service:
Failed to execute operation: Connection timed out\n"}

The problem seems to be related to empty cgroup notification
on the legacy/classic hierarchy; it doesn't happen on hybrid
or unified hierarchies.

The fix is upstream systemd commit d8fdc62037b5 ("core: use
an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification").

That patch is already in progress/review in bug 1846787 [2],
and is present on Bionic and later, only Xenial is required.

[Test Case]

Create K8s pods with secret volume mounts (example below)
on Xenial/4.15 kernel, and check this after it completes:

$ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'

(With the fix applied, there are zero units reported -- see comment #1)

Steps:
-----

Create a Xenial VM:

$ uvt-simplestreams-libvirt sync release=xenial arch=amd64
$ uvt-kvm create --memory 8192 --cpu 8 --disk 8 vm-xenial release=xenial arch=amd64

Install the HWE/4.15 kernel and MicroK8s on it:

$ uvt-kvm wait vm-xenial
$ uvt-kvm ssh vm-xenial

    $ sudo apt update
    $ sudo apt install linux-image-4.15.0-65-generic
    $ sudo reboot

$ uvt-kvm wait vm-xenial
$ uvt-kvm ssh vm-xenial

    $ sudo snap install microk8s --channel=1.16/stable --classic
    $ sudo snap alias microk8s.kubectl kubectl
    $ sudo usermod -a -G microk8s $USER
    $ exit

Check package versions:

$ uvt-kvm ssh vm-xenial

$ lsb_release -cs
xenial

$ uname -rv
4.15.0-65-generic #74~16.04.1-Ubuntu SMP Wed Sep 18 09:51:44 UTC 2019

    $ snap list microk8s
    Name Version Rev Tracking Publisher Notes
    microk8s v1.16.0 920 1.16 canonical✓ classic

$ dpkg -s systemd | grep ^Version:
Version: 229-4ubuntu21.22

Create a pod with a secret/volume:

    $ cat <<EOF > pod-with-secret.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: pod-with-secret
    spec:
      containers:
      - name: container
        image: debian:stretch
        args: ["/bin/true"]
        volumeMounts:
        - name: secret
          mountPath: /secret
      volumes:
      - name: secret
        secret:
          secretName: secret-for-pod
      restartPolicy: Never
    EOF

$ kubectl create secret generic secret-for-pod --from-literal=key=value

Notice it leaves a transient scope unit running even after complete:

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl create -f pod-with-secret.yaml

    $ kubectl get pods
    NAME READY STATUS RESTARTS AGE
    pod-with-secret 0/1 Completed 0 30s

And more transient scope units are left running
as the pod is created again (e.g., like cronjob).

$ kubectl delete pods pod-with-secret
$ kubectl create -f pod-with-secret.yaml

    $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
    run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
    run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret

$ kubectl delete pods pod-with-secret
$ kubectl create -f pod-with-secret.yaml

    $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
    run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
    run-ra5caa6aa3bb0426795ce991f178649f3.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/6a74c2cd-3029-4f30-8dc2-131de44d6625/volumes/kubernetes.io~secret/secret
    run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret

$ kubectl delete pods pod-with-secret

Repeating the test with a CronJob:

    $ cat <<EOF > cronjob-with-secret.yaml
    apiVersion: batch/v1beta1
    kind: CronJob
    metadata:
      name: cronjob-with-secret
    spec:
      schedule: "*/1 * * * *"
      jobTemplate:
        spec:
          template:
        spec:
          nodeSelector:
            kubernetes.io/hostname: sf219578xt
          containers:
          - name: container
            image: debian:stretch
            args: ["/bin/true"]
            volumeMounts:
            - name: secret
              mountPath: /secret
          volumes:
          - name: secret
            secret:
              secretName: secret-for-pod
          restartPolicy: OnFailure
    EOF

$ kubectl create secret generic secret-for-pod --from-literal=key=value

$ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl create -f cronjob-with-secret.yaml
cronjob.batch/cronjob-with-secret created

(wait ~5 minutes)

    $ kubectl get cronjobs
    NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
    cronjob-with-secret */1 * * * * False 0 42s 5m54s

    $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
    run-r022aebe0c9944f6fbd6cd989a2c2b819.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/9df44dc4-de5b-4586-9948-2930f6bc47fa/volumes/kubernetes.io~secret/default-token-24k4f
    run-r2123bea060344165b7b13320d68f1fd5.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/5e95c6ef-f04f-47c5-b479-d3a5b1830106/volumes/kubernetes.io~secret/secret
    run-rb8605acad9e54c3d965b2cba965b593b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/933f41b5-a566-432e-86bb-897549675403/volumes/kubernetes.io~secret/secret
    run-rbbaa670a270a41238d019e08a1aba400.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/2e942cf5-3b1d-48f2-8758-5f1875dc05f7/volumes/kubernetes.io~secret/default-token-24k4f
    $

[1] https://github.com/kubernetes/kubernetes/issues/64137
[2] https://bugs.launchpad.net/bugs/1846787

See original description

Tags:

Mauricio Faria de Oliveira (mfo) on 2019-10-09

Changed in systemd (Ubuntu):
status:	New → Incomplete
Changed in systemd (Ubuntu Xenial):
status:	New → In Progress
importance:	Undecided → Medium
assignee:	nobody → Mauricio Faria de Oliveira (mfo)
description:	updated
Changed in systemd (Ubuntu):
status:	Incomplete → Invalid

Revision history for this message

Mauricio Faria de Oliveira (mfo) wrote on 2019-10-09:

The systemd fix commit/from LP bug 1846787 has been verified
to resolve the problem with test packages in ppa:mfo/sf219578 [1],
- no scope units are left over after the pods complete.

It's also been verified by another user on different K8s setup,
- no scope units are left over after the pods complete.

$ dpkg -s systemd | grep ^Version:
Version: 229-4ubuntu21.22+test20191008b1

With simple Pod:

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl create -f pod-with-secret.yaml

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-secret 0/1 Completed 0 35s

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl delete pods pod-with-secret
$ kubectl create -f pod-with-secret.yaml
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-secret 0/1 Completed 0 8s

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl delete pods pod-with-secret
$ kubectl create -f pod-with-secret.yaml
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-secret 0/1 Completed 0 5s

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl delete pods pod-with-secret
$ kubectl create -f pod-with-secret.yaml

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl delete pods pod-with-secret

With CronJob:

$ kubectl create -f cronjob-with-secret.yaml
cronjob.batch/cronjob-with-secret created

< wait a few minutes >

$ kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob-with-secret */1 * * * * False 0 24s 5m52s

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl delete cronjobs cronjob-with-secret

[1] https://launchpad.net/~mfo/+archive/ubuntu/sf219578

The systemd fix commit/from LP bug 1846787 has been verified 
to resolve the problem with test packages in ppa:mfo/sf219578 [1],
- no scope units are left over after the pods complete.

It's also been verified by another user on different K8s setup,
- no scope units are left over after the pods complete.

$ dpkg -s systemd | grep ^Version:
Version: 229-4ubuntu21.22+test20191008b1

With simple Pod:

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl create -f pod-with-secret.yaml

$ kubectl get pods
NAME              READY   STATUS      RESTARTS   AGE
pod-with-secret   0/1     Completed   0          35s

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl delete pods pod-with-secret
$ kubectl create -f pod-with-secret.yaml
$ kubectl get pods
NAME              READY   STATUS      RESTARTS   AGE
pod-with-secret   0/1     Completed   0          8s

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl delete pods pod-with-secret
$ kubectl create -f pod-with-secret.yaml
$ kubectl get pods
NAME              READY   STATUS      RESTARTS   AGE
pod-with-secret   0/1     Completed   0          5s

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl delete pods pod-with-secret
$ kubectl create -f pod-with-secret.yaml

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl delete pods pod-with-secret

With CronJob:

$ kubectl create -f cronjob-with-secret.yaml 
cronjob.batch/cronjob-with-secret created

< wait a few minutes >

$ kubectl get cronjobs
NAME                  SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob-with-secret   */1 * * * *   False     0        24s             5m52s

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

$ kubectl delete cronjobs cronjob-with-secret

[1] https://launchpad.net/~mfo/+archive/ubuntu/sf219578

description:

updated

Revision history for this message

Mauricio Faria de Oliveira (mfo) wrote on 2019-10-09:

Marking this bug as a duplicate of bug 1846787 for the systemd fix.

Mauricio Faria de Oliveira (mfo) on 2019-10-10

tags:

added: sts

Revision history for this message

Mauricio Faria de Oliveira (mfo) wrote on 2019-12-03:

Download full text (5.1 KiB)

Verification done with fix for bug 1846787 on xenial-proposed (systemd 229-4ubuntu21.23).

With the new systemd packages there are no leaked scope units for transient mounts.

cheers,
Mauricio

Setup
---

$ sudo snap install --beta --classic multipass

$ multipass launch --cpus 16 --mem 8G --disk 8G --name lp1847512 xenial

$ multipass shell lp1847512
$ sudo apt update && sudo apt -y upgrade && sudo apt -y install linux-generic-hwe-16.04 && sudo reboot

$ multipass shell lp1847512

$ lsb_release -cs
xenial

$ uname -rv
4.15.0-72-generic #81~16.04.1-Ubuntu SMP Tue Nov 26 16:34:21 UTC 2019

$ sudo snap install microk8s --channel=1.16/stable --classic
$ sudo snap alias microk8s.kubectl kubectl
$ sudo usermod -a -G microk8s $USER
$ newgrp microk8s

$ kubectl create secret generic secret-for-pod --from-literal=key=value

$ cat <<EOF > pod-with-secret.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-secret
spec:
  containers:
  - name: container
image: debian:stretch
args: ["/bin/true"]
volumeMounts:
- name: secret
  mountPath: /secret
  volumes:
  - name: secret
secret:
  secretName: secret-for-pod
  restartPolicy: Never
EOF

xenial-updates: there are leaked scope units over time. (bad)
---

$ multipass shell lp1847512

$ dpkg -s systemd | grep ^Version:
Version: 229-4ubuntu21.22

No scope units at the beginning:

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$

Test #1: leaked one unit.

$ kubectl create -f pod-with-secret.yaml

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-secret 0/1 Completed 0 11s

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
run-rf2ba6bb83e014123818fedcdde24ef63.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/62cea6e6-bb30-4a48-a61b-0242d10f0546/volumes/kubernetes.io~secret/secret

$ kubectl delete pods pod-with-secret
pod "pod-with-secret" deleted

Test #2: leaked zero units.

$ kubectl create -f pod-with-secret.yaml
pod/pod-with-secret created

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-secret 0/1 Completed 0 5s

$ kubectl delete pods pod-with-secret
pod "pod-with-secret" deleted

Test #3: leaked one more unit.

$ kubectl create -f pod-with-secret.yaml
pod/pod-with-secret created

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-secret 0/1 Completed 0 4s

$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
run-r181f6242dd644256be6f8405eab60ed7.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/a35aee3e-cc0a-443c-a33d-556b94730e1e/volumes/kubernetes.io~secret/secret
run-rf2ba6bb83e014123818fedcdde24ef63.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/62cea6e6-bb30-4a48-a61b-0242d10f0546/volumes/kubern...

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntusystemd package

xenial: leftover scope units for Kubernetes transient mounts

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
systemd package