Activity log for bug #1847512

Date Who What changed Old value New value Message
2019-10-09 18:25:14 Mauricio Faria de Oliveira bug added bug
2019-10-09 18:25:25 Mauricio Faria de Oliveira nominated for series Ubuntu Xenial
2019-10-09 18:25:25 Mauricio Faria de Oliveira bug task added systemd (Ubuntu Xenial)
2019-10-09 18:25:33 Mauricio Faria de Oliveira systemd (Ubuntu): status New Incomplete
2019-10-09 18:25:46 Mauricio Faria de Oliveira systemd (Ubuntu Xenial): status New In Progress
2019-10-09 18:25:49 Mauricio Faria de Oliveira systemd (Ubuntu Xenial): importance Undecided Medium
2019-10-09 18:25:52 Mauricio Faria de Oliveira systemd (Ubuntu Xenial): assignee Mauricio Faria de Oliveira (mfo)
2019-10-09 18:26:39 Mauricio Faria de Oliveira description [Impact] When running Kubernetes on Xenial there's a leftover scope unit for the transient mounts used by a pod (eg, secret volume mount) together with its associate cgroup dirs, after the pod completes, almost every time such pod is created: $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for' run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/(...)/volumes/kubernetes.io~secret/(...) /sys/fs/cgroup/devices/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope /sys/fs/cgroup/pids/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope /sys/fs/cgroup/blkio/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope /sys/fs/cgroup/memory/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope /sys/fs/cgroup/cpu,cpuacct/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope /sys/fs/cgroup/systemd/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope This problem becomes noticeable with Kubernetes CronJobs as time goes by, as it repeatedly recreates pods to run the cronjob task. Over time, the leftover units (and associated cgroup directories) pile up to a significant amount, and start to cause problems for other components, with effects on /sys/fs/cgroup/ scanning: - Kubelet CPU/Memory Usage linearly increases using CronJob [1] and systemd commands time out, breaking things like Ansible: - failed: [...] (item=apt-daily-upgrade.service) => {[...] "msg": "Unable to disable service apt-daily-upgrade.service: Failed to execute operation: Connection timed out\n"} The problem seems to be related to empty cgroup notification on the legacy/classic hierarchy; it doesn't happen on hybrid or unified hierarchies. The fix is upstream systemd commit d8fdc62037b5 ("core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification"). That patch is already in progress/review in bug 1846787 [2]. [Test Case] Create K8s pods with secret volume mounts (example below) on Xenial/4.15 kernel, and check this after it completes: $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for' (With the fix applied, there are zero units reported) Steps: ----- Create a Xenial VM: $ uvt-simplestreams-libvirt sync release=xenial arch=amd64 $ uvt-kvm create --memory 8192 --cpu 8 --disk 8 vm-xenial release=xenial arch=amd64 Install the HWE/4.15 kernel and MicroK8s on it: $ uvt-kvm wait vm-xenial $ uvt-kvm ssh vm-xenial $ sudo apt update $ sudo apt install linux-image-4.15.0-65-generic $ sudo reboot $ uvt-kvm wait vm-xenial $ uvt-kvm ssh vm-xenial $ sudo snap install microk8s --channel=1.16/stable --classic $ sudo snap alias microk8s.kubectl kubectl $ sudo usermod -a -G microk8s $USER $ exit Check package versions: $ uvt-kvm ssh vm-xenial $ lsb_release -cs xenial $ uname -rv 4.15.0-65-generic #74~16.04.1-Ubuntu SMP Wed Sep 18 09:51:44 UTC 2019 $ snap list microk8s Name Version Rev Tracking Publisher Notes microk8s v1.16.0 920 1.16 canonical✓ classic $ dpkg -s systemd | grep ^Version: Version: 229-4ubuntu21.22 Create a pod with a secret/volume: $ cat <<EOF > pod-with-secret.yaml apiVersion: v1 kind: Pod metadata: name: pod-with-secret spec: containers: - name: container image: debian:stretch args: ["/bin/true"] volumeMounts: - name: secret mountPath: /secret volumes: - name: secret secret: secretName: secret-for-pod restartPolicy: Never EOF $ kubectl create secret generic secret-for-pod --from-literal=key=value Notice it leaves a transient scope unit running even after complete: $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for' $ $ kubectl create -f pod-with-secret.yaml $ kubectl get pods NAME READY STATUS RESTARTS AGE pod-with-secret 0/1 Completed 0 30s $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for' run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f And more transient scope units are left running as the pod is created again (e.g., like cronjob). $ kubectl delete pods pod-with-secret $ kubectl create -f pod-with-secret.yaml $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for' run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret $ kubectl delete pods pod-with-secret $ kubectl create -f pod-with-secret.yaml $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for' run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f run-ra5caa6aa3bb0426795ce991f178649f3.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/6a74c2cd-3029-4f30-8dc2-131de44d6625/volumes/kubernetes.io~secret/secret run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret $ kubectl delete pods pod-with-secret Repeating the test with a CronJob: $ cat <<EOF > cronjob-with-secret.yaml apiVersion: batch/v1beta1 kind: CronJob metadata: name: cronjob-with-secret spec: schedule: "*/1 * * * *" jobTemplate: spec: template: spec: nodeSelector: kubernetes.io/hostname: sf219578xt containers: - name: container image: debian:stretch args: ["/bin/true"] volumeMounts: - name: secret mountPath: /secret volumes: - name: secret secret: secretName: secret-for-pod restartPolicy: OnFailure EOF $ kubectl create secret generic secret-for-pod --from-literal=key=value $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for' $ $ kubectl create -f cronjob-with-secret.yaml cronjob.batch/cronjob-with-secret created (wait ~5 minutes) $ kubectl get cronjobs NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE cronjob-with-secret */1 * * * * False 0 42s 5m54s $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for' run-r022aebe0c9944f6fbd6cd989a2c2b819.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/9df44dc4-de5b-4586-9948-2930f6bc47fa/volumes/kubernetes.io~secret/default-token-24k4f run-r2123bea060344165b7b13320d68f1fd5.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/5e95c6ef-f04f-47c5-b479-d3a5b1830106/volumes/kubernetes.io~secret/secret run-rb8605acad9e54c3d965b2cba965b593b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/933f41b5-a566-432e-86bb-897549675403/volumes/kubernetes.io~secret/secret run-rbbaa670a270a41238d019e08a1aba400.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/2e942cf5-3b1d-48f2-8758-5f1875dc05f7/volumes/kubernetes.io~secret/default-token-24k4f $ [1] https://github.com/kubernetes/kubernetes/issues/64137 [2] https://bugs.launchpad.net/bugs/1846787 [Impact] When running Kubernetes on Xenial there's a leftover scope unit for the transient mounts used by a pod (eg, secret volume mount) together with its associate cgroup dirs, after the pod completes, almost every time such pod is created:     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/(...)/volumes/kubernetes.io~secret/(...)     /sys/fs/cgroup/devices/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/pids/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/blkio/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/memory/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/cpu,cpuacct/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/systemd/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope This problem becomes noticeable with Kubernetes CronJobs as time goes by, as it repeatedly recreates pods to run the cronjob task. Over time, the leftover units (and associated cgroup directories) pile up to a significant amount, and start to cause problems for other components, with effects on /sys/fs/cgroup/ scanning: - Kubelet CPU/Memory Usage linearly increases using CronJob [1] and systemd commands time out, breaking things like Ansible: - failed: [...] (item=apt-daily-upgrade.service) => {[...]   "msg": "Unable to disable service apt-daily-upgrade.service:   Failed to execute operation: Connection timed out\n"} The problem seems to be related to empty cgroup notification on the legacy/classic hierarchy; it doesn't happen on hybrid or unified hierarchies. The fix is upstream systemd commit d8fdc62037b5 ("core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification"). That patch is already in progress/review in bug 1846787 [2], and is present on Bionic and later, only Xenial is required. [Test Case]     Create K8s pods with secret volume mounts (example below)     on Xenial/4.15 kernel, and check this after it completes:     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     (With the fix applied, there are zero units reported)     Steps:     -----     Create a Xenial VM:     $ uvt-simplestreams-libvirt sync release=xenial arch=amd64     $ uvt-kvm create --memory 8192 --cpu 8 --disk 8 vm-xenial release=xenial arch=amd64     Install the HWE/4.15 kernel and MicroK8s on it:     $ uvt-kvm wait vm-xenial     $ uvt-kvm ssh vm-xenial     $ sudo apt update     $ sudo apt install linux-image-4.15.0-65-generic     $ sudo reboot     $ uvt-kvm wait vm-xenial     $ uvt-kvm ssh vm-xenial     $ sudo snap install microk8s --channel=1.16/stable --classic     $ sudo snap alias microk8s.kubectl kubectl     $ sudo usermod -a -G microk8s $USER     $ exit     Check package versions:     $ uvt-kvm ssh vm-xenial     $ lsb_release -cs     xenial     $ uname -rv     4.15.0-65-generic #74~16.04.1-Ubuntu SMP Wed Sep 18 09:51:44 UTC 2019     $ snap list microk8s     Name Version Rev Tracking Publisher Notes     microk8s v1.16.0 920 1.16 canonical✓ classic     $ dpkg -s systemd | grep ^Version:     Version: 229-4ubuntu21.22     Create a pod with a secret/volume:     $ cat <<EOF > pod-with-secret.yaml     apiVersion: v1     kind: Pod     metadata:       name: pod-with-secret     spec:       containers:       - name: container         image: debian:stretch         args: ["/bin/true"]         volumeMounts:         - name: secret           mountPath: /secret       volumes:       - name: secret         secret:           secretName: secret-for-pod       restartPolicy: Never     EOF     $ kubectl create secret generic secret-for-pod --from-literal=key=value     Notice it leaves a transient scope unit running even after complete:     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     $     $ kubectl create -f pod-with-secret.yaml     $ kubectl get pods     NAME READY STATUS RESTARTS AGE     pod-with-secret 0/1 Completed 0 30s     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f     And more transient scope units are left running     as the pod is created again (e.g., like cronjob).     $ kubectl delete pods pod-with-secret     $ kubectl create -f pod-with-secret.yaml     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f     run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret     $ kubectl delete pods pod-with-secret     $ kubectl create -f pod-with-secret.yaml     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f     run-ra5caa6aa3bb0426795ce991f178649f3.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/6a74c2cd-3029-4f30-8dc2-131de44d6625/volumes/kubernetes.io~secret/secret     run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret     $ kubectl delete pods pod-with-secret     Repeating the test with a CronJob:     $ cat <<EOF > cronjob-with-secret.yaml     apiVersion: batch/v1beta1     kind: CronJob     metadata:       name: cronjob-with-secret     spec:       schedule: "*/1 * * * *"       jobTemplate:         spec:           template:         spec:           nodeSelector:             kubernetes.io/hostname: sf219578xt           containers:           - name: container             image: debian:stretch             args: ["/bin/true"]             volumeMounts:             - name: secret               mountPath: /secret           volumes:           - name: secret             secret:               secretName: secret-for-pod           restartPolicy: OnFailure     EOF     $ kubectl create secret generic secret-for-pod --from-literal=key=value     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     $     $ kubectl create -f cronjob-with-secret.yaml     cronjob.batch/cronjob-with-secret created     (wait ~5 minutes)     $ kubectl get cronjobs     NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE     cronjob-with-secret */1 * * * * False 0 42s 5m54s     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r022aebe0c9944f6fbd6cd989a2c2b819.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/9df44dc4-de5b-4586-9948-2930f6bc47fa/volumes/kubernetes.io~secret/default-token-24k4f     run-r2123bea060344165b7b13320d68f1fd5.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/5e95c6ef-f04f-47c5-b479-d3a5b1830106/volumes/kubernetes.io~secret/secret     run-rb8605acad9e54c3d965b2cba965b593b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/933f41b5-a566-432e-86bb-897549675403/volumes/kubernetes.io~secret/secret     run-rbbaa670a270a41238d019e08a1aba400.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/2e942cf5-3b1d-48f2-8758-5f1875dc05f7/volumes/kubernetes.io~secret/default-token-24k4f     $ [1] https://github.com/kubernetes/kubernetes/issues/64137 [2] https://bugs.launchpad.net/bugs/1846787
2019-10-09 18:26:46 Mauricio Faria de Oliveira systemd (Ubuntu): status Incomplete Invalid
2019-10-09 18:53:50 Mauricio Faria de Oliveira description [Impact] When running Kubernetes on Xenial there's a leftover scope unit for the transient mounts used by a pod (eg, secret volume mount) together with its associate cgroup dirs, after the pod completes, almost every time such pod is created:     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/(...)/volumes/kubernetes.io~secret/(...)     /sys/fs/cgroup/devices/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/pids/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/blkio/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/memory/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/cpu,cpuacct/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/systemd/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope This problem becomes noticeable with Kubernetes CronJobs as time goes by, as it repeatedly recreates pods to run the cronjob task. Over time, the leftover units (and associated cgroup directories) pile up to a significant amount, and start to cause problems for other components, with effects on /sys/fs/cgroup/ scanning: - Kubelet CPU/Memory Usage linearly increases using CronJob [1] and systemd commands time out, breaking things like Ansible: - failed: [...] (item=apt-daily-upgrade.service) => {[...]   "msg": "Unable to disable service apt-daily-upgrade.service:   Failed to execute operation: Connection timed out\n"} The problem seems to be related to empty cgroup notification on the legacy/classic hierarchy; it doesn't happen on hybrid or unified hierarchies. The fix is upstream systemd commit d8fdc62037b5 ("core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification"). That patch is already in progress/review in bug 1846787 [2], and is present on Bionic and later, only Xenial is required. [Test Case]     Create K8s pods with secret volume mounts (example below)     on Xenial/4.15 kernel, and check this after it completes:     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     (With the fix applied, there are zero units reported)     Steps:     -----     Create a Xenial VM:     $ uvt-simplestreams-libvirt sync release=xenial arch=amd64     $ uvt-kvm create --memory 8192 --cpu 8 --disk 8 vm-xenial release=xenial arch=amd64     Install the HWE/4.15 kernel and MicroK8s on it:     $ uvt-kvm wait vm-xenial     $ uvt-kvm ssh vm-xenial     $ sudo apt update     $ sudo apt install linux-image-4.15.0-65-generic     $ sudo reboot     $ uvt-kvm wait vm-xenial     $ uvt-kvm ssh vm-xenial     $ sudo snap install microk8s --channel=1.16/stable --classic     $ sudo snap alias microk8s.kubectl kubectl     $ sudo usermod -a -G microk8s $USER     $ exit     Check package versions:     $ uvt-kvm ssh vm-xenial     $ lsb_release -cs     xenial     $ uname -rv     4.15.0-65-generic #74~16.04.1-Ubuntu SMP Wed Sep 18 09:51:44 UTC 2019     $ snap list microk8s     Name Version Rev Tracking Publisher Notes     microk8s v1.16.0 920 1.16 canonical✓ classic     $ dpkg -s systemd | grep ^Version:     Version: 229-4ubuntu21.22     Create a pod with a secret/volume:     $ cat <<EOF > pod-with-secret.yaml     apiVersion: v1     kind: Pod     metadata:       name: pod-with-secret     spec:       containers:       - name: container         image: debian:stretch         args: ["/bin/true"]         volumeMounts:         - name: secret           mountPath: /secret       volumes:       - name: secret         secret:           secretName: secret-for-pod       restartPolicy: Never     EOF     $ kubectl create secret generic secret-for-pod --from-literal=key=value     Notice it leaves a transient scope unit running even after complete:     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     $     $ kubectl create -f pod-with-secret.yaml     $ kubectl get pods     NAME READY STATUS RESTARTS AGE     pod-with-secret 0/1 Completed 0 30s     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f     And more transient scope units are left running     as the pod is created again (e.g., like cronjob).     $ kubectl delete pods pod-with-secret     $ kubectl create -f pod-with-secret.yaml     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f     run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret     $ kubectl delete pods pod-with-secret     $ kubectl create -f pod-with-secret.yaml     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f     run-ra5caa6aa3bb0426795ce991f178649f3.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/6a74c2cd-3029-4f30-8dc2-131de44d6625/volumes/kubernetes.io~secret/secret     run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret     $ kubectl delete pods pod-with-secret     Repeating the test with a CronJob:     $ cat <<EOF > cronjob-with-secret.yaml     apiVersion: batch/v1beta1     kind: CronJob     metadata:       name: cronjob-with-secret     spec:       schedule: "*/1 * * * *"       jobTemplate:         spec:           template:         spec:           nodeSelector:             kubernetes.io/hostname: sf219578xt           containers:           - name: container             image: debian:stretch             args: ["/bin/true"]             volumeMounts:             - name: secret               mountPath: /secret           volumes:           - name: secret             secret:               secretName: secret-for-pod           restartPolicy: OnFailure     EOF     $ kubectl create secret generic secret-for-pod --from-literal=key=value     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     $     $ kubectl create -f cronjob-with-secret.yaml     cronjob.batch/cronjob-with-secret created     (wait ~5 minutes)     $ kubectl get cronjobs     NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE     cronjob-with-secret */1 * * * * False 0 42s 5m54s     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r022aebe0c9944f6fbd6cd989a2c2b819.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/9df44dc4-de5b-4586-9948-2930f6bc47fa/volumes/kubernetes.io~secret/default-token-24k4f     run-r2123bea060344165b7b13320d68f1fd5.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/5e95c6ef-f04f-47c5-b479-d3a5b1830106/volumes/kubernetes.io~secret/secret     run-rb8605acad9e54c3d965b2cba965b593b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/933f41b5-a566-432e-86bb-897549675403/volumes/kubernetes.io~secret/secret     run-rbbaa670a270a41238d019e08a1aba400.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/2e942cf5-3b1d-48f2-8758-5f1875dc05f7/volumes/kubernetes.io~secret/default-token-24k4f     $ [1] https://github.com/kubernetes/kubernetes/issues/64137 [2] https://bugs.launchpad.net/bugs/1846787 [Impact] When running Kubernetes on Xenial there's a leftover scope unit for the transient mounts used by a pod (eg, secret volume mount) together with its associate cgroup dirs, after the pod completes, almost every time such pod is created:     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/(...)/volumes/kubernetes.io~secret/(...)     /sys/fs/cgroup/devices/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/pids/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/blkio/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/memory/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/cpu,cpuacct/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope     /sys/fs/cgroup/systemd/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope This problem becomes noticeable with Kubernetes CronJobs as time goes by, as it repeatedly recreates pods to run the cronjob task. Over time, the leftover units (and associated cgroup directories) pile up to a significant amount, and start to cause problems for other components, with effects on /sys/fs/cgroup/ scanning: - Kubelet CPU/Memory Usage linearly increases using CronJob [1] and systemd commands time out, breaking things like Ansible: - failed: [...] (item=apt-daily-upgrade.service) => {[...]   "msg": "Unable to disable service apt-daily-upgrade.service:   Failed to execute operation: Connection timed out\n"} The problem seems to be related to empty cgroup notification on the legacy/classic hierarchy; it doesn't happen on hybrid or unified hierarchies. The fix is upstream systemd commit d8fdc62037b5 ("core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification"). That patch is already in progress/review in bug 1846787 [2], and is present on Bionic and later, only Xenial is required. [Test Case]     Create K8s pods with secret volume mounts (example below)     on Xenial/4.15 kernel, and check this after it completes:     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     (With the fix applied, there are zero units reported -- see comment #1)     Steps:     -----     Create a Xenial VM:     $ uvt-simplestreams-libvirt sync release=xenial arch=amd64     $ uvt-kvm create --memory 8192 --cpu 8 --disk 8 vm-xenial release=xenial arch=amd64     Install the HWE/4.15 kernel and MicroK8s on it:     $ uvt-kvm wait vm-xenial     $ uvt-kvm ssh vm-xenial     $ sudo apt update     $ sudo apt install linux-image-4.15.0-65-generic     $ sudo reboot     $ uvt-kvm wait vm-xenial     $ uvt-kvm ssh vm-xenial     $ sudo snap install microk8s --channel=1.16/stable --classic     $ sudo snap alias microk8s.kubectl kubectl     $ sudo usermod -a -G microk8s $USER     $ exit     Check package versions:     $ uvt-kvm ssh vm-xenial     $ lsb_release -cs     xenial     $ uname -rv     4.15.0-65-generic #74~16.04.1-Ubuntu SMP Wed Sep 18 09:51:44 UTC 2019     $ snap list microk8s     Name Version Rev Tracking Publisher Notes     microk8s v1.16.0 920 1.16 canonical✓ classic     $ dpkg -s systemd | grep ^Version:     Version: 229-4ubuntu21.22     Create a pod with a secret/volume:     $ cat <<EOF > pod-with-secret.yaml     apiVersion: v1     kind: Pod     metadata:       name: pod-with-secret     spec:       containers:       - name: container         image: debian:stretch         args: ["/bin/true"]         volumeMounts:         - name: secret           mountPath: /secret       volumes:       - name: secret         secret:           secretName: secret-for-pod       restartPolicy: Never     EOF     $ kubectl create secret generic secret-for-pod --from-literal=key=value     Notice it leaves a transient scope unit running even after complete:     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     $     $ kubectl create -f pod-with-secret.yaml     $ kubectl get pods     NAME READY STATUS RESTARTS AGE     pod-with-secret 0/1 Completed 0 30s     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f     And more transient scope units are left running     as the pod is created again (e.g., like cronjob).     $ kubectl delete pods pod-with-secret     $ kubectl create -f pod-with-secret.yaml     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f     run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret     $ kubectl delete pods pod-with-secret     $ kubectl create -f pod-with-secret.yaml     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f     run-ra5caa6aa3bb0426795ce991f178649f3.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/6a74c2cd-3029-4f30-8dc2-131de44d6625/volumes/kubernetes.io~secret/secret     run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret     $ kubectl delete pods pod-with-secret     Repeating the test with a CronJob:     $ cat <<EOF > cronjob-with-secret.yaml     apiVersion: batch/v1beta1     kind: CronJob     metadata:       name: cronjob-with-secret     spec:       schedule: "*/1 * * * *"       jobTemplate:         spec:           template:         spec:           nodeSelector:             kubernetes.io/hostname: sf219578xt           containers:           - name: container             image: debian:stretch             args: ["/bin/true"]             volumeMounts:             - name: secret               mountPath: /secret           volumes:           - name: secret             secret:               secretName: secret-for-pod           restartPolicy: OnFailure     EOF     $ kubectl create secret generic secret-for-pod --from-literal=key=value     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     $     $ kubectl create -f cronjob-with-secret.yaml     cronjob.batch/cronjob-with-secret created     (wait ~5 minutes)     $ kubectl get cronjobs     NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE     cronjob-with-secret */1 * * * * False 0 42s 5m54s     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'     run-r022aebe0c9944f6fbd6cd989a2c2b819.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/9df44dc4-de5b-4586-9948-2930f6bc47fa/volumes/kubernetes.io~secret/default-token-24k4f     run-r2123bea060344165b7b13320d68f1fd5.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/5e95c6ef-f04f-47c5-b479-d3a5b1830106/volumes/kubernetes.io~secret/secret     run-rb8605acad9e54c3d965b2cba965b593b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/933f41b5-a566-432e-86bb-897549675403/volumes/kubernetes.io~secret/secret     run-rbbaa670a270a41238d019e08a1aba400.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/2e942cf5-3b1d-48f2-8758-5f1875dc05f7/volumes/kubernetes.io~secret/default-token-24k4f     $ [1] https://github.com/kubernetes/kubernetes/issues/64137 [2] https://bugs.launchpad.net/bugs/1846787
2019-10-09 18:57:13 Mauricio Faria de Oliveira marked as duplicate 1846787
2019-10-10 14:51:57 Mauricio Faria de Oliveira tags sts