xenial: leftover scope units for Kubernetes transient mounts
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
systemd (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Xenial |
In Progress
|
Medium
|
Mauricio Faria de Oliveira |
Bug Description
[Impact]
When running Kubernetes on Xenial there's a leftover scope unit
for the transient mounts used by a pod (eg, secret volume mount)
together with its associate cgroup dirs, after the pod completes,
almost every time such pod is created:
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
run-
/sys/
/sys/
/sys/
/sys/
/sys/
/sys/
This problem becomes noticeable with Kubernetes CronJobs as time
goes by, as it repeatedly recreates pods to run the cronjob task.
Over time, the leftover units (and associated cgroup directories)
pile up to a significant amount, and start to cause problems for
other components, with effects on /sys/fs/cgroup/ scanning:
- Kubelet CPU/Memory Usage linearly increases using CronJob [1]
and systemd commands time out, breaking things like Ansible:
- failed: [...] (item=apt-
"msg": "Unable to disable service apt-daily-
Failed to execute operation: Connection timed out\n"}
The problem seems to be related to empty cgroup notification
on the legacy/classic hierarchy; it doesn't happen on hybrid
or unified hierarchies.
The fix is upstream systemd commit d8fdc62037b5 ("core: use
an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification").
That patch is already in progress/review in bug 1846787 [2],
and is present on Bionic and later, only Xenial is required.
[Test Case]
Create K8s pods with secret volume mounts (example below)
on Xenial/4.15 kernel, and check this after it completes:
$ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
(With the fix applied, there are zero units reported -- see comment #1)
Steps:
-----
Create a Xenial VM:
$ uvt-simplestrea
$ uvt-kvm create --memory 8192 --cpu 8 --disk 8 vm-xenial release=xenial arch=amd64
Install the HWE/4.15 kernel and MicroK8s on it:
$ uvt-kvm wait vm-xenial
$ uvt-kvm ssh vm-xenial
$ sudo apt update
$ sudo apt install linux-image-
$ sudo reboot
$ uvt-kvm wait vm-xenial
$ uvt-kvm ssh vm-xenial
$ sudo snap install microk8s --channel=
$ sudo snap alias microk8s.kubectl kubectl
$ sudo usermod -a -G microk8s $USER
$ exit
Check package versions:
$ uvt-kvm ssh vm-xenial
$ lsb_release -cs
xenial
$ uname -rv
4.15.
$ snap list microk8s
Name Version Rev Tracking Publisher Notes
microk8s v1.16.0 920 1.16 canonical✓ classic
$ dpkg -s systemd | grep ^Version:
Version: 229-4ubuntu21.22
Create a pod with a secret/volume:
$ cat <<EOF > pod-with-
apiVersion: v1
kind: Pod
metadata:
name: pod-with-secret
spec:
containers:
- name: container
image: debian:stretch
args: ["/bin/true"]
- name: secret
volumes:
- name: secret
secret:
restartPo
EOF
$ kubectl create secret generic secret-for-pod --from-
Notice it leaves a transient scope unit running even after complete:
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$
$ kubectl create -f pod-with-
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-secret 0/1 Completed 0 30s
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
run-
And more transient scope units are left running
as the pod is created again (e.g., like cronjob).
$ kubectl delete pods pod-with-secret
$ kubectl create -f pod-with-
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
run-
run-
$ kubectl delete pods pod-with-secret
$ kubectl create -f pod-with-
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
run-
run-
run-
$ kubectl delete pods pod-with-secret
Repeating the test with a CronJob:
$ cat <<EOF > cronjob-
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cronjob-with-secret
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
- name: container
image: debian:stretch
args: ["/bin/true"]
- name: secret
volumes:
- name: secret
secret:
EOF
$ kubectl create secret generic secret-for-pod --from-
$ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$
$ kubectl create -f cronjob-
cronjob.
(wait ~5 minutes)
$ kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob-
$ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
run-
run-
run-
run-
$
[1] https:/
[2] https:/
Changed in systemd (Ubuntu): | |
status: | New → Incomplete |
Changed in systemd (Ubuntu Xenial): | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Mauricio Faria de Oliveira (mfo) |
description: | updated |
Changed in systemd (Ubuntu): | |
status: | Incomplete → Invalid |
tags: | added: sts |
The systemd fix commit/from LP bug 1846787 has been verified
to resolve the problem with test packages in ppa:mfo/sf219578 [1],
- no scope units are left over after the pods complete.
It's also been verified by another user on different K8s setup,
- no scope units are left over after the pods complete.
$ dpkg -s systemd | grep ^Version: 22+test20191008 b1
Version: 229-4ubuntu21.
With simple Pod:
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$
$ kubectl create -f pod-with- secret. yaml
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-secret 0/1 Completed 0 35s
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$
$ kubectl delete pods pod-with-secret secret. yaml
$ kubectl create -f pod-with-
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-secret 0/1 Completed 0 8s
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$
$ kubectl delete pods pod-with-secret secret. yaml
$ kubectl create -f pod-with-
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-secret 0/1 Completed 0 5s
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$
$ kubectl delete pods pod-with-secret secret. yaml
$ kubectl create -f pod-with-
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$
$ kubectl delete pods pod-with-secret
With CronJob:
$ kubectl create -f cronjob- with-secret. yaml batch/cronjob- with-secret created
cronjob.
< wait a few minutes >
$ kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob-with-secret */1 * * * * False 0 24s 5m52s
$ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
$
$ kubectl delete cronjobs cronjob-with-secret
[1] https:/ /launchpad. net/~mfo/ +archive/ ubuntu/ sf219578