kubectl snap creates thousands of hanging systemd scope units

Bug #1928806 reported by Przemyslaw Lal
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Incomplete
Undecided
Unassigned
snapd
New
Undecided
Unassigned

Bug Description

We found over 8.5k snap kubectl systemd scope units on each of our Kubernetes master nodes. This causes 100% CPU usage spikes caused by systemd and /sbin/init processes hosing the entire cluster.

$ sudo systemctl list-units --type scope | grep snap | wc -l
8643

Typical entries look like these:
  snap.kubectl.kubectl.001b8133-30f5-42c3-90e4-39da4229cbef.scope loaded active running snap.kubectl.kubectl.001b8133-30f5-42c3-90e4-39da4229cbef.scope
  snap.kubectl.kubectl.001e4ffa-17c6-49ea-8fd2-1d7d07fb0e2f.scope loaded active running snap.kubectl.kubectl.001e4ffa-17c6-49ea-8fd2-1d7d07fb0e2f.scope
  snap.kubectl.kubectl.0022dadb-f7b5-456f-ac51-98cfcd128035.scope loaded active running snap.kubectl.kubectl.0022dadb-f7b5-456f-ac51-98cfcd128035.scope
  snap.kubectl.kubectl.0026e64c-8794-47d9-b58c-a05759814dbd.scope loaded active running snap.kubectl.kubectl.0026e64c-8794-47d9-b58c-a05759814dbd.scope
  snap.kubectl.kubectl.002d028f-b30c-4ce7-b6b9-4c72ef74f308.scope loaded active running snap.kubectl.kubectl.002d028f-b30c-4ce7-b6b9-4c72ef74f308.scope
  snap.kubectl.kubectl.003584be-b585-48b9-8385-fca6a642433a.scope loaded active running snap.kubectl.kubectl.003584be-b585-48b9-8385-fca6a642433a.scope
  snap.kubectl.kubectl.0042ab28-ac94-476b-877d-9cde2d5e2312.scope loaded active running snap.kubectl.kubectl.0042ab28-ac94-476b-877d-9cde2d5e2312.scope
  snap.kubectl.kubectl.00537acd-ce13-4a9c-a70d-059d7d1088dc.scope loaded active running snap.kubectl.kubectl.00537acd-ce13-4a9c-a70d-059d7d1088dc.scope

Please note that all of them are in status active/running.

After manually stopping them using this one-liner:
sudo systemctl list-units --type scope | grep kubectl | awk '{print $1}' | xargs sudo systemctl stop

The number goes down to expected values:
sudo systemctl list-units --type scope | grep snap | wc -l
11

And the system becomes much snappier again. The increased load caused by this issue, causes transient failures in communication between API servers and kubelets, resulting in errors similar to this: [0] Then we end up restarting kubelets which is the only way to restore connectivity between kubelets and API servers.

Additionally we see a lot of similarities with this bug [1] reported for etcdctl. Both kubectl and etcdctl from that bug are running as snaps, leaving thousands of systemd scope units, slowing down the system.

Versions:
kubernetes-master charm: 1.18.15 charm revision: 895
Ubuntu: 18.04.5 LTS
kubectl snap: 1.18.15 1.18/stable
$ snap --version
snap 2.50
snapd 2.50
series 16
ubuntu 18.04
kernel 5.4.0-1046-azure

[0] https://github.com/kubernetes/kubernetes/issues/87615
[1] https://bugs.launchpad.net/charm-etcd/+bug/1926185

Revision history for this message
Przemyslaw Lal (przemeklal) wrote :
description: updated
Revision history for this message
Przemyslaw Lal (przemeklal) wrote :

We just found the issue on another Kubernetes cluster, this time there were 13299 leaked systemd scopes on one of the masters making the system even more unresponsive. The other two master were at ~8k and were slightly more responsive so there appears to be a correlation between number of leaked scopes and general system responsiveness.

Revision history for this message
George Kraft (cynerva) wrote :

Please share output of `juju status`. In LP:1926185 this appeared to be a weird interaction between snaps and LXD. We need to know more about the topology of the deployment here, and in particular if LXD is in use.

Changed in charm-kubernetes-master:
status: New → Incomplete
Revision history for this message
George Kraft (cynerva) wrote :

A crashdump would also help a lot:

sudo snap install juju-crashdump --classic --channel edge
juju-crashdump -a debug-layer -a config

Revision history for this message
Calvin Hartwell (calvinh) wrote :

@George if you need more customer-specific or deployment-specific information which I cannot put onto the bug, you can reach out to me or bootstack directly on MM.

Revision history for this message
George Kraft (cynerva) wrote :

Let's keep the discussion on here if we can. I do need that info - our team can't assist you without it. Please send it to me through a secure channel and I will forward it to the rest of my team.

Revision history for this message
Przemyslaw Lal (przemeklal) wrote :

LXD is indeed in use on the kubernetes-master units.

I shared "juju status" output and juju crashdump links in Mattermost chat.

Revision history for this message
George Kraft (cynerva) wrote :

Przemyslaw sent me the status output and crashdump. Thanks! I'm looking into this now.

Changed in charm-kubernetes-master:
status: Incomplete → New
Revision history for this message
George Kraft (cynerva) wrote :

The crashdump seems to be missing some info - none of the debug actions are present, and /var/log/syslog is missing on the kubernetes-master units.

I had a go at reproducing this with the info I have, but so far haven't had any luck. I reproduced the deployment as closely as I could (same cloud, ubuntu series, charm revs, k8s version, unit placement with lxd, etc) but I'm not encountering the leaked scopes.

Please send me the following info from one of the affected kubernetes-master units:
/var/log/syslog
Output of `sudo snap list`
Output of `sudo lxc version`

Changed in charm-kubernetes-master:
status: New → Incomplete
Revision history for this message
Przemyslaw Lal (przemeklal) wrote :

$ sudo snap list
Name Version Rev Tracking Publisher Notes
canonical-livepatch 9.6.1 98 latest/stable canonical✓ -
cdk-addons 1.18.10 4715 1.18/stable canonical✓ in-cohort
core 16-2.50 11081 latest/stable canonical✓ core
kube-apiserver 1.18.15 1978 1.18/stable canonical✓ in-cohort
kube-controller-manager 1.18.15 1836 1.18/stable canonical✓ in-cohort
kube-proxy 1.18.15 1830 1.18/stable canonical✓ classic,in-cohort
kube-scheduler 1.18.15 1805 1.18/stable canonical✓ in-cohort
kubectl 1.18.15 1778 1.18/stable canonical✓ classic,in-cohort

$ sudo lxc version
Client version: 3.0.3
Server version: 3.0.3

Syslog: https://private-fileshare.canonical.com/~przemeklal/lp1928806_syslog.tar.xz

George Kraft (cynerva)
Changed in charm-kubernetes-master:
status: Incomplete → New
Revision history for this message
George Kraft (cynerva) wrote :

Thanks. Sadly, I'm not seeing any log messages that look relevant. Just messages when the scopes are created:

May 21 08:10:14 hostname systemd[1]: Started snap.kubectl.kubectl.e7703a7f-4b23-477e-b48b-c96a92ce543a.scope.
May 21 08:10:15 hostname systemd[1]: Started snap.kubectl.kubectl.8162f275-9a19-4d5a-901d-43d579b76091.scope.
May 21 08:10:17 hostname systemd[1]: Started snap.kubectl.kubectl.a9cb8bad-b6b2-4b13-aaa6-3d766bb8a53a.scope.

But no corresponding messages for stopping scopes, attempting to stop scopes, apparmor denials, or anything like that.

Revision history for this message
George Kraft (cynerva) wrote :

Can you send output of `dpkg -l`? Looking to see what version of systemd is in your environment and if it differs from mine.

Revision history for this message
Przemyslaw Lal (przemeklal) wrote :

Systemd version is 237-3ubuntu10.42.

Full `dpkg -l` output here:
https://private-fileshare.canonical.com/~przemeklal/dpkg_-l_lp1928806.txt

Revision history for this message
George Kraft (cynerva) wrote :

Thanks - same systemd version in my test environment.

Well, I'm out of ideas. I'm not seeing anything weird happening at the charm level, and so far I haven't been able to find any sort of log message, version mismatch, or anything like that to indicate why this might be happening. But from what I can tell, the issue appears to lie somewhere firmly between snapd and systemd.

Adding snapd to this issue. I need someone with more experience at the snap level to look at this.

Changed in charm-kubernetes-master:
status: New → Incomplete
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

Can you grab the output of `systemctl status` for one of the snap.kubectl.kubect.*.scope units?

Revision history for this message
Przemyslaw Lal (przemeklal) wrote :

No problem:

● snap.kubectl.kubectl.56cd74f3-ee72-4e70-8fcd-81f9133b0d10.scope
   Loaded: loaded (/run/systemd/transient/snap.kubectl.kubectl.56cd74f3-ee72-4e70-8fcd-81f9133b0d10.scope; transient)
Transient: yes
   Active: active (running) since Mon 2021-05-24 18:34:18 UTC; 19h ago
    Tasks: 0 (limit: 4915)
   CGroup: /system.slice/snap.kubectl.kubectl.56cd74f3-ee72-4e70-8fcd-81f9133b0d10.scope

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

Revision history for this message
Michael Vogt (mvo) wrote :

The fact that this "scope" has no tasks but is marked as "active (running)" is quite interesting. I assume this is also the case for all the other scopes?

I suspect this is the root cause of this issue.

Fwiw, I was not able to reproduce this here in any way, neither with artifical test-snapd-sh commands nor with running kubectl from the snap. There is something missing still it seems.

I wonder if there is any specific kubectl command or plugin that is triggering this issue. Are there any special commands being run?

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Is kubectl doing anything with cgroups that regular "simple" snaps do not?

Revision history for this message
Przemyslaw Lal (przemeklal) wrote (last edit ):

@mvo: Yes, all leftover scopes look like the one I pasted above.

This should give you an idea of what's executed on the kubernetes-master unit:
https://private-fileshare.canonical.com/~przemeklal/lp1928806_kubectl_examples.txt

Not very scientific way to do that, but this is a list of kubectl processes that showed up in ps output in the span of ~5 minutes.

Please note that not every kubectl execution ends up as a hanging systemctl scope, in fact in this case none of these calls increased the count of hanging scopes.

Can you please advise if there's any way to capture a process that was executed in systemd's transient scope?

Revision history for this message
George Kraft (cynerva) wrote :

> Is kubectl doing anything with cgroups that regular "simple" snaps do not?

Not that I'm aware of. It's a golang client to a web API.

Kubelet and containerd certainly do things with cgroups, but kubectl itself should not.

Revision history for this message
Ian Johnson (anonymouse67) wrote :

Hmm, interestingly I note that I happen to have one of these leftover transient units on my machine:

$ systemctl status snap.zoom-client.hook.configure.16523059-a244-423d-a1a3-40a56bd63587.scope
● snap.zoom-client.hook.configure.16523059-a244-423d-a1a3-40a56bd63587.scope
     Loaded: loaded (/run/systemd/transient/snap.zoom-client.hook.configure.16523059-a244-423d-a1a3-40a56bd63587.scope; transient)
  Transient: yes
     Active: active (running) since Tue 2021-07-20 05:34:19 CDT; 2 days ago
      Tasks: 0 (limit: 38185)
     Memory: 106.7M
     CGroup: /system.slice/snap.zoom-client.hook.configure.16523059-a244-423d-a1a3-40a56bd63587.scope

Jul 20 05:34:19 ubuntu systemd[1]: Started snap.zoom-client.hook.configure.16523059-a244-423d-a1a3-40a56bd63587.scope.

While mine is for a hook rather than an app, it's the same invocation from the snapd side of things so it's equivalent. This is on Hirstute with systemd 247.3-3ubuntu3.4.

I had a poke at the underlying cgroup, here are the results if someone wants to take a look, but AFAICT the cgroup exists because systemd created it and hasn't destroyed it, the issue is that systemd should have destroyed it when the processes in it exited.

I'm inclined to say that this is a systemd bug and not a snapd bug. The code for how we create the transient unit can be found here: https://github.com/snapcore/snapd/blob/master/sandbox/cgroup/tracking.go#L216-L319

Revision history for this message
Chris Johnston (cjohnston) wrote :

Looks like a repro:

root@juju-be253b-7-lxd-5:~# date ; systemctl list-units --type scope | grep snap | wc -l
Thu Jul 22 21:49:25 UTC 2021
0
root@juju-be253b-7-lxd-5:~# for i in {1..100}; do /snap/bin/kubectl --kubeconfig=/root/.kube/config get secrets -n kube-system -o json & done; for i in {1..20}; do echo 'Reloading...'; sudo systemctl daemon-reload; done
[1] 10320
[2] 10321
[3] 10322
[4] 10323
[5] 10324
[6] 10325
[7] 10326
[8] 10327
[9] 10328
[10] 10329
[11] 10330
[12] 10331
[13] 10332
[14] 10333
[15] 10334
[16] 10335
[17] 10336
[18] 10349
[19] 10351
...
root@juju-be253b-7-lxd-5:~# date ; systemctl list-units --type scope | grep snap | wc -l
Thu Jul 22 21:51:36 UTC 2021
27
root@juju-be253b-7-lxd-5:~# for i in {1..100}; do /snap/bin/kubectl --kubeconfig=/root/.kube/config get secrets -n kube-system -o json & done; for i in {1..20}; do echo 'Reloading...'; sudo systemctl daemon-reload; done
...
root@juju-be253b-7-lxd-5:~# date ; systemctl list-units --type scope | grep snap | wc -l
Thu Jul 22 21:52:31 UTC 2021
68
root@juju-be253b-7-lxd-5:~# for i in {1..100}; do /snap/bin/kubectl --kubeconfig=/root/.kube/config get secrets -n kube-system -o json & done; for i in {1..20}; do echo 'Reloading...'; sudo systemctl daemon-reload; done
...
root@juju-be253b-7-lxd-5:~# date ; systemctl list-units --type scope | grep snap | wc -l
Thu Jul 22 21:54:51 UTC 2021
100
root@juju-be253b-7-lxd-5:~# for i in {1..100}; do /snap/bin/kubectl --kubeconfig=/root/.kube/config get secrets -n kube-system -o json & done; for i in {1..20}; do echo 'Reloading...'; sudo systemctl daemon-reload; done
...
root@juju-be253b-7-lxd-5:~# date ; systemctl list-units --type scope | grep snap | wc -l
Thu Jul 22 21:56:32 UTC 2021
145
root@juju-be253b-7-lxd-5:~# date ; systemctl list-units --type scope | grep snap | wc -l
Thu Jul 22 22:05:18 UTC 2021
145

Revision history for this message
Chris Johnston (cjohnston) wrote :
Download full text (3.7 KiB)

Possible fix:
root@juju-be253b-8-lxd-5:~# sudo add-apt-repository ppa:ddstreet/lp1934147

 More info: https://launchpad.net/~ddstreet/+archive/ubuntu/lp1934147
Press [ENTER] to continue or Ctrl-c to cancel adding it.
...
root@juju-be253b-8-lxd-5:~# sudo apt install systemd=237-3ubuntu10.49~bug1934147v20210711b1 libsystemd0=237-3ubuntu10.49~bug1934147v20210711b1 libnss-systemd=237-3ubuntu10.49~bug1934147v20210711b1 libpam-systemd=237-3ubuntu10.49~bug1934147v20210711b1 libudev1=237-3ubuntu10.49~bug1934147v20210711b1 systemd-sysv=237-3ubuntu10.49~bug1934147v20210711b1 udev=237-3ubuntu10.49~bug1934147v20210711b1
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'sudo apt autoremove' to remove it.
Suggested packages:
  systemd-container
The following packages will be DOWNGRADED:
  libnss-systemd libpam-systemd libsystemd0 libudev1 systemd systemd-sysv udev
0 upgraded, 0 newly installed, 7 downgraded, 0 to remove and 13 not upgraded.
Need to get 5,201 kB of archives.
After this operation, 735 kB disk space will be freed.
Do you want to continue? [Y/n] y
...
root@juju-be253b-8-lxd-5:~# dpkg -l | grep systemd
ii libnss-systemd:amd64 237-3ubuntu10.49~bug1934147v20210711b1 amd64 nss module providing dynamic user and group name resolution
ii libpam-systemd:amd64 237-3ubuntu10.49~bug1934147v20210711b1 amd64 system and service manager - PAM module
ii libsystemd0:amd64 237-3ubuntu10.49~bug1934147v20210711b1 amd64 systemd utility library
ii networkd-dispatcher 1.7-0ubuntu3.3 all Dispatcher service for systemd-networkd connection status changes
ii python3-systemd 234-1build1 amd64 Python 3 bindings for systemd
ii systemd 237-3ubuntu10.49~bug1934147v20210711b1 amd64 system and service manager
ii systemd-sysv 237-3ubuntu10.49~bug1934147v20210711b1 amd64 system and service manager - SysV links
root@juju-be253b-8-lxd-5:~# date ; systemctl list-units --type scope | grep snap | wc -l
Thu Jul 22 22:12:10 UTC 2021
0
root@juju-be253b-8-lxd-5:~# for i in {1..100}; do /snap/bin/kubectl --kubeconfig=/root/.kube/config get secrets -n kube-system -o json & done; for i in {1..20}; do echo 'Reloading...'; sudo systemctl daemon-reload; done
[1] 4679
[2] 4680
[3] 4681
[4] 4682
[5] 4683
[6] 4684
[7] 4685
[8] 4686
...
root@juju-be253b-8-lxd-5:~# date ; systemctl list-units --type scope | grep snap | wc -l
Thu Jul 22 22:13:08 UTC 2021
0
root@juju-be253b-8-lxd-5:~# for i in {1..100}; do /snap/bin/kubectl --kubeconfig=/root/.kube/config get secrets -n kube-system -o json & done; for i in {1..20}; do echo 'Reloading...'; sudo systemctl daemon-reload; done
root@juju-be253b-8-lxd-5:~# date ; systemctl list-units --type scope | grep snap | wc -l
Thu Jul 22 22:13:59 UTC 2021
0
root@juju-be253b-8-lxd-5:~# for i in {1..100}; do /snap/bin/kubectl --kubeconfig=/root/.kube/config get secrets -n k...

Read more...

Revision history for this message
Ian Johnson (anonymouse67) wrote :

Given your test of that systemd from the PPA, I'm inclined to say this bug is a duplicate of that one, thanks for that.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.