etcdctl snap created thousands of systemd scope processes

Bug #1926185 reported by Dariusz Smigiel
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Etcd Charm
Incomplete
Undecided
Unassigned

Bug Description

During investigation of strange CPU spikes, we've noticed that there were over 24k long running processes of etcdctl.

root@juju-2b11a1-4-lxd-10:~# systemctl status snap.etcd.etcdctl.0027c2a3-7e0f-4739-afd6-106eff7ffcb2.scope
● snap.etcd.etcdctl.0027c2a3-7e0f-4739-afd6-106eff7ffcb2.scope
   Loaded: loaded
Transient: yes
   Active: active (running) since Wed 2021-04-14 17:03:36 UTC; 1 weeks 4 days ago
    Tasks: 0
   Memory: 408.0K
      CPU: 17ms

root@juju-2b11a1-4-lxd-10:/var/lib/lxcfs/cgroup/devices/lxc/juju-2b11a1-4-lxd-10/system.slice# ls | wc -l
24178

When running below command it timed out
systemctl list-units --type=scope
Failed to list units: Connection timed out

To improve this issue, we had to manually stop them by:
for i in `ls | grep etcdctl`; do echo $i; systemctl stop $i; done

ubuntu@juju-2b11a1-6-lxd-14:~$ snap list
Name Version Rev Tracking Publisher Notes
core 16-2.49.1 10908 latest/stable canonical✓ core
etcd 3.1.10 81 3.1/stable canonical✓ -
ubuntu@juju-2b11a1-6-lxd-14:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial

  etcd:
    charm: cs:etcd-544
    series: xenial
    os: ubuntu
    charm-origin: jujucharms
    charm-name: etcd
    charm-rev: 544
    can-upgrade-to: cs:etcd-569

description: updated
Revision history for this message
George Kraft (cynerva) wrote :

I can reproduce this, but it appears to be specific to LXD containers running xenial. I could not reproduce this in bionic or focal.

Given that xenial support ends in just a few days, I do not think we will be fixing this. I recommend upgrading your deployment to bionic or focal. See instructions: https://ubuntu.com/kubernetes/docs/upgrading#upgrading-the-machines-series

Changed in charm-etcd:
status: New → Won't Fix
Revision history for this message
Chris Johnston (cjohnston) wrote :

I have a bionic cluster with etcd 3.4.5. One node has zero of these scopes, one node has seven of these scopes, the third node has 73 scopes. I'm unsure how to reproduce this, just happened to notice it while looking into something else.

George Kraft (cynerva)
Changed in charm-etcd:
status: Won't Fix → New
Revision history for this message
George Kraft (cynerva) wrote :

Chris, can you provide more details about your bionic deployment? Can you show `juju status`? Are your etcd nodes deployed to LXD?

Changed in charm-etcd:
status: New → Incomplete
Revision history for this message
Chris Johnston (cjohnston) wrote :

juju statis: https://pastebin.canonical.com/p/RTb6StDQDD/

Yes, etcd is deployed in containers.

Changed in charm-etcd:
status: Incomplete → New
Revision history for this message
George Kraft (cynerva) wrote :

I'm still unable to reproduce this in a 15-hour test on Bionic.

Marking this as incomplete since we just don't have enough information to act on here. We need either reproduction steps, or details that show us what is causing the problem. I'm not really sure where to dig for those details - maybe we will see something in syslog, if you can share that.

Changed in charm-etcd:
status: New → Incomplete
Revision history for this message
Chris Johnston (cjohnston) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.