jujud agent consumes too much CPU when "idle"

Bug #1982893 reported by Simon Déziel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

# Summary

jujud on an empty (no app/unit) machine consumes a lot of CPU doing no useful work.

# Steps to reproduce

1) Create an empty machine
juju add-machine --series jammy
2) Let it wait for a while
3) Check CPU usage of jujud in the empty machine
juju ssh <machine #>
top -bcn1 -o 'TIME' | head -n 20

ubuntu@r01-amd64-06:~$ top -bcn1 -o 'TIME' | head -n 20
top - 21:15:17 up 3 days, 20:12, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 0.0 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 0.0 si, 1.5 st
MiB Mem : 3920.9 total, 2792.9 free, 289.9 used, 838.1 buff/cache
MiB Swap: 3920.0 total, 3920.0 free, 0.0 used. 3381.8 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    791 root 20 0 852788 89560 74964 S 0.0 2.2 148:28.90 /var/lib/juju/tools/machine-165/jujud machine --data-dir /var/lib+
    459 root 20 0 716364 17592 10724 S 0.0 0.4 5:38.84 /run/lxd_agent/lxd-agent
    483 root rt 0 289480 27264 9076 S 0.0 0.7 1:04.96 /sbin/multipathd -d -s
    742 root 20 0 82836 3904 3556 S 0.0 0.1 0:26.74 /usr/sbin/irqbalance --foreground
    748 root 20 0 1169920 39384 20488 S 0.0 1.0 0:22.72 /usr/lib/snapd/snapd
     43 root 20 0 0 0 0 S 0.0 0.0 0:14.78 [kcompactd0]
    692 systemd+ 20 0 16248 7984 6972 S 0.0 0.2 0:14.47 /lib/systemd/systemd-networkd
      1 root 20 0 166432 11772 8256 S 0.0 0.3 0:12.26 /sbin/init
 111820 root 20 0 0 0 0 I 0.0 0.0 0:11.72 [kworker/1:2-cgroup_destroy]
   1412 root 39 19 0 0 0 S 0.0 0.0 0:10.43 [arc_reap]
    698 systemd+ 20 0 25392 12336 8312 S 0.0 0.3 0:10.30 /lib/systemd/systemd-resolved
   1414 root 39 19 0 0 0 S 0.0 0.0 0:09.86 [dbuf_evict]
     14 root 20 0 0 0 0 I 0.0 0.0 0:08.67 [rcu_sched]

In the above, we see the empty VM has been running for less than 4 days and during that time, jujud
consumed ~148 minutes of CPU doing essentially no useful work because the VM has no workload.

https://gist.github.com/stgraber/28b6e704cea2a317667ce95bea100b9f contains `health-check` reports for jujud and lxd (which is needlessly activated by juju due to https://bugs.launchpad.net/juju/+bug/1934176).

It would be nice for jujud to minimize it's CPU/energy consumption, especially when there is no valuable work to do.

# Additional information:

ubuntu@r01-amd64-06:~$ lsb_release -rd
Description: Ubuntu 22.04 LTS
Release: 22.04
ubuntu@r01-amd64-06:~$ /var/lib/juju/tools/machine-165/jujud version
2.9.32-ubuntu-amd64

Revision history for this message
Stéphane Graber (stgraber) wrote :

Given our recent renewed interest in power efficiency, the combined reduction of jujud's usage with it not needlessly enabling the lxd daemon should result in a good amount of spared CPU time and power on a large number of cloud instances.

Revision history for this message
Joseph Phillips (manadart) wrote :

https://bugs.launchpad.net/juju/+bug/1934176 mentioned above will be fixed in 2.9.33+

Revision history for this message
John A Meinel (jameinel) wrote :

I did just a `juju add-machine` for an LXD container, and then ran:
```
juju_cpu_profile 3000 > cpu.pprof
```

Which runs the go cpu profile for 3000 seconds and then dumps out the usage to a profile. Unfortunately the result isn't particularly interesting from just a simple top:
```
Showing top 10 nodes out of 50
      flat flat% sum% cum cum%
   20670ms 40.83% 40.83% 20670ms 40.83% runtime.futex
   14460ms 28.57% 69.40% 14460ms 28.57% runtime.epollwait
    1130ms 2.23% 71.63% 1130ms 2.23% runtime.nanotime (inline)
    1030ms 2.03% 73.67% 29930ms 59.13% runtime.findrunnable
     980ms 1.94% 75.60% 15490ms 30.60% runtime.netpoll
     840ms 1.66% 77.26% 840ms 1.66% runtime.(*mcache).prepareForSweep
     710ms 1.40% 78.66% 830ms 1.64% runtime.pidleget
     660ms 1.30% 79.97% 2940ms 5.81% runtime.checkTimers
     580ms 1.15% 81.11% 920ms 1.82% runtime.selectgo
     570ms 1.13% 82.24% 570ms 1.13% runtime.lock2
```

The SVG doesn't appear to be much clearer. It seems to also be saying that it is the go runtime waiting on a futex or for epoll.

Revision history for this message
John A Meinel (jameinel) wrote :
Changed in juju:
status: New → Triaged
importance: Undecided → High
Changed in juju:
milestone: none → 2.9.34
Changed in juju:
milestone: 2.9.34 → 2.9.35
Changed in juju:
milestone: 2.9.35 → none
Revision history for this message
Simon Déziel (sdeziel) wrote :

Quick update: still an issue on Juju 3.1.2 (amd64) where an empty machine burns ~5% CPU all the time.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.