cups-browsed is using an excessive amount of CPU
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cups-browsed (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Lunar |
Fix Released
|
High
|
Unassigned |
Bug Description
[ Impact ]
One observes that a certain time after booting cups-browsed suddenly starts to occupy a certain percentage or all of one CPU core. This slows down other processes on the system, consumes battery power, causes noise of the CPU fan.
It does not require to have any local printers set up on ones machine, but there must be printers available in the local network.
A typical trigger for this bug is a sudden disappearing of a printer in the network, for example if a laptop shares a printer and it gets suspended by closing the lid. This way the shared printer most probably disappears without the laptop's Avahi sending out some "disappered" notification.
cups-browsed does not recover from the failure, once failed it consumes CPU and stops working, until being restarted, for most with the next boot.
The problem got introduced on the transition from cups-browsed 1.x to 2.x (in Ubuntu 23.04). cups-browsed got a multi-threading feature added to be able to create more local queues at a time, especially when there are many printers available in the network..
The bug is in the error handling: If cups-browsed fails to access a remote printer in a sub-thread, it sets a flag to inform the main thread, to stop an update loop. The main thread misses to reset the flag once it has stopped the loop and so any further update loop during the rest of the life of cups-browsed gets stopped immediately, no printers updated at all, and as because of the updates not performed, updates are still needed and so the loop called again immediately, ending up in an infinite busy loop.
And these access errors happen especially if a remote printer goes away without any DNS-SD/Avahi notification about it disappearing.
So not only CPU load is caused but cups-browsed ceases completely to work.
[ Test Plan ]
This bug is not easy to reproduce, but at least for everyone who reported it here it occurs again and again. So everyone already suffering it is asked to test the proposed SRU package.
To try to reproduce it one ideally takes 2 computers, one running Ubuntu 23.04 with the affected cups-browsed (the client) and one running any Linux and sharing printers by means of CUPS queues, Printer Applications, or the ippeveprinter utility (the server).
Some ways to try to trigger the failure on the client:
- Suspend the server, either by closing its laptpp lid or by selecting the
"Suspend" function in its desktop's menus.
- On the server start a Printer Application or ippeveprinter manually (this way
no systemd watch dog applies to it). Then hard-kill its process with "kill -9
...".
- If the server is connected to the local network only by wired Ethernet, unplug
its Ethernet cable.
- If the server is connected to the local network only by Wi-Fi, switch it into
flight mode.
All these methods should make (a) shared printer(s) on the server go away without getting it properly de-registered from Avahi on the server, and so no notification being broadcasted into the local network. So the client's cups-browsed would not remove the corresponding local print queue and keep maintaining it, sooner or later failing to access the printer and then getting stuck as described above.
Anyone who is suffering this bug could also simply install the proposed package and observe and when the CPU load by cups-browsed does not appear again after some days consider the fix as verified.
[ Where problems could occur ]
The fix does nothing more than removing the mentioned flag, and instead mark the remote printer as disappeared. This way the update loop is not stopped but finishes normally, which is no problem as the remote printers are independent, there is no reason to skip updating printers because one printer failed.
After the update loop having completed, in the next update loop the local queue for the faulty printer gets removed, as it is marked as disappeared.
If cups-browsed gets notified about a disappeared printer by Avahi, it also marks it as disappeared so that the queue gets removed in the next update loop. So now we do the same with faulty printers, which simply do not answer to an IPP request.
As the regular procedure when a remote printer gets shut down works correctly we do not actually expect regressions here.
[ Original Description ]
It could be a problem with the network, but I'm seeing cups-browsed appearing to keep one CPU core busy. According to systemctl, it has used 51 minutes of CPU time since being started 6 hours ago (the laptop has been suspended a few times in that time):
Active: active (running) since Thu 2023-05-04 08:41:36 CEST; 6h ago
Main PID: 68281 (cups-browsed)
Tasks: 4 (limit: 18785)
Memory: 7.0M
CPU: 51min 32.735s
This seems surprising, since I'd only expect it to be doing something when printers are advertised over mDNS.
It may be unrelated, but I'm also seeing warnings like the following in the journal:
May 04 14:26:11 lrrr cups-browsed[
May 04 14:26:11 lrrr cups-browsed[
May 04 14:26:11 lrrr cups-browsed[
May 04 14:26:11 lrrr cups-browsed[
May 04 14:38:28 lrrr cups-browsed[
May 04 14:38:28 lrrr cups-browsed[
May 04 14:38:28 lrrr cups-browsed[
May 04 14:38:28 lrrr cups-browsed[
May 04 14:38:28 lrrr cups-browsed[
May 04 14:38:39 lrrr cups-browsed[
May 04 14:50:48 lrrr cups-browsed[
May 04 14:50:48 lrrr cups-browsed[
May 04 14:50:48 lrrr cups-browsed[
May 04 14:50:51 lrrr cups-browsed[
May 04 14:50:51 lrrr cups-browsed[
May 04 14:50:51 lrrr cups-browsed[
May 04 14:50:51 lrrr cups-browsed[
May 04 14:50:51 lrrr cups-browsed[
May 04 14:52:24 lrrr cups-browsed[
May 04 14:52:24 lrrr cups-browsed[
May 04 14:52:24 lrrr cups-browsed[
May 04 14:52:25 lrrr cups-browsed[
These errors seem to be generated when glib's g_source_remove() function is called with an ID for a job that doesn't exist. This could indicate that cups-browsed is losing track of an idle or timeout function.
ProblemType: Bug
DistroRelease: Ubuntu 23.04
Package: cups-browsed 2.0~rc1-0ubuntu1
ProcVersionSign
Uname: Linux 6.2.0-20-generic x86_64
ApportVersion: 2.26.1-0ubuntu2
Architecture: amd64
CasperMD5CheckR
CurrentDesktop: ubuntu:GNOME
Date: Thu May 4 15:10:03 2023
InstallationDate: Installed on 2017-09-02 (2070 days ago)
InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Alpha amd64 (20170901)
MachineType: LENOVO 20HRCTO1WW
Papersize: a4
PpdFiles:
Error: command ['fgrep', '-H', '*NickName', '/etc/cups/
grep: /etc/cups/
grep: /etc/cups/
grep: /etc/cups/
grep: /etc/cups/
ProcKernelCmdLine: BOOT_IMAGE=
SourcePackage: cups-browsed
UpgradeStatus: Upgraded to lunar on 2023-03-29 (36 days ago)
dmi.bios.date: 11/24/2022
dmi.bios.release: 1.57
dmi.bios.vendor: LENOVO
dmi.bios.version: N1MET72W (1.57 )
dmi.board.
dmi.board.name: 20HRCTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.
dmi.ec.
dmi.modalias: dmi:bvnLENOVO:
dmi.product.family: ThinkPad X1 Carbon 5th
dmi.product.name: 20HRCTO1WW
dmi.product.sku: LENOVO_
dmi.product.
dmi.sys.vendor: LENOVO
Changed in cups-browsed (Ubuntu): | |
importance: | Undecided → High |
Changed in cups-browsed (Ubuntu): | |
status: | Incomplete → Confirmed |
description: | updated |
Changed in cups-browsed (Ubuntu Lunar): | |
status: | New → In Progress |
importance: | Undecided → High |
Status changed to 'Confirmed' because the bug affects multiple users.