[nvidia] gnome-shell eats 100% cpu when seconds display is on and screen is locked

Bug #1814125 reported by laulau
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
gnome-shell (Ubuntu)
Won't Fix
Undecided
Daniel van Vugt
mutter (Ubuntu)
Won't Fix
Undecided
Daniel van Vugt

Bug Description

on my laptop (xiaomi notebook pro), ubuntu 18.10 amd64, standart ubuntu desktop (gnome-shell), no application running
just activate second display in gnome tweak, let it lock itself
the fan is going fast, simply moving the mouse and the fan slow down.
wrote a simple script to append a top every minut in a log file, show that gnome-shell consumes 100% when locked.
workaround : disabling second displaying and no more CPU usage / fan noise when locked.

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: xorg 1:7.7+19ubuntu8
ProcVersionSignature: Ubuntu 4.18.0-13.14-generic 4.18.17
Uname: Linux 4.18.0-13-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
.proc.driver.nvidia.gpus.0000.01.00.0: Error: [Errno 21] Is a directory: '/proc/driver/nvidia/gpus/0000:01:00.0'
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 415.27 Thu Dec 20 17:25:03 CST 2018
 GCC version: gcc version 8.2.0 (Ubuntu 8.2.0-7ubuntu1)
ApportVersion: 2.20.10-0ubuntu13.1
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Thu Jan 31 16:19:16 2019
DistUpgraded: 2018-12-30 21:42:17,971 DEBUG Running PostInstallScript: './xorg_fix_proprietary.py'
DistroCodename: cosmic
DistroVariant: ubuntu
DkmsStatus:
 nvidia, 415.27, 4.18.0-13-generic, x86_64: installed
 virtualbox, 5.2.18, 4.18.0-13-generic, x86_64: installed
ExtraDebuggingInterest: Yes, if not too technical
GraphicsCard:
 Intel Corporation UHD Graphics 620 [8086:5917] (rev 07) (prog-if 00 [VGA controller])
   Subsystem: Xiaomi UHD Graphics 620 [1d72:1701]
   Subsystem: Xiaomi Mi Notebook Pro [GeForce MX150] [1d72:1701]
InstallationDate: Installed on 2018-06-08 (236 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
MachineType: Timi TM1701
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.18.0-13-generic root=UUID=174e8747-a737-411b-bfaf-4e6795426fea ro quiet splash nouveau.runpm=0 vt.handoff=1
SourcePackage: xorg
Symptom: display
UpgradeStatus: Upgraded to cosmic on 2018-12-30 (31 days ago)
dmi.bios.date: 10/13/2017
dmi.bios.vendor: INSYDE Corp.
dmi.bios.version: XMAKB5R0P0502
dmi.board.asset.tag: Any
dmi.board.name: TM1701
dmi.board.vendor: Timi
dmi.board.version: MP
dmi.chassis.asset.tag: Chassis Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: Timi
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnINSYDECorp.:bvrXMAKB5R0P0502:bd10/13/2017:svnTimi:pnTM1701:pvr:rvnTimi:rnTM1701:rvrMP:cvnTimi:ct10:cvrChassisVersion:
dmi.product.family: Timibook
dmi.product.name: TM1701
dmi.sys.vendor: Timi
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.95-1
version.libgl1-mesa-dri: libgl1-mesa-dri 18.2.2-0ubuntu1
version.libgl1-mesa-glx: libgl1-mesa-glx 18.2.2-0ubuntu1
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.1-3ubuntu2.1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:18.1.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20171229-1ubuntu1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.15-3

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

As per bug 779039#c14, this checkin https://git.gnome.org/browse/mutter/commit/?id=383ba566bd7c2a76d0856015a66e47caedef06b6 makes gnome-shell use ~80% cpu of one core out of 4 (I am running a skylake corei5) constantly when running something like glxgears.

Reverting the patch brings down cpu usage to around 3% when running glxgears.
In addition, this makes my cpu temperature increase by ~5 degrees.

GPU temperature and usage don't seem to be affect. Only CPU usage is.
I am using a NVIDIA kepler card.

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

I forgot to mention that I had asked a nvidia developer about this and he said it was due to the overhead of many xlib calls.

Revision history for this message
In , apemax (apemax) wrote :

I can also confirm this issue, gnome-shell hits around 80% CPU usage on one core when running glxgears, It also causes stuttering in all games I have tried as well. Reverting the patch mentioned by Hussam brings the CPU usage back down to normal levels.

Revision history for this message
In , Alec (susie-cumming) wrote :

I have also encountered this issue on the Nvidia proprietary driver. Reverting the patch fixes the problem.

Revision history for this message
In , DeadMetaler (dead-666) wrote :

I make a screenshot of cpu usage before and after applying
Call-coglxlibrenderersetthreadedswapwaitenabled.patch
http://storage4.static.itmages.ru/i/17/0411/h_1491948046_9928347_1598f98b62.png

I also noticed that this patch have improved responsiveness when moving windows. But now, the animations of the Gnome Shell to lag even more. Now without the load on the CPU, the decrease in FPS became visible. This is all tested on a proprietary Nvidia + 650GTX.

Revision history for this message
In , Alec (susie-cumming) wrote :

(In reply to Maxim from comment #4)
> I make a screenshot of cpu usage before and after applying
> Call-coglxlibrenderersetthreadedswapwaitenabled.patch
> http://storage4.static.itmages.ru/i/17/0411/h_1491948046_9928347_1598f98b62.
> png
>
>
> I also noticed that this patch have improved responsiveness when moving
> windows. But now, the animations of the Gnome Shell to lag even more. Now
> without the load on the CPU, the decrease in FPS became visible. This is all
> tested on a proprietary Nvidia + 650GTX.

I have also noticed this behaviour

Revision history for this message
In , Promolecule (promolecule) wrote :

I am seriously wondering why no devs have noticed this yet. This bug seems to affect literally everyone who uses an NVIDIA gpu.

Revision history for this message
In , Fau-l (fau-l) wrote :

I've been experiencing sluggish animations in gnome-shell after updating to v3.24.1 on ArchLinux on my laptop with dedicated Nvidia GPU (NVS 5100M); no integrated GPU is present.

The previous Gnome version did not show this behaviour.

I reverted the commit:
https://git.gnome.org/browse/mutter/commit?id=383ba566bd7c2a76d0856015a66e47caedef06b6
as mentioned by somebody on the Arch Forums and this fixed the issue for me.

Revision history for this message
In , Fau-l (fau-l) wrote :

(In reply to Frank from comment #7)
> I've been experiencing sluggish animations in gnome-shell after updating to
> v3.24.1 on ArchLinux on my laptop with dedicated Nvidia GPU (NVS 5100M); no
> integrated GPU is present.
>
> The previous Gnome version did not show this behaviour.
>
> I reverted the commit:
> https://git.gnome.org/browse/mutter/
> commit?id=383ba566bd7c2a76d0856015a66e47caedef06b6
> as mentioned by somebody on the Arch Forums and this fixed the issue for me.

Sorry I forgot to mention that I use the closed source Nvidia driver 340.xx and run Gnome on xorg.

Revision history for this message
In , Léo (leeo97one) wrote :

Same for me, this is a really annoying issue.

Revision history for this message
In , Promolecule (promolecule) wrote :

I am on ArchLinux and it seems that with the new NVIDIA release (381.22) and latest mutter release the high cpu usage is finally gone. Also the animations are back to normal as in having very smooth animations and window movements, whatever. I am not sure if it's the NVIDIA release or mutter which has included a fix but I would assume that NVIDIA has somehow fixed the locking issue.

Revision history for this message
In , Fau-l (fau-l) wrote :

(In reply to Peet from comment #10)
> I am on ArchLinux and it seems that with the new NVIDIA release (381.22) and
> latest mutter release the high cpu usage is finally gone. Also the
> animations are back to normal as in having very smooth animations and window
> movements, whatever. I am not sure if it's the NVIDIA release or mutter
> which has included a fix but I would assume that NVIDIA has somehow fixed
> the locking issue.

The update to mutter-3.24.2 (ArchLinux) has not resolved the issue for me.

Revision history for this message
In , Promolecule (promolecule) wrote :

(In reply to Frank from comment #11)
> (In reply to Peet from comment #10)
> > I am on ArchLinux and it seems that with the new NVIDIA release (381.22) and
> > latest mutter release the high cpu usage is finally gone. Also the
> > animations are back to normal as in having very smooth animations and window
> > movements, whatever. I am not sure if it's the NVIDIA release or mutter
> > which has included a fix but I would assume that NVIDIA has somehow fixed
> > the locking issue.
>
> The update to mutter-3.24.2 (ArchLinux) has not resolved the issue for me.

Since this is NVIDIA related have you tried upgrading to the lastest NVIDIA release? I am currently using 381.22.

Revision history for this message
In , Fau-l (fau-l) wrote :

I use the nvidia-340.102 legacy drivers which is up-to-date.

Revision history for this message
In , Alec (susie-cumming) wrote :

(In reply to Peet from comment #10)
> I am on ArchLinux and it seems that with the new NVIDIA release (381.22) and
> latest mutter release the high cpu usage is finally gone. Also the
> animations are back to normal as in having very smooth animations and window
> movements, whatever. I am not sure if it's the NVIDIA release or mutter
> which has included a fix but I would assume that NVIDIA has somehow fixed
> the locking issue.

The Mutter 3.24.2 and Nvidia 381.22 updates have not fixed the issue for me

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

The nvidia changelog says:
"Disabled OpenGL threaded optimizations by default, initially enabled in 378.09, due to various reports of instability."
Could this be related?

Revision history for this message
In , Black-inc (black-inc) wrote :

(In reply to alecdc272 from comment #14)
> (In reply to Peet from comment #10)
> > I am on ArchLinux and it seems that with the new NVIDIA release (381.22) and
> > latest mutter release the high cpu usage is finally gone. Also the
> > animations are back to normal as in having very smooth animations and window
> > movements, whatever. I am not sure if it's the NVIDIA release or mutter
> > which has included a fix but I would assume that NVIDIA has somehow fixed
> > the locking issue.
>
> The Mutter 3.24.2 and Nvidia 381.22 updates have not fixed the issue for me

Can confirm, same setup, did not fix issue for me either

Revision history for this message
In , Promolecule (promolecule) wrote :

This is was my recent upgrade:

[2017-05-12 00:46] [ALPM] upgraded dialog (1:1.3_20170131-1 -> 1:1.3_20170509-1)
[2017-05-12 00:46] [ALPM] upgraded nvidia-utils (378.13-6 -> 381.22-1)
[2017-05-12 00:46] [ALPM] upgraded libproxy (0.4.13-2 -> 0.4.15-1)
[2017-05-12 00:46] [ALPM] upgraded mutter (3.24.1+1+geb394f19d-1 -> 3.24.2-1)
[2017-05-12 00:46] [ALPM] upgraded gnome-shell (3.24.1+2+g45c2627d4-1 -> 3.24.2-1)
[2017-05-12 00:46] [ALPM] upgraded gnome-shell-extensions (3.24.1+1+gfbf3cf3-1 -> 3.24.2-1)
[2017-05-12 00:46] [ALPM] upgraded konsole (17.04.0-1 -> 17.04.1-1)
[2017-05-12 00:46] [ALPM] upgraded lib32-nvidia-utils (378.13-3 -> 381.22-1)
[2017-05-12 00:46] [ALPM] upgraded v4l-utils (1.12.3-1 -> 1.12.5-1)
[2017-05-12 00:46] [ALPM] upgraded lib32-v4l-utils (1.12.3-1 -> 1.12.5-1)
[2017-05-12 00:46] [ALPM] upgraded libcdio-paranoia (10.2+0.94+1-1 -> 10.2+0.94+1-2)
[2017-05-12 00:46] [ALPM] upgraded libkexiv2 (17.04.0-1 -> 17.04.1-1)
[2017-05-12 00:46] [ALPM] upgraded libkomparediff2 (17.04.0-1 -> 17.04.1-1)
[2017-05-12 00:46] [ALPM] upgraded libreoffice-fresh (5.3.2-3 -> 5.3.3-1)
[2017-05-12 00:46] [ALPM] upgraded nvidia-dkms (378.13-6 -> 381.22-1)
[2017-05-12 00:46] [ALPM] upgraded okteta (17.04.0-1 -> 17.04.1-1)
[2017-05-12 00:46] [ALPM] upgraded okular (17.04.0-1 -> 17.04.1-1)
[2017-05-12 00:46] [ALPM] upgraded openvpn (2.4.1-2 -> 2.4.2-1)
[2017-05-12 00:46] [ALPM] upgraded qt4 (4.8.7-19 -> 4.8.7-20)
[2017-05-12 00:46] [ALPM] upgraded sudo (1.8.19.p2-1 -> 1.8.20-1)
[2017-05-12 00:46] [ALPM] upgraded wildmidi (0.4.0-1 -> 0.4.1-1)

As I said before, only several packages seem relevant to me, for example:

[2017-05-12 00:46] [ALPM] upgraded nvidia-utils (378.13-6 -> 381.22-1)
[2017-05-12 00:46] [ALPM] upgraded mutter (3.24.1+1+geb394f19d-1 -> 3.24.2-1)
[2017-05-12 00:46] [ALPM] upgraded gnome-shell (3.24.1+2+g45c2627d4-1 -> 3.24.2-1)
[2017-05-12 00:46] [ALPM] upgraded lib32-nvidia-utils (378.13-3 -> 381.22-1)
[2017-05-12 00:46] [ALPM] upgraded nvidia-dkms (378.13-6 -> 381.22-1)

Revision history for this message
In , Léo (leeo97one) wrote :

Not resolved for me too.

[2017-05-12 17:32] [ALPM] upgraded mutter (3.24.1+1+geb394f19d-1 -> 3.24.2-1)
[2017-05-12 17:32] [ALPM] upgraded gnome-shell (3.24.1+2+g45c2627d4-1 -> 3.24.2-1)
[2017-05-12 17:32] [ALPM] upgraded nvidia (378.13-6 -> 381.22-1)

Revision history for this message
In , Promolecule (promolecule) wrote :

I am not sure if this can make a difference but, @Léo could you try using nvidia-dkms instead of the nvidia package?

Revision history for this message
In , Eduardo Medina (no-more-hopes) wrote :

Hi, I'm a Manjaro user. I was redirected to here from Manjaro Forum and I think I have the issue explained here, and if it's true, I have something to add.

When I was using GNOME 3.22 the environment worked perfect, but since the update to GNOME 3.24 I have problems with the stability with the environment, and I think the Mutter has the blame.

I have two computers. One of them is my main computer, a desktop equipped with an ASUS P5K Motherboard, a NVIDIA GTX 1050 as GPU and an Intel Core 2 Quad Q8300 as CPU. I use the proprietary blob driver 375.66-1 version from Manjaro repos, Linux 4.9 and Intel Microcode.

The second computer I have is an old Toshiba laptop Satellite Pro P200, with an Intel Core 2 Duo T7300 and an ATI Mobility Radeon HD 2600 as GPU identified as AMD® Rv630 by the free drivers stack (Linux 4.9 LTS and Mesa 17.0.5). I use Intel Microcode here too.

Well, the problem I have is sometimes the environment crashes randomly. It recovers perfect after crashing, but when the bug occurs is very annoying.

I couldn't catch any log or message to know what is the exact error, the only thing I have clear is the bug runs always when I have an amount of windows spreaded through some virtual desktops.

Yes, the bug occurs on Mesa too if it's the same problem I'm responding.

Revision history for this message
In , Eduardo Medina (no-more-hopes) wrote :

I forgot to say that Wayland looks much more stable than Xorg, but the bug happens on both servers.

Revision history for this message
In , Daniel Boles (dboles) wrote :

(In reply to Eduardo Medina from comment #20)
> Well, the problem I have is sometimes the environment crashes randomly. It
> recovers perfect after crashing, but when the bug occurs is very annoying.
>
> I couldn't catch any log or message to know what is the exact error, the
> only thing I have clear is the bug runs always when I have an amount of
> windows spreaded through some virtual desktops.
>
> Yes, the bug occurs on Mesa too if it's the same problem I'm responding.

Is your bug _only_ crashing? If so, it's quite possibly not the same or related. As in the title, this report is about high CPU usage, and none of the other reporters have mentioned any crashing following that. Do you log high CPU usage before these crashes occur?

Revision history for this message
In , Eduardo Medina (no-more-hopes) wrote :

Created attachment 352150
CPU log from Eduardo Medina's old Toshiba laptop

I ran this command:
$ while true; do ps -eo pcpu,pid,user,args | sort -k 1 -r | head -50 >> logfileToshiba.txt; echo "\n" >> logfileToshiba.txt; sleep 1; done

It seems the problem is systemd-coredump, it appears with a high CPU usage more or less in the same times than GNOME Shell crashed. You can see it in the final part of the big file inside the ZIP.

Well it seems I have to report this to Manjaro or systemd.

Sorry for the inconvenience.

Revision history for this message
In , Florian-muellner (florian-muellner) wrote :

That's definitively a different issue. This bug is about high CPU usage cause by mutter/gnome-shell on Nvidia GPUs.

A crash is of course a bug too, but it's not a matter of CPU usage (you could say that the CPU usage of a crashed process is too low, namely 0% ...)

Revision history for this message
In , Eduardo Medina (no-more-hopes) wrote :

I see, I'm collecting all data I can to fill another bug.

From coredumpctl and journalctl I see something related with libc.

Revision history for this message
In , Daniel Boles (dboles) wrote :

(In reply to Eduardo Medina from comment #23)
> It seems the problem is systemd-coredump, it appears with a high CPU usage
> more or less in the same times than GNOME Shell crashed.

This doesn't mean systemd-coredump is the culprit - quite the opposite. It using significant CPU coincident with a crash is to be expected, because it needs to dump the core image of the process that crashed if needed for debugging purposes.

46 comments hidden view all 126 comments
Revision history for this message
In , Noxum (noxum) wrote :

Thank you, Daniel. This specific info destroys my idea:
> Moving GL commands into a new thread is non-trivial (you need to manually set
> up the context so that the commands will work)

The rest of your comments do not apply. The threaded swap wait is an nvidia proprietary driver only code path, emulating the intel extension glx_intel_swap_event. So changing only that does not influence the nouveau code path or any other driver's one. This specific path already contains a glFinish, so I was talking about moving that, not adding another one.

Revision history for this message
In , Noxum (noxum) wrote :

PS:
Daniel, in general I completely agree with you, curently mutter uses too many glFinish and glxWaitVideoSync which are both blocking. The prop. driver to my knowledge needs exactly one glFinish per frame for proper operation so my idea was about using that as efficiently as possible.
A web-search trying to re-find the mentioned article brought up that many gl devs were hitting the same behaviour of the prop. driver and used the same solution I was thinking of.
I didn't look into why mutter uses glxWaitVideoSync so often to wait a little here, wait a little there.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Just a reminder to all:

If you have frame rate problems then please see bug 789186.

This bug is about high CPU usage only.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

For those of you who still find reverting commit 383ba566bd7c2a76d0856015a66e47caedef06b6 helps to reduce CPU, can you please note the Nvidia driver version you are using? I've tested some theories today and retested drivers 340 and 390 but can't find any problem. Or at least can't find any problem that's unique to Nvidia.

When testing for this bug please take care to not move the mouse at all. Just let rendering proceed untouched. High CPU from moving the mouse is a whole different family of issues (https://gitlab.gnome.org/GNOME/mutter/issues/283) so please don't move the mouse or discuss that here.

Please also note if the bug affects Xorg or Wayland sessions.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Correction: Please DON'T note if the bug affects Xorg or Wayland sessions.

It's rather obvious from 383ba566bd7c2a76d0856015a66e47caedef06b6 that this discussion should be about Xorg sessions only.

89 comments hidden view all 126 comments
Revision history for this message
laulau (olaulau) wrote :
90 comments hidden view all 126 comments
Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

And remember, this bug is about high CPU only. Other issues should not be discussed here.

Those simply wanting smoother performance for Nvidia should use this instead:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/281

89 comments hidden view all 126 comments
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This might be a different issue, but your kernel log is also getting flooded with error messages about the touchpad(?)

https://launchpadlibrarian.net/409195699/CurrentDmesg.txt

summary: - gnome-shell eats 100% cpu when seconds display is on and screen is
- locked
+ [nvidia] gnome-shell eats 100% cpu when seconds display is on and screen
+ is locked
affects: xorg (Ubuntu) → gnome-shell (Ubuntu)
tags: added: nvidia
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If you have any gnome-shell extensions installed, please try uninstalling them and tell us if the problem persists.

Changed in gnome-shell (Ubuntu):
status: New → Incomplete
Revision history for this message
laulau (olaulau) wrote :

seems that the bug disappears with all gnome extensions uninstalled.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks. In that case I am going to declare this bug invalid for 'gnome-shell'. But we can reopen it for a different component if you can find out which gnome-shell extension was causing the problem.

Changed in gnome-shell (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
laulau (olaulau) wrote :

the bug still happens if I enable seconds display. still no gnome extensions installed.

here is an extract of a top running in background when screen auto locks and cpu is burning (watch -n1 "top -b -n 1 >> cpu.log") :

top - 12:02:08 up 36 min, 1 user, load average: 0.97, 0.78, 0.62
Tasks: 273 total, 2 running, 271 sleeping, 0 stopped, 0 zombie
%Cpu(s): 17.9 us, 4.8 sy, 0.0 ni, 77.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7860.6 total, 2552.6 free, 2671.6 used, 2636.4 buff/cache
MiB Swap: 512.0 total, 512.0 free, 0.0 used. 4761.5 avail Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 3427 laulau 20 0 3335356 308544 98700 R 83.3 3.8 7:55.70 gnome-shell
 4147 laulau 20 0 2371472 461776 152652 S 11.1 5.7 1:43.68 firefox
 1214 message+ 20 0 24216 6244 3836 S 5.6 0.1 0:01.80 dbus-daemon
 1354 root 20 0 248388 9456 6404 S 5.6 0.1 0:00.73 polkitd
 1505 root 20 0 263212 8256 7012 S 5.6 0.1 0:00.06 gdm3
 4350 laulau 20 0 1698980 171616 84856 S 5.6 2.1 0:24.28 WebExtensions
 4476 laulau 20 0 199236 28424 13388 S 5.6 0.4 0:00.30 chrome-gnome-sh
 4723 laulau 20 0 2010052 489476 206920 S 5.6 6.1 7:09.37 Web Content
 4978 laulau 20 0 313968 27492 21004 S 5.6 0.3 0:00.17 update-notifier
 9080 laulau 20 0 26308 3852 3232 R 5.6 0.0 0:00.03 top
    1 root 20 0 195224 9552 6788 S 0.0 0.1 0:03.25 systemd

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK. Does the bug still happen if:

(a) Firefox is not running; or

(b) You disable your Nvidia GPU (in the BIOS?) and just use the Intel one?

Changed in gnome-shell (Ubuntu):
status: Invalid → Incomplete
Revision history for this message
laulau (olaulau) wrote :

disabled nvidia gpu in nvidia-settings (and rebooted)
 without any application : OK
 with just a firefox empty private browsing window : OK
 with a normal firefox (many tabs & plugins) : OK
 many apps running : OK
enabling nvidia GPU, rebooted
 without any application : KO

computer is calm, doing a manual lock lead to imediate fan blowing. unlocking and fan calm down.

extract from my script logs (I added CPU freq fyi):

cpu MHz : 700.142
cpu MHz : 700.025
cpu MHz : 700.346
cpu MHz : 700.285
cpu MHz : 700.037
cpu MHz : 700.094
cpu MHz : 700.109
cpu MHz : 700.109
top - 12:01:18 up 54 min, 1 user, load average: 0.16, 0.46, 0.67
Tasks: 263 total, 1 running, 262 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 1.5 sy, 0.0 ni, 97.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7860.6 total, 5831.8 free, 963.5 used, 1065.4 buff/cache
MiB Swap: 512.0 total, 512.0 free, 0.0 used. 6620.2 avail Mem
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  245 root -51 0 0 0 0 S 6.2 0.0 0:04.87 irq/129-nvidia
 2568 laulau 20 0 375224 87192 49688 S 6.2 1.1 0:20.95 Xorg
 2772 laulau 20 0 3265100 286588 97372 S 6.2 3.6 27:57.77 gnome-shell
 6954 laulau 20 0 26296 3764 3140 R 6.2 0.0 0:00.02 top
    1 root 20 0 195328 9612 6764 S 0.0 0.1 0:03.46 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd

cpu MHz : 3400.000
cpu MHz : 3400.341
cpu MHz : 3401.565
cpu MHz : 3401.177
cpu MHz : 3399.907
cpu MHz : 3400.164
cpu MHz : 3403.125
cpu MHz : 3400.692
top - 12:01:28 up 54 min, 1 user, load average: 0.29, 0.48, 0.67
Tasks: 262 total, 2 running, 260 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.0 us, 8.8 sy, 0.0 ni, 87.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7860.6 total, 5830.9 free, 964.3 used, 1065.4 buff/cache
MiB Swap: 512.0 total, 512.0 free, 0.0 used. 6619.3 avail Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 2772 laulau 20 0 3265100 286948 97372 R 93.8 3.6 28:06.30 gnome-shell
    1 root 20 0 195328 9612 6764 S 0.0 0.1 0:03.46 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd

so the problem is between nvidia and gnome-shell on my configuration. and the fact that I want to display seconds in the top bar !

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK. Since the problem seems to be in the Nvidia-415 driver, and that driver is not part of Ubuntu then this bug is not a valid Ubuntu bug right now. Sorry.

You can potentially make this a valid Ubuntu bug if you install (downgrade) to a supported version of the Nvidia driver like version 390.

  1. Uninstall the nvidia driver version 415.
  2. sudo apt install nvidia-driver-390

Then if the bug still occurs in a supported version of the driver we can reopen this bug.

affects: gnome-shell (Ubuntu) → ubuntu
Changed in ubuntu:
status: Incomplete → Invalid
summary: - [nvidia] gnome-shell eats 100% cpu when seconds display is on and screen
- is locked
+ [nvidia-415] gnome-shell eats 100% cpu when seconds display is on and
+ screen is locked
Revision history for this message
laulau (olaulau) wrote : Re: [nvidia-415] gnome-shell eats 100% cpu when seconds display is on and screen is locked

OK. I just removed graphics PPA, removed nvidia 415, then installed nvidia 390. rebooted.
same again : gnome shell eats 100% CPU when screen is locked and seconds are displayed !

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK, thanks.

Please run this command:

  dpkg -l > allpackages.txt

and send us the resulting file 'allpackages.txt'.

affects: ubuntu → gnome-shell (Ubuntu)
Changed in gnome-shell (Ubuntu):
status: Invalid → Incomplete
Revision history for this message
laulau (olaulau) wrote :
summary: - [nvidia-415] gnome-shell eats 100% cpu when seconds display is on and
- screen is locked
+ [nvidia] gnome-shell eats 100% cpu when seconds display is on and screen
+ is locked
Changed in gnome-shell (Ubuntu):
status: Incomplete → New
80 comments hidden view all 126 comments
Revision history for this message
In , Noxum (noxum) wrote :

Ok, nvidia driver 418.43, gnome-shell 3.31.91 + mutter 3.31.91 freshly built without any additional patches.
Running only glxgears in a small window, not anything else open, not touching anything, gnome-shell cpu usage spikes to 85%.

Revision history for this message
In , Noxum (noxum) wrote :

Even just the small circle animation in the g-c-c bluetooth pane puts gnome-shell to constant 85% cpu usage.

Revision history for this message
In , Noxum (noxum) wrote :

https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-KDE-High-CPU-Fix
gnome-shell-3.32.0+mutter-3.32.0
Setting __GL_MaxFramesAllowed=1 returns cpu usage to normal when running glxgears or opening the bluetooth pane. Mystery solved?

Revision history for this message
In , Léo (leeo97one) wrote :

Interesting, so GNOME is also affected by this glXSwapBuffers misimplementation?
Furthermore, setting this environment variable also fixes some big freezes that I had with GNOME Shell (or Mutter ?) 3.32.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

The fix in comment 81 sounds the same as what this does:
https://gitlab.gnome.org/GNOME/mutter/merge_requests/281

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Although not entirely the same.

Mutter was already trying to do things right (compared to KDE) so if __GL_MaxFramesAllowed=1 helps you then that suggests to me the issue is missing frame notifications from the backend/driver (COGL_FRAME_EVENT_SYNC/COGL_FRAME_EVENT_COMPLETE). If they are unsupported or absent then clutter_stage_cogl_schedule_update returns early and the clutter master clock would revert to a dumb fallback throttling method (which is the same as KDE's).

This also explains why reverting 383ba566b would seem to help some people. It's a workaround, but not a fix for the real problem.

I guess what we need now is a developer who can reproduce the problem (I can't) to find out why COGL_FRAME_EVENT_SYNC/COGL_FRAME_EVENT_COMPLETE events aren't arriving. Note: Arriving 150ms+ late is counted the same as not arriving, so that's possible too but would be extra strange.

Revision history for this message
In , Noxum (noxum) wrote :

Leo, I can confirm that setting __GL_MaxFramesAllowed=1 also mitigates gnome-shell freezes in startup animations. The icon zoom animation would sometimes freeze for some seconds when starting an application from the dash. Just as additional info.

Daniel, I came around doing some testing on a desktop system and there, the high cpu usage is indeed not noticeable. So maybe this is only affecting hybrid graphic setups using PRIME Sync?

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Yes, it sounds like a good theory that COGL_FRAME_EVENT_SYNC/COGL_FRAME_EVENT_COMPLETE are missing/invalid in hybrid/PRIME setups. Someone should test that. Maybe me when I get time :)

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

A quicker fix for people might just be to use:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/363.patch
  https://gitlab.gnome.org/GNOME/mutter/merge_requests/363

That should avoid the offending spinning and high CPU. And it should work better than reverting 383ba566bd7c2a76d0856015a66e47caedef06b6. Because doing the revert has bad consequences for the responsiveness and throughput of mutter/gnome-shell's main loop.

If 363.patch works for you, that's great. But even that's not the optimal long term fix I would like to figure out still.

Revision history for this message
In , Léo (leeo97one) wrote :

As a PRIME (NVIDIA+Intel) user, I would be happy to help you test but I don't know how to trace the events.

Revision history for this message
In , Noxum (noxum) wrote :

Doing some more testing on the desktop system, this now gets weird and weirder. My desktop is also effected by this, just not when running glxgears, I bought a bluetooth dongle to have the circle animation.
gnome-shell cpu usage:
Threaded swap wait enabled - Desktop:
glxgears - normal, no cpu spikes
g-c-c bluetooth animation - 85% cpu
dragging windows - 85%cpu

Threaded swap wait enabled - notebook/PRIME:
glxgears - normal, spiking to 85% cpu
g-c-c bluetooth animation - 85% cpu
dragging windows - 85% cpu

Now setting __GL_MaxFramesAllowed=1 on both systems:
desktop: no change! cpu usage still high (?)
glxgears - normal, no cpu spikes
g-c-c bluetooth animation - 85% cpu
dragging windows - 85%cpu

notebook:
glxgears - normal, 10% cpu, no spikes
g-c-c bluetooth animation - normal, 10% cpu
dragging windows - okayish, 35% cpu

Next I'll check if the mentioned patch changes anything.

Revision history for this message
In , Léo (leeo97one) wrote :

@Maik I think you should also try with or without PRIME Synchronisation, just in case.

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

With mr363 and mr281 (minus the first commit of 281), moving the mouse spikes the CPU up to 3% and then settles down. dragging windows around spikes CPU usage up to 10% but then immediately goes back to 0% once the dragging stops.

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

With glxgears, I get spikes up to 3% without moving the mouse and 4% while moving the mouse. This is a massive improvement (mr281 + mr363). My desktop feels very fast!

Revision history for this message
In , Noxum (noxum) wrote :

PRIME Sync on/off changes:
__GL_MaxFramesAllowed=1 unset:
g-c-c bluetooth animation - 85% -> 30%
dragging windows - 85% -> 40%

__GL_MaxFramesAllowed=1 set:
g-c-c bluetooth animation - 10% -> 20%
dragging windows - 35% -> 45%

Entertaining but not really helping to shed light on this.

Revision history for this message
In , Noxum (noxum) wrote :

Now applied mr363+mr281, results are mixed:
Desktop:
g-c-c bluetooth animation - 85% -> 9% Good!
dragging windows - 85% -> 75% slightly better

Notebook, PRIME Sync
__GL_MaxFramesAllowed=1 unset:
g-c-c bluetooth animation - 85% -> 85% no change
dragging windows - 85% -> 75% slightly better

__GL_MaxFramesAllowed=1 set:
g-c-c bluetooth animation - 10% -> 13%
dragging windows - 35% -> 30%

Overall, desktop is snappy with those patches, PRIME sync notebook still sluggish. With patches + prime sync disabled, mutter will now render unthrottled (animation speeds up) so high cpu usage in general.

Revision history for this message
In , Noxum (noxum) wrote :

On the desktop system with single nvidia gpu, I can only second Hussam: it's a blast when using mutter-3.32 plus both patches. Snappy, fast, great user experience.
Great work Daniel, thank you.
In the prime sync case, mutter obviously somehow derails taking different code paths.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Maik,

Great to hear. But please don't use dragging windows as a CPU test. Even without this bug (e.g. Intel GPUs) dragging windows is very heavy on the CPU. That's a separate bug (not yet reported upstream?). The fixes required for high CPU usage when dragging windows are:

  * Closure of https://gitlab.gnome.org/GNOME/mutter/issues/283 and
  * Something like https://gitlab.gnome.org/GNOME/mutter/merge_requests/270 but better.

And I don't think we need to worry about __GL_MaxFramesAllowed=1. That's what mr281 does.

---

Hussam,

I'm very glad you find those patches work and I would like mutter to get both. But for this bug I think it's work pointing out you only need mr363. Once you have that this bug is fixed, and adding mr281 only reduces output latency (which is nice, but not relevant here).

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

I almost forgot, dragging windows is extra expensive with the Nvidia driver:

  https://launchpad.net/bugs/1799679

So everyone please don't use window dragging as a CPU test for this bug. It has its own causes not related to this bug.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Random thought: This might be the real root cause:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/216
  https://gitlab.gnome.org/GNOME/mutter/merge_requests/216.patch

Can anyone experiencing this bug please try that fix alone?

Revision history for this message
In , Noxum (noxum) wrote :

Just to put it straight:
- mr363 fixes this bug on single GPU systems
- mr363 does not have any effect on hybrid GPU systems using PRIME sync
- __GL_MaxFramesAllowed=1 is a workaround on PRIME systems currently.

So all testing is now done on PRIME sync with __GL_MaxFramesAllowed unset.
I'm sorry,
applying mr216 does not have any effect on my PRIME sync system.
g-c-c bluetooth animation - 85% -> 85% no change.

Revision history for this message
In , Noxum (noxum) wrote :

Of course, this is now mutter-3.32.0+mr216+mr363+mr281.

Revision history for this message
In , Noxum (noxum) wrote :

Daniel, re-reading your comment, I think I might have misunderstood you. So I now restested the following setup:
Single GPU, mutter-3.32+mr216 only (mr363+mr281 left out)
I can confirm that this setup also fixes this specific bug ( High cpu while constant rendering) but only this. (mr363 also fixes high cpu on mouse movement.)

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Thanks, that is very encouraging because mr216 would explain the root cause here, and why the bug randomly affects some Nvidia systems sometimes and not others.

But I would also be curious to hear what Hussam thinks about testing just

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/216
  https://gitlab.gnome.org/GNOME/mutter/merge_requests/216.patch

alone.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Maybe I need to provide a backport for older mutter releases?

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

My mistake. If you ever do experience the problem fixed by:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/216

then the symptom for that is a frozen screen. Not this bug.

So the real fix for this bug is still:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/363.patch
  https://gitlab.gnome.org/GNOME/mutter/merge_requests/363

104 comments hidden view all 126 comments
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I think there's a reasonable chance this bug is the same as:
https://bugzilla.gnome.org/show_bug.cgi?id=781835

affects: gnome-shell → mutter
Changed in mutter (Ubuntu):
assignee: nobody → Daniel van Vugt (vanvugt)
Changed in gnome-shell (Ubuntu):
assignee: nobody → Daniel van Vugt (vanvugt)
Changed in mutter (Ubuntu):
status: New → In Progress
Changed in gnome-shell (Ubuntu):
status: New → In Progress
Changed in mutter:
importance: Unknown → Medium
status: Unknown → Confirmed
105 comments hidden view all 126 comments
Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

And an extra, final nail in the coffin for this bug:

https://gitlab.gnome.org/GNOME/mutter/merge_requests/602

Although you only need !363 to solve this bug. !602 clarifies things further by showing after !363 it is now safe to revert commit 383ba566bd7c2a76d0856015a66e47caedef06b6 and more.

Revision history for this message
In , Noxum (noxum) wrote :

Thanks again, Daniel.
Unfortunately, I'm unable to apply MR602, tried mutter 3.32.2+MR363, mutter 3.33.2+(rebased)MR363 and mutter-HEAD+(rebased)MR363
Any prerequisite I'm missing?

Revision history for this message
In , Noxum (noxum) wrote :

Nevermind, didn't get that MR602 includes MR363.

Revision history for this message
In , Noxum (noxum) wrote :

I now tested on my Optimus system using PRIME sync with __GL_MaxFramesAllowed unset using mutter/gnome-shell-3.33.2+MR602.
The good news is, while previously MR363 had no effect on CPU usage while constantly rendering on this system, MR602 now fixes it. CPU usage on bluetooth-pane circle is ~12%.
The bad news, unrelated to this bug is that without setting __GL_MaxFramesAllowed=1 on PRIME sync, mutter is stuttering and intermittently freezing. Let alone still being laggy with or without that setting.

I did not test this so far on the single gpu desktop system since Gnome 3.33 is quite a mess right now with lots of api breakage and increased dependencies. Is it possible to rebase MR602 to Gnome-3.32.x ?

Revision history for this message
In , Noxum (noxum) wrote :

Tested mutter 3.32.2 +MR281+520+576(-1Hunk)+602 on the single gpu system, runs fine. No noticeable issues until now. Due to the removal of threaded wait moving windows is also back to normal/expected cpu usage now.

For the remaining issues with PRIME sync where these patches don't have any effect, I updated the existing issue at:
https://gitlab.gnome.org/GNOME/gnome-shell/issues/1202

Revision history for this message
In , Marco Trevisan (Treviño) (3v1n0) wrote :
Changed in mutter:
status: Confirmed → Fix Released
tags: added: fixed-in-3.33.3 fixed-upstream
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thank you for reporting this bug to Ubuntu.
Ubuntu 18.10 (cosmic) reached end-of-life on July 18, 2019.

See this document for currently supported Ubuntu releases:
https://wiki.ubuntu.com/Releases

We appreciate that this bug may be old and you might not be interested in discussing it any more. But if you are then please upgrade to the latest Ubuntu version and re-test. If you then find the bug is still present in the newer Ubuntu version, please add a comment here telling us which new version it is in and change the bug status to Confirmed.

Changed in gnome-shell (Ubuntu):
status: In Progress → Won't Fix
Changed in mutter (Ubuntu):
status: In Progress → Won't Fix
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Although the linked upstream fix is coming in GNOME 3.34 for Ubuntu 19.10, I have still marked this bug as 'Won't Fix'. That's because I am not completely confident this is the right downstream bug for what was fixed upstream.

no longer affects: mutter
tags: removed: fixed-in-3.33.3 fixed-upstream
Displaying first 40 and last 40 comments. View all 126 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.