[nvidia] gnome-shell eats 100% cpu when seconds display is on and screen is locked

Bug #1814125 reported by laulau
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
gnome-shell (Ubuntu)
Won't Fix
Undecided
Daniel van Vugt
mutter (Ubuntu)
Won't Fix
Undecided
Daniel van Vugt

Bug Description

on my laptop (xiaomi notebook pro), ubuntu 18.10 amd64, standart ubuntu desktop (gnome-shell), no application running
just activate second display in gnome tweak, let it lock itself
the fan is going fast, simply moving the mouse and the fan slow down.
wrote a simple script to append a top every minut in a log file, show that gnome-shell consumes 100% when locked.
workaround : disabling second displaying and no more CPU usage / fan noise when locked.

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: xorg 1:7.7+19ubuntu8
ProcVersionSignature: Ubuntu 4.18.0-13.14-generic 4.18.17
Uname: Linux 4.18.0-13-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
.proc.driver.nvidia.gpus.0000.01.00.0: Error: [Errno 21] Is a directory: '/proc/driver/nvidia/gpus/0000:01:00.0'
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 415.27 Thu Dec 20 17:25:03 CST 2018
 GCC version: gcc version 8.2.0 (Ubuntu 8.2.0-7ubuntu1)
ApportVersion: 2.20.10-0ubuntu13.1
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Thu Jan 31 16:19:16 2019
DistUpgraded: 2018-12-30 21:42:17,971 DEBUG Running PostInstallScript: './xorg_fix_proprietary.py'
DistroCodename: cosmic
DistroVariant: ubuntu
DkmsStatus:
 nvidia, 415.27, 4.18.0-13-generic, x86_64: installed
 virtualbox, 5.2.18, 4.18.0-13-generic, x86_64: installed
ExtraDebuggingInterest: Yes, if not too technical
GraphicsCard:
 Intel Corporation UHD Graphics 620 [8086:5917] (rev 07) (prog-if 00 [VGA controller])
   Subsystem: Xiaomi UHD Graphics 620 [1d72:1701]
   Subsystem: Xiaomi Mi Notebook Pro [GeForce MX150] [1d72:1701]
InstallationDate: Installed on 2018-06-08 (236 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
MachineType: Timi TM1701
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.18.0-13-generic root=UUID=174e8747-a737-411b-bfaf-4e6795426fea ro quiet splash nouveau.runpm=0 vt.handoff=1
SourcePackage: xorg
Symptom: display
UpgradeStatus: Upgraded to cosmic on 2018-12-30 (31 days ago)
dmi.bios.date: 10/13/2017
dmi.bios.vendor: INSYDE Corp.
dmi.bios.version: XMAKB5R0P0502
dmi.board.asset.tag: Any
dmi.board.name: TM1701
dmi.board.vendor: Timi
dmi.board.version: MP
dmi.chassis.asset.tag: Chassis Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: Timi
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnINSYDECorp.:bvrXMAKB5R0P0502:bd10/13/2017:svnTimi:pnTM1701:pvr:rvnTimi:rnTM1701:rvrMP:cvnTimi:ct10:cvrChassisVersion:
dmi.product.family: Timibook
dmi.product.name: TM1701
dmi.sys.vendor: Timi
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.95-1
version.libgl1-mesa-dri: libgl1-mesa-dri 18.2.2-0ubuntu1
version.libgl1-mesa-glx: libgl1-mesa-glx 18.2.2-0ubuntu1
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.1-3ubuntu2.1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:18.1.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20171229-1ubuntu1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.15-3

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

As per bug 779039#c14, this checkin https://git.gnome.org/browse/mutter/commit/?id=383ba566bd7c2a76d0856015a66e47caedef06b6 makes gnome-shell use ~80% cpu of one core out of 4 (I am running a skylake corei5) constantly when running something like glxgears.

Reverting the patch brings down cpu usage to around 3% when running glxgears.
In addition, this makes my cpu temperature increase by ~5 degrees.

GPU temperature and usage don't seem to be affect. Only CPU usage is.
I am using a NVIDIA kepler card.

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

I forgot to mention that I had asked a nvidia developer about this and he said it was due to the overhead of many xlib calls.

Revision history for this message
In , apemax (apemax) wrote :

I can also confirm this issue, gnome-shell hits around 80% CPU usage on one core when running glxgears, It also causes stuttering in all games I have tried as well. Reverting the patch mentioned by Hussam brings the CPU usage back down to normal levels.

Revision history for this message
In , Alec (susie-cumming) wrote :

I have also encountered this issue on the Nvidia proprietary driver. Reverting the patch fixes the problem.

Revision history for this message
In , DeadMetaler (dead-666) wrote :

I make a screenshot of cpu usage before and after applying
Call-coglxlibrenderersetthreadedswapwaitenabled.patch
http://storage4.static.itmages.ru/i/17/0411/h_1491948046_9928347_1598f98b62.png

I also noticed that this patch have improved responsiveness when moving windows. But now, the animations of the Gnome Shell to lag even more. Now without the load on the CPU, the decrease in FPS became visible. This is all tested on a proprietary Nvidia + 650GTX.

Revision history for this message
In , Alec (susie-cumming) wrote :

(In reply to Maxim from comment #4)
> I make a screenshot of cpu usage before and after applying
> Call-coglxlibrenderersetthreadedswapwaitenabled.patch
> http://storage4.static.itmages.ru/i/17/0411/h_1491948046_9928347_1598f98b62.
> png
>
>
> I also noticed that this patch have improved responsiveness when moving
> windows. But now, the animations of the Gnome Shell to lag even more. Now
> without the load on the CPU, the decrease in FPS became visible. This is all
> tested on a proprietary Nvidia + 650GTX.

I have also noticed this behaviour

Revision history for this message
In , Promolecule (promolecule) wrote :

I am seriously wondering why no devs have noticed this yet. This bug seems to affect literally everyone who uses an NVIDIA gpu.

Revision history for this message
In , Fau-l (fau-l) wrote :

I've been experiencing sluggish animations in gnome-shell after updating to v3.24.1 on ArchLinux on my laptop with dedicated Nvidia GPU (NVS 5100M); no integrated GPU is present.

The previous Gnome version did not show this behaviour.

I reverted the commit:
https://git.gnome.org/browse/mutter/commit?id=383ba566bd7c2a76d0856015a66e47caedef06b6
as mentioned by somebody on the Arch Forums and this fixed the issue for me.

Revision history for this message
In , Fau-l (fau-l) wrote :

(In reply to Frank from comment #7)
> I've been experiencing sluggish animations in gnome-shell after updating to
> v3.24.1 on ArchLinux on my laptop with dedicated Nvidia GPU (NVS 5100M); no
> integrated GPU is present.
>
> The previous Gnome version did not show this behaviour.
>
> I reverted the commit:
> https://git.gnome.org/browse/mutter/
> commit?id=383ba566bd7c2a76d0856015a66e47caedef06b6
> as mentioned by somebody on the Arch Forums and this fixed the issue for me.

Sorry I forgot to mention that I use the closed source Nvidia driver 340.xx and run Gnome on xorg.

Revision history for this message
In , Léo (leeo97one) wrote :

Same for me, this is a really annoying issue.

Revision history for this message
In , Promolecule (promolecule) wrote :

I am on ArchLinux and it seems that with the new NVIDIA release (381.22) and latest mutter release the high cpu usage is finally gone. Also the animations are back to normal as in having very smooth animations and window movements, whatever. I am not sure if it's the NVIDIA release or mutter which has included a fix but I would assume that NVIDIA has somehow fixed the locking issue.

Revision history for this message
In , Fau-l (fau-l) wrote :

(In reply to Peet from comment #10)
> I am on ArchLinux and it seems that with the new NVIDIA release (381.22) and
> latest mutter release the high cpu usage is finally gone. Also the
> animations are back to normal as in having very smooth animations and window
> movements, whatever. I am not sure if it's the NVIDIA release or mutter
> which has included a fix but I would assume that NVIDIA has somehow fixed
> the locking issue.

The update to mutter-3.24.2 (ArchLinux) has not resolved the issue for me.

Revision history for this message
In , Promolecule (promolecule) wrote :

(In reply to Frank from comment #11)
> (In reply to Peet from comment #10)
> > I am on ArchLinux and it seems that with the new NVIDIA release (381.22) and
> > latest mutter release the high cpu usage is finally gone. Also the
> > animations are back to normal as in having very smooth animations and window
> > movements, whatever. I am not sure if it's the NVIDIA release or mutter
> > which has included a fix but I would assume that NVIDIA has somehow fixed
> > the locking issue.
>
> The update to mutter-3.24.2 (ArchLinux) has not resolved the issue for me.

Since this is NVIDIA related have you tried upgrading to the lastest NVIDIA release? I am currently using 381.22.

Revision history for this message
In , Fau-l (fau-l) wrote :

I use the nvidia-340.102 legacy drivers which is up-to-date.

Revision history for this message
In , Alec (susie-cumming) wrote :

(In reply to Peet from comment #10)
> I am on ArchLinux and it seems that with the new NVIDIA release (381.22) and
> latest mutter release the high cpu usage is finally gone. Also the
> animations are back to normal as in having very smooth animations and window
> movements, whatever. I am not sure if it's the NVIDIA release or mutter
> which has included a fix but I would assume that NVIDIA has somehow fixed
> the locking issue.

The Mutter 3.24.2 and Nvidia 381.22 updates have not fixed the issue for me

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

The nvidia changelog says:
"Disabled OpenGL threaded optimizations by default, initially enabled in 378.09, due to various reports of instability."
Could this be related?

Revision history for this message
In , Black-inc (black-inc) wrote :

(In reply to alecdc272 from comment #14)
> (In reply to Peet from comment #10)
> > I am on ArchLinux and it seems that with the new NVIDIA release (381.22) and
> > latest mutter release the high cpu usage is finally gone. Also the
> > animations are back to normal as in having very smooth animations and window
> > movements, whatever. I am not sure if it's the NVIDIA release or mutter
> > which has included a fix but I would assume that NVIDIA has somehow fixed
> > the locking issue.
>
> The Mutter 3.24.2 and Nvidia 381.22 updates have not fixed the issue for me

Can confirm, same setup, did not fix issue for me either

Revision history for this message
In , Promolecule (promolecule) wrote :

This is was my recent upgrade:

[2017-05-12 00:46] [ALPM] upgraded dialog (1:1.3_20170131-1 -> 1:1.3_20170509-1)
[2017-05-12 00:46] [ALPM] upgraded nvidia-utils (378.13-6 -> 381.22-1)
[2017-05-12 00:46] [ALPM] upgraded libproxy (0.4.13-2 -> 0.4.15-1)
[2017-05-12 00:46] [ALPM] upgraded mutter (3.24.1+1+geb394f19d-1 -> 3.24.2-1)
[2017-05-12 00:46] [ALPM] upgraded gnome-shell (3.24.1+2+g45c2627d4-1 -> 3.24.2-1)
[2017-05-12 00:46] [ALPM] upgraded gnome-shell-extensions (3.24.1+1+gfbf3cf3-1 -> 3.24.2-1)
[2017-05-12 00:46] [ALPM] upgraded konsole (17.04.0-1 -> 17.04.1-1)
[2017-05-12 00:46] [ALPM] upgraded lib32-nvidia-utils (378.13-3 -> 381.22-1)
[2017-05-12 00:46] [ALPM] upgraded v4l-utils (1.12.3-1 -> 1.12.5-1)
[2017-05-12 00:46] [ALPM] upgraded lib32-v4l-utils (1.12.3-1 -> 1.12.5-1)
[2017-05-12 00:46] [ALPM] upgraded libcdio-paranoia (10.2+0.94+1-1 -> 10.2+0.94+1-2)
[2017-05-12 00:46] [ALPM] upgraded libkexiv2 (17.04.0-1 -> 17.04.1-1)
[2017-05-12 00:46] [ALPM] upgraded libkomparediff2 (17.04.0-1 -> 17.04.1-1)
[2017-05-12 00:46] [ALPM] upgraded libreoffice-fresh (5.3.2-3 -> 5.3.3-1)
[2017-05-12 00:46] [ALPM] upgraded nvidia-dkms (378.13-6 -> 381.22-1)
[2017-05-12 00:46] [ALPM] upgraded okteta (17.04.0-1 -> 17.04.1-1)
[2017-05-12 00:46] [ALPM] upgraded okular (17.04.0-1 -> 17.04.1-1)
[2017-05-12 00:46] [ALPM] upgraded openvpn (2.4.1-2 -> 2.4.2-1)
[2017-05-12 00:46] [ALPM] upgraded qt4 (4.8.7-19 -> 4.8.7-20)
[2017-05-12 00:46] [ALPM] upgraded sudo (1.8.19.p2-1 -> 1.8.20-1)
[2017-05-12 00:46] [ALPM] upgraded wildmidi (0.4.0-1 -> 0.4.1-1)

As I said before, only several packages seem relevant to me, for example:

[2017-05-12 00:46] [ALPM] upgraded nvidia-utils (378.13-6 -> 381.22-1)
[2017-05-12 00:46] [ALPM] upgraded mutter (3.24.1+1+geb394f19d-1 -> 3.24.2-1)
[2017-05-12 00:46] [ALPM] upgraded gnome-shell (3.24.1+2+g45c2627d4-1 -> 3.24.2-1)
[2017-05-12 00:46] [ALPM] upgraded lib32-nvidia-utils (378.13-3 -> 381.22-1)
[2017-05-12 00:46] [ALPM] upgraded nvidia-dkms (378.13-6 -> 381.22-1)

Revision history for this message
In , Léo (leeo97one) wrote :

Not resolved for me too.

[2017-05-12 17:32] [ALPM] upgraded mutter (3.24.1+1+geb394f19d-1 -> 3.24.2-1)
[2017-05-12 17:32] [ALPM] upgraded gnome-shell (3.24.1+2+g45c2627d4-1 -> 3.24.2-1)
[2017-05-12 17:32] [ALPM] upgraded nvidia (378.13-6 -> 381.22-1)

Revision history for this message
In , Promolecule (promolecule) wrote :

I am not sure if this can make a difference but, @Léo could you try using nvidia-dkms instead of the nvidia package?

Revision history for this message
In , Eduardo Medina (no-more-hopes) wrote :

Hi, I'm a Manjaro user. I was redirected to here from Manjaro Forum and I think I have the issue explained here, and if it's true, I have something to add.

When I was using GNOME 3.22 the environment worked perfect, but since the update to GNOME 3.24 I have problems with the stability with the environment, and I think the Mutter has the blame.

I have two computers. One of them is my main computer, a desktop equipped with an ASUS P5K Motherboard, a NVIDIA GTX 1050 as GPU and an Intel Core 2 Quad Q8300 as CPU. I use the proprietary blob driver 375.66-1 version from Manjaro repos, Linux 4.9 and Intel Microcode.

The second computer I have is an old Toshiba laptop Satellite Pro P200, with an Intel Core 2 Duo T7300 and an ATI Mobility Radeon HD 2600 as GPU identified as AMD® Rv630 by the free drivers stack (Linux 4.9 LTS and Mesa 17.0.5). I use Intel Microcode here too.

Well, the problem I have is sometimes the environment crashes randomly. It recovers perfect after crashing, but when the bug occurs is very annoying.

I couldn't catch any log or message to know what is the exact error, the only thing I have clear is the bug runs always when I have an amount of windows spreaded through some virtual desktops.

Yes, the bug occurs on Mesa too if it's the same problem I'm responding.

Revision history for this message
In , Eduardo Medina (no-more-hopes) wrote :

I forgot to say that Wayland looks much more stable than Xorg, but the bug happens on both servers.

Revision history for this message
In , Daniel Boles (dboles) wrote :

(In reply to Eduardo Medina from comment #20)
> Well, the problem I have is sometimes the environment crashes randomly. It
> recovers perfect after crashing, but when the bug occurs is very annoying.
>
> I couldn't catch any log or message to know what is the exact error, the
> only thing I have clear is the bug runs always when I have an amount of
> windows spreaded through some virtual desktops.
>
> Yes, the bug occurs on Mesa too if it's the same problem I'm responding.

Is your bug _only_ crashing? If so, it's quite possibly not the same or related. As in the title, this report is about high CPU usage, and none of the other reporters have mentioned any crashing following that. Do you log high CPU usage before these crashes occur?

Revision history for this message
In , Eduardo Medina (no-more-hopes) wrote :

Created attachment 352150
CPU log from Eduardo Medina's old Toshiba laptop

I ran this command:
$ while true; do ps -eo pcpu,pid,user,args | sort -k 1 -r | head -50 >> logfileToshiba.txt; echo "\n" >> logfileToshiba.txt; sleep 1; done

It seems the problem is systemd-coredump, it appears with a high CPU usage more or less in the same times than GNOME Shell crashed. You can see it in the final part of the big file inside the ZIP.

Well it seems I have to report this to Manjaro or systemd.

Sorry for the inconvenience.

Revision history for this message
In , Florian-muellner (florian-muellner) wrote :

That's definitively a different issue. This bug is about high CPU usage cause by mutter/gnome-shell on Nvidia GPUs.

A crash is of course a bug too, but it's not a matter of CPU usage (you could say that the CPU usage of a crashed process is too low, namely 0% ...)

Revision history for this message
In , Eduardo Medina (no-more-hopes) wrote :

I see, I'm collecting all data I can to fill another bug.

From coredumpctl and journalctl I see something related with libc.

Revision history for this message
In , Daniel Boles (dboles) wrote :

(In reply to Eduardo Medina from comment #23)
> It seems the problem is systemd-coredump, it appears with a high CPU usage
> more or less in the same times than GNOME Shell crashed.

This doesn't mean systemd-coredump is the culprit - quite the opposite. It using significant CPU coincident with a crash is to be expected, because it needs to dump the core image of the process that crashed if needed for debugging purposes.

Revision history for this message
In , Eduardo Medina (no-more-hopes) wrote :

Well, I think I'm following now the correct bug: https://bugzilla.gnome.org/show_bug.cgi?id=781799

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

Maybe a9f139cab66b532e83fca31d35f01b1b5650ca24 to 383ba566bd7c2a76d0856015a66e47caedef06b6 can be reverted? They are supposed to help in NVIDIA's situation but ended up causing more issues. And it was verified that reverting helps.

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

Unless anyone has an objection, I am going to email NVIDIA with a link to this bug report (as there is obviously a bug in the driver causing this).

More notices: With 381.21, the high CPU issue is less as they turned off gl threading optimizations by default. But if you switch tty or hibernate/resume, gnome-shell uses high CPU again when constantly rendering till I alt-f2 > r and then it goes quiet again.

Revision history for this message
In , Promolecule (promolecule) wrote :

@Hussam Al-Tayeb Did you get any response from NVIDIA?

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

(In reply to Peet from comment #30)
> @Hussam Al-Tayeb Did you get any response from NVIDIA?

I emailed them on the 28th on <email address hidden> with a nvidia-bug-report.log.gz file and a full description of the bug and easy reproduction steps. They never replied.

They replied last year when I reported a 13 year old Linux computer game was not running well and they even quickly fixed the bug. I thought this was worth the shot but it seems they only care for computer games.

Revision history for this message
In , Promolecule (promolecule) wrote :

Well, yeah, my next GPU is definitely not going to be an NVIDIA card (GTX 1060 6GB). The linux drivers are buggy as hell. But yeah, you can't really compare the performance to Intel HD Graphics or the recent AMD gpus, so I guess I am stuck here.

Revision history for this message
In , Xbrunini (xbrunini) wrote :

Confirmed here too on two different computers with Gnome 3.24.3 on Fedora 26/Linux 4.11 + NVIDIA 381.22 (GTX 560 and GTX 760), i also tried with NVIDIA 375.66 in the same computers and see the same lag after updated to Gnome 3.24.

However, with integrated intel graphics and Wayland on a macbook everything works better than ever.

So if NVIDIA 381.22 and 375.66 works normal on Gnome 3.22, i think this bug is more on gnome 3.24 than in nvidia proprietary driver.

Revision history for this message
In , Léo (leeo97one) wrote :

So, nobody in the GNOME community is able to do something for a such obvious performance issue?

Revision history for this message
In , Xbrunini (xbrunini) wrote :

Maybe everybody is using nouveau instead NVIDIA. In my case i downgrade every pc with Nvidia card to GNOME 3.22. Someday perhaps Nvidia will do a better job, instead force the community to implement eglstreams instead GBM, for Wayland scenario. Nevertheless this bug is related to Xorg use only.

Revision history for this message
In , F-isaac-0 (f-isaac-0) wrote :

*** Bug 785609 has been marked as a duplicate of this bug. ***

Revision history for this message
In , nicman23 (n-fit-8) wrote :

this also happens on intel hardware, tested witha a sandy and a baytrail series laptops

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

In theory, commit 383ba566bd7c2 is correct. It will definitely reduce CPU usage because it's stopping things from running that are meant to be running. The real question is what does gnome-shell need to be running in idlers that's using so much CPU? That's the heart of the problem that needs more work.

Scrolling up, it appears this is mostly with the NVIDIA driver. And I can confirm from previous experience that yes the NVIDIA driver is very CPU intensive compared to Intel. It doesn't just use your GPU but also uses as much CPU as it can get.

But that's not an excuse for this bug... I know gnome-shell uses unreasonably high CPU [1] on Intel graphics too, and hope to find time to investigate it in detail during October. But even better would be if someone else could look sooner.

[1] https://bugs.launchpad.net/ubuntu/+source/gnome-shell/+bug/1696305

Revision history for this message
In , Jeckhackreg (jeckhackreg) wrote :

Confirming bug, but I have a bit strange behavior.
When gnome-shell is idle, I see fps drop in applications, i.e. Chromium, or games.
Symptoms - launch Chromium, scrolling page is smooth. Wait 15 secs, scrolling fps drop to 30 fps and don't go up until I do something in shell, i.e. start new app, or switch few windows. The same happens in games like Terraria, Prison Architect, etc.
This almost looks like CPU or GPU drops frequency on idle and doesn't restore it.
In fact, for over a month I thought this is exactly the case with NVIDIA blob.
But then I reverted that commit (383ba566bd7c2a76d0856015a66e47caedef06b6) and problem has gone.
The only downside of this reverting is that window moving around the screen become not so smooth, like 30 fps sometimes, but at least my programs aren't affected anymore.

Archlinux, i3 4130, gtx 960 with prop. nvidia driver.
P.S. I didn't have CPU usage problem, only what i wrote above.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Frequency scaling is definitely something that exists, and can hurt frame rates. But I still suspect that reverting commit 383ba566 is the wrong answer.

It's possible the clutter frame clock is the problem. If it was to try and measure frame intervals based on past frames rather than calculating it from the monitor's expected refresh rate then you could get some unfortunate feedback resulting in artificially low frame rates. Just a theory (haven't looked at that code yet), but worth checking.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Someone should also check that glXWaitVideoSync() is being used correctly (introduced via commit 383ba566) ...

https://www.khronos.org/registry/OpenGL/extensions/SGI/GLX_SGI_video_sync.txt

Revision history for this message
In , Jeckhackreg (jeckhackreg) wrote :

Very interesting observation, please test and confirm:

1) - Launch nvidia-settings, watch PowerMizer frequencies while doing something in parallel - i.e. scroll Chromium page. when you start doing that, nvidia righfully raises it's frequency to max, but Chromium performance is still low, as is overall shell performance.

2) - Set PowerMixer to Prefer Maximum Performance, then try to scroll in Chromium and to move some windows in shell, try different animations. Performance is absolutely great, every animation in gnome-shell is smooth also.

---
May it be that mutter somehow conflicts with nvidia's PowerMizer, I don't know, SETTING, not REAL frequency. Maybe you were right about clutter,s frame clock?

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

I experienced similar problems way back with Intel graphics actually. Rendering was only smooth while the CPU was being stressed. The solution in that case was to change the way Mir scheduled frames. This is what I was hinting at in comment #40.

My similar experience is documented here:

https://bugs.launchpad.net/mir/+bug/1388490

And it sounds like the problem with Nvidia seems to be related to the use of glXWaitVideoSync (introduced with commit 383ba566 ?):

https://www.khronos.org/registry/OpenGL/extensions/SGI/GLX_SGI_video_sync.txt

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

However, we are now off-topic. This bug is about CPU usage being too high.

Revision history for this message
In , Jeckhackreg (jeckhackreg) wrote :

Ok, I found it. Tested with and without 383ba566 commit:

With this commit - nvidia drops frequencies and Chromium drops fps to 30, shell starts to lag. When nvidia raises frequencies on load, it's again 60fps. You can trigger load by switching\opening new windows. You can monitor nvidia frequency in parallel.

Without commit - nvidia drops frequencies, but Chromium is still smooth. gnome-shell fps drops to 30, you can clearly see it when moving Chromium window around the screen.

This is tested with and without HW acceleration in chromium.

So, you are right.

>> However, we are now off-topic. This bug is about CPU usage being too high.

I see it now, thanks.

Revision history for this message
In , Jeckhackreg (jeckhackreg) wrote :
Revision history for this message
In , Yu Feng (rainwoodman) wrote :

Hi, Did we get to the bottom of this? I hope the filing of another bug (NV only?) doesn't draw the attention away from this issue.

I am using an intel video card. I see high CPU usage of gnome-shell when hovering mouse over the desktop background, or if I ran glxgears. Both are close to 25%.

gnome-shell uses a few percent to zero if I am hoving the mouse over gnome-terminal or firefox.

Revision history for this message
In , Florian-muellner (florian-muellner) wrote :

(In reply to rainwoodman from comment #47)
> I am using an intel video card.

This issue is about the fallback code for drivers that don't support the INTEL_swap_event extension. The intel driver is clearly not one of them, so if you are seeing unusual high CPU usage, then that's a different issue unrelated to this bug.

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

Adding
__GL_YIELD="USLEEP"
__GL_THREADED_OPTIMIZATIONS=0
to /etc/environment and then restarting stopped the CPU spikes. Using usleep supposedly means a client bug according to nvidia forums but right now I have low cpu usage on gnome-shell no matter what I have open. Perhaps it can help someone else.
KDE's wiki also suggests using usleep.

Revision history for this message
In , Alvin (alvin) wrote :

commenting to follow the issue -- seeing `gnome-shell` taking `106+%` CPU pretty consistently

AMD graphics card

3.24 didn't have any issues like this

Revision history for this message
In , Adam Kosseck (tyderian) wrote :

Upgrading from Ubuntu 17.04 to 17.10 (running under Virtualbox) I also get this issue.

gnome-shell was using max CPU and the Wayland session was unusable. Switching to command-line console was functional, but any UI tasks were impossible.

I tried adding to /etc/environment:
__GL_YIELD="USLEEP"
__GL_THREADED_OPTIMIZATIONS=0

But this did not help.

Turning off the second monitor in the VM settings and enabling "3D acceleration" in the VM settings fixed it for me.

Note: Another 17.10 VM installed from scratch did not have this issue, despite having 3D acceleration disabled.

Revision history for this message
In , Florian-muellner (florian-muellner) wrote :

*** Bug 792643 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Daniel Boles (dboles) wrote :

*** Bug 789186 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Léo (leeo97one) wrote :

Is there seriously no way to fix it after a year? This is becoming a bit annoying. Or it is NVIDIA fault? The only real workaround is actually to revert the commit...

Revision history for this message
In , Jeckhackreg (jeckhackreg) wrote :

https://bugzilla.gnome.org/show_bug.cgi?id=789186

Please check and comment there if you have the same problem.
Thanks in advance.

Revision history for this message
In , Kiren Pillay (kirenpillay1) wrote :

I worked out that if I first log into my windows partition, then do a warm boot into my Linux partition, the problem goes away.

I'm guessing there's some kind of initialization of the graphics card that Windows does that the Linux OSS driver doesn't do.

PS: I'm using Fedora, but the same should apply to Ubuntu.

Revision history for this message
In , Noxum (noxum) wrote :

Since this is still an issue, I looked into this and found out that this is reinventing the wheel. The same approach has already been taken by Mozilla:
https://bugzilla.mozilla.org/show_bug.cgi?id=1197954
Most interesting to learn from that is:
>> what happens if we use the main thread X display?
>>If we use a separate X11 display, which physical monitor's vsync does this listen to?
>
>If we use the main X display, depending on the GLX implementation we can either:
>
>a) crash due to threading issues (mesa)
>b) block a lot (NVIDIA)
>
>Neither are preferable. An X11 display doesn't really correspond to a monitor, so nothing effectively changes with regards to what gets synced.
>
>In this case we synchronize with the default CRTC (at least on NVIDIA).

In the current implementation I can't see this using a separate display. Has this been taken into account?

Revision history for this message
In , RussianNeuroMancer (russianneuromancer) wrote :

As I understand this bug should be migrated to Gnome's GitLab otherwise nobody will fix it, is that correct?

Revision history for this message
In , Jonas Ådahl (jadahl) wrote :

All open bugs (e.g. this one) will eventually be migrated to gitlab.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

I am now testing nvidia-390 and gnome-shell/mutter 3.30 and reverting commit 383ba566bd7c2a76d0856015a66e47caedef06b6 does not seem to make any difference at all. There is high CPU usage still, so that seems to be caused by something else.

Is anyone able to find that reverting the infamous commit actually makes a difference with the current mutter code and more recent Nvidia driver? If not then this bug should be closed.

I feel this bug has become a general discussion about Nvidia performance when the Description at the top intended for it to be about that specific commit.

Revision history for this message
In , Noxum (noxum) wrote :

I just checked and while the situation seems to have somewhat improved, this is still far away from being resolved.
- GK208M using PRIME output
- Nvidia driver 410
- gnome-shell-3.30.1
- mutter-3.30.1

Previously, just wiggling the mouse raised gnome-shell cpu usage to 30-40%, this effect has vanished, no noticeable difference, about 4% cpu usage with or without threaded-swap-wait.
Moving a window (gnome-terminal) slowly around still puts gnome-shell to >80% cpu usage opposed to being 30-40% without the threaded swap wait.
While people running a desktop might find this acceptable, this represents a critical regression for me since affecting battery life being on a notebook.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

This bug should not be about general Nvidia performance when the original reporter was citing a specific commit.

I am analysing the performance of Nvidia right now, and certainly it is not good. But if we're no longer talking about commit 383ba566bd7c2a76d0856015a66e47caedef06b6 then this bug should probably be closed and replaced with new more current bug reports.

Revision history for this message
In , Noxum (noxum) wrote :

I don't understand what you're trying to tell me. This bug report is about high cpu usage when threaded swap wait is enabled, which is done by this specific commit.
And I just confirmed that this is still the case.

Revision history for this message
In , Noxum (noxum) wrote :

Or maybe you misunderstood me:
Yes, reverting that commit still makes a difference in cpu usage.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Yes, sorry. I misunderstood the statement about "no noticeable difference, about 4% cpu usage with or without threaded-swap-wait.", and then failed to notice your second mention of it with respect to window movement.

I did not include window movement in my testing because the original reporter only mentioned glxgears. I will now retest with window movement, which is also mentioned a major problem for Nvidia users in https://gitlab.gnome.org/GNOME/mutter/merge_requests/168.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Nope. Reverting commit 383ba566bd7c2a76d0856015a66e47caedef06b6 makes no difference to window-moving performance for me (nvidia-390 with mutter/gnome-shell 3.30.2 master branches). So I can't endorse removing it myself.

Someone who does benefit from the change should propose a merge request here:
  https://gitlab.gnome.org/GNOME/mutter
and include performance stats of before and after.

Revision history for this message
In , Mateusz Mikuła (mati865) wrote :

With mutter versions 3.26-3.28 opening windows preview (with Windows key) for the first time after logging in or after not using it for long time was really choppy on X11.
Reverting this commit or running Wayland made it smooth.

Somebody also said (I think it was on Reddit but cannot find it) after rebooting from Windows makes this issue disappear but it appears after full shutdown.

Revision history for this message
In , Noxum (noxum) wrote :

Daniel, what kind of system (desktop/optimus notebook) and gpu (Fermi/Kepler/Maxwell/Pascal/Volta/Turing) were you using for test? I would based on that do more tests to see if this is system specific.

While you're here, please allow me to ask a question about the specifics of this implementation. The current flow of gl commands on buffer swap currently is:

glFinish
start new thread --------------->glxWaitVideoSync
glXSwapBuffers
return
dostuffordont
glFinish

The impression I got from research about glXSwapInterval and glxSwapBuffers is that the nvidia implementation is behaving differently than other ones.
I learned (correct me if I'm wrong) that with nvidia glx, glXSwapBuffers is just put on the pipeline, only executed when glFinish is called. Since glFinish is blocking, I would expect that it blocks until the VSync is happening, which would mean that
glXSwapInterval (1) + glXSwapBuffers + glFinish = glxWaitVideoSync
 so that the swapping logic for nvidia could be changed to:

start new thread --------------->glXSwapBuffers + glFinish
return
dostuffordont

Would that be worth experimenting with?
The reason I'm asking is, if you're following the discussions about Mozilla using the same approach mentioned in comment #57, a Nvidia dev mentioned "glxWaitVideoSync? They shouldn't use that old extension."

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Mateusz,

The windows preview performance is not an Nvidia problem, but one that affects all GPUs. The main fixes I know improved that are:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/105
  https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/73

But you need Gnome 3.30 to benefit from those.

Certainly some low-level rendering changes such as being described here will help, but the main fixes for the preview are nothing to do with any graphics driver. See the links above.

Similarly, the icon spring animation's poor performance has very little to do with the graphics driver. The fixes for that are general CPU fixes:

  https://gitlab.gnome.org/GNOME/gnome-shell/issues/349

---

Maik,

I am presently testing with a Quadro K620 (Maxwell?).

No, there should not be any calls to glFinish being used by default because that function stalls the pipeline and kills performance. I have checked a couple of times and I don't think mutter is using glFinish by default. You will find it in the source code but it's (hopefully) never called.

OpenGL in general, including the final glXSwapBuffers or eglSwapBuffers are already threaded by design. GL commands are only requests to the GPU to do the work /eventually/. Even after glXSwapBuffers returns that does not mean the GPU has finished rendering the last frame. So threading would be redundant. Also threading creates new synchronization problems that would take a lot of effort to solve, AND the nouveau driver is not thread safe and likely to crash (https://bugs.freedesktop.org/show_bug.cgi?id=92438).

So I think you're guessing there. The main lesson I have learned in most of a year working on mutter performance is that your initial guesses are usually wrong and the main causes of poor performance are much stranger than you imagined.

Revision history for this message
In , Noxum (noxum) wrote :

Daniel, I think you completely missed my point. This was about the current implementation of the threaded swap wait which is affecting the nvidia proprietary driver only. So nouveau has nothing to do with it anyway, it not being thread safe is already mentioned in comment #57. Nevermind, anyway.
glFinish is always used as _cogl_winsys_wait_for_gpu in cogl-winsys-glx.c but IMHO not always in a sensible way.
My guess was based on an article where someone investigated and explained the implementation differences of swapbuffers/vsync in the different drivers mesa/amd/nvidia, probably outdated. Still a guess though.

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

(In reply to Daniel van Vugt from comment #62)
> This bug should not be about general Nvidia performance when the original
> reporter was citing a specific commit.
>
> I am analysing the performance of Nvidia right now, and certainly it is not
> good. But if we're no longer talking about commit
> 383ba566bd7c2a76d0856015a66e47caedef06b6 then this bug should probably be
> closed and replaced with new more current bug reports.

The issue with the commit is still relevant but for me, adding
__GL_YIELD="USLEEP"
_GL_THREADED_OPTIMIZATIONS=0
to /etc/environment and restarting Xorg worked around the issue.
So it's either those environment variables or reverting the commit.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Maik,

Re-reading your comment #68, I still have much the same concerns:

1. If the Nvidia driver is consistently different to other drivers, then why can't I reproduce the problem using the Nvidia proprietary driver?

2. Moving GL commands into a new thread is non-trivial (you need to manually set up the context so that the commands will work), and it may cause crashes for nouveau users (https://bugs.freedesktop.org/show_bug.cgi?id=92438). I know this bug is not about nouveau but your suggested solution may be unacceptable because we have to write code that is compatible with nouveau.

3. Any real solution should not involve glFinish at all anyway. glFinish really hurts performance so the sooner we can get rid of that the better.

Also, I proposed a new fix yesterday that might help some people here:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/277

Revision history for this message
In , Noxum (noxum) wrote :

Thank you, Daniel. This specific info destroys my idea:
> Moving GL commands into a new thread is non-trivial (you need to manually set
> up the context so that the commands will work)

The rest of your comments do not apply. The threaded swap wait is an nvidia proprietary driver only code path, emulating the intel extension glx_intel_swap_event. So changing only that does not influence the nouveau code path or any other driver's one. This specific path already contains a glFinish, so I was talking about moving that, not adding another one.

Revision history for this message
In , Noxum (noxum) wrote :

PS:
Daniel, in general I completely agree with you, curently mutter uses too many glFinish and glxWaitVideoSync which are both blocking. The prop. driver to my knowledge needs exactly one glFinish per frame for proper operation so my idea was about using that as efficiently as possible.
A web-search trying to re-find the mentioned article brought up that many gl devs were hitting the same behaviour of the prop. driver and used the same solution I was thinking of.
I didn't look into why mutter uses glxWaitVideoSync so often to wait a little here, wait a little there.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Just a reminder to all:

If you have frame rate problems then please see bug 789186.

This bug is about high CPU usage only.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

For those of you who still find reverting commit 383ba566bd7c2a76d0856015a66e47caedef06b6 helps to reduce CPU, can you please note the Nvidia driver version you are using? I've tested some theories today and retested drivers 340 and 390 but can't find any problem. Or at least can't find any problem that's unique to Nvidia.

When testing for this bug please take care to not move the mouse at all. Just let rendering proceed untouched. High CPU from moving the mouse is a whole different family of issues (https://gitlab.gnome.org/GNOME/mutter/issues/283) so please don't move the mouse or discuss that here.

Please also note if the bug affects Xorg or Wayland sessions.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Correction: Please DON'T note if the bug affects Xorg or Wayland sessions.

It's rather obvious from 383ba566bd7c2a76d0856015a66e47caedef06b6 that this discussion should be about Xorg sessions only.

Revision history for this message
laulau (olaulau) wrote :
Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

And remember, this bug is about high CPU only. Other issues should not be discussed here.

Those simply wanting smoother performance for Nvidia should use this instead:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/281

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This might be a different issue, but your kernel log is also getting flooded with error messages about the touchpad(?)

https://launchpadlibrarian.net/409195699/CurrentDmesg.txt

summary: - gnome-shell eats 100% cpu when seconds display is on and screen is
- locked
+ [nvidia] gnome-shell eats 100% cpu when seconds display is on and screen
+ is locked
affects: xorg (Ubuntu) → gnome-shell (Ubuntu)
tags: added: nvidia
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If you have any gnome-shell extensions installed, please try uninstalling them and tell us if the problem persists.

Changed in gnome-shell (Ubuntu):
status: New → Incomplete
Revision history for this message
laulau (olaulau) wrote :

seems that the bug disappears with all gnome extensions uninstalled.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks. In that case I am going to declare this bug invalid for 'gnome-shell'. But we can reopen it for a different component if you can find out which gnome-shell extension was causing the problem.

Changed in gnome-shell (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
laulau (olaulau) wrote :

the bug still happens if I enable seconds display. still no gnome extensions installed.

here is an extract of a top running in background when screen auto locks and cpu is burning (watch -n1 "top -b -n 1 >> cpu.log") :

top - 12:02:08 up 36 min, 1 user, load average: 0.97, 0.78, 0.62
Tasks: 273 total, 2 running, 271 sleeping, 0 stopped, 0 zombie
%Cpu(s): 17.9 us, 4.8 sy, 0.0 ni, 77.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7860.6 total, 2552.6 free, 2671.6 used, 2636.4 buff/cache
MiB Swap: 512.0 total, 512.0 free, 0.0 used. 4761.5 avail Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 3427 laulau 20 0 3335356 308544 98700 R 83.3 3.8 7:55.70 gnome-shell
 4147 laulau 20 0 2371472 461776 152652 S 11.1 5.7 1:43.68 firefox
 1214 message+ 20 0 24216 6244 3836 S 5.6 0.1 0:01.80 dbus-daemon
 1354 root 20 0 248388 9456 6404 S 5.6 0.1 0:00.73 polkitd
 1505 root 20 0 263212 8256 7012 S 5.6 0.1 0:00.06 gdm3
 4350 laulau 20 0 1698980 171616 84856 S 5.6 2.1 0:24.28 WebExtensions
 4476 laulau 20 0 199236 28424 13388 S 5.6 0.4 0:00.30 chrome-gnome-sh
 4723 laulau 20 0 2010052 489476 206920 S 5.6 6.1 7:09.37 Web Content
 4978 laulau 20 0 313968 27492 21004 S 5.6 0.3 0:00.17 update-notifier
 9080 laulau 20 0 26308 3852 3232 R 5.6 0.0 0:00.03 top
    1 root 20 0 195224 9552 6788 S 0.0 0.1 0:03.25 systemd

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK. Does the bug still happen if:

(a) Firefox is not running; or

(b) You disable your Nvidia GPU (in the BIOS?) and just use the Intel one?

Changed in gnome-shell (Ubuntu):
status: Invalid → Incomplete
Revision history for this message
laulau (olaulau) wrote :

disabled nvidia gpu in nvidia-settings (and rebooted)
 without any application : OK
 with just a firefox empty private browsing window : OK
 with a normal firefox (many tabs & plugins) : OK
 many apps running : OK
enabling nvidia GPU, rebooted
 without any application : KO

computer is calm, doing a manual lock lead to imediate fan blowing. unlocking and fan calm down.

extract from my script logs (I added CPU freq fyi):

cpu MHz : 700.142
cpu MHz : 700.025
cpu MHz : 700.346
cpu MHz : 700.285
cpu MHz : 700.037
cpu MHz : 700.094
cpu MHz : 700.109
cpu MHz : 700.109
top - 12:01:18 up 54 min, 1 user, load average: 0.16, 0.46, 0.67
Tasks: 263 total, 1 running, 262 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 1.5 sy, 0.0 ni, 97.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7860.6 total, 5831.8 free, 963.5 used, 1065.4 buff/cache
MiB Swap: 512.0 total, 512.0 free, 0.0 used. 6620.2 avail Mem
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  245 root -51 0 0 0 0 S 6.2 0.0 0:04.87 irq/129-nvidia
 2568 laulau 20 0 375224 87192 49688 S 6.2 1.1 0:20.95 Xorg
 2772 laulau 20 0 3265100 286588 97372 S 6.2 3.6 27:57.77 gnome-shell
 6954 laulau 20 0 26296 3764 3140 R 6.2 0.0 0:00.02 top
    1 root 20 0 195328 9612 6764 S 0.0 0.1 0:03.46 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd

cpu MHz : 3400.000
cpu MHz : 3400.341
cpu MHz : 3401.565
cpu MHz : 3401.177
cpu MHz : 3399.907
cpu MHz : 3400.164
cpu MHz : 3403.125
cpu MHz : 3400.692
top - 12:01:28 up 54 min, 1 user, load average: 0.29, 0.48, 0.67
Tasks: 262 total, 2 running, 260 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.0 us, 8.8 sy, 0.0 ni, 87.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7860.6 total, 5830.9 free, 964.3 used, 1065.4 buff/cache
MiB Swap: 512.0 total, 512.0 free, 0.0 used. 6619.3 avail Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 2772 laulau 20 0 3265100 286948 97372 R 93.8 3.6 28:06.30 gnome-shell
    1 root 20 0 195328 9612 6764 S 0.0 0.1 0:03.46 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd

so the problem is between nvidia and gnome-shell on my configuration. and the fact that I want to display seconds in the top bar !

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK. Since the problem seems to be in the Nvidia-415 driver, and that driver is not part of Ubuntu then this bug is not a valid Ubuntu bug right now. Sorry.

You can potentially make this a valid Ubuntu bug if you install (downgrade) to a supported version of the Nvidia driver like version 390.

  1. Uninstall the nvidia driver version 415.
  2. sudo apt install nvidia-driver-390

Then if the bug still occurs in a supported version of the driver we can reopen this bug.

affects: gnome-shell (Ubuntu) → ubuntu
Changed in ubuntu:
status: Incomplete → Invalid
summary: - [nvidia] gnome-shell eats 100% cpu when seconds display is on and screen
- is locked
+ [nvidia-415] gnome-shell eats 100% cpu when seconds display is on and
+ screen is locked
Revision history for this message
laulau (olaulau) wrote : Re: [nvidia-415] gnome-shell eats 100% cpu when seconds display is on and screen is locked

OK. I just removed graphics PPA, removed nvidia 415, then installed nvidia 390. rebooted.
same again : gnome shell eats 100% CPU when screen is locked and seconds are displayed !

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK, thanks.

Please run this command:

  dpkg -l > allpackages.txt

and send us the resulting file 'allpackages.txt'.

affects: ubuntu → gnome-shell (Ubuntu)
Changed in gnome-shell (Ubuntu):
status: Invalid → Incomplete
Revision history for this message
laulau (olaulau) wrote :
summary: - [nvidia-415] gnome-shell eats 100% cpu when seconds display is on and
- screen is locked
+ [nvidia] gnome-shell eats 100% cpu when seconds display is on and screen
+ is locked
Changed in gnome-shell (Ubuntu):
status: Incomplete → New
Revision history for this message
In , Noxum (noxum) wrote :

Ok, nvidia driver 418.43, gnome-shell 3.31.91 + mutter 3.31.91 freshly built without any additional patches.
Running only glxgears in a small window, not anything else open, not touching anything, gnome-shell cpu usage spikes to 85%.

Revision history for this message
In , Noxum (noxum) wrote :

Even just the small circle animation in the g-c-c bluetooth pane puts gnome-shell to constant 85% cpu usage.

Revision history for this message
In , Noxum (noxum) wrote :

https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-KDE-High-CPU-Fix
gnome-shell-3.32.0+mutter-3.32.0
Setting __GL_MaxFramesAllowed=1 returns cpu usage to normal when running glxgears or opening the bluetooth pane. Mystery solved?

Revision history for this message
In , Léo (leeo97one) wrote :

Interesting, so GNOME is also affected by this glXSwapBuffers misimplementation?
Furthermore, setting this environment variable also fixes some big freezes that I had with GNOME Shell (or Mutter ?) 3.32.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

The fix in comment 81 sounds the same as what this does:
https://gitlab.gnome.org/GNOME/mutter/merge_requests/281

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Although not entirely the same.

Mutter was already trying to do things right (compared to KDE) so if __GL_MaxFramesAllowed=1 helps you then that suggests to me the issue is missing frame notifications from the backend/driver (COGL_FRAME_EVENT_SYNC/COGL_FRAME_EVENT_COMPLETE). If they are unsupported or absent then clutter_stage_cogl_schedule_update returns early and the clutter master clock would revert to a dumb fallback throttling method (which is the same as KDE's).

This also explains why reverting 383ba566b would seem to help some people. It's a workaround, but not a fix for the real problem.

I guess what we need now is a developer who can reproduce the problem (I can't) to find out why COGL_FRAME_EVENT_SYNC/COGL_FRAME_EVENT_COMPLETE events aren't arriving. Note: Arriving 150ms+ late is counted the same as not arriving, so that's possible too but would be extra strange.

Revision history for this message
In , Noxum (noxum) wrote :

Leo, I can confirm that setting __GL_MaxFramesAllowed=1 also mitigates gnome-shell freezes in startup animations. The icon zoom animation would sometimes freeze for some seconds when starting an application from the dash. Just as additional info.

Daniel, I came around doing some testing on a desktop system and there, the high cpu usage is indeed not noticeable. So maybe this is only affecting hybrid graphic setups using PRIME Sync?

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Yes, it sounds like a good theory that COGL_FRAME_EVENT_SYNC/COGL_FRAME_EVENT_COMPLETE are missing/invalid in hybrid/PRIME setups. Someone should test that. Maybe me when I get time :)

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

A quicker fix for people might just be to use:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/363.patch
  https://gitlab.gnome.org/GNOME/mutter/merge_requests/363

That should avoid the offending spinning and high CPU. And it should work better than reverting 383ba566bd7c2a76d0856015a66e47caedef06b6. Because doing the revert has bad consequences for the responsiveness and throughput of mutter/gnome-shell's main loop.

If 363.patch works for you, that's great. But even that's not the optimal long term fix I would like to figure out still.

Revision history for this message
In , Léo (leeo97one) wrote :

As a PRIME (NVIDIA+Intel) user, I would be happy to help you test but I don't know how to trace the events.

Revision history for this message
In , Noxum (noxum) wrote :

Doing some more testing on the desktop system, this now gets weird and weirder. My desktop is also effected by this, just not when running glxgears, I bought a bluetooth dongle to have the circle animation.
gnome-shell cpu usage:
Threaded swap wait enabled - Desktop:
glxgears - normal, no cpu spikes
g-c-c bluetooth animation - 85% cpu
dragging windows - 85%cpu

Threaded swap wait enabled - notebook/PRIME:
glxgears - normal, spiking to 85% cpu
g-c-c bluetooth animation - 85% cpu
dragging windows - 85% cpu

Now setting __GL_MaxFramesAllowed=1 on both systems:
desktop: no change! cpu usage still high (?)
glxgears - normal, no cpu spikes
g-c-c bluetooth animation - 85% cpu
dragging windows - 85%cpu

notebook:
glxgears - normal, 10% cpu, no spikes
g-c-c bluetooth animation - normal, 10% cpu
dragging windows - okayish, 35% cpu

Next I'll check if the mentioned patch changes anything.

Revision history for this message
In , Léo (leeo97one) wrote :

@Maik I think you should also try with or without PRIME Synchronisation, just in case.

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

With mr363 and mr281 (minus the first commit of 281), moving the mouse spikes the CPU up to 3% and then settles down. dragging windows around spikes CPU usage up to 10% but then immediately goes back to 0% once the dragging stops.

Revision history for this message
In , Hussam Al-Tayeb (hussam) wrote :

With glxgears, I get spikes up to 3% without moving the mouse and 4% while moving the mouse. This is a massive improvement (mr281 + mr363). My desktop feels very fast!

Revision history for this message
In , Noxum (noxum) wrote :

PRIME Sync on/off changes:
__GL_MaxFramesAllowed=1 unset:
g-c-c bluetooth animation - 85% -> 30%
dragging windows - 85% -> 40%

__GL_MaxFramesAllowed=1 set:
g-c-c bluetooth animation - 10% -> 20%
dragging windows - 35% -> 45%

Entertaining but not really helping to shed light on this.

Revision history for this message
In , Noxum (noxum) wrote :

Now applied mr363+mr281, results are mixed:
Desktop:
g-c-c bluetooth animation - 85% -> 9% Good!
dragging windows - 85% -> 75% slightly better

Notebook, PRIME Sync
__GL_MaxFramesAllowed=1 unset:
g-c-c bluetooth animation - 85% -> 85% no change
dragging windows - 85% -> 75% slightly better

__GL_MaxFramesAllowed=1 set:
g-c-c bluetooth animation - 10% -> 13%
dragging windows - 35% -> 30%

Overall, desktop is snappy with those patches, PRIME sync notebook still sluggish. With patches + prime sync disabled, mutter will now render unthrottled (animation speeds up) so high cpu usage in general.

Revision history for this message
In , Noxum (noxum) wrote :

On the desktop system with single nvidia gpu, I can only second Hussam: it's a blast when using mutter-3.32 plus both patches. Snappy, fast, great user experience.
Great work Daniel, thank you.
In the prime sync case, mutter obviously somehow derails taking different code paths.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Maik,

Great to hear. But please don't use dragging windows as a CPU test. Even without this bug (e.g. Intel GPUs) dragging windows is very heavy on the CPU. That's a separate bug (not yet reported upstream?). The fixes required for high CPU usage when dragging windows are:

  * Closure of https://gitlab.gnome.org/GNOME/mutter/issues/283 and
  * Something like https://gitlab.gnome.org/GNOME/mutter/merge_requests/270 but better.

And I don't think we need to worry about __GL_MaxFramesAllowed=1. That's what mr281 does.

---

Hussam,

I'm very glad you find those patches work and I would like mutter to get both. But for this bug I think it's work pointing out you only need mr363. Once you have that this bug is fixed, and adding mr281 only reduces output latency (which is nice, but not relevant here).

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

I almost forgot, dragging windows is extra expensive with the Nvidia driver:

  https://launchpad.net/bugs/1799679

So everyone please don't use window dragging as a CPU test for this bug. It has its own causes not related to this bug.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Random thought: This might be the real root cause:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/216
  https://gitlab.gnome.org/GNOME/mutter/merge_requests/216.patch

Can anyone experiencing this bug please try that fix alone?

Revision history for this message
In , Noxum (noxum) wrote :

Just to put it straight:
- mr363 fixes this bug on single GPU systems
- mr363 does not have any effect on hybrid GPU systems using PRIME sync
- __GL_MaxFramesAllowed=1 is a workaround on PRIME systems currently.

So all testing is now done on PRIME sync with __GL_MaxFramesAllowed unset.
I'm sorry,
applying mr216 does not have any effect on my PRIME sync system.
g-c-c bluetooth animation - 85% -> 85% no change.

Revision history for this message
In , Noxum (noxum) wrote :

Of course, this is now mutter-3.32.0+mr216+mr363+mr281.

Revision history for this message
In , Noxum (noxum) wrote :

Daniel, re-reading your comment, I think I might have misunderstood you. So I now restested the following setup:
Single GPU, mutter-3.32+mr216 only (mr363+mr281 left out)
I can confirm that this setup also fixes this specific bug ( High cpu while constant rendering) but only this. (mr363 also fixes high cpu on mouse movement.)

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Thanks, that is very encouraging because mr216 would explain the root cause here, and why the bug randomly affects some Nvidia systems sometimes and not others.

But I would also be curious to hear what Hussam thinks about testing just

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/216
  https://gitlab.gnome.org/GNOME/mutter/merge_requests/216.patch

alone.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Maybe I need to provide a backport for older mutter releases?

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

My mistake. If you ever do experience the problem fixed by:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/216

then the symptom for that is a frozen screen. Not this bug.

So the real fix for this bug is still:

  https://gitlab.gnome.org/GNOME/mutter/merge_requests/363.patch
  https://gitlab.gnome.org/GNOME/mutter/merge_requests/363

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I think there's a reasonable chance this bug is the same as:
https://bugzilla.gnome.org/show_bug.cgi?id=781835

affects: gnome-shell → mutter
Changed in mutter (Ubuntu):
assignee: nobody → Daniel van Vugt (vanvugt)
Changed in gnome-shell (Ubuntu):
assignee: nobody → Daniel van Vugt (vanvugt)
Changed in mutter (Ubuntu):
status: New → In Progress
Changed in gnome-shell (Ubuntu):
status: New → In Progress
Changed in mutter:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

And an extra, final nail in the coffin for this bug:

https://gitlab.gnome.org/GNOME/mutter/merge_requests/602

Although you only need !363 to solve this bug. !602 clarifies things further by showing after !363 it is now safe to revert commit 383ba566bd7c2a76d0856015a66e47caedef06b6 and more.

Revision history for this message
In , Noxum (noxum) wrote :

Thanks again, Daniel.
Unfortunately, I'm unable to apply MR602, tried mutter 3.32.2+MR363, mutter 3.33.2+(rebased)MR363 and mutter-HEAD+(rebased)MR363
Any prerequisite I'm missing?

Revision history for this message
In , Noxum (noxum) wrote :

Nevermind, didn't get that MR602 includes MR363.

Revision history for this message
In , Noxum (noxum) wrote :

I now tested on my Optimus system using PRIME sync with __GL_MaxFramesAllowed unset using mutter/gnome-shell-3.33.2+MR602.
The good news is, while previously MR363 had no effect on CPU usage while constantly rendering on this system, MR602 now fixes it. CPU usage on bluetooth-pane circle is ~12%.
The bad news, unrelated to this bug is that without setting __GL_MaxFramesAllowed=1 on PRIME sync, mutter is stuttering and intermittently freezing. Let alone still being laggy with or without that setting.

I did not test this so far on the single gpu desktop system since Gnome 3.33 is quite a mess right now with lots of api breakage and increased dependencies. Is it possible to rebase MR602 to Gnome-3.32.x ?

Revision history for this message
In , Noxum (noxum) wrote :

Tested mutter 3.32.2 +MR281+520+576(-1Hunk)+602 on the single gpu system, runs fine. No noticeable issues until now. Due to the removal of threaded wait moving windows is also back to normal/expected cpu usage now.

For the remaining issues with PRIME sync where these patches don't have any effect, I updated the existing issue at:
https://gitlab.gnome.org/GNOME/gnome-shell/issues/1202

Revision history for this message
In , Marco Trevisan (Treviño) (3v1n0) wrote :
Changed in mutter:
status: Confirmed → Fix Released
tags: added: fixed-in-3.33.3 fixed-upstream
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thank you for reporting this bug to Ubuntu.
Ubuntu 18.10 (cosmic) reached end-of-life on July 18, 2019.

See this document for currently supported Ubuntu releases:
https://wiki.ubuntu.com/Releases

We appreciate that this bug may be old and you might not be interested in discussing it any more. But if you are then please upgrade to the latest Ubuntu version and re-test. If you then find the bug is still present in the newer Ubuntu version, please add a comment here telling us which new version it is in and change the bug status to Confirmed.

Changed in gnome-shell (Ubuntu):
status: In Progress → Won't Fix
Changed in mutter (Ubuntu):
status: In Progress → Won't Fix
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Although the linked upstream fix is coming in GNOME 3.34 for Ubuntu 19.10, I have still marked this bug as 'Won't Fix'. That's because I am not completely confident this is the right downstream bug for what was fixed upstream.

no longer affects: mutter
tags: removed: fixed-in-3.33.3 fixed-upstream
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.