Extreme jerkiness with kwin compositing on Nvidia binary driver after upgrading to 4.11.5.

Bug #1267977 reported by Michael Marley
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KDE Base Workspace
Fix Released
Medium
kde-workspace (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

On my Kubuntu 14.04 (Trusty) system with an Nvidia graphics card and version 331.20 of the binary driver, I began having a problem with very jerky/juddery graphics after updating from kde-workspace 4.11.4 to 4.11.5. (The rest of the system is on 4.12.) About every half-second or so, everything moving or animating on the screen jerks very noticeably. This occurs for both OpenGL things like glxgears and for 2D things like scrolling in Firefox.

Looking through the changelog for 4.11.5, it would seem that the issue probably would have been caused by this: https://git.reviewboard.kde.org/r/114162/. However, using the KWIN_USE_BUFFER_AGE=0 environment variable does not eliminate the jerkiness. It does, however, introduce tearing, unless I set "Tearing Prevention (VSync)" to "Full Scene Repaints."

Revision history for this message
In , Michael Marley (mamarley) wrote :

On my Kubuntu 14.04 (Trusty) system with an Nvidia graphics card and version 331.20 of the binary driver, I began having a problem with very jerky/juddery graphics after updating from kde-workspace 4.11.4 to 4.11.5. (The rest of the system is on 4.12.) About every half-second or so, everything moving or animating on the screen jerks very noticeably. This occurs for both OpenGL things like glxgears and for 2D things like scrolling in Firefox.

Looking through the changelog for 4.11.5, it would seem that the issue probably would have been caused by this: https://git.reviewboard.kde.org/r/114162/. However, using the KWIN_USE_BUFFER_AGE=0 environment variable does not eliminate the jerkiness. It does, however, introduce tearing, unless I set "Tearing Prevention (VSync)" to "Full Scene Repaints."

Reproducible: Always

Steps to Reproduce:
1. Get a system with an Nvidia graphics card and install a GLX_EXT_BUFFER_AGE-supporting driver version.
2. Install/upgrade to kwin 4.11.5.
3. Launch glxgears
Actual Results:
glxgears should run smoothly without dropping any frames.

Expected Results:
glxgears jerks noticeably about twice every second.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

please provide the output of "qdbus org.kde.kwin /KWin supportInformation", "env | grep GL" and "grep -i triple /var/log/Xorg.0.log"

you may also blindly try
export __GL_YIELD="USLEEP"
kwin --replace &

Revision history for this message
In , Michael Marley (mamarley) wrote :

I already had __GL_YIELD set to USLEEP and triple buffering turned on in my xorg.conf.

When I tried to run your first command, I got "qdbus: could not find a Qt installation of ''", but I think this is an unrelated issue with my system that I am troubleshooting now.

Revision history for this message
In , Michael Marley (mamarley) wrote :

I also tried all combinations of __GL_YIELD and triple-buffering settings, but none had any effect.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

install the qtchooser package (also there's maybe "qdbus-qt4")

Revision history for this message
In , Michael Marley (mamarley) wrote :

Thanks, that worked. Here is the output: http://pastebin.kde.org/pgzbj8jy4

Revision history for this message
Michael Marley (mamarley) wrote :

There is also an upstream bug here: https://bugs.kde.org/show_bug.cgi?id=329821

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

it says: "KWin version: 4.11.4"?
Also the determined tearing prevention is frontbuffer re-usage (likely from "auto")

what happens if you disable buffer_age, set tearing prevention to none (cheap, full repaints - *not* front buffer copying nor automatic) and restart "kwin --replace&"?

Revision history for this message
In , Michael Marley (mamarley) wrote :

Yeah, sorry about that. I downgraded back to 4.11.4 to get rid of the jerking. If you want, I can upgrade back to 4.11.5 and run the command again.

On 4.11.5, if I disable buffer_age and set full repaints, I get smooth motion and no tearing.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

what about jerkyness for no buffer_age and front buffer re-usage?

Revision history for this message
In , Michael Marley (mamarley) wrote :

With that combination, I get both tearing and jerkiness.

Revision history for this message
In , Michael Marley (mamarley) wrote :

Additionally, it seems that with those settings, the tearing and the jerkiness are "synchronized." At the same time glxgears jerks, the line of tearing appears on the konsole window I am dragging above it.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

-> buffer_age on and tearing prevention to none?

Revision history for this message
In , Michael Marley (mamarley) wrote :

In that mode, 2D applications like Firefox are smooth but with lots of tearing. glxgears tears heavily in the middle 1/3 of the screen. It is jerky in the area that tears but smooth otherwise.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

-> "only when cheap" (what should be the case for buffer_age)

Revision history for this message
In , Michael Marley (mamarley) wrote :

That setting produces the same results as Automatic. (Everything jerks twice a second, no tearing.)

Revision history for this message
In , Michael Marley (mamarley) wrote :

I just tried with the newly-released Nvidia 331.38 driver and the bug still occurs.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

Random guess:
1. ensure that triple buffering is really enabled:
  grep -i triple /var/log/Xorg.0.log
2. next convince kwin about it
   export KWIN_TRIPLE_BUFFER=1
   kwin --replace &

The problem here is that full scene repaints do not seem to cause a problem, so it cannot be swapping by itself (though you should ensure that flipping is enabled in nvidia-settings, GL settings page) and either frontbuffer reading or buffer_age.

What's even more weird is that you claim tearing for frontbuffer reading, what can only have two pot. causes:
1. no flipping (see above)
2. tearing in the client (activate the "show paint" effect, it's worthless for buffer_age or full scene repaints, though)

But either case would also apply to full scene repaints.

One last resort: try to disable blurring.

Revision history for this message
In , Michael Marley (mamarley) wrote :

Thanks! The "export KWIN_TRIPLE_BUFFER=1" thing completely clears up the jerkiness!

I still consider this a bug though, because the jerkiness occurs even when I have triple buffering turned off in xorg.conf. Perhaps kwin should be able to automatically detect when triple buffering is taking place?

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

*sigh*

It does try to detect whether triple buffering is enabled and that is (in a way) crucial to know.
Unfortunately there's no "legal" way to know this, so it's measured at runtime and that used to work nicely in the past (and still does here)

-> Can you please check how much time buffer swapping takes during the detection?

To do so, you'd have to run "kdebugdialog --fullmode", filter for kwin (1212) and redirect all output to some file, e.g. /tmp/kwin.dbg

The file will after a short time (500 screen updates) contain a "Triple buffering detection" line which will indicate whether triple buffering is assumed to be available and the mean blocking time of glSwapBuffers().

related bug #322060 and bug #329297

Revision history for this message
In , Michael Marley (mamarley) wrote :

I tried this but I do not get any such message about triple buffering.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

Sorry, I should have mentioned that you must *not* export KWIN_TRIPLE_BUFFER or the heuristic detection won't take place at all.

Revision history for this message
In , Michael Marley (mamarley) wrote :

I tried again just to make sure, but even when I comment out the export from my .profile and reboot, I still don't get anything about triple buffering detection in the debug output.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

it requires 500 full repaints - if you did not enable buffer_age or full scene repaints or frontbuffer copying as tearing prevention, this can last quite a while.

Can you attach the generated file?

Revision history for this message
In , Michael Marley (mamarley) wrote :

I do have buffer_age enabled, and I waited at least 5 minutes before checking the file. I am going to have to go away in just a minute, but I will test it again when I get a chance.

Revision history for this message
In , Michael Marley (mamarley) wrote :

Created attachment 84748
KWin debugging output with KWIN_TRIPLE_BUFFER undefined

Sorry for the delay. Here is the output. I commented out the KWIN_TRIPLE_BUFFER in my ,profile, rebooted, and ran glxgears in fullscreen for about 30 seconds to make sure it had rendered enough frames.

Revision history for this message
In , Michael Marley (mamarley) wrote :

I have also noticed that after enabling KWIN_TRIPLE_BUFFER, sometimes I get more lag between the cursor and the window when dragging windows around the screen, especially if I drag the window in circles. It isn't that bad, but I thought you should know anyway.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

something is fishy here.

a) please provide
- /var/log/Xorg.0.log
- glxinfo > my.glxinfo
- nvidia-settings -q all > my.nvsettings
- cat /proc/`pidof kwin`/environ > my.kwinenv

b) did you really redirect *all* level outputs for 1212/kwin in "kdebugdialog --fullmode"?

Revision history for this message
In , Michael Marley (mamarley) wrote :

Created attachment 84755
Xorg.0.log

Revision history for this message
In , Michael Marley (mamarley) wrote :

Created attachment 84756
my.glxinfo

Revision history for this message
In , Michael Marley (mamarley) wrote :

Created attachment 84757
my.kwinenv

Revision history for this message
In , Michael Marley (mamarley) wrote :

Created attachment 84758
my.nvsettings

Revision history for this message
In , Michael Marley (mamarley) wrote :

I did redirect all the output for "1212 kwin" to that file.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

You're overriding FSAA (multisampling), it's not re-overridden by an enviroment in kwin.
So unless you've an application profile for kwin in nvidia settings that sets GLFSAAMode to 0x0, that will cause some significant GPU load.

Revision history for this message
In , Michael Marley (mamarley) wrote :

I do in fact have an application profile configured for kwin that turns AA and AF off. Sorry, I forgot to mention that.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

grrrr... it's because the buffer_age patch exits the paint function early and completely bypasses the triple buffer detection.
Can/Do you want to try a patch?

Revision history for this message
In , Michael Marley (mamarley) wrote :

Sure, I can try a patch.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

See here:
https://git.reviewboard.kde.org/r/115306/

Sorry that it took so long (that's apparently why everyone says that early exits are evil ;-)

Revision history for this message
In , Michael Marley (mamarley) wrote :

OK, compiling now. This may take a while; my compile box is an old Core 2 Duo machine.

Revision history for this message
In , Michael Marley (mamarley) wrote :

Thanks, this patch works! After a few seconds, the log message indicates that kwin detected triple buffering and the jerkiness clears up.

However, I still am getting that lag I was talking about earlier when dragging windows. That only started happening after buffer_age was introduced. Should I file another bug for that?

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

Does the lagging ever occur during a session or only right after login?

Revision history for this message
In , Michael Marley (mamarley) wrote :

It can happen anytime, but doesn't always happen. A pretty reliable way to reproduce it for me is to drag a window around in large circles on the screen at a rate of about one circle per second. When I do that, the window lags quite a ways behind the mouse cursor. Curiously, I haven't noticed any lag when doing other things, such as dragging scrollbars or playing games.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

Circular movement is just a good way to outpace systems, so that's not too special.

When this happens, does
- re-initiating a new drag
- restaring the compositor (Shift+Alt+F12 twice)
stop it?

Is there exceptonally high CPU load?

Revision history for this message
In , Michael Marley (mamarley) wrote :

During the dragging, kwin is using about 8% of the CPU and Xorg is using about 4% (both as measured by htop.) I don't even have to restart compositing to make the lag go away. If I stop dragging the window, it goes away immediately. Also, this didn't happen before buffer_age was introduced.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

(In reply to comment #42)
> I don't even have to restart
> compositing to make the lag go away. If I stop dragging the window, it goes
> away immediately.

Just to be absolutely certain about this:
that means a subsequent drag does not show this symptom?

Revision history for this message
In , Michael Marley (mamarley) wrote :

Not unless I start dragging it in circles again.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

That means it's permanent.
Try to toggle compositing off. Still laggy?
Toggle compositing on again. Still laggy?

Revision history for this message
In , Michael Marley (mamarley) wrote :

With compositing off, there is no lag. When I re-enable it, the lag comes back.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

I guess it feels like moving through jelly, the window follows the mouse - it does not hang and then jump to the mouse position?

Dev note:
This would mean we load the swapbuffer with too many swaps what can basically have two reasons:

1. misdetection/overridden refreshrate/MaxFPS
2. triplebuffer misdetection (assumed to be NOT available, while it indeed is)

According to the present debug output, the refreshrate is detected as 60Hz (what is supported by the nvidia-settings query) - so it had to be 3buf detection, just that comment #25 explicitly states that this occurred WITH KWIN_TRIPLE_BUFFER=1 ...

So there must be a third reason to overcommit frames (could be broken paint time calculation)

Revision history for this message
In , Michael Marley (mamarley) wrote :

My monitor does run at 60Hz and I do have KWIN_TRIPLE_BUFFER=1 set, so that sounds right.

And moving through jelly is a good description. It is still smooth, but just is smooth farther behind the mouse cursor than it was before buffer_age was introduced.

Revision history for this message
In , Michael Marley (mamarley) wrote :

I just discovered a reliable way to reproduce the lag at any time. If I run an application that is playing a video or doing any kind of continuous animation, it makes all window dragging lag in exactly the same way I described with the circular dragging earlier.

Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :

Git commit bb9f76e1aede42fcd51edf298e4d8a0b942ff6ac by Thomas Lübking.
Committed on 24/01/2014 at 21:29.
Pushed by luebking into branch 'KDE/4.11'.

merge buffer_age render into general render code

avoiding the blocking swapinterval detection causes
issues in the timing strategy and prevents protection
against CPU overload on the nvidia blob
FIXED-IN: 4.11.6
REVIEW: 115306

M +8 -10 kwin/eglonxbackend.cpp
M +8 -10 kwin/glxbackend.cpp

http://commits.kde.org/kde-workspace/bb9f76e1aede42fcd51edf298e4d8a0b942ff6ac

Revision history for this message
Rohan Garg (rohangarg) wrote :

Fix has been commited upstream, it'll be released with KDE SC 4.11.6 packages in a few days.

Changed in kde-workspace (Ubuntu):
status: New → Fix Released
Revision history for this message
Rohan Garg (rohangarg) wrote :

You can also try out packages from Project Neon ( https://launchpad.net/~neon/+archive/ppa ) to check if the fix works for you.

Revision history for this message
Michael Marley (mamarley) wrote :

I already compiled my own KDE workspace package using the patch that Thomas Lübking supplied upstream and found that it works.

Changed in kdebase-workspace:
importance: Unknown → Medium
status: Unknown → Fix Released
Revision history for this message
In , Thomas-luebking (thomas-luebking) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.