[intrepid] 1.4 -> 1.5.2 bad performance regression

Bug #280671 reported by Oibaf on 2008-10-09
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
X.Org X server
Fix Released
Medium
xorg-server (Ubuntu)
High
Unassigned

Bug Description

I noticed bad performance regression with xorg-server 1.5.2 (intrepid) against 1.4 (hardy). For the detailed report see here:
http://lists.freedesktop.org/archives/xorg/2008-October/039277.html

There are two known problems:

1) EXA: Avoid excessive syncing in PutImage
Patch to fix slow EXA in 1.5.2:
http://gitweb.freedesktop.org/?p=xorg/xserver.git;a=commitdiff;h=f4c33e2e64ce83c29c3bc79853e421247acfea11
the fix was included in 1.5.3.

Test package of xorg-server 1.5.3~git-1ubuntu0tormod4 can be found at:
https://edge.launchpad.net/~xorg-edgers/+archive

See comment #6 (https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/280671/comments/6) for numbers of performance increase with gtkperf. No regression so far with the 1.5.3~git-1ubuntu0tormod4 package.

2) Array-index based devPrivates implementation
Reading the suggestions from the thread at
http://lists.freedesktop.org/archives/xorg/2008-October/039277.html
this backported patch from 1.6 should fix the performance regression in 1.5:
http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.5-branch&id=8ef37c194fa08d3911095299413a42a01162b078
This fix was applied to the 1.5 branch but was later reverted. It appears that this fix breaks the ABI (so it's not for intrepid):
https://bugs.freedesktop.org/show_bug.cgi?id=16647#c23

Created an attachment (id=17599)
Profile running Xorg-1.5

Looks like a lot of cycles are burnt in dixLookupPrivate() with 1.5. CC'ing Eamon Walsh, who said he's going to fix this known regression.

Thanks a lot :)

PS: The original report is a bit confusing. I am using Xorg-server-1.4.903 of course, not Xorg-7.4.903.
And "On the Xorg-1.5 system..." should have been "On the Xorg-server-1.3 system", sorry about the confusion.

Created an attachment (id=17603)
workload2 on 1.3

Created an attachment (id=17604)
workload2 on 1.5

Created an attachment (id=17605)
The "benchmark" itself

I tested a different workload, where I draw 12800 lines and 12800 single-height rects to an 8-bit pixmap (this is quite like my real-world workload will look).

This time not many cycles are spent in dixLookupPrivate, but a lot of time is spent in pixman itself.

I attached the profiles, as well as my "benchmark".

Thanks, Clemens

(In reply to comment #7)
> This time not many cycles are spent in dixLookupPrivate, but a lot of time is
> spent in pixman itself.

The 'Msecs ellapsed' value varies wildly here, but I don't see any lasting CPU usage that would allow for a useful profile. Is that different from what you're seeing?

Either way, this seems likely to be a different issue from what you reported here initially, so it should probably be tracked separately.

It would also be nice if you could try these with the xserver master branch, which has some EXA optimizations over the 1.5 branch.

> I attached the profiles, as well as my "benchmark".

Next time, please include a Makefile instead of an x86 binary. :) (and add a toplevel directory to the tarball)

Created an attachment (id=17606)
minimal benchmark 2.0

The new version of the benchmark is striped to just do line-drawing and rects and composite the result, nothing more.

There is now an infinite benchmark loop, the strange results from time to time are because the time-code is flawed (it does not cope with overflow).

These are the results I get:
Xorg-1.3: 80ms
Xorg-1.5: 230ms
(GeForce6600: 40ms with nvidia's closed driver, not mine)

> Next time, please include a Makefile instead of an x86 binary. :) (and add a
> toplevel directory to the tarball)
Sorry about that, the toplevel-tarballs also annoy me every time I encounter one.
I created a bash-compile-script, sorry I don't know make.

> Either way, this seems likely to be a different issue from what you reported
> here initially, so it should probably be tracked separately.
I don't know, maybe you could have a look again at the new benchmark?

Thanks a lot, Clemens

Created an attachment (id=17632)
Possible solution for second benchmark

Does this patch help for the second benchmark? It greatly reduces the valid region tracking overhead with it here and puts dixLookupPrivate back to the top of the profile. So this probably should have been a separate report, but that may be moot now anyway. :)

Created an attachment (id=17638)
oprofile results of the line benchmark with patch

Created an attachment (id=17639)
a real-world workload, with patch

Thanks a lot for the patch, performance is now on par with Xorg-1.3, even with dixLookupPrivate eating 25% of total cycles.
Sorry that I did not open a seperate report, I was not sure how much both issues are connected.

Thanks again for fixing it, Clemens

I'm working on the dixLookupPrivate issue, hope to have a solution sometime soon.

I have an O(1) implementation done but the callers have to be changed slightly to accommodate it.

Anything new about this bug - dixLookupPrivate seems to take over 50% of Xorg while rendering glxgears according to oprofile

On Fri, Aug 29, 2008 at 01:16:13AM -0700, <email address hidden> wrote:
> Anything new about this bug - dixLookupPrivate seems to take over 50% of Xorg
> while rendering glxgears according to oprofile

We're in the middle of fixing this.

Created an attachment (id=18573)
array-index based devPrivates implementation

Please apply the attached patch to the current git master (including the changes from yesterday) and run your performance test again.

You'll need to be using in-tree drivers because out-of-tree drivers may not have changed the devPrivate keys to point to integer storage yet.

I tested this patch, and it seems to bring down the cpu usage of dixLookupPrivate to a more acceptable level, anything special that's keeping it from being applied to master?

I went ahead and committed the patch, and send a notice to the Xorg mailing list.

It could affect out-of-tree drivers that need to adjust to it, so I was waiting for some confirmation that it did in fact address the performance issues.

This also appears to have been bakported to 1.5 branch:
http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.5-branch&id=8ef37c194fa08d3911095299413a42a01162b078

Should be marked as fixed?

I noticed very bad performance regression with xorg-server 1.5.1 against 1.4. For the detailed report see here:
http://lists.freedesktop.org/archives/xorg/2008-October/039277.html

Michel Dänzer suggests to update the xserver to a recent server-1.5-branch that has a fix to this problem:
http://lists.freedesktop.org/archives/xorg/2008-October/039278.html

This fix was discussed at http://bugs.freedesktop.org/show_bug.cgi?id=16647

We should update the xserver to a recent server-1.5-branch or at least backport the fix.

Changed in xorg-server:
status: Unknown → Confirmed
Oibaf (oibaf) on 2008-10-09
description: updated

Hi fabio-pedretti,

Please attach the output of `lspci -vvnn`, and attach your /var/log/Xorg.0.log file from after reproducing this issue. If you've made any customizations to your /etc/X11/xorg.conf please attach that as well.

Changed in xorg-server:
status: New → Incomplete
Oibaf (oibaf) wrote :

I am now using xserver-xorg-core 2:1.5.1-1ubuntu3 and having this problem.

Xorg log and output of `lspci -vvnn` are attached. My /etc/X11/xorg.conf is the default (except the EXA option when testing EXA).

Note that reading the suggestions from the thread at:
http://lists.freedesktop.org/archives/xorg/2008-October/039277.html
appears that this pacth could fix the performance problem:
http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.5-branch&id=8ef37c194fa08d3911095299413a42a01162b078

The fix was applied to the 1.5 branch but was later reverted. It appears that the fix also require another patch:
http://lists.freedesktop.org/archives/xorg/2008-October/039285.html
commit http://cgit.freedesktop.org/xorg/xserver/commit/?id=ebea78cdba0ff14a397239ee1936bd254c181e1b
(and maybe also rebuilt drivers).

Oibaf (oibaf) wrote :
Changed in xorg-server:
status: Incomplete → New
Oibaf (oibaf) wrote :
description: updated

> --- Comment #22 from Fabio <email address hidden> 2008-10-09 03:03:40 PST ---
> This also appears to have been bakported to 1.5 branch:
> http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.5-branch&id=8ef37c194fa08d3911095299413a42a01162b078
>
It's been reverted as breaking ABI.

Oibaf (oibaf) on 2008-10-20
description: updated
Changed in xorg-server:
status: Confirmed → Fix Released
Oibaf (oibaf) wrote :
description: updated
Oibaf (oibaf) wrote :

There are some nice speedups using 1.5.3 vs current default ubuntu 1.5.2. I am using the xorg-server - 2:1.5.3~git-1ubuntu0tormod4 packages from:
https://edge.launchpad.net/~xorg-edgers/+archive

Performance test with "gtkperf -a -c 500":

default 1.5.2 with EXA without compiz:

GtkEntry - time: 0,21
GtkComboBox - time: 7,12
GtkComboBoxEntry - time: 3,91
GtkSpinButton - time: 1,75
GtkProgressBar - time: 1,56
GtkToggleButton - time: 1,06
GtkCheckButton - time: 0,77
GtkRadioButton - time: 1,21
GtkTextView - Add text - time: 13,98
GtkTextView - Scroll - time: 2,01
GtkDrawingArea - Lines - time: 5,73
GtkDrawingArea - Circles - time: 4,15
GtkDrawingArea - Text - time: 9,97
GtkDrawingArea - Pixbufs - time: 0,54
 ---
Total time: 53,98

1.5.3~git-1ubuntu0tormod4 with EXA without compiz:

GtkEntry - time: 0,20
GtkComboBox - time: 6,34
GtkComboBoxEntry - time: 3,95
GtkSpinButton - time: 1,09
GtkProgressBar - time: 1,03
GtkToggleButton - time: 1,00
GtkCheckButton - time: 0,87
GtkRadioButton - time: 1,28
GtkTextView - Add text - time: 13,76
GtkTextView - Scroll - time: 1,90
GtkDrawingArea - Lines - time: 5,62
GtkDrawingArea - Circles - time: 4,14
GtkDrawingArea - Text - time: 6,92
GtkDrawingArea - Pixbufs - time: 0,36
 ---
Total time: 48,47

default 1.5.2 with EXA with compiz:

GtkEntry - time: 0,23
GtkComboBox - time: 7,12
GtkComboBoxEntry - time: 4,96
GtkSpinButton - time: 2,43
GtkProgressBar - time: 2,32
GtkToggleButton - time: 1,59
GtkCheckButton - time: 1,48
GtkRadioButton - time: 1,85
GtkTextView - Add text - time: 14,28
GtkTextView - Scroll - time: 2,86
GtkDrawingArea - Lines - time: 8,02
GtkDrawingArea - Circles - time: 6,60
GtkDrawingArea - Text - time: 14,47
GtkDrawingArea - Pixbufs - time: 1,64
 ---
Total time: 69,87

1.5.3~git-1ubuntu0tormod4 with EXA with compiz:

GtkEntry - time: 0,13
GtkComboBox - time: 4,43
GtkComboBoxEntry - time: 3,71
GtkSpinButton - time: 1,13
GtkProgressBar - time: 1,16
GtkToggleButton - time: 0,88
GtkCheckButton - time: 0,84
GtkRadioButton - time: 1,16
GtkTextView - Add text - time: 13,82
GtkTextView - Scroll - time: 1,75
GtkDrawingArea - Lines - time: 7,68
GtkDrawingArea - Circles - time: 6,96
GtkDrawingArea - Text - time: 8,41
GtkDrawingArea - Pixbufs - time: 0,69
 ---
Total time: 52,72

Bryce Harrington (bryce) on 2008-11-14
Changed in xorg-server:
status: New → Confirmed
jox (myemail-1) wrote :

Is there any hope to get a fix for this bug in 8.10?
It slows down so many things that it makes using Ubuntu a real pain sometimes.

Oibaf (oibaf) on 2008-11-25
description: updated
description: updated
description: updated

Michel Dänzer's patch was never applied. Does it still help the performance any?

W.r.t, the dixLookupPrivate issues - which seems to be more the focus of this bug report, I'm still seeing about 11% of world time used up in dixLookupPrivate with xorg 1.5.99.3 for the line rendering test I'm running. (Uses cairo, not XDrawLine)

Bryce Harrington (bryce) wrote :

We now carry the newer xserver in Jaunty:

xorg-server | 2:1.5.99.901-0ubuntu1 | http://se.archive.ubuntu.com jaunty/main Sources

Changed in xorg-server:
importance: Undecided → High
status: Confirmed → Triaged
status: Triaged → Fix Released
Oibaf (oibaf) wrote :

I already tested jaunty, but I left this bug open since I noticed it's still slower than 1.4, although faster than 1.5.2. Also note this comment on the upstream bug:
https://bugs.freedesktop.org/show_bug.cgi?id=16647#c24

(In reply to comment #24)
> Michel Dänzer's patch was never applied.

I pushed it to the master branch.

> W.r.t, the dixLookupPrivate issues - which seems to be more the focus of this
> bug report, I'm still seeing about 11% of world time used up in
> dixLookupPrivate with xorg 1.5.99.3 for the line rendering test I'm running.
> (Uses cairo, not XDrawLine)

I'm working on reducing the dixLookupPrivate calls in EXA by passing around the private pointers internally where possible. I'll hopefully have it ready for review soon, but I'm not sure how much of the overhead you're seeing it'll eliminate.

Changed in xorg-server:
importance: Unknown → Medium
Changed in xorg-server:
importance: Medium → Unknown
Changed in xorg-server:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.