Comment 158 for bug 252094

Revision history for this message
In , Carl Worth (cworth) wrote :

I've done some testing with OpenSUSE 11.1 Beta5 (the KDE4 LiveCD) and identified some performance issues. I haven't done comparisons with XAA yet, but instead compared performance against my standard development build. This currently consists of:

Linux 2.6.28-rc4 (from git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel for-review branch)
X server 1.5.99.1 (recent master)
xf86-video-intel 2.4.97 (recent master)

For testing I took x11perf with a selected set of tests (eliminating all of the uninteresting core rendering tests such as stippled fills, wide lines, ellipses, and arcs). I ran these tests against my "master" builds and then the same tests on the same hardware after booting the OpenSUSE live CD.

Finally, I manually scanned the results looking for cases where the performance differed by 2x or more. Below is a sorted list of the differences, showing the "master" performance followed by the OpenSUSE performance for each test. Also, for each test the relative performance is quantified (where "slowdown" means that the OpenSUSE performance is slower than the "master" performance---note that in two cases there is actually a speedup instead).

The next step would be to do profiling of some of the slowest tests, or perhaps to switch out one or more components to see what's contributing to the performance difference. Any contribution to those efforts from anyone would be most appreciated---as would any verification of these test results, or similar testing with XAA.

The copywinwin test is likely the most fundamental. And it perhaps is at the root of several of the other slowdowns.

Here's an x11perf command line that can be used to quickly obtain results for just these tests that seem interesting:

x11perf -repeat 2 \
  -aatrap1 -aatrap10 -aatrap2x1 -aatrap2x10 \
  -aa10text -aa24text -rgb10text -rgb24text \
  -scroll10 -scroll100 \
  -copywinwin10 -copywinwin100 \
  -copypixwin10 -copypixwin100 \
  -putimage10 -putimage100 \
  -shmput10 -shmput100 \
  -getimage100 -getimage500 \
  -compwinwin10 -compwinwin100 \
  -comppixwin10 -comppixwin100

And here are the results I obtained:

-aatrap2x1: 316000.0/sec
                 70900.0/sec 44.5x slowdown

-putimage10: 126000.0/sec
                  6280.0/sec 20.1x slowdown

-copywinwin10: 137000.0/sec
                  7570.0/sec 18.1x slowdown

-compwinwin10: 125000.0/sec
                  7520.0/sec 16.6x slowdown

-comppixwin10: 124000.0/sec
                  8360.0/sec 14.8x slowdown

-copypixwin10: 125000.0/sec
                 10000.0/sec 12.5x slowdown

-scroll10: 139000.0/sec
                 11500.0/sec 12.1x slowdown

-shmput10: 112000.0/sec
                 10700.0/sec 10.5x slowdown

-getimage100: 1350.0/sec
                  6240.0/sec 4.6x speedup (!)

-putimage100: 9420.0/sec
                  2260.0/sec 4.2x slowdown

-aatrap1: 325000.0/sec
                 79700.0/sec 4.1x slowdown

-aa24text: 57700.0/sec
                 14600.0/sec 4.0x slowdown

-shmput100: 14900.0/sec
                  4270.0/sec 3.5x slowdown

-aatrap10: 89800.0/sec
                 25400.0/sec 3.5x slowdown

-copywinwin100: 19100.0/sec
                  5750.0/sec 3.3x slowdown

-compwinwin100: 19100.0/sec
                  5730.0/sec 3.3x slowdown

-aa10text: 85100.0/sec
                 26200.0/sec 3.2x slowdown

-rgb10text: 76200.0/sec
                 24900.0/sec 3.0x slowdown

-comppixwin100: 18200.0/sec
                  6280.0/sec 2.9x slowdown

-scroll100: 19000.0/sec
                  6850.0/sec 2.8x slowdown

-aatrap2x10: 97000.0/sec
                 35200.0/sec 2.8x slowdown

-rgb24text: 45200.0/sec
                 16800.0/sec 2.7x slowdown

-copypixwin100: 18400.0/sec
                  6750.0/sec 2.7x slowdown

-shmput500: 1070.0/sec
                   420.0/sec 2.5x slowdown

-putimage500: 394.0/sec
                   161.0/sec 2.4x slowdown

-getimage500: 55.8/sec
                   106.0/sec 1.9x speedup (!)