Images corruption in firefox when using "sna"

Bug #1144558 reported by Sebastien Bacher on 2013-03-04
62
This bug affects 13 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
Medium
xserver-xorg-video-intel (Ubuntu)
High
Unassigned
Raring
High
Unassigned

Bug Description

Since "sna" is in using in raring users are reporting pixmap corruptions in firefox, I see the issue on an i5 cpu. The same problem has been reported by i965 users as well

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: xserver-xorg-video-intel 2:2.21.3-0ubuntu1
ProcVersionSignature: Ubuntu 3.8.0-8.17-generic 3.8.0
Uname: Linux 3.8.0-8-generic i686
.tmp.unity.support.test.0:

ApportVersion: 2.9-0ubuntu2
Architecture: i386
CompizPlugins: [core,composite,opengl,compiztoolbox,decor,snap,gnomecompat,mousepoll,place,session,resize,move,wall,grid,imgpng,vpswitch,unitymtgrabhandles,regex,animation,expo,fade,workarounds,scale,ezoom,unityshell]
CompositorRunning: compiz
CompositorUnredirectDriverBlacklist: '(nouveau|Intel).*Mesa 8.0'
CompositorUnredirectFSW: true
Date: Mon Mar 4 16:00:20 2013
DistUpgraded: Fresh install
DistroCodename: raring
DistroVariant: ubuntu
EcryptfsInUse: Yes
ExtraDebuggingInterest: Yes
GraphicsCard:
 Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) (prog-if 00 [VGA controller])
   Subsystem: Dell Device [1028:040a]
InstallationDate: Installed on 2010-10-09 (877 days ago)
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release i386 (20101007)
MachineType: Dell Inc. Latitude E6410
MarkForUpload: True
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
PlymouthDebug: Error: [Errno 13] Permission non accordée: '/var/log/plymouth-debug.log'
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-8-generic root=UUID=555ebc11-d747-44d3-af56-5e7d17851ce3 ro quiet splash vt.handoff=7
SourcePackage: xserver-xorg-video-intel
UpgradeStatus: No upgrade log present (probably fresh install)
XorgConf:
 Section "Device"
             Identifier "intel"
             Driver "intel"
             Option "Accelmethod" "uxa"
     EndSection
dmi.bios.date: 11/30/2011
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A11
dmi.board.name: 0HNGW4
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA11:bd11/30/2011:svnDellInc.:pnLatitudeE6410:pvr0001:rvnDellInc.:rn0HNGW4:rvr:cvnDellInc.:ct9:cvr:
dmi.product.name: Latitude E6410
dmi.product.version: 0001
dmi.sys.vendor: Dell Inc.
version.compiz: compiz 1:0.9.9~daily13.03.01-0ubuntu1
version.libdrm2: libdrm2 2.4.42-0ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 9.0.2-0ubuntu1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 9.0.2-0ubuntu1
version.xserver-xorg-core: xserver-xorg-core 2:1.13.2-0ubuntu2
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.3-0ubuntu2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.1.0-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.21.3-0ubuntu1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.6-0ubuntu3

Created attachment 75706
lspci -vvv

Hello.

Since I've upgraded from 2.20.18 version of intel driver page previews in Firefox are rendered improperly (see attached screenshot). Tested versions of intel driver are 2.20.{18,19} and 2.21.{0,2,3}, Firefox's versions are 17.0-19.0. I don't think it is a Firefox issues it is completely gone when downgrading back to 2.20.18.

My system is Gentoo amd64, currently with latest Firefox and intel driver. My current kernel version is 3.8 and it is vanilla. I am using SNA acceleration.
If there is any additional info that would be helpful I am ready to provide it.

Created attachment 75707
glxinfo -l -t

Created attachment 75708
Screenshot with example of corrupted rendering

I still haven't been able to reproduce this one yet. Do you have a foolproof (and remember just how big a fool I am!) recipe?

This issue happens occasionally, but I don't have a 100% reproducible way to show it. One of the most sucessfull attempts to reproduce it is:

1. make all `speed dial` buttons (previews on about:newtab) in Firefox filled with something reasonably heavy, not plain-text pages (on my machine it is a couple of youtube pages, web interface to SAGE, couple of redmines, etc.)
2. close all tabs except one and this last one tab should be about:newtab page
3. middle-click all the previews as fast as you can one by one, so the pages begin to load in background
3. now hit Ctrl+W till you close everything including that about:newtab page where you've started. You shouldn't wait until all pages you've opened on step 3 are loaded.
4. now open about:newtab again and with a good chance some of the preview will be corrupted. Sometimes there is no corruption, but some preview is displayed on the wrong position, for example two different sites share the same preview image.

Another way to reproduce:

1. make at least one `speed dial` button (preview on about:newtab) in Firefox filled with any kind of preview, just any site you want
2. close all tabs except one and this last one tab should be about:newtab page
3. go to http://www.dreamworksanimation.com/ and add it to bookmarks, then close tab (sorry, bookmarking is the only way I know to make a specific site to show up in previews)
4. open about:newtab again and remove any preview image from it by pressing [X]
5. open bookmarks and drag dreamworksanimation bookmark you've made on step 3 into the freed on step 4 place
6. now visit http://www.dreamworksanimation.com/ so Firefox will generate preview
7. close tab and open again about:newtab. The preview for dreamworksanimation should be corupted

Sorry if the descriptions are a bit messy. Also I don't have any other issues with firefox sites rendering, just issues with rendering previews. I wish there was an easier way to reproduce it.

I did git bisecting between 2.20.18 and 2.20.19 and the result is this commit:

dc643ef753bcfb69685f1eb10828d0c8f830c30e is the first bad commit
commit dc643ef753bcfb69685f1eb10828d0c8f830c30e
Author: Chris Wilson <email address hidden>
Date: Thu Jan 17 12:27:55 2013 +0000

    sna: Apply read-only synchronization hints for move-to-cpu

    Signed-off-by: Chris Wilson <email address hidden>

:040000 040000 0f53950ba9a9756a39722f12c322c2d629c1a2a4 d5ff0a7307cc718ee94c78ee2fb1c9bf6158ed91 M src

As this bug is not 100% reproducible it could slipped out of my sight during some bisect runs, however it is something to start with. What do you think? Could this sommit lead to the rendering problems I have?

There was a related bug, fixed with

commit 19bd005056a2083de64753681b96716996e4237d
Author: Chris Wilson <email address hidden>
Date: Fri Feb 22 12:05:04 2013 +0000

    sna: Avoid migrating and making the GPU bo busy prior to mmapping it

    References: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1131134
    Signed-off-by: Chris Wilson <email address hidden>

that I thought was already in 2.21.3 and so you had tested it. It is actually in master, so can you try compiling from git and checking if that fixes the issue?

I'll admit to not fully explaining how that prevented the corruption, as the damage should had been migrated and then the kernel should have stalled upon the read... But it did have an effect and prevented a similar issue that bisected to the same commit.

(In reply to comment #6)
> It is
> actually in master, so can you try compiling from git and checking if that
> fixes the issue?

I've just tested master and the issue is still there.

Created attachment 75871
xf86-video-intel-2.21.3-revert-dc643ef753bcfb69685f1eb10828d0c8f830c30e.patch

With this patch applied on top of xf86-video-intel-2.21.3 the problem is gone (at least I tried hard to reproduce it, but failed). This patch is simply reverting dc643ef753bcfb69685f1eb10828d0c8f830c30e commit mentioned above.

Can you try converting each of those kgem_bo_sync__cpu_full() back to kgem_bo_sync__cpu() individually and see if we can narrow it down to one particular path?

Created attachment 75892
Force CPU synchronisation after writes

Another test to try.

Sebastien Bacher (seb128) wrote :
Sebastien Bacher (seb128) wrote :

(the xorg.conf there has uxa since I was asked to try if that fixes the corruption issue, and it does)

Chris Coulson (chrisccoulson) wrote :

Here's an example from bugzilla.mozilla.org

Chris Wilson (ickle) wrote :

Please separate out gen4 reports as that GPU has known issues that need to be addressed.

Sebastien Bacher (seb128) wrote :

sorry but what chipsets are gen4? that bug is about the i5 issue and we should open a new one about i965?

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → High
status: New → Triaged
Chris Wilson (ickle) wrote :

There's a transient misrender in gen4 (a flicker that gets redrawn differently every time, normally correctly): 1098489. I mention it so that we don't automatically confuse the two, though this upstream bug will affect all gen.

Chris Wilson (ickle) wrote :

Also there are some patches in ppa:xorg-edgers that will help, but I'm still hunting the root cause.

Sebastien Bacher (seb128) wrote :

@Chris: ok, thanks for the details, let me know if I can help testing a fix!

(In reply to comment #11)
> Created attachment 75892 [details] [review]
> Force CPU synchronisation after writes
>
> Another test to try.

With this patch applied on top of 2.21.3 the problem seems to be fixed.

Created attachment 75920
kgem_bo_sync__cpu_full-revert-bad.patch

(In reply to comment #10)
> Can you try converting each of those kgem_bo_sync__cpu_full() back to
> kgem_bo_sync__cpu() individually and see if we can narrow it down to one
> particular path?

With this patch on top of 2.21.3 I've hit the bug almost immediately. In this case I've left first kgem_bo_sync__cpu_full() as is and converted only second one.

Created attachment 75921
kgem_bo_sync__cpu_full-revert-good.patch

(In reply to comment #10)
> Can you try converting each of those kgem_bo_sync__cpu_full() back to
> kgem_bo_sync__cpu() individually and see if we can narrow it down to one
> particular path?

With this patch on top of 2.21.3 I was unable to reproduce the bug anymore. In this case I've converted first kgem_bo_sync__cpu_full() and left second one as is.

I've looked through all callers to see if I can find one that missed the MOVE_WRITE to no avail. I've double checked the kernel to see if there is a loop hole, again to no avail. So I'm a little bit lost to see where the missed synchronisation is coming from, and I haven't yet thought of a good test to force/catch an error.

In the meantime, I've applied one minor tweak to xf86-video-intel.git,

commit 60ec35b8d25ecfabf1744ea7bc81109d7f2a90e2
Author: Chris Wilson <email address hidden>
Date: Tue Mar 5 11:14:37 2013 +0000

    sna: Be explicit when checking for an idle bo after CPU synchronisation

Do you mind giving that a quick test?

Also one other test is to try with the drm-intel-next kernel.

(In reply to comment #15)
> I've looked through all callers to see if I can find one that missed the
> MOVE_WRITE to no avail. I've double checked the kernel to see if there is a
> loop hole, again to no avail. So I'm a little bit lost to see where the
> missed synchronisation is coming from, and I haven't yet thought of a good
> test to force/catch an error.
>
> In the meantime, I've applied one minor tweak to xf86-video-intel.git,
>
> commit 60ec35b8d25ecfabf1744ea7bc81109d7f2a90e2
> Author: Chris Wilson <email address hidden>
> Date: Tue Mar 5 11:14:37 2013 +0000
>
> sna: Be explicit when checking for an idle bo after CPU synchronisation
>
> Do you mind giving that a quick test?

OK, I'll test it later today

(In reply to comment #16)
> Also one other test is to try with the drm-intel-next kernel.

Could you please give me a quick link to their git repo?
Would 3.9-rc1 would be enough?

Created attachment 76040
Disable read-read optimisations

And one last request, can you please test that this patch as a temporary solution?

(In reply to comment #20)
> Created attachment 76040 [details] [review]
> Disable read-read optimisations
>
> And one last request, can you please test that this patch as a temporary
> solution?

This patch also fixes the issue. It was tested on 3.7.10 kernel as well as all previous patches. Now gonna try with drm-intel-next.

(In reply to comment #21)
> (In reply to comment #20)
> > Created attachment 76040 [details] [review] [review]
> > Disable read-read optimisations
> >
> > And one last request, can you please test that this patch as a temporary
> > solution?
>
> This patch also fixes the issue. It was tested on 3.7.10 kernel as well as
> all previous patches. Now gonna try with drm-intel-next.

Thanks. In the meantime, I'm going to push the temporary workaround - obviously I still hope to find the real bug.

Chris Wilson (ickle) wrote :

I've pushed a workaround for what I think is this bug to xf86-video-intel, can people try xorg-edgers in the next day or so and see if it fixes this one as well?

Changed in xserver-xorg-video-intel (Ubuntu Raring):
status: Triaged → In Progress

(In reply to comment #16)
> Also one other test is to try with the drm-intel-next kernel.

Ok, just tried out today's drm-intel-next kernel and was unable to reproduce this bug anymore. This sounds like good news.

(In reply to comment #23)
> (In reply to comment #16)
> > Also one other test is to try with the drm-intel-next kernel.
>
> Ok, just tried out today's drm-intel-next kernel and was unable to reproduce
> this bug anymore. This sounds like good news.

Oh, wait, I forgot to rebuild xf86-video-intel without patch. Sorry. Will try vanilla now

/o\ Can you confirm that result with vanilla xf86-video-intel?

(In reply to comment #25)
> /o\ Can you confirm that result with vanilla xf86-video-intel?

Sorry to disappoint you, but the issue is reproducible with vanilla xf86-video-intel and drm-intel-next.

Sebastien Bacher (seb128) wrote :

Can I rebuild the new intel package on raring? edgers has the new xserver right? I would prefer stay on what is shipping in raring if that's possible, I'm happy to rebuild the intel package and test it though

Chris Wilson (ickle) wrote :

Of course you can, I just picked xorg-edgers for convenience.

Sebastien Bacher (seb128) wrote :

I still get the issue with that version:
ii xserver-xorg-video-intel 2:2.21.3+git20130306.779fc0b2-0ubuntu0sarvatt i386 X.Org X server -- Intel i8xx, i9xx display driver

(took the source from the xorg-edgers ppa and rebuilt it locally on raring)

Sebastien Bacher (seb128) wrote :

(the version seems older than the comment about the fix so I will wait for/try with the next update)

Chris Wilson (ickle) wrote :

Right, it will be in the following update to xorg-edgers. You can just build it from xf86-video-intel.git... :)

bugbot (bugbot) on 2013-03-07
tags: added: corruption

(In reply to comment #22)
> Thanks. In the meantime, I'm going to push the temporary workaround -
> obviously I still hope to find the real bug.

Is there a way I can help? Attach some debug info or test something?

(In reply to comment #27)
> (In reply to comment #22)
> > Thanks. In the meantime, I'm going to push the temporary workaround -
> > obviously I still hope to find the real bug.
>
> Is there a way I can help? Attach some debug info or test something?

If you change the define in src/sna/sna_accel.c:

diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
index ae6d3c1..5edad51 100644
--- a/src/sna/sna_accel.c
+++ b/src/sna/sna_accel.c
@@ -57,7 +57,7 @@
 #define FORCE_INPLACE 0
 #define FORCE_FALLBACK 0
 #define FORCE_FLUSH 0
-#define FORCE_FULL_SYNC 1 /* https://bugs.freedesktop.org/show_bug.cgi?id=61628 */
+#define FORCE_FULL_SYNC 0

 #define DEFAULT_TILING I915_TILING_X

that restores the buggy behaviour. If you can keep running with that patch and with --enable-debug to check if any assertions are triggered and see how things progress.

(In reply to comment #28)
> If you can keep running with that patch
> and with --enable-debug to check if any assertions are triggered and see how
> things progress.

OK, I've did what you've said, powered on and started to watch Xorg.0.log.

The first thing I did was to open Firefox and trigger this issue several times - no output.
Then I've tried to simulate some typical workflow i.e. opened programs I use on a daily basis and do some things inside them like checking mail, browsing a couple of webpages - still no output.
Then I've decided to close them and return to Firefox and again triggered this issue several times and opened a couple of heavy tabs with flash and suddenly caught this:

(EE) [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x34) [0x5969b4]
(EE) 1: /usr/bin/X (mieqEnqueue+0x263) [0x5776c3]
(EE) 2: /usr/bin/X (0x400000+0x4fcd4) [0x44fcd4]
(EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f236e1d0000+0x6208) [0x7f236e1d6208]
(EE) 4: /usr/bin/X (0x400000+0x7a477) [0x47a477]
(EE) 5: /usr/bin/X (0x400000+0xa5527) [0x4a5527]
(EE) 6: /lib64/libpthread.so.0 (0x3a9c400000+0x10bf0) [0x3a9c410bf0]
(EE) 7: /lib64/libc.so.6 (ioctl+0x7) [0x3a9bce3437]
(EE) 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x3fd3c040d8]
(EE) 9: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f236fd9c000+0x1c1a0) [0x7f236fdb81a0]
(EE) 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f236fd9c000+0x1d9f7) [0x7f236fdb99f7]
(EE) 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f236fd9c000+0x4fe3a) [0x7f236fdebe3a]
(EE) 12: /usr/bin/X (BlockHandler+0x44) [0x43f224]
(EE) 13: /usr/bin/X (WaitForSomething+0x11d) [0x593e7d]
(EE) 14: /usr/bin/X (0x400000+0x3ade2) [0x43ade2]
(EE) 15: /usr/bin/X (0x400000+0x29b5a) [0x429b5a]
(EE) 16: /lib64/libc.so.6 (__libc_start_main+0xed) [0x3a9bc2460d]
(EE) 17: /usr/bin/X (0x400000+0x29eb1) [0x429eb1]
(EE)
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
(EE) [mi] mieq is *NOT* the cause. It is a victim.
[ 8739.251] [mi] Increasing EQ size to 512 to prevent dropped events.
[ 8739.251] [mi] EQ processing has resumed after 64 dropped events.
[ 8739.251] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.

After that I've tried to reproduce this trace again opening same tabs and triggering issue again and again, but without any luck. Is this stack trace useful in any way?

Hmm, I expect dmesg to contain a GPU hang and /sys/kernel/debug/0/i915_error_state to be populated, mind attaching it?

(In reply to comment #30)
> Hmm, I expect dmesg to contain a GPU hang and
> /sys/kernel/debug/0/i915_error_state to be populated, mind attaching it?

Too bad I turned off my machine later after I've caught that stack trace, so I can't give you the dump of i915_error_state, but I was checking both dmesg and xsession-errors and there was nothing unusual and no signs of error output from i915.

I'll try to catch it again and if I do I'll attach dmesg and dump of i915_error_state here.

Pete Graner (pgraner) wrote :

I hit this bug on the date time setting screen. See attached screenshot (plus & minus buttons)

Chris Wilson (ickle) wrote :

No, that's bug 1131134.

Timo Aaltonen (tjaalton) wrote :

Pete, are you running current raring? It should have the commit from 1131134 already.

Chris Wilson (ickle) wrote :

re 1131134, I mistaken hit the 'fix released' too soon, it didn't make the 2.21.3 cut as I had believed.

Anyway, I would like confirmation that xorg-edgers fixes these observed issues before making 2.21.4 which I want to do asap...

*** Bug 61610 has been marked as a duplicate of this bug. ***

Sebastien Bacher (seb128) wrote :

@Chris: sorry for the delay, I didn't run into the issue so often before so I wanted to give it some testing time, it seems good to me so far, thanks for the fix!

Chris Coulson (chrisccoulson) wrote :

I no longer get the issue with the version from xorg-edgers either

Chris Wilson (ickle) on 2013-03-11
Changed in xserver-xorg-video-intel (Ubuntu Raring):
status: In Progress → Fix Committed
Chris Wilson (ickle) on 2013-03-11
Changed in xserver-xorg-video-intel (Ubuntu Raring):
status: Fix Committed → Fix Released

(In reply to comment #28)
> If you change the define in src/sna/sna_accel.c:
>
> diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
> index ae6d3c1..5edad51 100644
> --- a/src/sna/sna_accel.c
> +++ b/src/sna/sna_accel.c
> @@ -57,7 +57,7 @@
> #define FORCE_INPLACE 0
> #define FORCE_FALLBACK 0
> #define FORCE_FLUSH 0
> -#define FORCE_FULL_SYNC 1 /*
> https://bugs.freedesktop.org/show_bug.cgi?id=61628 */
> +#define FORCE_FULL_SYNC 0
>
> #define DEFAULT_TILING I915_TILING_X
>
> that restores the buggy behaviour. If you can keep running with that patch
> and with --enable-debug to check if any assertions are triggered and see how
> things progress.

I've been running this way ever since you've asked me, but that stack trace was the only one I was able to trigger, though improper rendering happened a lot.
I am positive that when I caught that trace there were no errors in dmesg.

Now, 2.21.4 is out and I will continue trying to catch something, though
since it happens only in firefox maybe there is issue somewhere else?
What versions of firefox, cairo and gtk do you have?

Also I've noticed this message in .xsession-errors whenever I move previews in Firefox:

(firefox:3574): GdkPixbuf-CRITICAL **: gdk_pixbuf_new: assertion `width > 0' failed

This happens both with FORCE_FULL_SYNC 0 and 1.

I've been primarily using iceweasel (based on ff10) with the system cairo as that is many times faster for gfx. But I've also been using the bloated ff from ubuntu and fedora on different systems (and they use the ancient cairo embedded into firefox). There are a lot of differences in cairo between those versions, so it would not surprise me if it was a bug specific to an older cairo. But I've hoped to have seen it by now as well. :|

I've just tested binary Firefox's versions from their site. I've tried latest versions of 16,17,18 and 19 branches and I was able to trigger the issue in all of them.

Will play with cairo versions now, my current cairo is 1.10.2 with some distro patches on top.

Just note well that all firefox post version-10 use their builtin version of cairo. In order to use system cairo, firefox needs a patch to remove its reliance upon non-upstreamed API.

Tested firefox-19.0.2 with all available versions of cairo from repos: 1.10.2, 1.12.8, 1.12.10, 1.12.12. Issue is reproducible with all versions.

(In reply to comment #36)
> Just note well that all firefox post version-10 use their builtin version of
> cairo. In order to use system cairo, firefox needs a patch to remove its
> reliance upon non-upstreamed API.

Thanks for info, though I am using Gentoo and use Firefox built from sources on my machine and it is distro-patched to link against system-wide cairo so it's fine.

Hmmm, that's news to me. Do you have a link to the patches they apply against firefox?

Or a simple test is something like: http://ie.microsoft.com/testdrive/Performance/ParticleAcceleration/ which should be CPU bound in Xorg and not firefox.

Also I've noticed that "disable read-read optimisations" patch practically does the same as converting kgem_bo_sync__cpu_full back to kgem_bo_sync__cpu (I may be wrong here though it looks this way to me). I will not question this as you are developer and know best, though as tests shown only one particular branch of kgem_bo_sync__cpu_full triggers this issue, see kgem_bo_sync__cpu_full-revert-bad.patch. Maybe you could add some asserts in that branch, I will apply them and give you some more info?

(In reply to comment #38)
> Hmmm, that's news to me. Do you have a link to the patches they apply
> against firefox?

http://mirror.yandex.ru/gentoo-distfiles/distfiles/firefox-19.0-patches-0.3.tar.xz

> Or a simple test is something like:
> http://ie.microsoft.com/testdrive/Performance/ParticleAcceleration/ which
> should be CPU bound in Xorg and not firefox.

Well, I've visited this link and see some spherical thingy made of particles. What should I check?

(In reply to comment #40)
> Well, I've visited this link and see some spherical thingy made of
> particles. What should I check?

Just look at top; For this particular benchmark, it should be ratelimited by the Xorg process not firefox - or better look at sudo perf top, if firefox is hitting pixman functions, it is a bad firefox.

Seems like gentoo has the right patch though, it should be fine. Now if only the other distros also used that patch :(

(In reply to comment #42)
> Seems like gentoo has the right patch though, it should be fine. Now if only
> the other distros also used that patch :(

So, should I check top or not? Because I am a bit confused what exactly means
"ratelimited by the Xorg process not firefox". I am building perf right now though.

(In reply to comment #41)
> (In reply to comment #40)
> > Well, I've visited this link and see some spherical thingy made of
> > particles. What should I check?
>
> Just look at top; For this particular benchmark, it should be ratelimited by
> the Xorg process not firefox - or better look at sudo perf top, if firefox
> is hitting pixman functions, it is a bad firefox.

When running this demo in firefox `# perf top` says "42% libpixman-1.so.0.29.2" and this line sits on top of the list. Does that mean bad firefox? :(

Only if that pixman time is inside firefox and not Xorg... Have gentoo also disabled server-side gradients in cairo?

(In reply to comment #45)
> Only if that pixman time is inside firefox and not Xorg...

I am not familiar with this tool. How do I check this?

> Have gentoo also
> disabled server-side gradients in cairo?

Yes, part of changelog:

10 Sep 2010; Samuli Suominen <email address hidden>
+cairo-1.10.0-r2.ebuild, +files/cairo-1.10.0-buggy_gradients.patch:
Do not use server-side gradients. It hurts performance, and causes bad
rendering on at least nvidia. Bug 336696.

And this patch is still applied on top of cairo version I am running now. Though maintainers added option to disable it in the latest version in tree. It enabled by default though, so I tested this version also with disabled gradients. Should I check without it?

Yeah, that gradient patch dramatically hurts performance on Nvidia and Intel systems, whilst having little impact on EXA systems. Kill that patch with fire.

(In reply to comment #48)
> Yeah, that gradient patch dramatically hurts performance on Nvidia and Intel
> systems, whilst having little impact on EXA systems. Kill that patch with
> fire.

Tested without this patch, but the issue is still presented.

Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
status: Unknown → Confirmed

What do you think about comment #39? And how can I check if pixman time shown in `perf top` belongs to Xorg or Firefox? (see comment #46)

If you have the ncurses gui, the second column shows you the "comm" i.e. the process name. Similarly in the perf report.

I'm trying to install gentoo to see if that helps (the prospect of a modern ff using system cairo is very appealing).

(In reply to comment #51)
> If you have the ncurses gui, the second column shows you the "comm" i.e. the
> process name. Similarly in the perf report.

Oh, finally, I was able to get it. Yes, that pixman rendering belongs to Firefox process, not Xorg. Though there is somehow no "comm" column in my perf-top, ncurses gui allows to zoom into threads and that's the solution.

> I'm trying to install gentoo to see if that helps (the prospect of a modern
> ff using system cairo is very appealing).

That't nice to hear :) We have a handbook which covers most of the aspects of installation, but if you'll get stuck somewhere feel free to send me an e-mail, I'll be glad to help you.

Reading http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/www-client/firefox/firefox-19.0.2.ebuild?view=markup it seems that the use of system-cairo has been dropped. Which is a shame.

On the positive news though the latest unstable cairo has dropped the buggy gradients patch (unless legacy-drivers is set).

(In reply to comment #53)
> Reading
> http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/www-client/firefox/
> firefox-19.0.2.ebuild?view=markup it seems that the use of system-cairo has
> been dropped. Which is a shame.

Well, you've seen the patches applied on top of firefox and support for system cairo is there. Out of curiosity I've run some initial steps of firefox build and here's a bit filtered result:

grep cairo /var/tmp/portage/www-client/firefox-19.0.2/temp/build.log
 * 6009_fix_system_cairo_support.patch ...
    --enable-system-cairo system_libs
    --enable-default-toolkit=cairo-gtk2 mozilla.org default
  --enable-system-cairo
  --enable-default-toolkit=cairo-gtk2
checking for cairo >= 1.10... yes
checking CAIRO_CFLAGS... -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libdrm -I/usr/include/libpng15
checking CAIRO_LIBS... -lcairo
checking for cairo-tee >= 1.10... yes
checking CAIRO_TEE_CFLAGS... -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libdrm -I/usr/include/libpng15
checking CAIRO_TEE_LIBS... -lcairo
checking for cairo-xlib-xrender >= 1.10... yes
checking CAIRO_XRENDER_CFLAGS... -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libdrm -I/usr/include/libpng15
checking CAIRO_XRENDER_LIBS... -lcairo -lXrender -lX11

and this is output from already built firefox I am running now:

ldd /usr/lib/firefox/libxul.so | grep cairo
        libcairo.so.2 => /usr/lib64/libcairo.so.2 (0x00007f205d497000)
        libpangocairo-1.0.so.0 => /usr/lib64/libpangocairo-1.0.so.0 (0x00007f2059cc2000)

So, system-wide cairo enabled at build time and it is really there as shown by ldd.

(In reply to comment #53)
> Reading
> http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/www-client/firefox/
> firefox-19.0.2.ebuild?view=markup it seems that the use of system-cairo has
> been dropped. Which is a shame.

You are not seeing thing like "we're enabling system cairo here ..." directly in ebuild because it is done inside mozcoreconf-2.eclass which inherited by mozconfig-3.eclass which inherited by firefox ebuild. Inheriting eclass can be thought of as pretty close equavivalent of using #include directive in C.

(In reply to comment #53)
> Reading
> http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/www-client/firefox/
> firefox-19.0.2.ebuild?view=markup it seems that the use of system-cairo has
> been dropped. Which is a shame.

And the last one, you can find sources of eclasses in your $PORTDIR/eclass dir which is most probably /usr/portage/eclass.

P.S. sorry for a burst of comments.

Ok, I have ff-19 built at last using gentoo ~amd64 on a lowly ilk. It seems to be doing the right thing regarding using system-cairo and server-side gradients. Next step is to piece together enough components to see if I can reproduce the bug.

(In reply to comment #57)
> Ok, I have ff-19 built at last using gentoo ~amd64 on a lowly ilk. It seems
> to be doing the right thing regarding using system-cairo and server-side
> gradients. Next step is to piece together enough components to see if I can
> reproduce the bug.

Ok, tell me what info I should provide and I'll post it.

As s first step, my firefox and xf86-video-intel USE-flags are:
x11-drivers/xf86-video-intel-2.21.4 was built with the following:
USE="dri sna udev xvmc -glamor -uxa"

www-client/firefox-19.0.2 was built with the following:
USE="alsa dbus gstreamer jit libnotify minimal (multilib) pgo system-jpeg wifi -bindist -custom-cflags -custom-optimization -debug (-selinux) -startup-notification -system-sqlite" ABI_X86="64" LINGUAS="ru -af -ak -ar -as -ast -be -bg -bn_BD -bn_IN -br -bs -ca -cs -csb -cy -da -de -el -en_GB -en_ZA -eo -es_AR -es_CL -es_ES -es_MX -et -eu -fa -fi -fr -fy_NL -ga_IE -gd -gl -gu_IN -he -hi_IN -hr -hu -hy_AM -id -is -it -ja -kk -km -kn -ko -ku -lg -lt -lv -mai -mk -ml -mr -nb_NO -nl -nn_NO -nso -or -pa_IN -pl -pt_BR -pt_PT -rm -ro -si -sk -sl -son -sq -sr -sv_SE -ta -ta_LK -te -th -tr -uk -vi -zh_CN -zh_TW -zu"
CFLAGS="-march=core2 -mtune=generic -pipe -mno-avx"
CXXFLAGS="-march=core2 -mtune=generic -pipe -mno-avx"

I was able to reproduce that stack trace from Xorg log and intel driver is not an issue here at all.

I found out that the cause of this is the fast spinning mouse wheel. I have a mouse with a wheel which can be scrolled like in 'free roam' mode, without that 'clicks', you know. And if I scroll too fast that stack appears. As before dmesg is clean from any i915 errors and no error state was caught. So, that stack is not related to the bug at all.

I'm still using the optimized flushes on all of my machines and have yet to encounter corruption. :|

Well, I am still experiencing this issue even with latest intel driver :(

Are you running Gentoo now? What is your setup? Could you please give me the output of `emerge --info firefox` and `emerge --info xf86-video-intel`?

I haven't tried Firefox 20 yet though. Could it be the issue in Firefox itself?

Same issue with firefox 20 and xf86-video-intel 2.21.5

Hello.

At last, there is some positive dynamic! Though I still from time to time see corrupted rendering of certain elements on some pages, but at least I haven't seen for a while any completely corrupted previews like it was before. Portions of previews could be corrupted, but only those parts which are rendered corrupted while browsing. So now there are no previews consisiting of complete garbage.
(Both previews and pages are rendered via same drawWindow function in firefox as far as I can tell from sources)

Updates that introduced(?) these changes:

libdrm 2.4.43 -> 2.4.44
xorg-server 1.13.1 -> 1.13.4
GTK+ 2.24.16 -> 2.24.17
agg 2.5 -> 2.5-r2 (nothing big, maintainer changed couple of build options; added in the list because I use gnash in Firefox which uses agg, so maybe somehow connected)

There were other updates, but these are the only changes that are possibly related to the effects I see. I was (and currently do) running xf86-video-intel-2.21.6 with disabled FORCE_SYNC all the time.

That's unexpected - those updates should have had no impact upon this issue. :|

(In reply to comment #64)
> That's unexpected - those updates should have had no impact upon this issue.
> :|

Nevertheless, the overall look and feel in firefox was improved somehow. Now I've updated mesa to 9.1.2 and kernel to 3.9.0 and these positive effects are preserved.

The situation is much better now than it was when I opened this bug: I don't have random huge screen corruptions in firefox both in thumbnails and during normal browsing. Though I can still trigger this issue and get corrupted page preview, it doesn't interfere with browsing. All other applications are unaffected.

Since, things are quite good now, maybe it is a good idea to enable back that optimizations? What do you think? It looks like I am the only one who has this issue:(

Ok, having made a new release, it is time to see if anyone else is seeing this bug:

commit 8e42637050275945200797538a34c13c90b295cc
Author: Chris Wilson <email address hidden>
Date: Tue May 21 11:13:03 2013 +0100

    sna: Re-enable read-read optimisations

(In reply to comment #66)
> Ok, having made a new release, it is time to see if anyone else is seeing
> this bug:
>
> commit 8e42637050275945200797538a34c13c90b295cc
> Author: Chris Wilson <email address hidden>
> Date: Tue May 21 11:13:03 2013 +0100
>
> sna: Re-enable read-read optimisations

Thank you. I'll update this bug with any new info if I notice any changes bad or good.

Changed in xserver-xorg-video-intel:
status: Confirmed → Incomplete

(In reply to comment #68)
> It's back:
> https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/
> 1189850

Thanks for the link. I've tried today's xf86-video-intel git with the commit which is marked as a solution via link you provided. I can confirm that I was unable to reproduce this issue, but I cannot say for sure as with recent changes this bug on my machine apperars much more rarely than before. It can reappear later, but I hope it won't. I'll provide any new info here if any.

Changed in xserver-xorg-video-intel:
status: Incomplete → Confirmed

commit 22fd5ca947b58901927d100d2b1aa0f1672b3435
Author: Chris Wilson <email address hidden>
Date: Fri Jun 28 16:54:08 2013 +0100

    drm/i915: Only clear write-domains after a successful wait-seqno

    In the introduction of the non-blocking wait, I cut'n'pasted the wait
    completion code from normal locked path. Unfortunately, this neglected
    that the normal path returned early if the wait returned early. The
    result is that read-only waits may return whilst the GPU is still
    writing to the bo.

    Fixes regression from
    commit 3236f57a0162391f84b93f39fc1882c49a8998c7 [v3.7]
    Author: Chris Wilson <email address hidden>
    Date: Fri Aug 24 09:35:09 2012 +0100

        drm/i915: Use a non-blocking wait for set-to-domain ioctl

    Signed-off-by: Chris Wilson <email address hidden>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66163
    Cc: <email address hidden>
    Signed-off-by: Daniel Vetter <email address hidden>

This bug just reappeared with xf86-video-intel-2.21.10. Next thing I am going to try is this commit you've posted above.

(In reply to comment #70)
> commit 22fd5ca947b58901927d100d2b1aa0f1672b3435
> Author: Chris Wilson <email address hidden>
> Date: Fri Jun 28 16:54:08 2013 +0100
>
> drm/i915: Only clear write-domains after a successful wait-seqno
>
> In the introduction of the non-blocking wait, I cut'n'pasted the wait
> completion code from normal locked path. Unfortunately, this neglected
> that the normal path returned early if the wait returned early. The
> result is that read-only waits may return whilst the GPU is still
> writing to the bo.
>
> Fixes regression from
> commit 3236f57a0162391f84b93f39fc1882c49a8998c7 [v3.7]
> Author: Chris Wilson <email address hidden>
> Date: Fri Aug 24 09:35:09 2012 +0100
>
> drm/i915: Use a non-blocking wait for set-to-domain ioctl
>
> Signed-off-by: Chris Wilson <email address hidden>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66163
> Cc: <email address hidden>
> Signed-off-by: Daniel Vetter <email address hidden>

Yes, this commit fixes the issue for me (on 3.10 kernel with this patch only).

Thanks a lot for your help!

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Phill (phill.l) wrote :

Noticed it was "Fix Released" for Raring, but problem appears to remain, attached is screen shot showing corrupt image on right, and photo showing different type of corruption that fixes itself if you take a screen shot, below is my version information (13.04 64-bit after apt-get update/upgrade/reboot)...

phill@phill-desktop:~$ dpkg-query --list xserver-xorg-video-intel
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-============-============-=================================
ii xserver-xorg-v 2:2.21.6-0ub amd64 X.Org X server -- Intel i8xx, i9x

If this is caused by a different problem can you please point me in the direction of any relevant existing bug report.

I get this on two different computers - in "About this computer" one says "Intel G45/G43", the other has "Intel Ironlake Mobile".

Chris Wilson (ickle) wrote :

Workaround that I put in for raring was presumed to be sufficient. Since then we have identified and fixed the root cause, but that fix has yet to even land in saucy.

Timo Aaltonen (tjaalton) wrote :

opening again for saucy, although 3.11 will be there within the next day or so

Changed in xserver-xorg-video-intel (Ubuntu):
status: Fix Released → Fix Committed
Maarten Lankhorst (mlankhorst) wrote :

3.11 is in saucy now, so marking bug as fixed.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Fix Committed → Won't Fix
status: Won't Fix → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.