Comment 151 for bug 541511

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

> --- Comment #124 from Indan Zupancic <email address hidden> 2010-04-19 11:26:15 PDT ---
> > Depends. It's definitely just a failed chipset flush (I've checked the
> > offset). But given enough time and testers, this is somewhat expected
> > because this patch just implements a probabilistic chipset flush. Tallying
> > all the chipset flushes of all testers easily gives on the order of 100mm
> > successful ones. Now if yours is the only one that failed, that's not a
> > problem. Please keep an eye on this and report any reoccurences - some
> > more tuning might be called for (perhaps even a module parameter).
>
> Well, it's curious I never got it with the v7 patch, while I ran that one for
> days.
>
> A module parameter to dis/enable this canary stuff would be good, it just seems
> to slow things down for me without improving anything.
>
> I wonder if it's in any way significant that the difference between the
> expected 827 and cpu_read 315 is precisely 512... Did anyone try to increase
> I830_MCH_WRITE_BUFFER_SIZE to something bigger?

The fact that it's 512 shows that the problem is a failed cacheflush and
nothing else (this is actually what I've checked). The chipset flush
checker changes the place it writes the check value every chipset flush.
And it reuses the same place every 512th chipset flush. So when the
chipset flush failes, the old value is there, which should be exactly 512
less than what's expected.

> Looking at the commit, especially the description, it seems like there's no way
> to do proper chipset flushes. Maybe hunt down and confront an Intel developer?
> Or avoid the need to do flushes, but that's probably unrealistic. On the other
> hand, if you can't really flush, you can't really depend on it either.

Well, there is _no_ way to do a reliable flush. And the hw docs explicitly
says so. But we need to move stuff in/out of the graphics mem (i.e. the
gtt). The other option would be to copy stuff in/out, which is even worse:
- Wastes memory (actually simply uses twice as much for everything).
- Would be even slower than what my hack currently does.

And to add insult to injury, some of the chipsets from the 2nd gen (i8xx)
suffer from other cache coherency problems in addition to this.

> > Also please report if the glyph corruptions show up again.
>
> I will.
>
> Okay, while writing this I got a second warning:
>
> i8xx chipset flush failed, expected: 4043, cpu_read: 3531

Ok, that's bad. Can you change the following define in
include/drm/intel-gtt.h and see whether you still get failed chipset
flushes?

-#define I830_CC_CANARY_FLOCK_GTT_PAGES 8
+#define I830_CC_CANARY_FLOCK_GTT_PAGES 16

The whole stuff make somewhat more sense this way around, anyway.

Oh, and add some details about your box, please (brand&model + cpu,
mostly, the rest is all in the dmesg, anyway).