Comment 39 for bug 541511

Revision history for this message
In , legolas558 (legolas558) wrote :

(In reply to comment #37)
> > --- Comment #36 from legolas558 <email address hidden> 2010-03-26 04:51:08 PST ---
> > what if we "give up" on coherency of the first n flushes? something like the
> > screen test of arcade machines, except that we just don't check if test is OK,
> > pretending that coherency becomes consistent after that; I have often found
> > that the last graphics contents of the LVDS pipe are persistent between a
> > reboot (when using a liveCD, for example) and you can actually see the last
> > screenshot a while before screen gets properly cleared and initialized.
>
> That's exactly what my patch currently does - it simply gives up after too
> many retries. Now if this corrupts a pixmap/texture, it just yields visual
> corruptions on the screen. But this can also corrupt the gpu command
> buffer (and some other vital things). And if the gpu reads crap from
> these, it usually just hangs itself.
>
Yes I can understand this; there's always a state machine behind the scenes.

> You've increased max_retries to 2000, which equals to about 1ms of delay.
> And it hasn't helped at all, i.e. the chipset takes probably even longer
> to reach a coherent state again. And 1 ms is an eternity for computer hw,
> so this will crash your box - sooner or later.
>
I have done some other tests and it seems that 1000 or 2000 is high enough to *never* cause a failure with mild usage, while if I make 2 glxgear windows have a clipping rectangle, the failure immediately happens. Might this help? Perhaps openGL is altering the GPU in some way that we cannot forecast?

> > Sorry but I am shooting in the dark here
>
> We all are ;) But there are some more constants to tune, this time in
> include/drm/intel-gtt.h
>
> #define I830_CC_GTT_WHACK_PAGES 16
>
> Try to increase this (doubling it each step is sensible, the algo only
> uses as much as required, this is just an upper bound). But don't go above
> 128, that'd be crazy (and I would have to figure out a new trick).
>
> btw, dmesg is usually enough - I'll ask if I need anything else.
>
Ok, thank you Daniel, I will revert the retries to 1000 and make the next possible 3 tests with the whack pages constant.

Can we state that the pre-KMS driver was working good enough because bug was harder to trigger in those conditions? I can say I experienced lockups when watching videos or when shutting down even with the pre-KMS/Xorg1.6 combo, but it was rare.