MASTER: [i855] GPU lockup (apport-crash)

Bug #541511 reported by Geir Ove Myhr on 2010-03-18
This bug affects 322 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Undecided
Unassigned
xf86-video-intel
Fix Released
Medium
linux (Ubuntu)
Undecided
Unassigned
Lucid
Undecided
Unassigned
xserver-xorg-video-intel (Ubuntu)
Wishlist
Unassigned
Lucid
High
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-intel

This is a MASTER bug report, i.e. not a real bug report, but a tool to help manage other bug reports.

Most bug reports on i855 are probably due to the CPU/GPU incoherency problem that is now consolidated upstream at http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off from a bug report for i845). For now, we mark all automatically reported GPU lockups on i855 as duplicates of this unless there is a reason not to.

A kernel with the proposed fix is available at https://launchpad.net/~brian-rogers/+archive/graphics-fixes

To use this fixed kernel, run the following commands:

sudo apt-add-repository ppa:brian-rogers/graphics-fixes
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install linux-image-2.6.35-ppa21+v9patch-generic

There is a similar master bug report for i845 at bug 541492.

This is just a bug report so that I can keep track of my own trials at fixing this issue. I'll upload a clean-up version of my patch soon

Created an attachment (id=34223)
cleanup up version of my gtt cache coherency patch

Forget about the patch for the moment, I've just noticed that it doesn't work on my i855GM, too.

Geir Ove Myhr (gomyhr) wrote :

Binary package hint: xserver-xorg-video-intel

This is a MASTER bug report, i.e. not a real bug report, but a tool to help manage other bug reports.

Most bug reports on i855 are probably due to the CPU/GPU incoherency problem that is now consolidated upstream at http://bugs.freedesktop.org/show_bug.cgi?id=26345 (which was originally reported for i845, but applies to i855 as well). For now, we mark all automatically reported GPU lockups on i855 as duplicates of this unless there is a reason not to. There are some tests you may do to help upstream with this issue, and I will come back with instructions here. For those of you who know how to patch and compile a kernel you may look at comment #61 in the upstream bug report.

There is a similar master bug report for i845 at bug 541492.

Geir Ove Myhr (gomyhr) on 2010-03-18
Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed

> --- Comment #4 from legolas558 <email address hidden> 2010-03-19 05:01:23 PST ---
> I think that your patch lets other bugs come out and become apparent, like the
> hangcheck timer bug.

Thanks for the list (especially the downstream bugs). btw the hangcheck
timer is the same bug, this is the kernel code noticing that the gpu just
died. Of course, if the dying gpu takes along the complete system, you're
not going to see this.

Created an attachment (id=34247)
new gtt cache coherency patch

New version of my patch. Changes:
- improved cache coherency checker. Instead of always using the same address it now constantly changes. Should increase the chance of catching an inconsistency. But still, this won't catch everything and there's still the chance that the gpu reads crap and dies before we detect that the chipset flush doesn't work.

- increased the size of the magic gtt write. I don't like this, but it seems to help. Given that testing of the last patch showed that I've only implemented a fancy delay I've tried different techniques. Unfortunately none worked.

Testing feedback highly welcome. I've you report results, please add some details about your machine: Processor (model and frequency), ram (type, speed & size). Perhaps there's a pattern there.

Oh, and legolas created a nice small script that calculates the cache failure rate of the running kernel:

https://bugs.freedesktop.org/attachment.cgi?id=34240

Geir Ove Myhr (gomyhr) on 2010-03-20
Changed in xserver-xorg-video-intel:
status: Confirmed → Unknown
description: updated

Created an attachment (id=34262)
i855/Acer TM66x: Test results with patch in attachment #34194 from bug #26345

Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed

> --- Comment #7 from Bruno <email address hidden> 2010-03-20 10:57:18 PST ---
> Created an attachment (id=34262)
> --> (http://bugs.freedesktop.org/attachment.cgi?id=34262)
> i855/Acer TM66x: Test results with patch in attachment #34194 from bug #26345

Thanks for testing. Unfortunately the patch you tested is outdated and
known not to work. At least the failure rates you're getting on your
Pentium M 1.5 GHz are about the same as on my Pentium M 1.2 GHz. Can you
please retest with my latest patch (attached to this bug)?

Created an attachment (id=34270)
BUG in i915_gem.c - obj_priv->pages_refcount is zero in i915_gem_object_put_pages()

> Thanks for testing. Unfortunately the patch you tested is outdated and
> known not to work. At least the failure rates you're getting on your
> Pentium M 1.5 GHz are about the same as on my Pentium M 1.2 GHz. Can you
> please retest with my latest patch (attached to this bug)?

It seems to be very much a matter of mood, on friday evening I saw no single bad message while today there were a lot.

Since a crash I'm running now with 2.6.34-rc2 + your patch and one of mine for sd.
Both with previous patch and current one I'm hitting a BUG() in i915_gem_object_put_pages(). obj_priv->pages_refcount is zero in there. No idea if it's related or not but it kills i915 driver!

Attached is kernel log of system including BUG(), but no bad chipset flush.

> --- Comment #9 from Bruno <email address hidden> 2010-03-20 16:20:41 PST ---
> Created an attachment (id=34270)
> --> (http://bugs.freedesktop.org/attachment.cgi?id=34270)
> BUG in i915_gem.c - obj_priv->pages_refcount is zero in
> i915_gem_object_put_pages()
>
> > Thanks for testing. Unfortunately the patch you tested is outdated and
> > known not to work. At least the failure rates you're getting on your
> > Pentium M 1.5 GHz are about the same as on my Pentium M 1.2 GHz. Can you
> > please retest with my latest patch (attached to this bug)?
>
> It seems to be very much a matter of mood, on friday evening I saw no single
> bad message while today there were a lot.

Just to clarify: You're talking about the BUG in i915_gem_object_put_pages
here, not the "chipset flush failed" warning? And this is still on the old
version of the patch, right?

> Since a crash I'm running now with 2.6.34-rc2 + your patch and one of mine for
> sd.
> Both with previous patch and current one I'm hitting a BUG() in
> i915_gem_object_put_pages(). obj_priv->pages_refcount is zero in there. No idea
> if it's related or not but it kills i915 driver!

Known issue, I get these here from time to time, too. Looks like my
unmap-inactive hack, intended to stress gtt cache flushing is uncovering
another problem somewhere else. I'll look into this more seriously now.

> > It seems to be very much a matter of mood, on friday evening I saw no single
> > bad message while today there were a lot.
>
> Just to clarify: You're talking about the BUG in i915_gem_object_put_pages
> here, not the "chipset flush failed" warning? And this is still on the old
> version of the patch, right?

The matter of mood was regarding the chipset flush failed warnings (as for GPU wedges before testing with your patch). Some day everything is running smooth, some days I have to reboot every hour (no noticeable usage difference on my side).

> > Since a crash I'm running now with 2.6.34-rc2 + your patch and one of mine for
> > sd.
> > Both with previous patch and current one I'm hitting a BUG() in
> > i915_gem_object_put_pages(). obj_priv->pages_refcount is zero in there. No
> > idea if it's related or not but it kills i915 driver!
>
> Known issue, I get these here from time to time, too. Looks like my
> unmap-inactive hack, intended to stress gtt cache flushing is uncovering
> another problem somewhere else. I'll look into this more seriously now.

I got it with both versions of your patch, let's see what today will bring, flush warnings or BUGs (or both).

Bryce Harrington (bryce) on 2010-03-21
tags: added: lucid
Bryce Harrington (bryce) on 2010-03-22
tags: added: freeze
tags: added: crash

Created an attachment (id=34353)
new patch with totally reworked chipset flush

I've combined a few of my previous, non-working ideas and combined them into this new patch. Hopefully this adapts better to different cpu/chipset combinations (i.e. better chance that it not only works on my machine). The new chipset flush is a magic dance involving a flock of canaries ;)

The magic values (look at include/drm/intel-gtt.h, but not at the comments, their stale) probably need some tuning. But this new chipset flush is adaptive (fyi it reports the maximum number of retries in the regular chipset flush no. reporting), so I hope it works out of the box.

Unfortunately I haven't yet fixed the BUG_ON in put_pages. For some odd reason I can't reproduce it here anymore and reviewing the code hasn't revealed anything yet.

As usual, testing reports highly welcome.

Created an attachment (id=34362)
BUG again, with flush retries before

Daniel, here is the result of one run with your latest patch (attachment #34353) at the time it BUGed but also having a few flush retries listed before.
I had the previous version running yesterday without any visible issue up to 2M flushes (but I might not have run the right actions)

It happened while surfing with firefox, some site doing fancy things with javascript and transparency for fade-in/out of images and the like.

(In reply to comment #12)
> Unfortunately I haven't yet fixed the BUG_ON in put_pages. For some odd reason
> I can't reproduce it here anymore and reviewing the code hasn't revealed
> anything yet.

Seems I can reproduce it damn easily...

In Firefox on page http://habiter.luxweb.com/NR58686_Mersch_Vente_Maison.html
it's sufficient to click on the image thumbnails to hit the BUG_ON (running Firefox 3.6-r2 + Flash plugin 10.0.45.2 (Gentoo)
Hovering the thumbnail images seems also to help getting flush retries.

> --- Comment #13 from Bruno <email address hidden> 2010-03-23 05:22:10 PST ---
> Created an attachment (id=34362)
> --> (http://bugs.freedesktop.org/attachment.cgi?id=34362)
> BUG again, with flush retries before
>
> Daniel, here is the result of one run with your latest patch (attachment
> #34353) at the time it BUGed but also having a few flush retries listed before.
> I had the previous version running yesterday without any visible issue up to 2M
> flushes (but I might not have run the right actions)

Thanks for testing. Don't worry about the retries, that's an integral part
of the new chipset flush. As long as it's a small number, everything's
fine (I've gotten higher numbers than you while testing).

The important thing is whether you still get backtraces about failed
chipset flushes (there's none in the dmesg you attached). iirc the
previous patch still had problems there for you.

If the BUG is causing you too much grieve while testing, undo the two
changes to drivers/gpu/drm/i915/i915_gem.c the patch makes. But that will
greatly reduce the number of chipset flushed, so massively hampering
testing. Otherwise just check you dmesg for any "chipset flush failed"
messages and hit the reset knob ;)

Geir Ove Myhr (gomyhr) on 2010-03-23
description: updated

Created an attachment (id=34375)
dmesg with gtt flush v4 patch

dmesg summary: Complaints about chipset flush timeout, subsequent log entries show that the maximum number of retries have been reached. Eventual freeze.

(In reply to comment #16)
> dmesg summary: Complaints about chipset flush timeout

While scanning over relevant code, I also noticed this: It should probably be canary_gtt_read, canary_cpu_read in drivers/char/agp/intel-gtt.c:978 instead of canary_cpu_read, canary_cpu_read.

> --- Comment #17 from <email address hidden> 2010-03-23 12:18:05 PST ---
> (In reply to comment #16)
> > dmesg summary: Complaints about chipset flush timeout
>
> While scanning over relevant code, I also noticed this: It should probably be
> canary_gtt_read, canary_cpu_read in drivers/char/agp/intel-gtt.c:978 instead of
> canary_cpu_read, canary_cpu_read.

Yep, you're right. Fixed in my local version.

I've looked at your dmesg and the chipset flush clearly doesn't work for
your hw. Can you please try to hang the gpu again (with my latest patch)
and then capture i915_error_state from <debugfs>/dri/0. Just to make sure
that your gpu is crashing due to a cache coherency bug and not due to
something else (rather unlikely). Meanwhile I try to come up with a new
idea to fix your problem.

Created an attachment (id=34376)
error state after freeze

Here you go.

Chris Wilson suggested elsewhere that this particular discrepancy between patch results of the same hardware might be related to chipset/hardware revision and corresponding quirks, so this may or may not explain oddities. (lspci claims rev 02 for my GPU)

Created an attachment (id=34377)
new patch, improved retry flushing

New patch to (hopefully) tackle the problems uncovered by 2points. Please retest.

Bruno, I haven't yet found the problem with the BUG in put_pages. I'm hitting it about once every few days (with the same backtrace like you), but I haven't got a clue yet what's causing it.

Daniel, is this patch worth testing on i845 as well?

> --- Comment #21 from Brian Rogers <email address hidden> 2010-03-24 08:21:43 PST ---
> Daniel, is this patch worth testing on i845 as well?

Since Chris Wilson last tested this patch on his i845 (didn't work), the
patch has changed quite a bit. So it might be worth to retest it, if you
have the time to spare. I certainly appreciate any testing feedback I can
get, because every machine (even seemingly similar 855 boxes) seems to act
differently.

Created an attachment (id=34416)
dmesg with gtt flush v5 patch

I've made these observations so far: The amount of reported failed chipset flushes has gone down drastically. In four test runs, only in one so far I've spotted a chipset flush failure warning. However, the GPU will still hang after a while, often without any flush-related warnings in dmesg.

I also like to add to speculations that failure rate might be related to CPU or possibly IO load. X runs for a lot longer if I just leave glxgears running by itself, but as soon as I start doing some work (Kontact, Opera, Flash plugin etc.) GPU freeze seems more likely.

Created an attachment (id=34417)
error state after freeze, v5

Created an attachment (id=34418)
Clip solid fills

So I am working on the assumption that the residual hangs I am seeing after enabling/disable the GMCH to force the GTT flush are real batch buffer bugs...

For instance http://bugs.freedesktop.org/attachment.cgi?id=34417 looks like we attempt to write well beyond the end of the buffer. In which case the attached should workaround the issue.

Created an attachment (id=34419)
dmesg with gtt flush v5 patch

(In reply to comment #25)
> Created an attachment (id=34418) [details]
> Clip solid fills
>
> So I am working on the assumption that the residual hangs I am seeing after
> enabling/disable the GMCH to force the GTT flush are real batch buffer bugs...

Thanks, doing some tests with git HEAD and this patch. As far as chipset flushes go, I'm at slighly over 4.5M flushes right now. Since this is about four times as long as the machine usually lasts without freezing, I'd conclude that the fix is pretty effective.

As for failed chipset flushes, dmesg records three occurences now after about one hour of glxgears and various other tasks.

(In reply to comment #25)
> For instance http://bugs.freedesktop.org/attachment.cgi?id=34417 looks like we
> attempt to write well beyond the end of the buffer.

Could you explain how to see this from the file or intel_error_decode output? I'm trying to make some documentation for downstream on how to interpret the dumps.

Is it related to this? I'm not sure what those numbers mean, except that the first is obviously the start address of the batch buffer.
Buffers [13]:
...
  02821000 16384 00000048 00000000 000e354b dirty purgeable

I forgot to add myself to CC list, will now try latest patch!

Sorry, I have reported my findings on bug 26345:

- debugfs DRI dumps (see attachment 34436)
- dmesg with GTT failures (attachment 34437)

http://bugs.freedesktop.org/show_bug.cgi?id=26345#c76

Geir Ove Myhr (gomyhr) on 2010-03-25
description: updated

Created an attachment (id=34469)
gtt chipset flush v6

Changes since v6:
- tuned magic values, hopefully fixing problems seen by 2points and legolas
- some debug checks trying to catch the put_pages BUG_ON problem while it's happening.

As usual, testing feedback higly welcome.

Created an attachment (id=34473)
dmesg of 2 failures with v6 patch

Created an attachment (id=34474)
debugfs dumps after 2 failures with v6 patch

I have just tested v6 patch with 3 glxgears and I got 2 flush failures; it is not possible to state if anything sensibly changed when compared with previous patch

> --- Comment #32 from legolas558 <email address hidden> 2010-03-26 03:19:48 PST ---
> Created an attachment (id=34474)
> --> (http://bugs.freedesktop.org/attachment.cgi?id=34474)
> debugfs dumps after 2 failures with v6 patch
>
> I have just tested v6 patch with 3 glxgears and I got 2 flush failures; it is
> not possible to state if anything sensibly changed when compared with previous
> patch

Indeed, not much changed at all. The real problem is reported in the first
backtrace (grep for "chipset flush timed out" in your dmesg). It
essentially means that the code has given up. All further failed flushes
are most likely just a result of this.

The problem seems to be writes to the gtt just don't show up at the cpu
side on your box. There's a tunable in my patch in the file
drivers/char/agp/intel-gtt.c

#define I830_GTT_MAX_RETRIES 100

Can you try out whether increasing this to a ridiculous number (like 1000)
helps? This might cause your machine to stall sometimes. As soon as you
get the chipset flush timed out message (and the max retries hits the
value you've defined), give up.

Thanks alot for testing this stuff.

btw, has video stability increased further with v6?

(In reply to comment #33)
> Indeed, not much changed at all. The real problem is reported in the first
> backtrace (grep for "chipset flush timed out" in your dmesg). It
> essentially means that the code has given up. All further failed flushes
> are most likely just a result of this.
>
I have an uptime of 50 minutes (it's the same session of reported dump/dmesg files) and my ratio is still 2 / 229376 (using gttqual script in attachment 34435), so it is indeed as you said.

> The problem seems to be writes to the gtt just don't show up at the cpu
> side on your box. There's a tunable in my patch in the file
> drivers/char/agp/intel-gtt.c
>
> #define I830_GTT_MAX_RETRIES 100
>
> Can you try out whether increasing this to a ridiculous number (like 1000)
> helps? This might cause your machine to stall sometimes. As soon as you
> get the chipset flush timed out message (and the max retries hits the
> value you've defined), give up.
>
> Thanks alot for testing this stuff.
>
Yes I am gonna make this test on next reboot and see what happens.

> btw, has video stability increased further with v6?
>
I would say definitively yes. I have tried with all videos which triggered the bug and no crash yet.

Other important notes:
* I am using only the v6 patch on drm-intel from git, and nothing else, neither the patched xf86-video-intel
* my kernel command line parameters are: lapic=yes hpet=force clocksource=hpet i8042.nomux=1

This hardware (Fujitsu Amilo clone) has a known problem with ACPI/i8042 controller; the i8042.nomux=1 is used to prevent touchpad glitches (http://bugzilla.kernel.org/show_bug.cgi?id=8740), while the battery, thermal and ac modules are always unloaded because otherwise this kernel bug would be triggered: http://bugzilla.kernel.org/show_bug.cgi?id=9147

Apart from these facts, I don't know anything else which could be relevant

Created an attachment (id=34476)
dmesg of early flush failure with 2000 retries

Bug triggered instantly, no sensible (additional) slowdown. This looks like a dead-end...

Shall I attach also the debugfs dri data? Shall I make any special test?

what if we "give up" on coherency of the first n flushes? something like the screen test of arcade machines, except that we just don't check if test is OK, pretending that coherency becomes consistent after that; I have often found that the last graphics contents of the LVDS pipe are persistent between a reboot (when using a liveCD, for example) and you can actually see the last screenshot a while before screen gets properly cleared and initialized.

Sorry but I am shooting in the dark here

> --- Comment #36 from legolas558 <email address hidden> 2010-03-26 04:51:08 PST ---
> what if we "give up" on coherency of the first n flushes? something like the
> screen test of arcade machines, except that we just don't check if test is OK,
> pretending that coherency becomes consistent after that; I have often found
> that the last graphics contents of the LVDS pipe are persistent between a
> reboot (when using a liveCD, for example) and you can actually see the last
> screenshot a while before screen gets properly cleared and initialized.

That's exactly what my patch currently does - it simply gives up after too
many retries. Now if this corrupts a pixmap/texture, it just yields visual
corruptions on the screen. But this can also corrupt the gpu command
buffer (and some other vital things). And if the gpu reads crap from
these, it usually just hangs itself.

You've increased max_retries to 2000, which equals to about 1ms of delay.
And it hasn't helped at all, i.e. the chipset takes probably even longer
to reach a coherent state again. And 1 ms is an eternity for computer hw,
so this will crash your box - sooner or later.

> Sorry but I am shooting in the dark here

We all are ;) But there are some more constants to tune, this time in
include/drm/intel-gtt.h

#define I830_CC_GTT_WHACK_PAGES 16

Try to increase this (doubling it each step is sensible, the algo only
uses as much as required, this is just an upper bound). But don't go above
128, that'd be crazy (and I would have to figure out a new trick).

btw, dmesg is usually enough - I'll ask if I need anything else.

(In reply to comment #37)
> > --- Comment #36 from legolas558 <email address hidden> 2010-03-26 04:51:08 PST ---
> > what if we "give up" on coherency of the first n flushes? something like the
> > screen test of arcade machines, except that we just don't check if test is OK,
> > pretending that coherency becomes consistent after that; I have often found
> > that the last graphics contents of the LVDS pipe are persistent between a
> > reboot (when using a liveCD, for example) and you can actually see the last
> > screenshot a while before screen gets properly cleared and initialized.
>
> That's exactly what my patch currently does - it simply gives up after too
> many retries. Now if this corrupts a pixmap/texture, it just yields visual
> corruptions on the screen. But this can also corrupt the gpu command
> buffer (and some other vital things). And if the gpu reads crap from
> these, it usually just hangs itself.
>
Yes I can understand this; there's always a state machine behind the scenes.

> You've increased max_retries to 2000, which equals to about 1ms of delay.
> And it hasn't helped at all, i.e. the chipset takes probably even longer
> to reach a coherent state again. And 1 ms is an eternity for computer hw,
> so this will crash your box - sooner or later.
>
I have done some other tests and it seems that 1000 or 2000 is high enough to *never* cause a failure with mild usage, while if I make 2 glxgear windows have a clipping rectangle, the failure immediately happens. Might this help? Perhaps openGL is altering the GPU in some way that we cannot forecast?

> > Sorry but I am shooting in the dark here
>
> We all are ;) But there are some more constants to tune, this time in
> include/drm/intel-gtt.h
>
> #define I830_CC_GTT_WHACK_PAGES 16
>
> Try to increase this (doubling it each step is sensible, the algo only
> uses as much as required, this is just an upper bound). But don't go above
> 128, that'd be crazy (and I would have to figure out a new trick).
>
> btw, dmesg is usually enough - I'll ask if I need anything else.
>
Ok, thank you Daniel, I will revert the retries to 1000 and make the next possible 3 tests with the whack pages constant.

Can we state that the pre-KMS driver was working good enough because bug was harder to trigger in those conditions? I can say I experienced lockups when watching videos or when shutting down even with the pre-KMS/Xorg1.6 combo, but it was rare.

> --- Comment #38 from legolas558 <email address hidden> 2010-03-26 07:08:31 PST ---
> Can we state that the pre-KMS driver was working good enough because bug was
> harder to trigger in those conditions? I can say I experienced lockups when
> watching videos or when shutting down even with the pre-KMS/Xorg1.6 combo, but
> it was rare.

Yep, the cache coherency bug was most likely always there. kms code tends
to be faster in certain circumstances and therefore tends to hit cache
coherency problems with a higher probability. That's also why Eric's patch
from half a year ago managed to break a few i855 chipsets by slightly
changing the timings. But that's just coincidental because that piece of
code helps cache coherency on i865 chipsets.

btw, I've tried your two-glxgears-with-clipping test. No ill effects, here
...

Created an attachment (id=34479)
dmesg of early failure with 1000 retries, 128 whack pages

(In reply to comment #39)
> > --- Comment #38 from legolas558 <email address hidden> 2010-03-26 07:08:31 PST ---
> btw, I've tried your two-glxgears-with-clipping test. No ill effects, here
> ...
>

I have tested with 32, 64, 128 whack pages and I have found the following:

- with 32/64 whack pages, moving around one glxgears window was enough to trigger the flush failure
- with 128 whack pages, moving around didn't seem to work but a resize of the glxgears window triggered the bug
- bug becomes progressively harder to trigger when increasing whack pages (qualitative sensation)

In all cases (even 16 whack pages and/or 1000/2000 retries), no more than 2 failures are found in dmesg (because as you said it gives up after that).

I am worried about this fact that our hardware, apparently the same, is not showing same behaviour...my .config is here:

http://www.iragan.com/linux/i855GM/legolas558.config

Geir Ove Myhr (gomyhr) on 2010-03-26
description: updated

> --- Comment #40 from legolas558 <email address hidden> 2010-03-26 07:49:31 PST ---
> In all cases (even 16 whack pages and/or 1000/2000 retries), no more than 2
> failures are found in dmesg (because as you said it gives up after that).

I've overlooked this, but now that I've checked, this is _very_ curious.
With v6 you only ever see 2 chipset flush failures, no matter how hard you
abuse your machine?

With the three dmesgs you've posted, these two failures are always in the
same chipset flush, just opposite directions (gtt->cpu and cpu->gtt
transfers). They'll also coincide with the chipset flush timed out
message. Can you please check that this is indeed the case (with the other
dmesgs you've got lying around) with the other test runs, too? Just
compare the "expected: xxx" value on each of the three backtraces.

This is strange because my code only gives up on the _current_ chipset
flush and doesn't bother to report any further timeouts. It still executes
all chipset flushes and still reports about failed ones. So if your hw
only ever reports one failure where everything fails (timeout+paranoia
check failures in both directions) and never fails again, this would be
_very_ strange indeed.

> I am worried about this fact that our hardware, apparently the same, is not
> showing same behaviour...my .config is here:

I've compared our configs and tried changing a few relevant ones to your
setting. Still can't reproduce your failures.

Download full text (3.3 KiB)

(In reply to comment #41)
> > --- Comment #40 from legolas558 <email address hidden> 2010-03-26 07:49:31 PST ---
> > In all cases (even 16 whack pages and/or 1000/2000 retries), no more than 2
> > failures are found in dmesg (because as you said it gives up after that).
>
> I've overlooked this, but now that I've checked, this is _very_ curious.
> With v6 you only ever see 2 chipset flush failures, no matter how hard you
> abuse your machine?
>
Yes. Never seen more than 2 since when I started using v6 patch, but I might be wrong because I never did more than 300k flushes in a session with a v6-patched kernel.

> With the three dmesgs you've posted, these two failures are always in the
> same chipset flush, just opposite directions (gtt->cpu and cpu->gtt
> transfers). They'll also coincide with the chipset flush timed out
> message. Can you please check that this is indeed the case (with the other
> dmesgs you've got lying around) with the other test runs, too? Just
> compare the "expected: xxx" value on each of the three backtraces.
>
Yes, you can also see it with v5 patch dmesg in attachment 34233

From my dmesg logs:
~~ session1 - v6 patch
[ 79.983513] i8xx chipset flush failed, expected: 5807, cpu_read: 5806
[ 79.983771] i8xx chipset flush failed, expected: 5807, gtt_read: 5806
~~ session2 - v6 patch
[ 101.807650] i8xx chipset flush failed, expected: 14194, cpu_read: 14193
[ 101.807844] i8xx chipset flush failed, expected: 14194, gtt_read: 14193
~~ session3 - v5 patch
[ 2832.905107] i8xx chipset flush failed, expected: 113457, cpu_read: 113456
[ 2832.905315] i8xx chipset flush failed, expected: 113457, gtt_read: 113456
[ 2910.626579] i8xx chipset flush failed, expected: 215361, cpu_read: 215360
[ 2910.626872] i8xx chipset flush failed, expected: 215361, gtt_read: 215360
[ 2977.424469] i8xx chipset flush failed, expected: 308976, cpu_read: 308975
[ 2977.424746] i8xx chipset flush failed, expected: 308976, gtt_read: 308975

I am gonna make more intensive tests later.

> This is strange because my code only gives up on the _current_ chipset
> flush and doesn't bother to report any further timeouts. It still executes
> all chipset flushes and still reports about failed ones. So if your hw
> only ever reports one failure where everything fails (timeout+paranoia
> check failures in both directions) and never fails again, this would be
> _very_ strange indeed.
>
Occam would say: perhaps it didn't fail at all and we are just not being informed correctly.

My raw guess is that some buddy between us and the GPU is touching something that shouldn't, and I am inclined to always blame the i8042 controller since I am already experiencing keyboard ports corruption when the battery ACPI is being used. But it is hard to link i8042 and the GPU (and the modules which cause the i8042 glitch for keyboard are never loaded), so I am still out of bullets.

> > I am worried about this fact that our hardware, apparently the same, is not
> > showing same behaviour...my .config is here:
>
> I've compared our configs and tried changing a few relevant ones to your
> setting. Still can't reproduce your failures.
>
As already stated, I am not using "clip sol...

Read more...

Created an attachment (id=34497)
dmesg with gtt flush v6 patch

Intermediate results for v6: No reported failures after 3.5M flushes, the number of maximum retries seems to have gone back slightly since v5 (now at 7, down from 9)

Meanwhile, I upgraded Xorg from 1.6.5 to 1.7.4 and noticed that the GPU hangs almost instantly once X is started up. Could attach relevant error state files, but it's probably not directly related to this bug, since no messages concerning failed flushes turn up in dmesg. Gone back to 1.6.5, and the problem disappeared (with the clip solid fills patch, that is).

> As already stated, I am not using "clip solid fills" patch, if that might be
relevant, but I doubt.
Maybe you should. It was already merged in git, too.

Created an attachment (id=34498)
dmesg of multiple flush failure with v6 patch

I was wrong. Playing with 4-5 glxgears windows finally triggered the bug, so situation is invariated vs v5 patch (except that failures happen less frequently).

(In reply to comment #43)
> Created an attachment (id=34497) [details]
> dmesg with gtt flush v6 patch
>
> Intermediate results for v6: No reported failures after 3.5M flushes, the
> number of maximum retries seems to have gone back slightly since v5 (now at 7,
> down from 9)
>
> Meanwhile, I upgraded Xorg from 1.6.5 to 1.7.4 and noticed that the GPU hangs
> almost instantly once X is started up. Could attach relevant error state files,
> but it's probably not directly related to this bug, since no messages
> concerning failed flushes turn up in dmesg. Gone back to 1.6.5, and the problem
> disappeared (with the clip solid fills patch, that is).
>
Wait, are you using Xorg 1.6.5 with KMS and the most recent intel driver? Is that possible?

I have always had the X hangup issue with 1.7.x series of Xorg. Right now I am using Xorg 1.7.5.902 and it crashes only when playing videos or using intensive graphics applications (wine).

> > As already stated, I am not using "clip solid fills" patch, if that might be relevant, but I doubt.
> Maybe you should. It was already merged in git, too.
>
Problem is that by freedesktop git stack doesn't work, so I don't know how to get a patched xf86-video-intel

On my side, nothing interesting to report, except maybe that (though I've not done aggressive tests, mostly just usual desktop use) for one day long sessions I've either hit the BUG_ON (not yet with v6 patch) or just had seldom increase of the number of retried flushes (for today, v6):
...
[ 384.945981] chipset flush no. 16384, max retries 0
[ 554.507090] chipset flush no. 32768, max retries 0
[ 686.938325] chipset flush no. 49152, max retries 0
[ 946.820082] chipset flush no. 65536, max retries 0
[ 1531.186977] chipset flush no. 81920, max retries 0
[ 2157.786320] chipset flush no. 98304, max retries 0
...
[21448.487379] chipset flush no. 1097728, max retries 0
[21910.290869] chipset flush no. 1114112, max retries 0
[22345.259438] chipset flush no. 1130496, max retries 0
[22858.071707] chipset flush no. 1146880, max retries 1
[23118.534467] chipset flush no. 1163264, max retries 1
[23299.123857] chipset flush no. 1179648, max retries 1
...

with 4 being the highest number seen.

If I don't open any glxgears and use the laptop for browsing and editing files, my max retries only reaches 6 at max.

When I open glxgears and resize it till flush failure, max retries count is updated to 1000 in next "chipset flush no." line.

> --- Comment #42 from legolas558 <email address hidden> 2010-03-26 14:49:10 PST ---
> From my dmesg logs:
> ~~ session1 - v6 patch
> [ 79.983513] i8xx chipset flush failed, expected: 5807, cpu_read: 5806
> [ 79.983771] i8xx chipset flush failed, expected: 5807, gtt_read: 5806
> ~~ session2 - v6 patch
> [ 101.807650] i8xx chipset flush failed, expected: 14194, cpu_read: 14193
> [ 101.807844] i8xx chipset flush failed, expected: 14194, gtt_read: 14193
> ~~ session3 - v5 patch
> [ 2832.905107] i8xx chipset flush failed, expected: 113457, cpu_read: 113456
> [ 2832.905315] i8xx chipset flush failed, expected: 113457, gtt_read: 113456
> [ 2910.626579] i8xx chipset flush failed, expected: 215361, cpu_read: 215360
> [ 2910.626872] i8xx chipset flush failed, expected: 215361, gtt_read: 215360
> [ 2977.424469] i8xx chipset flush failed, expected: 308976, cpu_read: 308975
> [ 2977.424746] i8xx chipset flush failed, expected: 308976, gtt_read: 308975

Yet again I was totally blind. All your failed flushes report an actual
value that's only one off from the expected one. But since v2 I'm moving
around the check value on the check page, so each position is only used
every 1024th cache flush. Which means that if the flush doesn't work and
the old value is still there, it should be "expected_value - 1024".

Furthermore your system seems to be the only one where chipset flushes
fail in pairs (always both directions in the same chipset flush). I
haven't seen this on any other dmesg neither by me nor by any other
tester.

In other words I highly suspect that something is (very rarely) corrupting
the last two bits of a 4 byte block. This would also explain why the
correct value never shows up, even after extensive gtt whacking.

Please test your box with memtest86+. If that doesn't turn anything up
I'll write a testpatch (memtest86+ doesn't check the gtt, wherein the
problem might be, too).

(In reply to comment #48)
> > --- Comment #42 from legolas558 <email address hidden> 2010-03-26 14:49:10 PST ---
> > From my dmesg logs:
> > ~~ session1 - v6 patch
> > [ 79.983513] i8xx chipset flush failed, expected: 5807, cpu_read: 5806
> > [ 79.983771] i8xx chipset flush failed, expected: 5807, gtt_read: 5806
> > ~~ session2 - v6 patch
> > [ 101.807650] i8xx chipset flush failed, expected: 14194, cpu_read: 14193
> > [ 101.807844] i8xx chipset flush failed, expected: 14194, gtt_read: 14193
> > ~~ session3 - v5 patch
> > [ 2832.905107] i8xx chipset flush failed, expected: 113457, cpu_read: 113456
> > [ 2832.905315] i8xx chipset flush failed, expected: 113457, gtt_read: 113456
> > [ 2910.626579] i8xx chipset flush failed, expected: 215361, cpu_read: 215360
> > [ 2910.626872] i8xx chipset flush failed, expected: 215361, gtt_read: 215360
> > [ 2977.424469] i8xx chipset flush failed, expected: 308976, cpu_read: 308975
> > [ 2977.424746] i8xx chipset flush failed, expected: 308976, gtt_read: 308975
>
In the session3, v5 might be v2 actually.

> Yet again I was totally blind. All your failed flushes report an actual
> value that's only one off from the expected one. But since v2 I'm moving
> around the check value on the check page, so each position is only used
> every 1024th cache flush. Which means that if the flush doesn't work and
> the old value is still there, it should be "expected_value - 1024".
>
Well I hope this will be useful to improve the patch.

> Furthermore your system seems to be the only one where chipset flushes
> fail in pairs (always both directions in the same chipset flush). I
> haven't seen this on any other dmesg neither by me nor by any other
> tester.
>
Yes, I admit I feel lonely recently...it would be nice to find another guy with my exact hardware.

My only custom option for intel driver in xorg.conf is:
Option "XvMC" "true"

but I doubt this could be relevant.

> In other words I highly suspect that something is (very rarely) corrupting
> the last two bits of a 4 byte block. This would also explain why the
> correct value never shows up, even after extensive gtt whacking.
>
I have tried booting with acpi=off, but seems that KMS depends on ACPI. I can only think that some ACPI or "gone wild" IRQ is causing the corruption, or that there is a broken GTT memory cell as you hypothesized.

> Please test your box with memtest86+. If that doesn't turn anything up
> I'll write a testpatch (memtest86+ doesn't check the gtt, wherein the
> problem might be, too).
>
I completed 2 passes (ECC off) with memtest86+ v4.00 and no errors were found in my 503M (I suppose the missing memory is shadowed). So the corruption might lie in the GTT area, but I don't know how to test that...and if I have understood correctly i855GM is not very handy to make this kind of consistency checks; I am waiting your testpatch because unfortunately I am far from being able to write such GTT testpatch.

Created an attachment (id=34505)
memory check patch for legolas

legolas, please apply this patch on top of v6. When a flush fails, this will read back the check values written to system mem (cpu) and gtt via the same path as they have been written to and print them out. If the readback value equals the expected value (look at the chipset fail message right before), everything is fine. If they equal the values as read on the other side (i.e. gtt readback = cpu read), something is corrupting memory when (ab)using the gtt.

To test just stress your system until you get a cache flush failure.

Created an attachment (id=34506)
dmesg of failure with v6 patch + gtt/cpu readback info

mumble mumble...

(In reply to comment #51)
> Created an attachment (id=34506) [details]
> dmesg of failure with v6 patch + gtt/cpu readback info
>
An excerpt from the above file:

[ 85.216591] i8xx chipset flush failed, expected: 5031, gtt_read: 5029
[ 85.216771] gtt readback: 5029, cpu reaback: 5029
[ 85.231559] chipset flush timed out, gtt_read: 5031, cpu_read: 5031, expected: 5032, gtt_pos :2421, cpu_pos: 0
[ 85.231854] gtt readback: 5031, cpu reaback: 5031
[ 113.151457] gtt readback: 20845, cpu reaback: 20845
[ 116.806192] gtt readback: 23135, cpu reaback: 23135

I'd say it's the second case, and narrowing down the cause seems scary. Some notes regarding patch v6:

1) there are some lags (even mouse cursor hiccups), this might be normal
2) the ability to switch to VT when system fatally crashes (e.g. video failure or other program failure, with hangcheck error or Xorg I/O errors) is gone
3) if I don't open glxgears and just use thunderbird, firefox and XFCE normally, no flush failures are ever reported in dmesg. Otherwise if I open glxgears and play with it a bit, I get the 2 failures (getting more is more difficult but indeed possible as shown before)

Regarding (2): drm and i915 modules are loaded automatically by Xorg and not loaded during kernel boot-up. I might try compiling drm+i915 as built-in, and perhaps this will fix the weird GTT corruption I am experiencing.

So it happens that after quite a while of normal usage I look at dmesg and I find no failure; in this status I have to expect a sudden (unrecoverable) crash from time to time, which requires hard shutdown, and that has almost become my usual way of closing the session (50% crashes, 50% normal I'd say).

(In reply to comment #52)
> Regarding (2): drm and i915 modules are loaded automatically by Xorg and not
> loaded during kernel boot-up. I might try compiling drm+i915 as built-in, and
> perhaps this will fix the weird GTT corruption I am experiencing.
>
Bah, it didn't work. I have just compiled drm and i915 as built-in and it creates almost the same output:

[ 71.198038] chipset flush timed out, gtt_read: 0, cpu_read: 3932, expected: 3933, gtt_pos :1, cpu_pos: 1
[ 71.198389] i8xx chipset flush failed, expected: 3933, gtt_read: 3932
[ 71.198533] gtt readback: 3932, cpu reaback: 3932
[ 71.521156] gtt readback: 4138, cpu reaback: 4138
[ 74.132310] gtt readback: 5803, cpu reaback: 5803
[ 75.552605] gtt readback: 6813, cpu reaback: 6813
[ 89.246701] gtt readback: 15633, cpu reaback: 15633
[ 95.692300] gtt readback: 20097, cpu reaback: 20097

Download full text (3.2 KiB)

Hi guys,
Really appreciate the work you guys have been doing to try to fix this issue. And I truly mean that. This is a horrible issue.

If you haven't read it, the intel data sheet for this graphics card is here :
http://www.intel.com/Assets/PDF/datasheet/252615.pdf

And it does make of a reasonably interesting read when you consider it from the aspect of trying to hunt down what's causing this problem, although needle - haystack, much?

The SMM space restrictions look like a place of interest to me and also the very liberal way in which they have allowed bios manufacturers choose certain things related to the address registers.
Because it's Centrino technology the 855 is like a three in one deal, bios, 855gm chip and processor, all linked into together to render the graphics to screen and it looks like bios manufacturers have done whatever they thought was the best way to make that combo work, so any number of 855gm cards can work any number of different weird and wonderful ways using each possibly unique bios implementation. I've certainly seen evidence of that on my 855GM.
Having used Linux on the machine using the machine's original bios and then updating the bios to the latest one from the Asus website. Different behaviour exhibited by both bios versions.
The first one only required the nolapic parameter to boot. While the latest one requires mem=1001M (But the memory is supposed to be 1024M and memtest says 1016M.)
Without specifying the memory, the kernel boots but very slowly without the mem param. Fine with.
My wild stab in the dark here is that there is an undetected memory hole and that's causing the problem. The actual memory modules are fine.

As far as what you guys have been testing, I have experienced the same thing in regards to the symptoms. It will work, I can use a browser, watch flash video and it all seems fine but after an hour, it will lock up and need a manual power down to restore a working system.

If it is the case that it is the memory and more specifically the memory buffer which is causing the problem because the buffer is filling up and it is not being flushed in time, does the card not use any sort of compression to compress any parts of the buffer which would require a different type of flush to empty? (Multiple overlay?)
Could it be that because bios manufacturers have had such liberal choices on their bios implementations with this card that the memory addresses to flush are being detected incorrectly or could it be that the flush is trying to flush a part of the memory which it is not allowed to (Maybe because it's detected the addresses incorrectly) And that in turn triggers some kind of stop in the hardware, which prevents any further flushes.

I am of course clutching at straws, my knowledge is limited and it would be a tall order for me to learn C, fork the code, go through the data sheet and write a proper driver for the Linux kernel for this card. Although I would dearly love to do that.

I wish you guys luck on fixing this problem. You have made some impressive improvements so far compared to the way it was and in it's current form, the driver using your patches is so very nearly close to being sui...

Read more...

> --- Comment #54 from Tony White <email address hidden> 2010-03-29 14:12:51 PST ---
> Hi guys,
> Really appreciate the work you guys have been doing to try to fix this issue.
> And I truly mean that. This is a horrible issue.
>
> If you haven't read it, the intel data sheet for this graphics card is here :
> http://www.intel.com/Assets/PDF/datasheet/252615.pdf

I know about this specsheet and I've read through it already a few weeks
ago. It only contains detailed feature descriptions of the gmch plus a few
non-graphics-core related register definitions. There's nothing in there
that could give us a hint as to how gtt<->cpu caches work and how to fix
it.

If you want to help, please test my latest patch (v6) and report how it
fares on your machine.

To everyone else who has already tested this: Thanks alot. Small summary
on the state of i855 cache coherency (please correct me if I'm wrong):

- Latest version works on three boxes (from Bruno, 2points and mine).
- It also seems to work on legolas' box, but that machine has some other
  issue. Pardon for being the messenger, but looks like your machine is
  toast :(
- The most likely unrelated BUG in put_pages hasn't surfaced again.

I won't submit this as is anytime soon for two reasons:
- The code is still rather ugly atm. I need to clean it up.
- This patch is a way too horrible hack to submit it for inclusion with a
  straight face.

To fix the latter, I want some more test reports. Given how many bug
reports downstream gathered, it shouldn't be a problem to find more
testers (distro maintainers: hint, hint). Please only test on i855
chipsets, i845 still seems to have some problems. If it works (check the
demsg for chipset flush related backtraces), please add your tested-by
line with a small blurb about your machine to this bug report, like this:

Tested-by: Daniel Vetter <email address hidden> (IBM Thinkpad X40)

If it doesn't work, hit me with your dmesg and i915_error_state output ;)

Bruno, legolas, 2points, please add your tested-by, too.

(In reply to comment #55)
> - This patch is a way too horrible hack to submit it for inclusion with a
> straight face.
>
> To fix the latter, I want some more test reports. Given how many bug
> reports downstream gathered, it shouldn't be a problem to find more
> testers (distro maintainers: hint, hint).

I have been meaning to build test kernels for Ubuntu users for a while, but I the Ubuntu wiki documentation for building kernel packages [1] that I have used before seems to not work that well anymore. Btw, which kernel version is it best to patch? A recent drm-intel-next or 2.6.34-rc2? On -rc2 I get problems with intel-agp.c:

gomyhr@storhaugen:~/src/linux-2.6.34-rc2$ patch -p1 --dry-run <../fix-i8xx-gtt-cache-coherency-v6.patch
patching file drivers/char/agp/Makefile
patching file drivers/char/agp/agp.h
patching file drivers/char/agp/efficeon-agp.c
patching file drivers/char/agp/intel-agp-gart.c
patching file drivers/char/agp/intel-agp.c
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n]
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file drivers/char/agp/intel-agp.c.rej
patching file drivers/char/agp/intel-agp.h
patching file drivers/char/agp/intel-gtt.c
patching file drivers/gpu/drm/i915/i915_dma.c
patching file drivers/gpu/drm/i915/i915_gem.c
patching file include/drm/intel-gtt.h

and I get the same with drm-intel-next. Should I just delete this file, since that is what the patch seems to do? (I tried this, and the build seemed to work, but then some of the Ubuntu-packaging failed).

[1]: https://wiki.ubuntu.com/KernelTeam/GitKernelBuild

> --- Comment #56 from Geir Ove Myhr <email address hidden> 2010-03-30 03:59:57 PST ---
> I have been meaning to build test kernels for Ubuntu users for a while, but I
> the Ubuntu wiki documentation for building kernel packages [1] that I have used
> before seems to not work that well anymore. Btw, which kernel version is it
> best to patch? A recent drm-intel-next or 2.6.34-rc2? On -rc2 I get problems
> with intel-agp.c:

Sorry for the confusion. Patch is actually based upon rc1, but that ones
missing some recent intel bugfixes. I'll rebase that thing on something
more current later today.

(In reply to comment #55)
> To everyone else who has already tested this: Thanks alot. Small summary
> on the state of i855 cache coherency (please correct me if I'm wrong):
>
> - Latest version works on three boxes (from Bruno, 2points and mine).
> - It also seems to work on legolas' box, but that machine has some other
> issue. Pardon for being the messenger, but looks like your machine is
> toast :(

I'd be more than happy to mark it as "toasted" and go on, but then I wouldn't be able to use it with WindowsXP (never had a crash there) and neither with Xorg 1.6 (that works but because as you said the driver hits less frequently the cache).

Has Intel ever released the WindowsXP driver sources? Yes, I know..just dreaming..

Is there any other way that could explain the GTT<->CPU bug?

I'll add my Tested-By with next patch, since this one doesn't seem to be in-sync with latest drm-intel

> --- Comment #58 from legolas558 <email address hidden> 2010-03-30 05:47:52 PST ---
> I'd be more than happy to mark it as "toasted" and go on, but then I wouldn't
> be able to use it with WindowsXP (never had a crash there) and neither with
> Xorg 1.6 (that works but because as you said the driver hits less frequently
> the cache).

The patch I intend to submit hopefully works better, too. By killing all
the gtt stress-testing hacks I've added you box is probably on par with
the other solutions.

> Has Intel ever released the WindowsXP driver sources? Yes, I know..just
> dreaming..

Likely won't help. The i8xx chipsets were designed without a kernel memory
manager in mind (Windows only gained that with Vista). So the XP driver
probably just implements a static gtt (that doesn't need any chipset
flushes) and copies textures back and forth. That works, but performance
will suck, especially with kernel-managed graphics memory allocation,
where spills happen rather often.

In other words, we're coxing these chipset into a framework they're not
designed for (but which is the only sane thing to do considering modern
graphics apis), trying to paper over any hw deficiencies with horrible
hacks like mine.

> Is there any other way that could explain the GTT<->CPU bug?

As long as there's no other report of the same problem, hw flakiness is
the only likely option. After all it only happens when hitting the gtt
really hard, something XP (and the old ums driver) are not likely to do.

(In reply to comment #59)
> > --- Comment #58 from legolas558 <email address hidden> 2010-03-30 05:47:52 PST ---
> > I'd be more than happy to mark it as "toasted" and go on, but then I wouldn't
> > be able to use it with WindowsXP (never had a crash there) and neither with
> > Xorg 1.6 (that works but because as you said the driver hits less frequently
> > the cache).
>
> The patch I intend to submit hopefully works better, too. By killing all
> the gtt stress-testing hacks I've added you box is probably on par with
> the other solutions.
>
Yes because I am driven to think that the sudden crash happening later in time (also 1 hour of uptime) is not related to these rarely happening cache failures (easily verified with glxgears, but not otherwise during normal usage). Fact is that the Xorg total crash will happen even if no cache failure have yet happened...so perhaps that's another bug, but this patch is still necessary even as-is.

Are you planning to submit the patch here before sending it upstream?

Anyway my signature is:

Tested-by: Daniele Castellitto <email address hidden> (Maxdata Pro 7000X)

> > Has Intel ever released the WindowsXP driver sources? Yes, I know..just
> > dreaming..
>
> Likely won't help. The i8xx chipsets were designed without a kernel memory
> manager in mind (Windows only gained that with Vista). So the XP driver
> probably just implements a static gtt (that doesn't need any chipset
> flushes) and copies textures back and forth. That works, but performance
> will suck, especially with kernel-managed graphics memory allocation,
> where spills happen rather often.
>
> In other words, we're coxing these chipset into a framework they're not
> designed for (but which is the only sane thing to do considering modern
> graphics apis), trying to paper over any hw deficiencies with horrible
> hacks like mine.
>
Thank you Daniel for telling us so much - very appreciated. I now see the bigger picture.

> > Is there any other way that could explain the GTT<->CPU bug?
>
> As long as there's no other report of the same problem, hw flakiness is
> the only likely option. After all it only happens when hitting the gtt
> really hard, something XP (and the old ums driver) are not likely to do.
>
I see. I have also tried adding delays before reading back the values, but that does not help. It must be indeed some hardware glitch. I'd like to try KMS without ACPI, to see if it still happens, but that doesn't seem possible either.

So for now I'll keep this patch; and perhaps we can focus the sudden crash bug later.

Thanks for all your work on this. Even if inclusion in mainline is still pending, patches here finally fix problems I've had for two years now. Looks like I can finally upgrade from 2.6.27 without fear of random failures interrupting my work every now and then.

Tested-by: Moritz Brunner <email address hidden> (Asus M2400N)

Daniel, I tested your patch (v6) on an 852GME. Being very similar to the 855, the 852 also suffers from this bug. The GPU hangs frequently with all recent kernels. With your patch applied, I have not seen any hangs, nor any messages about failed flushes. The graphics performance is noticeably reduced though.

Tested-by: Thorsten Vollmer <email address hidden> (DFI-ACP G5M150-N w/ 852GME)

BTW: I was surprised to read that you are using an X40, because I would never have discovered this bug on my X40, my second machine. I have been using it for weeks with unpatched kernels and never saw any hangs. At first I hoped that my X40 was not affected and we could compare register settings. But with your patch applied, the kernel reports some flush retries. The frequency of failed flushes must be very low though.

I appreciate your work on this issue. Thanks.

> --- Comment #60 from legolas558 <email address hidden> 2010-03-31 02:01:45 PST ---
> Are you planning to submit the patch here before sending it upstream?

As soon as I post the patch for inclusion, I'll add a patch with all the
debugging stuff removed to this bug report.

> --- Comment #62 from Thorsten Vollmer <email address hidden> 2010-03-31 11:24:00 PST ---
> BTW: I was surprised to read that you are using an X40, because I would never
> have discovered this bug on my X40, my second machine. I have been using it for
> weeks with unpatched kernels and never saw any hangs. At first I hoped that my
> X40 was not affected and we could compare register settings. But with your
> patch applied, the kernel reports some flush retries. The frequency of failed
> flushes must be very low though.

Yep, my X40 is very stable with stock kernels. So I could never understand
all these bug reports about "intel drivers totally suck on i855GM"
because, hey, it works here! But a discussion with Chris Wilson about a
very strange bug report got me thinking. A few debug hacks later (to
stress the gtt) I've reduced the lifetime expectancy of my X40 to half a
minute :( With this, I've could then start hacking on solutions.

btw, these hacks are included in the patches posted here to really make
sure it works now. I'll drop them for the final rev.

Created an attachment (id=34595)
gtt chipset flush v7

Rebased against 2.6.34-rc3 (_not_ drm-intel-next, that one doesn't have all the latest fixes). No other changes from v6.

Created an attachment (id=34597)
NEC P520: dmesg output

The v7 patch appears to be working on my NEC versa P520: no crashes or screen corruptions yet! I've been running glxgears for 40 minutes to stress test it. The dmesg output (see previous post) reports a number of inconsistancies though. No idea if this is a problem, i haven't been following this discussion very closely.

PS: Performance is also way better than with previous patch (https://bugs.freedesktop.org/attachment.cgi?id=33593), which I have been running for a couple of weeks now without any problems (except performance).

(In reply to comment #55)
> Tested-by: Daniel Vetter <email address hidden> (IBM Thinkpad X40)
>
> If it doesn't work, hit me with your dmesg and i915_error_state output ;)
>
> Bruno, legolas, 2points, please add your tested-by, too.
>

Here it is, also had v7 running this evening with 3x glxgears and dmesg lists same kind of results as in comment #46 (reached about 5M-flushes with 1 retry until now)

Tested-by: Bruno Prémont <email address hidden> (Acer TM66x)

Created an attachment (id=34602)
dmesg output (HP Pavilion dv1000)

Here's my dmesg after running 2 glxgears at the same time... Doesn't look too good.

I'll see if the various corruptions I've seen so far reappear less often or not.

Thanks

I am now using patch v7, it is much more performant, max retries never seen above 5.

The 2 flush failures still happen (although it's harder to trigger them) when resizing glxgears window. Also if you insist a bit you will get a total system crash.

Always better than the vanilla kernel; please send upstream.

Tested-by: Daniele Castellitto <email address hidden> (Maxdata Pro
7000X)

(In reply to comment #68)
> Created an attachment (id=34602) [details]
> dmesg output (HP Pavilion dv1000)
>
@Daniel Vetter: maybe I and Rémi have the same issue? Would it be possible to store somewhere the instructions executed just before the GTT failure or much more importantly before the total system crash? Perhaps there are opcodes which do not fully reset the internal state machine, and adding some null operation in the flow will fix it (this would also explain why Xorg 1.6 crashes once in a year...)

(In reply to comment #70)
> If you want to help, please test my latest patch (v6) and report how it
> fares on your machine.

I've got a Thinkpad X40 with a rev 2 855GM chipset. With stock kernels my system is mostly stable, but X crashes now and then (depending on what I do, a couple times a week). Since KMS it's just the screen that freezes, except for the mouse pointer, and I can switch to console.

With your v7 patch applied on current git I get a full system hang fairly soon and easily. Because the system is stuck I can't get any error messages, but I didn't get any in dmesg before the hang (max retries was always 0, printed twice).

I do have the debugfs output, dmesg and Xorg.log after a hang with a plain current git kernel (2.6.34-rc3, HEAD at 0fdf86). Start of i915_error_state says:

Time: 1270672918 s 546381 us
PCI ID: 0x3582
EIR: 0x00000000
  PGTBL_ER: 0x00000000
  INSTPM: 0x00000000
  IPEIR: 0x00000000
  IPEHR: 0x41100000
  INSTDONE: 0x037fefc1
  ACTHD: 0x07c2a814
seqno: 0x000298f1

xorg-server 1.7.5.902, intel driver 2.10.0,

If there's anything I can do to help, please ask.

As I'm not interested in 3D, I'm starting to wonder why I shouldn't switch to the VGA driver. Surely that one is stable and not that much slower for 2D?

Chris Halse Rogers (raof) wrote :

It looks likely that we'll be blacklisting KMS for this chip in Lucid to work around other bugs. It would be helpful if users experiencing this bug could try disabling KMS (see https://wiki.ubuntu.com/X/KernelModeSetting if you are unsure how to do this) and see if it fixes the problem for you.

If there seems to be a consensus favoring blacklisting KMS, then we'll reassign this bug to the kernel and have andy blacklist for us.

If anyone finds that their i845 system works *worse* with KMS turned off for some reason, please shout; I wouldn't expect turning KMS to cause regressions so this would be a surprise, but with i8xx weirder things have been known to happen!

Changed in xserver-xorg-video-intel (Ubuntu Lucid):
status: Triaged → Incomplete

> --- Comment #71 from Indan Zupancic <email address hidden> 2010-04-07 16:37:09 PDT ---
> If there's anything I can do to help, please ask.

Thanks for testing. Please update to xf86-video-intel 2.11 (just released
a few days ago). Also update to libdrm 2.4.20. These contain a few fixes
for gpu hangs on i8xx hw. If your gpu still hangs, please attach the
output of i915_error_state, that's usually sufficient to get a clue about
what's going on.

> --- Comment #70 from legolas558 <email address hidden> 2010-04-07 02:24:10 PDT ---
> @Daniel Vetter: maybe I and Rémi have the same issue?

Yep, it looks like Rémi, René and you all suffer from the same. In other
words, this can't be explained by broken hw anymore. I have a few ideas
about what's going on (that would also explain the put_pages BUG seen by
Bruno). But don't hold your breath waiting for a fix for this bug really
is a specialist in evasive maneuvers ;(

(In reply to comment #73)
> > --- Comment #70 from legolas558 <email address hidden> 2010-04-07 02:24:10 PDT ---
> > @Daniel Vetter: maybe I and Rémi have the same issue?
>
> Yep, it looks like Rémi, René and you all suffer from the same. In other
> words, this can't be explained by broken hw anymore. I have a few ideas
> about what's going on (that would also explain the put_pages BUG seen by
> Bruno). But don't hold your breath waiting for a fix for this bug really
> is a specialist in evasive maneuvers ;(

Oh sure it's broken, but its brokeness is consistent within the same model at least. How mad/stupid would it be to analyze the flow of GPU instructions to detect differences between a normal non-crashing flow (pre-KMS) and a crashing flow (KMS)? Also it would be nice to be able to "pack" these GPU flows in runnable batchsets so that one can eventually find the glitching sequence.
I have done this kind of sorcerery with JTAG so I thought it might be a "tool" for us too.

Created an attachment (id=34823)
Kernel logs with 3 BUGs in i915_gem.c:1456 / i915_gem_object_put_pages()

Created an attachment (id=34824)
fix locking around chipset flushing

legolas, Rémi, René, this patch should fixed the problems you've encountered with timed-out chipset flushes. It was a bug in my code. Please test extensively.

(In reply to comment #76)
> Created an attachment (id=34824) [details]
> fix locking around chipset flushing
>
> legolas, Rémi, René, this patch should fixed the problems you've encountered
> with timed-out chipset flushes. It was a bug in my code. Please test
> extensively.

Everything nominal with the new bugfix, as expected :-)

failures / flushes:
0 / 475136
max retries:
38.512510 0
160.138951 6
232.302424 8
402.559573 11

No flush failures with 6 glxgears windows (my 1.6Ghz hardware was almost hung up by running them).

The 8,11 max retries happened when opening Firefox, not when running glxgears; now it really seems stable, my raw feeling is that the bug is totally fixed.

Thanks for finding it :)

Now I'll try if playing videos still triggers a hangup, or if it hangs after hours of uptime.

Julian Lam (julian-lam) wrote :

KMS works fine in my experience, and I haven't experienced this apport crash since perhaps... 4 weeks ago?

If that changes anything.

(In reply to comment #72)
> Thanks for testing. Please update to xf86-video-intel 2.11 (just released
> a few days ago). Also update to libdrm 2.4.20. These contain a few fixes
> for gpu hangs on i8xx hw. If your gpu still hangs, please attach the
> output of i915_error_state, that's usually sufficient to get a clue about
> what's going on.

Okay, running 2.11 and 2.4.20 now. I'll report if I get any hangs, but that can take a while.

Thank you for all your work.

(As a side note, I still get slightly corrupted text in Firefox sometimes. If that has anything to do with this then the issue isn't totally solved yet.)

Geir Ove Myhr (gomyhr) wrote :

Julian, we turned off the automatic reporting of these bugs a while ago (4 weeks?). The apport-crash bug report would come up when the computer was rebooted after a freeze, but possibly in other situations as well. Did you see those freezes before or did you only see the apport-crash bug reporting dialog?

4h uptime, more than 2M flushes (2031616 exactly) without any failure, no crash after watching several videos, max retries is still 11. No graphics corruption anywhere. Possibly never experienced a more stable Xorg (neither with UMS).

All i855GM bugs are fixed for me; I am using xf86-video-intel v2.10.0 and an old libdrm compiled from git

The code which prints chipset flush stats can be taken out; please push patch upstream.

Many thanks for all the hard work!

> --- Comment #79 from legolas558 <email address hidden> 2010-04-08 18:37:55 PDT ---
> 4h uptime, more than 2M flushes (2031616 exactly) without any failure, no crash
> after watching several videos, max retries is still 11. No graphics corruption
> anywhere. Possibly never experienced a more stable Xorg (neither with UMS).

Great! Rémi & Réne, can you please retest v7 with this fix applied and add
your tested-by line here?

Created an attachment (id=34836)
i915_gem_object_put_pages crash happening with libdrm 2.4.18

Now I am also affected by the i915_gem_object_put_pages bug; it obviously has a different cause.

Note: this total system crash happens with libdrm-2.4.18 and with libdrm-git (pulled today), however I could only retrieve the syslog message for the older libdrm (with the new one I only got nul bytes printed to syslog)

The bug triggers very quickly with libdrm-2.4.18 while it becomes much harder to trigger with the most recent libdrm, but it is indeed there.

Created an attachment (id=34849)
Everything.log for i845 freezes

Ok, for this hardware: 00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)

Running this kernel: 2.6.34-rc2-59809-g22f2d3a-dirty #1 SMP PREEMPT Thu Apr 8 21:41:20 PDT 2010 i686 Intel(R) Pentium(R) 4 CPU 1.80GHz GenuineIntel GNU/Linux

from drm-intel-next with Daniel's patch from yesterday, runnining xorg-server 1.7.6, libdrm-newest 2.4.19, and xf86-video-intel-git 20100408,

it appears that the freezing up is gone -- almost. I have been running for several hours with a combination of glxgears, playing .mp4 and .wmv movies from my hard drive, surfing with chromium and firefox, flash movie playback, and switching virtual terminals between dwm and xfce and have no freezes yet

EXCEPT -- running Tuxpaint under xfce, I got a freeze (see attached log for details -- just a single error message). Also, trying to play a dvd (using mplayer or gnome-mplayer) I got lots of graphics errors and then the same freeze (see log for details.) Not sure if these freezes are from this bug or an unrelated bug with the 845 chips.

Otherwise, nice work! Thank you :)

Scott

> --- Comment #82 from Scott Hansen <email address hidden> 2010-04-09 10:04:27 PDT ---
> Otherwise, nice work! Thank you :)

Sorry to disappoint you, but that's just placebo. It looks like you've
only applied the small kernel patch from yesterday. But that's just an
incremental fix, i.e. you need the v7 patch _plus_ this small fix.

Can you please retest? If it still reports hangs, please add
i915_error_state in addition to the dmesg to this bug report.

Created an attachment (id=34857)
dmesg, i915_error_state and intel_gpu_dump

Agh, sorry! Well, with both the v7 and lock patches, my machine locked up instantly on starting X. I've attached (hopefully) the logs you requested. Let me know if you need more.

Thanks,
Scott

(In reply to comment #80)
> > --- Comment #79 from legolas558 <email address hidden> 2010-04-08 18:37:55 PDT ---
> > 4h uptime, more than 2M flushes (2031616 exactly) without any failure, no crash
> > after watching several videos, max retries is still 11. No graphics corruption
> > anywhere. Possibly never experienced a more stable Xorg (neither with UMS).
>
> Great! Rémi & Réne, can you please retest v7 with this fix applied and add
> your tested-by line here?

OK the following setup is still working after 45 min of stress-test (3x glxgears, 1x x11perf, 1x youtube):

- kernel 2.6.34-rc3 + v7 patch + mutex patch
- libdrm 2.4.20
- xorg-server 1.7.6
- xf86-video-intel 2.11.0
- mesa 7.8.1

No errors in dmesg whatsoever. The last entry reads "chipset flush no. 3637248, max retries 3".

There's just 1 thing wrong right now: the GNOME panel refuses to draw text after the stress test, but that's probably a panel issue.

Like I told Daniel yesterday on IRC, this works brilliantly (v7+locking patch). No more corruption and no more messages/crashes in dmesg.

Thanks again Daniel!

Tested-by: Rémi Cardona <email address hidden> (HP Pavilion dv1000)

Cheers

(In reply to comment #86)
> Like I told Daniel yesterday on IRC, this works brilliantly (v7+locking patch).
> No more corruption and no more messages/crashes in dmesg.
>
> Thanks again Daniel!
>
> Tested-by: Rémi Cardona <email address hidden> (HP Pavilion dv1000)
>
> Cheers
@Rémi: Could you please create an ebuild with the working patchset and put it in the x11-overlay or portage? I would love to test it here on my Acer Travelmate 663

> --- Comment #81 from legolas558 <email address hidden> 2010-04-09 00:53:49 PDT ---
> Created an attachment (id=34836)
> --> (https://bugs.freedesktop.org/attachment.cgi?id=34836)
> i915_gem_object_put_pages crash happening with libdrm 2.4.18
>
> Now I am also affected by the i915_gem_object_put_pages bug; it obviously has a
> different cause.
>
> Note: this total system crash happens with libdrm-2.4.18 and with libdrm-git
> (pulled today), however I could only retrieve the syslog message for the older
> libdrm (with the new one I only got nul bytes printed to syslog)
>
> The bug triggers very quickly with libdrm-2.4.18 while it becomes much harder
> to trigger with the most recent libdrm, but it is indeed there.

I've tried to again reproduce this problem on my box by downgrading to
libdrm 2.4.18 (and a few other hacks). But that bug simply refuses to show
up again, here. Can you and Bruno please gather a few backtraces (as many
as you have lying around in your logs) and upload them to this bug?

Perhaps I can see a pattern and get a clue what's going on - at least that
way I've managed to fix the other problem with the stuck chipset flush.

Created an attachment (id=34917)
All BUGs I've seen in i915_gem.c since February

After a number of days testing this patch, i haven't seen any crashes or (render) errors. Thanks for the hard work Daniel!

Tested-by: René Gabriëls <email address hidden> (NEC Versa P520)

(In reply to comment #88)
> > --- Comment #81 from legolas558 <email address hidden> 2010-04-09 00:53:49 PDT ---
> > Created an attachment (id=34836) [details]
> > --> (https://bugs.freedesktop.org/attachment.cgi?id=34836)
> > i915_gem_object_put_pages crash happening with libdrm 2.4.18
> >
> > Now I am also affected by the i915_gem_object_put_pages bug; it obviously has a
> > different cause.
> >
> > Note: this total system crash happens with libdrm-2.4.18 and with libdrm-git
> > (pulled today), however I could only retrieve the syslog message for the older
> > libdrm (with the new one I only got nul bytes printed to syslog)
> >
> > The bug triggers very quickly with libdrm-2.4.18 while it becomes much harder
> > to trigger with the most recent libdrm, but it is indeed there.
>
> I've tried to again reproduce this problem on my box by downgrading to
> libdrm 2.4.18 (and a few other hacks). But that bug simply refuses to show
> up again, here. Can you and Bruno please gather a few backtraces (as many
> as you have lying around in your logs) and upload them to this bug?
>
> Perhaps I can see a pattern and get a clue what's going on - at least that
> way I've managed to fix the other problem with the stuck chipset flush.
Yes I confirm that the chipset flushes are all OK because I never got again a GTT failure.

I can't be sure that the crash with libdrm-2.4.20 is also due to i915_gem_object_put_pages, because no log lines are stored on /var/log/messages (only nul bytes).

I can trigger this total system crash only with a windows application running inside wine, with all other normal linux usage (even 3D) there is no crash.

I'll try catching the i915 debugfs data right after the crash, but I am not sure that the init process is still alive after that.

Unfortunately I don't have other dump files lying around, however I am sure that this is a new bug on this hardware e.g. it is not the crash happening when watching videos.

Created an attachment (id=34922)
dri debugfs snapshots taken every second + generator script used

Created an attachment (id=34923)
excerpt from dmesg containing crash dump for i915_gem_object_put_pages+0x10b/0x110

A few updates. I have used a script running in background to gather DRI debugfs dumps.

1) the system is not totally hung up, because background scripts still run. Only keyboard/mouse die
2) it is the same crash happening with libdrm-2.4.18 and libdrm-git, so it's not libdrm-dependant
3) the nul bytes were due to write buffers not being flushed before hard shutdown, and with the 'sync' call in the daemon script (available in tbz archive) it correctly puts the crash dump in /var/log/messages
4) it does not only depend from wine but also from some other application, because I had to run wine and firefox to trigger it; anyway it looks deterministic and not totally random
5) the crash must have happened within the last 10 snapshots (e.g. seconds), sorry but I can't be more precise, I hope you can guess where it crashed from the DRI debugfs dumps

If I can be of some help I'd be glad to make other tests/reports; looks like I am able to reproduce this bug at will, so I can actually make manual tests.

> --- Comment #91 from legolas558 <email address hidden> 2010-04-12 10:13:19 PDT ---
> Unfortunately I don't have other dump files lying around, however I am sure
> that this is a new bug on this hardware e.g. it is not the crash happening when
> watching videos.

Concerning your overlay problem: Can you open a new bug report for that
and put me on the cc: (I'm the overlay guy)? A have another report from a
i965G hanging when using the overlay, perhaps there's some pattern. Please
add the usual amount of information so that I (or anyone else) doesn't
have to hunt around in various bug reports. Thanks.

(In reply to comment #94)
> > --- Comment #91 from legolas558 <email address hidden> 2010-04-12 10:13:19 PDT ---
> > Unfortunately I don't have other dump files lying around, however I am sure
> > that this is a new bug on this hardware e.g. it is not the crash happening when
> > watching videos.
>
> Concerning your overlay problem: Can you open a new bug report for that
> and put me on the cc: (I'm the overlay guy)? A have another report from a
> i965G hanging when using the overlay, perhaps there's some pattern. Please
> add the usual amount of information so that I (or anyone else) doesn't
> have to hunt around in various bug reports. Thanks.

Sorry, I ought have used the verb in past tense. the crash *that was* happening when watching videos. I am no more experiencing the overlay bug when watching videos with v7+locking patch.

If you want I can go back to an older patch which still verifies the crash with videos and make the report as I did for the i915_gem_object_put_pages issue.

I am confident that it is fixed now because I am no more seeing a psychedelic fuchsia/rainbow overlay fill, which was interleaved in frames from time to time and preceeded by some minutes the final crash.

The only bug remaining for me is i915_gem_object_put_pages

(In reply to comment #72)
> > --- Comment #71 from Indan Zupancic <email address hidden> 2010-04-07 16:37:09 PDT ---
> > If there's anything I can do to help, please ask.
>
> Thanks for testing. Please update to xf86-video-intel 2.11 (just released
> a few days ago). Also update to libdrm 2.4.20. These contain a few fixes
> for gpu hangs on i8xx hw. If your gpu still hangs, please attach the
> output of i915_error_state, that's usually sufficient to get a clue about
> what's going on.

Okay, I have been running this combination for five days now without any hangs, it's looking pretty stable, I think my problems are fixed now.

Unpatched kernel 2.6.34-rc3
xf86-video-intel 2.11.0
libdrm 2.4.20
xorg-server 1.7.5.902
intel-dri 7.7.1

Do you want me to test your v7 patch + locking fixes to make sure it causes no regressions? It appeared to make things worse for Scott, and v7 on its own didn't work for me before either. Or are the good bits already upstream?

I think Scott hit the same bug as I did, so it might be fixed for him too now with the new libdrm. (No idea what actually caused my problems, nor what fixed it. Was it the EINTR versus EAGAIN bugfix? Or all the intel driver fixes?)

Thanks,

Indan

> --- Comment #96 from Indan Zupancic <email address hidden> 2010-04-13 14:14:05 PDT ---
> Do you want me to test your v7 patch + locking fixes to make sure it causes no
> regressions? It appeared to make things worse for Scott, and v7 on its own
> didn't work for me before either. Or are the good bits already upstream?

Nope, nothing upstream yet (but the first patch series should hit
drm-intel-next in a few days). It looks like X40s are not really affected
by this gtt inconsistencies in day-to-day use. But the problem exists
there, too. So yes, please test v7+locking fix and beat it up for a few
days. If it works and doesn't report any failed chipset flushes, please
add your tested-by line, too.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xserver-xorg-video-intel - 2:2.9.1-3ubuntu3

---------------
xserver-xorg-video-intel (2:2.9.1-3ubuntu3) lucid; urgency=low

  [Christopher James Halse Rogers]
  * debian/patches/107_disable_dri_on_845_855.patch:
    + Disable DRI on i845 and i855 chips. Works around the stability problems
      these chips have in Lucid (LP: #541492, LP: #541511).
 -- Bryce Harrington <email address hidden> Tue, 13 Apr 2010 15:25:23 -0700

Changed in xserver-xorg-video-intel (Ubuntu Lucid):
status: Incomplete → Fix Released

(In reply to comment #76)
> Created an attachment (id=34824) [details]
> fix locking around chipset flushing
>
> legolas, Rémi, René, this patch should fixed the problems you've encountered
> with timed-out chipset flushes. It was a bug in my code. Please test
> extensively.

Hi,
The v7 + locking patches work fine here on a JVC MP-XP731 (with an Intel 855GM rev 02), more than 3mb flushes and no hangs. Before (normal debian squeeze stack) X would crash shortly after login.

However, it feels slightly more sluggish than with the old intel 2.3 driver and Xorg 7.3, but that's probably because the patch does some extra debug checking?

My working setup right now is:

Linux 2.6.34-rc2 from drm-intel-next + v7 + locking patch
libdrm 2.4.18
intel driver 2.9.1
Xserver 1.7.6
Mesa 7.7.1-devel

So the only exchanged component is the kernel. I've made a package which is available for others to test here: http://www2.informatik.hu-berlin.de/~beier/tmp/linux-image-2.6.34-rc2_2.6.34-rc2-10.00.Custom_i386.deb

Thumbs up for the hard work!

Christian

Created an attachment (id=34996)
dmesg output after suspend to disk

(From update of attachment 34996)
Oops, just after thinking evrything's fine. X got stuck shortly after a wakeup from suspend to disk. dmesg output attached. Dunno if this is related at all...

Cheers,
   Christian

Thanks alot for testing (this goes to everyone, not just Christian)!

> --- Comment #98 from Christian Beier <email address hidden> 2010-04-14 03:27:27 PDT ---
> However, it feels slightly more sluggish than with the old intel 2.3 driver and
> Xorg 7.3, but that's probably because the patch does some extra debug checking?

Yep, that's to be expected. My patch currently completely trashes the gtt
(to really exercise the chipset flush - no way to get to a few mm flushes
within just a few hours of testing without this). But that also kills
performance. Final version should be about on par with older drivers.

btw, is the following tested-by line correct?

Tested-by: Christian Beier <email address hidden> (JVC MP-XP731)

> btw, is the following tested-by line correct?
>
> Tested-by: Christian Beier <email address hidden> (JVC MP-XP731)

Oops, forgot that. Yeah, correct!

> --- Comment #100 from Christian Beier <email address hidden> 2010-04-14 03:39:37 PDT ---
> (From update of attachment 34996)
> Oops, just after thinking evrything's fine. X got stuck shortly after a wakeup
> from suspend to disk. dmesg output attached. Dunno if this is related at all...

Great, everyone is stuck on the dev->struct_mutex lock. Sigh. One more
hint that the locking is fishy. Can you please enable lockdep (Kernel
hacking -> Lock debugging: prove locking correctness) in your kernel
config and retest? Lockdep should shed some light on what the heck is
going on here.

Geir Ove Myhr (gomyhr) wrote :

Bug reporters, it would be nice to have some confirmation that the workaround works as intended. The expected behaviour is that with 2:2.9.1-3ubuntu3, the freezes will stop and the performance (at least 3D and video) will be horrible.

While there is a fix in the works upstream for 855GM it looks like it is intended for the drm-intel-next series of kernels, which I think means that it will end up in the 2.6.35 kernel. It should probably also be possible to apply it to 2.6.34, but I'm not sure about 2.6.33 which is effectively what we have in Lucid when it comes to these drivers.

Geir Ove Myhr (gomyhr) wrote :

If someone wants to help upstream verify the potential fix on your 855GM hardware, Christian Beier has built a kernel-package with the patches (for Debian, but hopefully it will also work in Lucid). See https://bugs.freedesktop.org/show_bug.cgi?id=27187#c98 . The feedback they want is dmesg output from running this for a few hours while exercising the system (glxgears, 3D-apps, video, screensavers, x11perf, and whatever you may think of). If the system hangs, the file /sys/kernel/debug/dri/0/i915_error_state is interesting (will have to be retrieved by ssh while the system is hung). It is a bit slower than usual due to all the debugging code that is intended to catch any errors.

Of course, this will not work with xserver-xorg-video-intel 2:2.9.1-3ubuntu3, since it disables DRI. The one in my standard PPA should be equivalent to the current Lucid one, except that DRI is not disabled. https://launchpad.net/~gomyhr/+archive/standard

Created an attachment (id=35042)
dmesg v7 + locking patch, lockdep enabled

This one's different from my last dmesg, no more hung tasks, rather looks like the i915_gem_object_put_pages() bug the others experienced.

HTH anyway,
Christian

I have enabled lockdep debugging and I also have an early BUG dump "BUG: key dd9e5288 not in .data!" like Christian, so that can be ignored.

Fixing this last bug has become very important because with updates of last week the Xorg v1.6 (and related packages and old intel driver) is badly crashing, so it can no more be used.

I am now using Xorg 1.7.6 with the VESA driver, and that is rock-solid

Hector Avila (compugeek3264) wrote :

The workaround did not work on my Gateway M305CRV with Intel 82852. I still get GPU lockups and the Xorg crashes that result from it.

Here's my Xorg log for anyone who wants it;

Hector Avila (compugeek3264) wrote :

does not*

Bryce Harrington (bryce) wrote :

reopening due to user feedback. I suspect disabling DRI might have helped, but is not a sufficient solution.

Changed in xserver-xorg-video-intel (Ubuntu Lucid):
status: Fix Released → Triaged
Vlad Pes (pesotsky-web) wrote :

1. this workaround did not work on my Toshiba Portege M100 with
        00:02.0 VGA compatible controller [0300]: Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 02)
+
2. when ubuntu's xorg chrashes und automatically restarts in low graphic mode, it is impossible to start xorg on the parallel installation debian 5.0.4. Debian reports errors of xorg in text mode (--> https://bugs.launchpad.net/bugs/532452).

Bryce Harrington (bryce) wrote :

I wish we had more testing evidence to base this decision on, but I've posted a kernel bug report requesting KMS disablement on three of the older 8xx cards: lp #563277

We've already sent up a fair plentitude of bug reports to upstream, so I'm hopeful that they'll come up with fixes to this and to KMS, so we can re-enable in meerkat, or maybe even in 10.04.1, but we'll have to see how things go.

Hector Avila (compugeek3264) wrote :

I apologize for posting the wrong Xorg.0.log file on one of my last comments.

00:02.0 VGA compatible controller [0300]: Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 01)
00:02.1 Display controller [0380]: Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 01)

This is an unfortunate situation. There is a non-trivial number of users with 845 and 855 chips who are impacted by a regressions in stability in the current x stack when running 3d and KMS.

We have opted for a "stability first" approach for these users. We will disable 3D and KMS for these chips in Lucid final release. This will have the unfortunate effect of disabling compiz. This will introduce a functional regression. So we will be sacrificing functionality for these users in favor of stability. This is a painful choice to make, but we feel that stability must trump functionality when we are forced to make such choices.

We will be pursuing functional fixes. However, we will do this outside the main release, for example in a PPA. If we are able to provide a fix that delivers stability and functionality, we will consider this a potential SRU in 10.04.1.

Good news about the put_pages BUG_ON: There's another bug report
indicating that this is not a problem in my patch but also exists in the
mainline kernel:

https://bugzilla.kernel.org/show_bug.cgi?id=15664

My patch (especialyl the hack to stress test the gtt) just makes it more
likely.

Bad news: I still have no clue what's goin on.

To all those who are hitting this problem: What mesa release are you using
and are you using a compositioning window manager that uses OpenGL? I have
an idea ...

(In reply to comment #106)
> To all those who are hitting this problem: What mesa release are you using
> and are you using a compositioning window manager that uses OpenGL? I have
> an idea ...

Using XFCE with mesa/libgl 7.7.1, no compositing at all. If you want I can enable Option "Composite" "Disable" in xorg.conf

> --- Comment #107 from legolas558 <email address hidden> 2010-04-15 03:41:55 PDT ---
> (In reply to comment #106)
> > To all those who are hitting this problem: What mesa release are you using
> > and are you using a compositioning window manager that uses OpenGL? I have
> > an idea ...
>
> Using XFCE with mesa/libgl 7.7.1, no compositing at all. If you want I can
> enable Option "Composite" "Disable" in xorg.conf

Arrgh, whatever, my theory just went bust.

(In reply to comment #106)
> To all those who are hitting this problem: What mesa release are you using
> and are you using a compositioning window manager that uses OpenGL? I have
> an idea ...

Mesa 7.7.1 and running compiz 0.8.4...

Created an attachment (id=35065)
only call put_pages when gtt_space != NULL

Ok, this might be the first real stab at that dreaded put_pages BUG. Everyone who's hitting this problem, please apply this patch on top of whatever kernel most easily reproduces the problem.

(In reply to comment #110)
> Created an attachment (id=35065) [details]
> only call put_pages when gtt_space != NULL
>
> Ok, this might be the first real stab at that dreaded put_pages BUG. Everyone
> who's hitting this problem, please apply this patch on top of whatever kernel
> most easily reproduces the problem.

It seems much more stable now.

gtt failures:
0 / 114688
max retries:
66.696711 0
203.340132 5
284.831640 6
570.264962 7

I also tested it through Murphy's law by trying to do something important with the wine application: no hangups up to now.

Let's see what happens in the next couple of days. For now I'd say FIXED.

Created an attachment (id=35073)
dmesg output snippet with i915_gem_tiling.c warning

With all three patches (the v7, locking and gtt_space!=NULL) atop a drm-intel-next kernel it seems to run stable. Did not run into any crashes (by now...). However, while playing around with RandR rotation, i got the attached warning. X continues running.

Again, I don't know if this is in any way related, but maybe it helps...

> --- Comment #112 from Christian Beier <email address hidden> 2010-04-15 15:19:40 PDT ---
> With all three patches (the v7, locking and gtt_space!=NULL) atop a
> drm-intel-next kernel it seems to run stable. Did not run into any crashes (by
> now...). However, while playing around with RandR rotation, i got the attached
> warning. X continues running.

Looks like the ddx is not properly disabling bo reuse on the framebuffer.
Please retest with the latest version of xf86-video-intel and libdrm. If
the problem persists, please open a new bug report, this is definitely
something else.

Created an attachment (id=35087)
new patch against current drm-intel-next

I've beaten the patch into shape and killed all the debug hacks. Patch is against current drm-intel-next. But portions of it are already submitted upstream, so it might no longer apply in a few days. I'll try to rebase asap when that happens.

Performance should be about the same as old ums code or unpatched kernel - but stable ;)

Everyone who's still testing these patches and has not yet reported their tested-by line, please do so now. I intend to submit this some when next week.

Please test this patch thoroughly.

(In reply to comment #114)
> Created an attachment (id=35087) [details]
> new patch against current drm-intel-next
>
> Performance should be about the same as old ums code or unpatched kernel - but
> stable ;)
>
General performance has indeed increased, however I can clearly notice "hickups" during major load on the GPU. Like when starting mozilla apps or wine apps; this was not noticeable with previous v7+locking patchset.

As you said it is like old UMS code, and actually glxgears FPS is comparable (if not better); so the only minor issue is the hickups that I am experiencing. These hickups also hang the mouse for some seconds, so I suppose it is some locking on the GPU pipe, but I can't read the changes in your v8 patch so just suppositions.

Anyway the patch is perfectly mature for upstream in my opinion, and is indeed the best one we have up to now. Please grab my Tested-By line from previous comments.

Hi guys,

It is a shame that things are not working. Unfortunately reverting
things has also broken things.

Just a data point - reverting KMS has now impacted me. (with KMS
things would work occassionaly).

I am not no longer able to boot to an operational graphical environment.

i855M (PCI ID: 8086:3582, subsystem: 1028:018d) rev 02

kernel: linux-image-2.6.32-21-generic (2.6.32-21.32)
gdm: 2.30.0-0ubuntu5
X:xserver-xorg-server-intel (2:2.9.1-3ubuntu4), intel-gpu-tools
(1.0.2+git20100324-0ubuntu1)

Anand

On Thu, Apr 15, 2010 at 7:25 AM, Rick Spencer
<email address hidden> wrote:
> This is an unfortunate situation. There is a non-trivial number of users
> with 845 and 855 chips who are impacted by a regressions in stability in
> the current x stack when running 3d and KMS.
>
> We have opted for a "stability first" approach for these users. We will
> disable 3D and KMS for these chips in Lucid final release. This will
> have the unfortunate effect of disabling compiz. This will introduce a
> functional regression. So we will be sacrificing functionality for these
> users in favor of stability. This is a painful choice to make, but we
> feel that stability must trump functionality when we are forced to make
> such choices.
>
> We will be pursuing functional fixes. However, we will do this outside
> the main release, for example in a PPA. If we are able to provide a fix
> that delivers stability and functionality, we will consider this a
> potential SRU in 10.04.1.
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help manage other bug reports.
>
> Most bug reports on i855 are probably due to the CPU/GPU incoherency problem that is now consolidated upstream at http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off from a bug report for i845). For now, we mark all automatically reported GPU lockups on i855 as duplicates of this unless there is a reason not to. There are some tests you may do to help upstream with this issue, and I will come back with instructions here. For those of you who know how to patch and compile a kernel you may look at comment #30 (and #6 for what kind of feedback they want) in the upstream bug report. Actually, if someone could volunteer to build an ubuntu-packaged kernel with this patch for others to test, that would be nice.
>
> There is a similar master bug report for i845 at bug 541492.
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/541511/+subscribe
>
>

(In reply to comment #114)
> Performance should be about the same as old ums code or unpatched kernel - but
> stable ;)

Yeah, with the v8 patch everything is more or less on par with the old ums code performance-wise. Seems to be stable as well, running with compiz since a few days, no crashes or hickups.

Thumbs up!
Christian

fossfreedom (fossfreedom) wrote :

Likewise,
  am using i855GM - a low resolution boot splash is displayed and then black. No GDM.

I've had to resort to the 32-20 kernel as well as the 2:2.9.1-3ubuntu1 driver to get back to a functioning system.

On Sun, Apr 18, 2010 at 7:18 PM, DavidM <email address hidden> wrote:
> Likewise,
>  am using i855GM - a low resolution boot splash is displayed and then black.  No GDM.
>
> I've had to resort to the 32-20 kernel as well as the 2:2.9.1-3ubuntu1
> driver to get back to a functioning system.

Thanks for that info. Looking at the changelog either 2:2.9.1-3ubuntu1
or 2:2.9.1-3ubuntu2 ought to work.

Unfortunately, for me, it appears all older packages have disappeared
from the mirrors.

Anand

> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help manage other bug reports.
>
> Most bug reports on i855 are probably due to the CPU/GPU incoherency problem that is now consolidated upstream at http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off from a bug report for i845). For now, we mark all automatically reported GPU lockups on i855 as duplicates of this unless there is a reason not to. There are some tests you may do to help upstream with this issue, and I will come back with instructions here. For those of you who know how to patch and compile a kernel you may look at comment #30 (and #6 for what kind of feedback they want) in the upstream bug report. Actually, if someone could volunteer to build an ubuntu-packaged kernel with this patch for others to test, that would be nice.
>
> There is a similar master bug report for i845 at bug 541492.
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/541511/+subscribe
>
>

Geir Ove Myhr (gomyhr) wrote :

> Unfortunately, for me, it appears all older packages have disappeared
from the mirrors.

xserver-xorg-video-intel 2:2.9.1-3ubuntu2~gomyhr1~clipsolids in my standard PPA (https://launchpad.net/~gomyhr/+archive/standard) should be functionally equivalent to 2:2.9.1-3ubuntu2, since I put it up to test that patch. Don't count on it staying there, since I will replace it whenever I need something else tested.

Anand Kumria (wildfire) wrote :

Hi Geir,

On Mon, Apr 19, 2010 at 12:26 AM, Geir Ove Myhr <email address hidden> wrote:
>> Unfortunately, for me, it appears all older packages have disappeared
> from the mirrors.
>
> xserver-xorg-video-intel 2:2.9.1-3ubuntu2~gomyhr1~clipsolids in my
> standard PPA (https://launchpad.net/~gomyhr/+archive/standard) should be
> functionally equivalent to 2:2.9.1-3ubuntu2, since I put it up to test
> that patch. Don't count on it staying there, since I will replace it
> whenever I need something else tested.

Thanks for that!

Kernel 2.6.32-21-generic:
 - Confirmed works if 'startx' is done manually.
 - with 'text' on the kernel command line and then gdm is manually
('start gdm') started after logging in as root
Kernel 2.6.32-19-generic:
 - causes plymouth to crash. gdm manually started is NOT successfull.
X does not work either.
Kernel 2.6.32-18-generic:
 - as with 2.6.32-19-generic

Cheers,
Anand

> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help manage other bug reports.
>
> Most bug reports on i855 are probably due to the CPU/GPU incoherency problem that is now consolidated upstream at http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off from a bug report for i845). For now, we mark all automatically reported GPU lockups on i855 as duplicates of this unless there is a reason not to. There are some tests you may do to help upstream with this issue, and I will come back with instructions here. For those of you who know how to patch and compile a kernel you may look at comment #30 (and #6 for what kind of feedback they want) in the upstream bug report. Actually, if someone could volunteer to build an ubuntu-packaged kernel with this patch for others to test, that would be nice.
>
> There is a similar master bug report for i845 at bug 541492.
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/541511/+subscribe
>
>

(In reply to comment #114)
> Created an attachment (id=35087) [details]
> new patch against current drm-intel-next

Hi Daniel,

Is it possible to merge the changes into a clean patch which applies to kernel version 2.6.32 and 2.6.33 (.32 preferred).

Since two hours fiddling around to get your patch cleanly applied to the latest lucid kernel. It compiles, but the patch changes so much things (splitting of files, etc.), so i haven't the slightest idea what you have really changed and what the patch really does.

Can you please post the code snippets which where introduced by you to solve the crashes?

Greetings Stefan

Vlad Pes (pesotsky-web) wrote :

I've searched for the information and this is what I've found (maybe it'll help):

http://bugs.archlinux.org/task/16974:
"Anyway the i855GM issues have just been radically fixed by Daniel Vetter, you can find the patch and the complete kernel sources archive here:http://www.iragan.com/linux/i855GM/"

If this doesn't help, (I don't know much about it but I've thought maybe this could be the solution): make alternative xorg + intel-driver for 855gm (which works) as a manual installation available for those who have such hardware.

Cheers,
Vlad

Geir Ove Myhr (gomyhr) wrote :

Vlad, thank you for taking an interest in helping. We have followed the upstream bug report (freedesktop-bugs #27187 at the top of this page) and we know about the fix.

>"Anyway the i855GM issues have just been radically fixed by Daniel Vetter, you can find the patch and the complete kernel
sources archive here:http://www.iragan.com/linux/i855GM/"

Two key words here are radically and kernel. Daniel has rewritten some parts of the kernel in order to achieve this and the fix is based on will be kernel 2.6.35 which is much newer than what is in Lucid. So the two problems are: 1. The fix changes a lot of things and could lead to a regression for others. Not something we want right before the release of Lucid. 2. The fix doesn't necessarily apply on the current Lucid kernel. After the release, we may be able to backport, test, and update if it doesn't lead to problems for anyone else.

> If this doesn't help, (I don't know much about it but I've thought maybe this could be the solution): make alternative xorg + intel-driver for 855gm (which works) as a manual installation available for those who have such hardware.

It is the kernel that needs an alternative version, changing xorg will not fix this. It's not too hard to compile, and I would already have done this if I had a i386 installation of Lucid available.

(In reply to comment #114)
> Created an attachment (id=35087) [details]
> new patch against current drm-intel-next
> ...
> Please test this patch thoroughly.
I've tested it on my Travelmate 660 since Friday evening, and it worked wonderful. Here are my specs:
00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)

I've started xcompmgr on top of openbox. Every night four glxgears were running. And on daily use firefox (with a lot of flash videos) a high resolutioned trailer of IronMan2 with mplayer, eclipse and a transparent consolte were running, without a freeze.
I would say it works! Thanks!

Tested-By: Arthur Spitzer <email address hidden> (Acer Travelmate 660)

(In reply to comment #117)
> Is it possible to merge the changes into a clean patch which applies to kernel
> version 2.6.32 and 2.6.33 (.32 preferred).
> Since two hours fiddling around to get your patch cleanly applied to the latest
> lucid kernel.

Stefan, for the Lucid kernel we would want a patch against the 2.6.33 kernel, since Ubuntu (and some other distros) use 2.6.32 kernels with drm from 2.6.33.

(In reply to comment #115)
> (In reply to comment #114)
> > Created an attachment (id=35087) [details] [details]
> > new patch against current drm-intel-next
> >
> > Performance should be about the same as old ums code or unpatched kernel - but
> > stable ;)
> >
> General performance has indeed increased, however I can clearly notice
> "hickups" during major load on the GPU. Like when starting mozilla apps or wine
> apps; this was not noticeable with previous v7+locking patchset.
>
> As you said it is like old UMS code, and actually glxgears FPS is comparable
> (if not better); so the only minor issue is the hickups that I am experiencing.
> These hickups also hang the mouse for some seconds, so I suppose it is some
> locking on the GPU pipe, but I can't read the changes in your v8 patch so just
> suppositions.
>
Daniel the patch fixed every bug, I almost forgot that I was running a testing stack. Regarding the hickup: it most probably is perfectly normal and was concealed in previous versions of the patch because the overall performance was slower so the hickups could not be "felt"

Please add my Tested-by line, it's ready for me.

Hi,

On Mon, Apr 19, 2010 at 7:39 AM, Geir Ove Myhr <email address hidden> wrote:
> Vlad, thank you for taking an interest in helping. We have followed the
> upstream bug report (freedesktop-bugs #27187 at the top of this page)
> and we know about the fix.
>
>>"Anyway the i855GM issues have just been radically fixed by Daniel Vetter, you can find the patch and the complete kernel
> sources archive here:http://www.iragan.com/linux/i855GM/"

I took a look over the patches; whilst there is a radical re-working
of things - what is interesting is seeing what was shipped
(supposedly, this is from the comment and not verified) in Fedora 13.
They have a 2 line patch.

From http://www.iragan.com/linux/i855GM/old_patches/drm-intel-big-hammer.patch

RedHat patch used in Fedora Core 13.

This patch prevents instantaneous crashes when starting Xorg.

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 37427e4..08af9db 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2553,6 +2553,11 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,

  mutex_lock(&dev->struct_mutex);

+ /* We don't get the flushing right for these chipsets, use the
+ * big hamer for now to avoid random crashiness. */
+ if (IS_I85X(dev) || IS_I865G(dev))
+ wbinvd();
+
  i915_verify_inactive(dev, __FILE__, __LINE__);

  if (dev_priv->mm.wedged) {

> It is the kernel that needs an alternative version, changing xorg will
> not fix this. It's not too hard to compile, and I would already have
> done this if I had a i386 installation of Lucid available.

It may be worthwhile asking the kernel guys about incorporating this
into the kernel. I am compiling 2.6.32-21-generic to test this out to
see if it works for me. I'll report back in a few hours.

Cheers,
Anand

> --- Comment #117 from Stefan Glasenhardt <email address hidden> 2010-04-18 15:32:41 PDT ---
> Can you please post the code snippets which where introduced by you to solve
> the crashes?

As a preview, my local topic branch is available at

http://cgit.freedesktop.org/~danvet/drm/log/?h=stuff/i8xx_cache_coherency_for_oga

The relevant patches start after "Enable distcc". On my further merge
plans: As already said, I hope to send the last patch pile (containing the
real fix) for review in a few days, pending merging of the previous
submissions. If it survives review intact I'll backport just the fix for
.34 and earlier kernels.

So taking testing/relase delays on each stage (-next, .34, -stable) into
account, expect a few weeks before this hits a stable kernel near you,
best-case scenario.

Chris Halse Rogers (raof) wrote :

I'm aware of that “big hammer” patch from the upstream bug. The problem is that it's not a fix - reports on the upstream bug indicate that it reduces the incidence of crashes, and the analysis by upstream is that it simply introduces a delay which makes it less likely to trigger the problem.

It was decided that as this was not going to be sufficient we wouldn't pursue this patch for Lucid. We'll be looking at pulling Daniel Vetter's fix into a PPA kernel at some point, and given sufficient testing it *might* be an SRU for 10.04.

Created an attachment (id=35159)
dmesg of a failed chipset flush warning

Since the libdrm and intel driver updates my system seems to be rock solid.
That said, I tried your patches to see if it made any change. v7 had horrible
performance, as expected, but v8 is considerably slower than unpatched too. E.g.
"time dmesg" in rxvt takes a lot longer (2x or more) and it all feels slightly
sluggish, not snappy as it is when unpatched. On the upside, it seems the text corruption is fixed by v8, though I'm not totally sure.

On the downside, just when I thought everything was fine, I got the following warning (for the first time):

WARNING: at /home/indan/src/linux-2.6/drivers/char/agp/intel-gtt.
c:1007 intel_i830_chipset_flush+0x2e3/0x32d()
Hardware name: 2371GHG
i8xx chipset flush failed, expected: 827, cpu_read: 315

So it seems we're not there yet.

> --- Comment #122 from Indan Zupancic <email address hidden> 2010-04-19 05:36:35 PDT ---
> Since the libdrm and intel driver updates my system seems to be rock solid.
> That said, I tried your patches to see if it made any change. v7 had horrible
> performance, as expected, but v8 is considerably slower than unpatched too.
> E.g.
> "time dmesg" in rxvt takes a lot longer (2x or more) and it all feels slightly
> sluggish, not snappy as it is when unpatched. On the upside, it seems the text
> corruption is fixed by v8, though I'm not totally sure.

This is just with the kernel changed, right? Because 2.11 has taken a
rather severe hit against 2.10 for i8xx chipsets on some workloads (I'm
working on fixing it).

> On the downside, just when I thought everything was fine, I got the following
> warning (for the first time):
>
> WARNING: at /home/indan/src/linux-2.6/drivers/char/agp/intel-gtt.
> c:1007 intel_i830_chipset_flush+0x2e3/0x32d()
> Hardware name: 2371GHG
> i8xx chipset flush failed, expected: 827, cpu_read: 315
>
> So it seems we're not there yet.

Depends. It's definitely just a failed chipset flush (I've checked the
offset). But given enough time and testers, this is somewhat expected
because this patch just implements a probabilistic chipset flush. Tallying
all the chipset flushes of all testers easily gives on the order of 100mm
successful ones. Now if yours is the only one that failed, that's not a
problem. Please keep an eye on this and report any reoccurences - some
more tuning might be called for (perhaps even a module parameter).

Also please report if the glyph corruptions show up again.

(In reply to comment #123)
> This is just with the kernel changed, right? Because 2.11 has taken a
> rather severe hit against 2.10 for i8xx chipsets on some workloads (I'm
> working on fixing it).

Yes, all userspace is unchanged since I switched to 2.11 and newer libdrm.

I didn't notice any regressions with 2.11 compared to 2.10 though, but I'm
only using 2D with xcompmgr -a running.

> Depends. It's definitely just a failed chipset flush (I've checked the
> offset). But given enough time and testers, this is somewhat expected
> because this patch just implements a probabilistic chipset flush. Tallying
> all the chipset flushes of all testers easily gives on the order of 100mm
> successful ones. Now if yours is the only one that failed, that's not a
> problem. Please keep an eye on this and report any reoccurences - some
> more tuning might be called for (perhaps even a module parameter).

Well, it's curious I never got it with the v7 patch, while I ran that one for days.

A module parameter to dis/enable this canary stuff would be good, it just seems to slow things down for me without improving anything.

I wonder if it's in any way significant that the difference between the expected 827 and cpu_read 315 is precisely 512... Did anyone try to increase I830_MCH_WRITE_BUFFER_SIZE to something bigger?

Looking at the commit, especially the description, it seems like there's no way to do proper chipset flushes. Maybe hunt down and confront an Intel developer? Or avoid the need to do flushes, but that's probably unrealistic. On the other hand, if you can't really flush, you can't really depend on it either.

> Also please report if the glyph corruptions show up again.

I will.

Okay, while writing this I got a second warning:

i8xx chipset flush failed, expected: 4043, cpu_read: 3531

The difference is again exactly 512.

Maybe the chipset flushing is fine, but there's a different bug making it seem to fail?

Created an attachment (id=35167)
failed flush with

Yesterday I've had a failed flush as well, the only one since I applied patch in attachment #35087.
Currently running:
  x11-base/xorg-server-1.7.6
  x11-libs/libdrm-2.4.20
  media-libs/mesa-7.8.1
  xf86-video-intel at commit 80f52482c7cde000a76b91fe3d8b6c16baf2141f
                             XvMC: fix memory overflow
                             8 April 2010, by Daniel Vetter

(In reply to comment #125)
> Created an attachment (id=35167) [details]
[30125.064301] i8xx chipset flush failed, expected: 648447, cpu_read: 647935
The difference is again 512. Suggests that it is usually/always bit 8 (counting from 0) that comes out wrong(?)

> --- Comment #124 from Indan Zupancic <email address hidden> 2010-04-19 11:26:15 PDT ---
> > Depends. It's definitely just a failed chipset flush (I've checked the
> > offset). But given enough time and testers, this is somewhat expected
> > because this patch just implements a probabilistic chipset flush. Tallying
> > all the chipset flushes of all testers easily gives on the order of 100mm
> > successful ones. Now if yours is the only one that failed, that's not a
> > problem. Please keep an eye on this and report any reoccurences - some
> > more tuning might be called for (perhaps even a module parameter).
>
> Well, it's curious I never got it with the v7 patch, while I ran that one for
> days.
>
> A module parameter to dis/enable this canary stuff would be good, it just seems
> to slow things down for me without improving anything.
>
> I wonder if it's in any way significant that the difference between the
> expected 827 and cpu_read 315 is precisely 512... Did anyone try to increase
> I830_MCH_WRITE_BUFFER_SIZE to something bigger?

The fact that it's 512 shows that the problem is a failed cacheflush and
nothing else (this is actually what I've checked). The chipset flush
checker changes the place it writes the check value every chipset flush.
And it reuses the same place every 512th chipset flush. So when the
chipset flush failes, the old value is there, which should be exactly 512
less than what's expected.

> Looking at the commit, especially the description, it seems like there's no way
> to do proper chipset flushes. Maybe hunt down and confront an Intel developer?
> Or avoid the need to do flushes, but that's probably unrealistic. On the other
> hand, if you can't really flush, you can't really depend on it either.

Well, there is _no_ way to do a reliable flush. And the hw docs explicitly
says so. But we need to move stuff in/out of the graphics mem (i.e. the
gtt). The other option would be to copy stuff in/out, which is even worse:
- Wastes memory (actually simply uses twice as much for everything).
- Would be even slower than what my hack currently does.

And to add insult to injury, some of the chipsets from the 2nd gen (i8xx)
suffer from other cache coherency problems in addition to this.

> > Also please report if the glyph corruptions show up again.
>
> I will.
>
> Okay, while writing this I got a second warning:
>
> i8xx chipset flush failed, expected: 4043, cpu_read: 3531

Ok, that's bad. Can you change the following define in
include/drm/intel-gtt.h and see whether you still get failed chipset
flushes?

-#define I830_CC_CANARY_FLOCK_GTT_PAGES 8
+#define I830_CC_CANARY_FLOCK_GTT_PAGES 16

The whole stuff make somewhat more sense this way around, anyway.

Oh, and add some details about your box, please (brand&model + cpu,
mostly, the rest is all in the dmesg, anyway).

Brian Rogers (brian-rogers) wrote :

For those who need it, I have a kernel with Daniel Vetter's patch in my experimental PPA: https://launchpad.net/~brian-rogers/+archive/experimental

Download full text (3.7 KiB)

OK Guys, I've tried the :
fix-i8xx-gtt-cache-coherency-v7.patch
locking_for_chipset_flush.patch
&
gtt_space_null_means_no_pages_ref.patch
patches against 2.6.34-rc3 for about a week now.

I have :
Intel Corporation 82852/855GM Integrated Graphics Device (rev 02) (prog-if 00 [VGA controller])

libdrm 2.4.18
xorg-x11-drv-intel-2.9.1
xorg-x11-server-Xorg-1.7.6

The patch seems quite stable, although I just experienced a (Non fatal) Crash.
So nice job guys, it's much better.

The crash I just got looks like :

Apr 19 20:28:48 m3n kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Apr 19 20:28:48 m3n kernel: render error detected, EIR: 0x00000000
Apr 19 20:28:51 m3n kdm[1368]: X server for display :0 terminated unexpectedly

It is the first time it has crashed using the patches, the screen just went black, input was totally lost, I was able to safely shut down the machine by pressing the power button once and I am seeing lots of stuff like :

Apr 19 16:35:59 m3n kernel: chipset flush no. 2752512, max retries 7
Apr 19 16:42:43 m3n kernel: chipset flush no. 2768896, max retries 7
Apr 19 16:45:03 m3n kernel: chipset flush no. 2785280, max retries 7
Apr 19 16:45:58 m3n kernel: chipset flush no. 2801664, max retries 7
Apr 19 16:47:54 m3n kernel: chipset flush no. 2818048, max retries 7
Apr 19 16:49:08 m3n kernel: chipset flush no. 2834432, max retries 7
Apr 19 16:50:26 m3n kernel: chipset flush no. 2850816, max retries 7
Apr 19 16:52:57 m3n kernel: chipset flush no. 2867200, max retries 7
Apr 19 16:56:29 m3n kernel: chipset flush no. 2883584, max retries 7
Apr 19 16:57:46 m3n kernel: chipset flush no. 2899968, max retries 7
Apr 19 16:58:27 m3n kernel: chipset flush no. 2916352, max retries 7
Apr 19 17:02:14 m3n kernel: chipset flush no. 2932736, max retries 7
Apr 19 17:05:27 m3n kernel: chipset flush no. 2949120, max retries 7
Apr 19 17:10:07 m3n kernel: chipset flush no. 2965504, max retries 7
Apr 19 17:12:00 m3n kernel: chipset flush no. 2981888, max retries 7
Apr 19 17:15:20 m3n kernel: chipset flush no. 2998272, max retries 7
Apr 19 17:17:07 m3n kernel: chipset flush no. 3014656, max retries 7
Apr 19 17:17:54 m3n kernel: chipset flush no. 3031040, max retries 7
Apr 19 17:18:57 m3n kernel: chipset flush no. 3047424, max retries 7
Apr 19 17:26:08 m3n kernel: chipset flush no. 3063808, max retries 7
Apr 19 17:26:37 m3n kernel: chipset flush no. 3080192, max retries 7
Apr 19 17:28:20 m3n kernel: chipset flush no. 3096576, max retries 7
Apr 19 17:28:55 m3n kernel: chipset flush no. 3112960, max retries 7
Apr 19 17:29:58 m3n kernel: chipset flush no. 3129344, max retries 7
Apr 19 17:30:20 m3n kernel: chipset flush no. 3145728, max retries 7
Apr 19 17:30:44 m3n kernel: chipset flush no. 3162112, max retries 7
Apr 19 17:31:26 m3n kernel: chipset flush no. 3178496, max retries 7
Apr 19 17:31:54 m3n kernel: chipset flush no. 3194880, max retries 7
Apr 19 17:35:13 m3n kernel: chipset flush no. 3211264, max retries 7
Apr 19 17:35:34 m3n kernel: chipset flush no. 3227648, max retries 7
Apr 19 17:37:11 m3n kernel: chipset flush no. 3244032, max retries 7
Apr 19 17:38:20 m3n kernel: chipset flush no. 3260416, max retries 7
Apr 19 17:...

Read more...

> --- Comment #128 from Tony White <email address hidden> 2010-04-19 13:04:24 PDT ---
> OK Guys, I've tried the :
> fix-i8xx-gtt-cache-coherency-v7.patch
> locking_for_chipset_flush.patch
> &
> gtt_space_null_means_no_pages_ref.patch
> patches against 2.6.34-rc3 for about a week now.
>
> I have :
> Intel Corporation 82852/855GM Integrated Graphics Device (rev 02) (prog-if 00
> [VGA controller])
>
> libdrm 2.4.18
> xorg-x11-drv-intel-2.9.1
> xorg-x11-server-Xorg-1.7.6

Thanks alot for testing. Unfortunately the versions you're using are
known-broken. Please retest with the latest&greatest (currently libdrm
2.4.20 and xf86-video-intel 2.11). Also, when the gpu hangs (as indicated
by the hangcheck time elapsed error in the dmesg) always grab an
i915_error_state (from the dri directory of the debugfs filesystem). That
file contains the a dump of the gpu state when it died with all the
necessary info to debug such a hang (the dmesg only tells that the gpu
died, but misses all the other needed info).

(In reply to comment #129)
> > --- Comment #128 from Tony White 2010-04-19
> > libdrm 2.4.18
> > xorg-x11-drv-intel-2.9.1
> Thanks alot for testing. Unfortunately the versions you're using are
> known-broken. Please retest with the latest&greatest (currently libdrm
> 2.4.20 and xf86-video-intel 2.11).

Daniel, would you be able to give a list of commits that fix this kind of bugs? Ubuntu is currently frozen and has those versions, but it would be nice to have a list of candidate patches for updates. We already have
[0c47195ca805881e3fbd5b9224be5c930feeeb8c] i830: Clip solid fills to surface

> --- Comment #130 from Geir Ove Myhr <email address hidden> 2010-04-19 13:53:18 PDT ---
> Daniel, would you be able to give a list of commits that fix this kind of bugs?
> Ubuntu is currently frozen and has those versions, but it would be nice to have
> a list of candidate patches for updates. We already have
> [0c47195ca805881e3fbd5b9224be5c930feeeb8c] i830: Clip solid fills to surface

For a definite answer, please ask Chris, but a quick scan shows the
following commit since libdrm 2.4.18 as an important fix
 - a4041e096ce0faea3dd39b4d78014d45a8cacec0 (intel: Repeat execbuffer if
   interrupted by signal)

Download full text (3.8 KiB)

(In reply to comment #127)
> The fact that it's 512 shows that the problem is a failed cacheflush and
> nothing else (this is actually what I've checked). The chipset flush
> checker changes the place it writes the check value every chipset flush.
> And it reuses the same place every 512th chipset flush. So when the
> chipset flush failes, the old value is there, which should be exactly 512
> less than what's expected.

Yeah, I figured it would be that, reading through your old comments.

By the way, I think I got those failed flushes without xcompmgr running.
(I killed it to see if there was any difference.) That might explain why I
didn't see failed flushes before, xcompmgr is more or less always running.

My case might be related to suspend, because both failures happened within
a minute or so from resume.

I wish I knew a way to trigger it easily, now it takes days to test anything.

> Well, there is _no_ way to do a reliable flush. And the hw docs explicitly
> says so. But we need to move stuff in/out of the graphics mem (i.e. the
> gtt). The other option would be to copy stuff in/out, which is even worse:
> - Wastes memory (actually simply uses twice as much for everything).
> - Would be even slower than what my hack currently does.
>
> And to add insult to injury, some of the chipsets from the 2nd gen (i8xx)
> suffer from other cache coherency problems in addition to this.

What I don't understand is why your patch slows things down so much for me,
it seems to do only a few thousand flushes anyway.

I guess copying around is what the old drivers did?

Some random ideas:

- Increase I830_MCH_WRITE_BUFFER_SIZE?

- Instead of writing zeroes, actually change the content of the flush page.
  Flushing caches doesn't seem to do much if the new content is the same as
  the old one?

- The text you quoted in one of your commit messages said that the memory
  content isn't coherent, but it didn't say anything about the mapping itself.
  Can't you update the gtt mapping to effectively flush it? I mean, if you
  move pages out of the gtt and back in, shouldn't that flush the old content?
  Maybe move it to a different index, e.g. insert new mapping to the start
  instead of the end, in case the hw caches it by address+index. Similar to
  Chris Wilson's gtt disabling thing, but instead of disabling, altering it
  in a smart, flush causing way.

If the problem is that the flush is needed to avoid the hardware from writing
stale data to old gtt mapped physical memory:

- If an entry is added, there should be no need for a flush, because the all
  memory is still valid. If an entry is removed, the gpu can continue to write
  to those pages. What about copying the content to a new physical page and
  keeping the original page for a while until the gpu is done with it?

(I don't know what I'm talking about, just trying to inspire you to come up
with some genius plan to solve all problems. :-)

> Ok, that's bad. Can you change the following define in
> include/drm/intel-gtt.h and see whether you still get failed chipset
> flushes?
>
> -#define I830_CC_CANARY_FLOCK_GTT_PAGES 8
> +#define I830_CC_CANARY_FLOCK_GTT_PAGES 16
>
> The whole stuff make ...

Read more...

(In reply to comment #132)
> (In reply to comment #127)
> > Oh, and add some details about your box, please (brand&model + cpu,
> > mostly, the rest is all in the dmesg, anyway).
>
> See my first post: Thinkpad X40, 855GM (rev 02), Pentium M (family 6, model 13,
> stepping 6: It has clflush).
>
> But the hangs are gone, so I'm happy. I prefer slight glyph corruption that
> goes
> away when I cause a refresh (e.g. increase text size) with snappy performance
> to
> the sluggishness caused by the current patch.

I also own an 855GM (rev 02), but I had no glyph corruption with patch v6; without the locking patch I experienced crashes, so the most recent patch is really necessary for me, although I'd also like to see it more performant. But first comes reliability, and right now it's not crashing anymore.

> --- Comment #132 from Indan Zupancic <email address hidden> 2010-04-19 15:26:36 PDT ---
> What I don't understand is why your patch slows things down so much for me,
> it seems to do only a few thousand flushes anyway.

Well, worst-case a flush can take 1 ms.

> I guess copying around is what the old drivers did?

Nope. But for various reasons it changed mappings _much_ less. So much
less likely to crash.

> Some random ideas:
>
> - Increase I830_MCH_WRITE_BUFFER_SIZE?

Tried. Given up at 64 kb.

> - Instead of writing zeroes, actually change the content of the flush page.
> Flushing caches doesn't seem to do much if the new content is the same as
> the old one?

Patch does that atm for all writes. Furthermore I've never seen hw that
clever (it's a total worthless optimization, usually).

> - The text you quoted in one of your commit messages said that the memory
> content isn't coherent, but it didn't say anything about the mapping itself.
> Can't you update the gtt mapping to effectively flush it? I mean, if you
> move pages out of the gtt and back in, shouldn't that flush the old content?
> Maybe move it to a different index, e.g. insert new mapping to the start
> instead of the end, in case the hw caches it by address+index. Similar to
> Chris Wilson's gtt disabling thing, but instead of disabling, altering it
> in a smart, flush causing way.

Well, that's exactly where the shit usually hits the fan. Furthermore, at
least on i845 there are chipset errata that says (no joke) if you change a
mapping shortly before the gpu reads stuff from it, it may read adjacent
pages. Chris is trying to battle that one. Oh, and no, rewriting the gtt
entries doesn't flush data (only tlb, but not everywhere, see above).

> If the problem is that the flush is needed to avoid the hardware from writing
> stale data to old gtt mapped physical memory:
>
> - If an entry is added, there should be no need for a flush, because the all
> memory is still valid. If an entry is removed, the gpu can continue to write
> to those pages. What about copying the content to a new physical page and
> keeping the original page for a while until the gpu is done with it?

Something similar is already done. Look for scratch_page in intel-gtt.c

> > Ok, that's bad. Can you change the following define in
> > include/drm/intel-gtt.h and see whether you still get failed chipset
> > flushes?
> >
> > -#define I830_CC_CANARY_FLOCK_GTT_PAGES 8
> > +#define I830_CC_CANARY_FLOCK_GTT_PAGES 16
> >
> > The whole stuff make somewhat more sense this way around, anyway.
>
> I will try this later, first I'm going to try without your latest commit
> ("fix i85x gtt chipset flush") to see how it behaves without that stuff,
> both performance and amount of failed flushes.

If your X40 is anything like mine, you're in for a bad surprise :(

> > Oh, and add some details about your box, please (brand&model + cpu,
> > mostly, the rest is all in the dmesg, anyway).
>
> See my first post: Thinkpad X40, 855GM (rev 02), Pentium M (family 6, model 13,
> stepping 6: It has clflush).

Thanks, I'm regularly losing my overview with all the different testers on
this bug ;)

> --- Comment #133 from legolas558 <email address hidden> 2010-04-19 15:48:18 PDT ---
> I also own an 855GM (rev 02), but I had no glyph corruption with patch v6;
> without the locking patch I experienced crashes, so the most recent patch is
> really necessary for me, although I'd also like to see it more performant. But
> first comes reliability, and right now it's not crashing anymore.

Yep, I want to get this right first before performance tuning starts. But
I have already a few ideas how to improve the current situation.
- The current chipset flush always flushes both directions, but we usually
  only need one direction flushed. This is especially important because
  the slower flush is in gtt->cpu direction, which isn't performance
  critical at all.
- atm the driver executes enormous amounts of unnecessary flushes.
  Batching them up should fix this.

Christiansen (happylinux) wrote :

Just for the record and I have to back up Anand Kumria on comment #84.

I too am no longer able to start X (locks up/freezes completely) since kernel 2.6.32-21.32 on a ThinkPad with this adapter:

00:02.0 VGA compatible controller [0300]: Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 02)
Subsystem: IBM Device [1014:0557]

Lucid was reinstalled from the (yesterdays) latest LiveCD 2010.04.16 and all available updates applied as of 2010.04.19. Downgrading to kernel 2.6.32-21.31 from recovery console and I'm able to start X again.

(In reply to comment #110)
> Created an attachment (id=35065) [details]
> only call put_pages when gtt_space != NULL
>
> Ok, this might be the first real stab at that dreaded put_pages BUG. Everyone
> who's hitting this problem, please apply this patch on top of whatever kernel
> most easily reproduces the problem.

I've been hitting this bug at least once the last couple of days (my system crashed a couple of times, and only once I was able to extract logs). With this patch applied, I haven't seen this bug resurface so far.

However, my system crashes when starting doomsday or warzone2100 (both OpenGL games). I hadn't noticed this before, and is probably unrelated to this coherency bug. Where should I file this bug report? Mesa? Dmesg says:

[drm:i915_gem_do_execbuffer] *ERROR* Invalid object handle 48 at index 0

X log says:

[ 5132.404] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Bad file descriptor.

OpenGL hangs could be bug 26557.

Try reverting commit b4a6169412819cc3a027c6a118f0537911145a30.

That's a Mesa commit, BTW.

Same problem here, on a Toshiba A200 portege laptop with the 855GM
card in it.

Gavin

On 19 Apr 2010, at 17:40, Christiansen <email address hidden> wrote:

> Just for the record and I have to back up Anand Kumria on comment #84.
>
> I too am no longer able to start X (locks up/freezes completely) since
> kernel 2.6.32-21.32 on a ThinkPad with this adapter:
>
> 00:02.0 VGA compatible controller [0300]: Intel Corporation
> 82852/855GM Integrated Graphics Device [8086:3582] (rev 02)
> Subsystem: IBM Device [1014:0557]
>
> Lucid was reinstalled from the (yesterdays) latest LiveCD 2010.04.16
> and
> all available updates applied as of 2010.04.19. Downgrading to kernel
> 2.6.32-21.31 from recovery console and I'm able to start X again.
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of the bug.
>

(I don't own an 855 and my 845 machine is not available right now, so this is just wild speculation.)

Could some of the chipset buffers be indexed by the SDRAM bank number (and maybe even the row (side) number)? I'm imagining a scenario where the CPU and the GTT sides have separate SDRAM write buffers that are not kept coherent (their access to the actual RAM can be arbitrated), and each write buffer has one or two cache lines for each bank; this might be a relatively easy way to make simultaneous access to different banks in parallel. There seems to be 4 banks on the 845, and the bank number can be between bits 11-12 and 14-15, depending on the DRAM modules installed; perhaps the situation is similar on the 855 as well. If this is the case, 16 physically contiguous pages should cover all banks, while non-contiguous ones might not be so if we are particularly unlucky in intel_i830_setup_flush(), which is called when resuming.

To test this theory, maybe we can print the physical addresses (those within the System RAM range in /proc/iomem) of the allocated i8xx_pages. Then, when we see retried or even failed flushes, perhaps some patterns can be observed.

> --- Comment #139 from <email address hidden> 2010-04-19 20:32:56 PDT ---
> Could some of the chipset buffers be indexed by the SDRAM bank number (and
> maybe even the row (side) number)? I'm imagining a scenario where the CPU and
> the GTT sides have separate SDRAM write buffers that are not kept coherent
> (their access to the actual RAM can be arbitrated), and each write buffer has
> one or two cache lines for each bank; this might be a relatively easy way to
> make simultaneous access to different banks in parallel. There seems to be 4
> banks on the 845, and the bank number can be between bits 11-12 and 14-15,
> depending on the DRAM modules installed; perhaps the situation is similar on
> the 855 as well. If this is the case, 16 physically contiguous pages should
> cover all banks, while non-contiguous ones might not be so if we are
> particularly unlucky in intel_i830_setup_flush(), which is called when
> resuming.

Neat idea. I'll look into allocating the pages as one big chunk (ie higher
order alloc). But that doesn't explain why the problem seems to happen
only after a resume - the pages don't get reallocated on resume (look for
"goto setup" in intel_i830_setup_flush.

Valentijn Sessink (valentijn) wrote :

IBM X40 laptop, 82852/855GM rev. 02, also: no working graphical environment since 2.6.32-21; I manually installed 2.6.32.20 to get a working environment. Unfortunately, there's no crash information whatsoever, the system just freezes hard, without logs or otherwise usable information. Setting i915.modeset=0 does not help.

royden (ryts) wrote :

Setting i915.modeset=1 via grub at boot resulted in a successful launch of X & gdm for me, (given the same symptoms as V. Seesink ie no log indicators of problem)

18 chipset failures in 1h of uptime with a resume from hibernation (seems totally unrelated for me).

[ 140.678789] i8xx chipset flush failed, expected: 4642, cpu_read: 4130
[ 382.334636] i8xx chipset flush failed, expected: 32422, cpu_read: 31910
[ 916.360151] i8xx chipset flush failed, expected: 85629, cpu_read: 85117
[ 1461.747517] i8xx chipset flush failed, expected: 142082, cpu_read: 141570
[ 2256.590632] i8xx chipset flush failed, expected: 196727, cpu_read: 196215
[ 4106.345442] i8xx chipset flush failed, expected: 267271, cpu_read: 266759
[ 5147.195196] i8xx chipset flush failed, expected: 309181, cpu_read: 308669
[ 6185.589716] i8xx chipset flush failed, expected: 354133, cpu_read: 353621
[ 8005.430094] i8xx chipset flush failed, expected: 437064, cpu_read: 436552
[ 8114.898367] i8xx chipset flush failed, expected: 444113, cpu_read: 443601

no "max retries" line.

Xorg 1.7.6
libdrm 2.4.19
mesa 7.7.1

We don't need further confirmations from people seeing the issue.

Also, we don't need further reports of how it was worked around; we know
there's several ways of working around it. Unfortunately what works for
one person doesn't for another.

This has been an extremely frustrating bug from a developer perspective,
probably almost as frustrating as it is from a user perspective. It
seems whenever we make a change to fix something for one set of users,
it just causes breakage for some other set. There does not seem to be
any particular combination of knob settings that makes things functional
for *all* users.

So what we've opted to do is turn off KMS for this hardware but pretty
much leave all other settings to defaults. So people for whom this
configuration works will have 3D and all the usual -intel functionality,
just not the boot prettiness. For the set of users that find this is
not a good configuration, we've documented the issue in the release
notes with a link to the various workarounds people have found, here:

  https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes

If anyone discovers additional workarounds, or has ideas on improving
this documentation, please feel welcome to edit this page. It may help
your fellow 8xx users.

It is our hope that there will come to be upstream fixes that are viable
to backport. If we get enough fixes that we feel confident, we *might*
re-enable KMS for 8xx chips on lucid at some point.

There are likely to be a lot of patches flying around. If you wish to
provide them in a PPA, that's cool. Just be mindful that our goal
ultimately is to get a fix into Lucid, and time you can put towards that
goal could help a lot.

The nature of this bug is such that it's really sensitive to
conditions. So you may find a configuration or patch that makes the
issue totally go away on your system and someone else's, but breaks
things on 3 other people's systems with exactly the same hardware. So
getting a patch that fixes it for *everyone* is going to be really
tough.

Download full text (4.0 KiB)

c'mon you know it is a security bug that relates to Xorg (intel i915GM) ,
GDM, compiz, emerald and Ubuntu update manager (every crash is tied to
update manager or maybe the way it uses network and choice of algorithms to
verify your integrity sums of packages). every symptom had been reproduced
and can be reproduced again. i am running on some legacy equipment. let's
see if it works on UNIX without GDM

On Wed, Apr 21, 2010 at 2:16 AM, Bryce Harrington <<email address hidden>
> wrote:

> We don't need further confirmations from people seeing the issue.
>
> Also, we don't need further reports of how it was worked around; we know
> there's several ways of working around it. Unfortunately what works for
> one person doesn't for another.
>
> This has been an extremely frustrating bug from a developer perspective,
> probably almost as frustrating as it is from a user perspective. It
> seems whenever we make a change to fix something for one set of users,
> it just causes breakage for some other set. There does not seem to be
> any particular combination of knob settings that makes things functional
> for *all* users.
>
> So what we've opted to do is turn off KMS for this hardware but pretty
> much leave all other settings to defaults. So people for whom this
> configuration works will have 3D and all the usual -intel functionality,
> just not the boot prettiness. For the set of users that find this is
> not a good configuration, we've documented the issue in the release
> notes with a link to the various workarounds people have found, here:
>
> https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes
>
> If anyone discovers additional workarounds, or has ideas on improving
> this documentation, please feel welcome to edit this page. It may help
> your fellow 8xx users.
>
> It is our hope that there will come to be upstream fixes that are viable
> to backport. If we get enough fixes that we feel confident, we *might*
> re-enable KMS for 8xx chips on lucid at some point.
>
> There are likely to be a lot of patches flying around. If you wish to
> provide them in a PPA, that's cool. Just be mindful that our goal
> ultimately is to get a fix into Lucid, and time you can put towards that
> goal could help a lot.
>
> The nature of this bug is such that it's really sensitive to
> conditions. So you may find a configuration or patch that makes the
> issue totally go away on your system and someone else's, but breaks
> things on 3 other people's systems with exactly the same hardware. So
> getting a patch that fixes it for *everyone* is going to be really
> tough.
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help
> manage other bug reports.
>
> Most bug reports on i855 are probably due to ...

Read more...

pdecat (pdecat) wrote :

On my legacy hardware (Asus S1300N from 2003) with Lucid fully updated, the graphical login screen won't show without KMS.

Applying option 3 solves it :
echo options i915 modeset=1 | sudo tee /etc/modprobe.d/i915-kms.conf

If I then click on my login entry, Xorg crashes immediately berore I can enter my password.
The screen flickers several times then I am informed that Ubuntu works in lower graphics mode.

More details about my hardware :

$ lspci -vs 00:02
00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03)
        Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3141
        Flags: bus master, fast devsel, latency 0, IRQ 28
        Memory at fd000000 (64-bit, non-prefetchable) [size=4M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at ff00 [size=8]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915

00:02.1 Display controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03)
        Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3141
        Flags: fast devsel
        Memory at fda00000 (64-bit, non-prefetchable) [disabled] [size=1M]
        Capabilities: <access denied>

Regards,
Patrick.

pdecat (pdecat) wrote :

Oops, wrong lspci output, here's the correct one :

$ lspci -vs 00:02
00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
        Subsystem: ASUSTeK Computer Inc. Device 1712
        Flags: bus master, fast devsel, latency 0, IRQ 5
        Memory at f0000000 (32-bit, prefetchable) [size=128M]
        Memory at ffa80000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at dc00 [size=8]
        Capabilities: [d0] Power Management version 1
        Kernel driver in use: i915
        Kernel modules: i915

00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
        Subsystem: ASUSTeK Computer Inc. Device 1712
        Flags: bus master, fast devsel, latency 0
        Memory at e8000000 (32-bit, prefetchable) [size=128M]
        Memory at ff980000 (32-bit, non-prefetchable) [size=512K]
        Capabilities: [d0] Power Management version 1

Created an attachment (id=35211)
dmesg of deadlock with v8 patch

Hi,
with the v8 patch applied atop a drm-intel-next kernel (which is not too recent, around 2,5 weeks old), i got an apparent deadlock some time after resume. Lockdep is actually turned on (according to dmesg), so i really don't know why it says it's off.

Kernel 2.6.34-rc2 from drm-intel-next with v8 patch applied, lockdep enabled.
Xserver 1.7.6
libdrm 2.4.18
intel-drv 2.11

Dunno if it's related to the patch...

Cheers,
   Christian

(From update of attachment 35073)
resolved by updating to intel 2.11.

Created an attachment (id=35214)
DRI debugfs after overlay crash

Created an attachment (id=35215)
gtt flush failures (not fatal)

I have attached the flush failures found in dmesg (probably not related to the overlay crash) and the DRI debugfs after a crash happening when watching videos.

Looks like the overlay bug is not yet fixed. I had to watch the entire video collection of The Rockets, but finally I got an overlay filled with a nice blue, music still playing but Xorg inevitably dead. I could access a VT and take the dump, but any further attempt to restart Xorg was failing miserably.

Xorg was being filled indefinitively with these lines:

(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.

Nothing else was relevant there.

I think the flush failures are not tied to the overlay crash (I also have them when the system doesn't crash), but anyway I have added them here.

Daniel Baumann (dnjl) wrote :

I'm trying to backport Daniel Vetters patches (http://cgit.freedesktop.org/~danvet/drm/log/?h=stuff/i8xx_cache_coherency_for_oga) to current lucids kernel (2.6.32-21.32). It seems to be not too complex as lucids kernel has applied drm of kernel 2.6.33.

The experimental result can be found there:
https://launchpad.net/~dnjl/+archive/experimental/+packages?field.name_filter=linux&field.status_filter=published&field.series_filter=lucid

Changes summary:

linux (2.6.32-21.32+i8xxkmsfix2~dnjl1) lucid; urgency=critical

  [ Daniel Baumann ]
  * Revert "SAUCE: i915 KMS -- support disabling KMS for known broken devices"
    - LP: #563277
  * Revert "SAUCE: i915 KMS -- blacklist i830"
    - LP: #542208, #563277
  * Revert "SAUCE: i915 KMS -- blacklist i845g"
    - LP: #541492, #563277
  * Revert "SAUCE: i915 KMS -- blacklist i855"
    - LP: #511001, #541511, #563277
  * x86, lib: Add wbinvd smp helpers (stolen from 2.6.34)
  * Applied patches done by Daniel Vetter to fix above problems:
    - agp/intel-gtt: fix i85x gtt chipset flush
    - agp/intel-gtt: extract mch buffer flush in i830 chipset flush
    - agp/intel-gtt: check cache-coherency on i830 class chipsets
    - drm/i915: add locking around chipset flush
    - agp/intel-gtt: steal the last gtt page
    - agp/intel-gtt: kill previous_size assignments
    - agp/intel-gtt: kill intel_i830_tlbflush
    - agp/intel: make intel-gtt.c into a real source file
    - agp/intel: split out gmch/gtt probe, part 2
    - agp/intel: split out gmch/gtt probe, part 1
    - drm/intel: kill mutli_gmch_chip
    - agp/intel: uncoditionally reconfigure driver on resume
    - agp/intel: split out the GTT support
    - agp/intel: introduce intel-agp.h header file

On my system (Dell Latitude X300 with 855GM) I'm running this kernel and the libdrm and xorg intel driver from xorg-edgers (https://launchpad.net/~xorg-edgers/+archive/ppa). So I'm able to get my system up and login without any kernel/module parameters.
But desktop effects are not working and it's not stable at all. But crashes would not hard freeze the whole system - only X crashs sometimes sporadic. So it's possible to grab the debugging data in sysfs.

Those who are interessted in, please test this. It's not perfect, I know, but its run somewhat.

Does anyone has any hints how to further this?

tags: added: patch
Geir Ove Myhr (gomyhr) wrote :

Daniel Vetter plans to backport the fix. Here is what he writes (comment #121 upstream):

As already said, I hope to send the last patch pile (containing the
real fix) for review in a few days, pending merging of the previous
submissions. If it survives review intact I'll backport just the fix for
.34 and earlier kernels.

So taking testing/relase delays on each stage (-next, .34, -stable) into
account, expect a few weeks before this hits a stable kernel near you,
best-case scenario.

(In reply to comment #145)
> I think the flush failures are not tied to the overlay crash (I also have them
> when the system doesn't crash), but anyway I have added them here.

Some more information: while watching videos there are occasional glitching overlay frames (psychedelic colors, only 1 frame), sometimes interleaved with a blue fill like the one appearing when it gives up in a total Xorg crash. The blue frames seem to appear more frequently when Xorg is more prone to fatally giving up, but never seen them more than twice before a total Xorg crash.

Shall we split this bug into a new one?

(In reply to comment #146)
> (In reply to comment #145)
> > I think the flush failures are not tied to the overlay crash (I also have them
> > when the system doesn't crash), but anyway I have added them here.
>
> Some more information: while watching videos there are occasional glitching
> overlay frames (psychedelic colors, only 1 frame), sometimes interleaved with a
> blue fill like the one appearing when it gives up in a total Xorg crash. The
> blue frames seem to appear more frequently when Xorg is more prone to fatally
> giving up, but never seen them more than twice before a total Xorg crash.
>
> Shall we split this bug into a new one?

I think you're hitting the bug I had, which should be fixed by upgrading to
the 2.11 intel driver and libdrm 2.4.20.

Chris Halse Rogers (raof) wrote :

Since the DRI disablement patch didn't help as much as was hoped, unconditionally disables 3D, and will make it harder to grab a proper fix from upstream, it has been decided to revert it and instead document the various different workarounds for the Lucid release notes.

The attached debdiff drops the DRI disablement patch.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xserver-xorg-video-intel - 2:2.9.1-3ubuntu5

---------------
xserver-xorg-video-intel (2:2.9.1-3ubuntu5) lucid; urgency=low

  * Drop debian/patches/107_disable_dri_on_845_855.patch:
    + This attempt to work around the crashes on i845 and i855 documented in
      LP: #541492 and LP: #541511 simply shuffled the brokeness around. It
      hasn't helped enough, and it unconditionally disables 3D which worked for
      some users before.
 -- Christopher James Halse Rogers <email address hidden> Mon, 26 Apr 2010 10:14:02 +1000

Changed in xserver-xorg-video-intel (Ubuntu Lucid):
status: Triaged → Fix Released
Geir Ove Myhr (gomyhr) on 2010-04-26
Changed in xserver-xorg-video-intel (Ubuntu Lucid):
status: Fix Released → Triaged
Steve Langasek (vorlon) wrote :

Chris, what needs to be added to the release notes for this?

Changed in ubuntu-release-notes:
status: New → Incomplete
Bryce Harrington (bryce) wrote :

I'd put a general purpose "8xx freezes" note in there a while back, and it looks perfectly well suited for this. It does not itemize the workarounds except for using vesa, but provides a link to a wiki page with workarounds. I think that's probably better since it gives us some flexibility in steering users more accurately as fixes become available.

So I'm closing the release-notes task as good-nuff.

Changed in ubuntu-release-notes:
status: Incomplete → Fix Released

Created an attachment (id=35329)
GPU hang with newest driver and libdrm. v8 without extra flushing patch.

(In reply to comment #147)
> I think you're hitting the bug I had, which should be fixed by upgrading to
> the 2.11 intel driver and libdrm 2.4.20.

Okay, after a week or two (?) of running v8 without the last extra flushing commit I finally got a hung GPU again. So it seems there is a corner case left for this particular bug somewhere.

Last time I counted I got around 2% failed flushes, but otherwise the system was rock solid. Text corruption was rare too, though I think it did happen the day the GPU hung.

Dump was taken with this script:

#!/bin/bash
PATH="/bin:/usr/bin"
mount /mnt/debug
cd /tmp/

while true; do
 if grep -q 0 /mnt/debug/dri/0/i915_wedged; then
  sleep 1;
 else
  mkdir dump
  dmesg > dump/dmesg
  cp /var/log/Xorg.0.log dump/
  cp -a /mnt/debug/dri/0/* dump/
  tar czf dump.tgz dump
  rm -rf dump
  mv dump.tgz /home/indan/
  sync;
  exit;
 fi
done

(In reply to comment #134)
> > --- Comment #132 from Indan Zupancic <email address hidden> 2010-04-19 15:26:36 PDT ---
> > What I don't understand is why your patch slows things down so much for me,
> > it seems to do only a few thousand flushes anyway.
>
> Well, worst-case a flush can take 1 ms.

That would explain it yes.

[cut]
> > If the problem is that the flush is needed to avoid the hardware from writing
> > stale data to old gtt mapped physical memory:
> >
> > - If an entry is added, there should be no need for a flush, because the all
> > memory is still valid. If an entry is removed, the gpu can continue to write
> > to those pages. What about copying the content to a new physical page and
> > keeping the original page for a while until the gpu is done with it?
>
> Something similar is already done. Look for scratch_page in intel-gtt.c

But if done properly the need for flushing would go away altogether. Considering it's quite stable here without those extra flushes, perhaps it's easier to fix the corner cases that still need flushing instead of getting flushing reliable?

> > > Ok, that's bad. Can you change the following define in
> > > include/drm/intel-gtt.h and see whether you still get failed chipset
> > > flushes?
> > >
> > > -#define I830_CC_CANARY_FLOCK_GTT_PAGES 8
> > > +#define I830_CC_CANARY_FLOCK_GTT_PAGES 16
> > >
> > > The whole stuff make somewhat more sense this way around, anyway.
> >
> > I will try this later, first I'm going to try without your latest commit
> > ("fix i85x gtt chipset flush") to see how it behaves without that stuff,
> > both performance and amount of failed flushes.
>
> If your X40 is anything like mine, you're in for a bad surprise :(

Dmesg is full with backtraces, but other than that it's quite stable.
Performance is good too again.

Next week I should have a bit more time to read the code and do more testing.

> > > Oh, and add some details about your box, please (brand&model + cpu,
> > > mostly, the rest is all in the dmesg, anyway).
> >
> > See my first post: Thinkpad X40, 855GM (rev 02), Pentium M (family 6, model 13,
> > stepping 6: It has clflush).
>
> Thanks, I'm regularly losing my overview with all the different testers on
> this bug ;)

No problem, you're doing great. :-)

Tom (tom6) wrote :

1650 Ati card. LiveCd is fine. Installing gets me to login screen fine, no trouble

After login screen it goes black, never seems to reach desktop. Tried pressing F2 and typing "sudo reboot" but nothing happened. Keyboard lights were not working either. I could have stuffed up on the typing tho.

I thought it was because the 1650 is obsolete and unsupported by ati now. Their legacy driver broke my 9.04 system but so did the OpenSource ati driver. Somehow 9.04 seemed to work fine with defaults tho?? lol

10.04 only works in "Failsafe" low graphics mode but this is set to 1280by1024 rather than my preferred 1024by768. I'm still playing around trying to get the resolution down from low graphics mode!! lol, all fun & games. I have a dual-boot (with 9.04) so my machine is still usable :)

Good luck and regards from
Tom :)

On Thu, Apr 29, 2010 at 04:57:49PM -0000, Tom wrote:
> 1650 Ati card.

This bug report is *only* for the i855 intel graphics card.

GPU lockup bugs are basically indistinguishable from a user point of
view, but they're almost always hardware-specific.

Filippo neri (fneri23) wrote :

My situation:
System: Thinkpad X40 Intel(R) Pentium(R) M processor 1.40GHz
Memory: 1.5 GiB, Intel 855GM Chipset
Ubuntu Release 10.04(lucid) installed from the beta2 CD and updated to the latest packages
(including lucid-proposed and lucid-backports.)
I have installed the following kernels:
2.6.32-19-generic
2.6.32-21-generic
2.6.32-22-generic
2.6.34-020634rc5-generic
(I upgraded to 2.6.32-22-generic today. )
Results (VERY repeatable!)
2.6.32-19-generic → OK
2.6.32-21-generic → Black screen
2.6.32-22-generic → Black screen
2.6.34-020634rc5-generic → OK
By OK I mean everything seems to work fine, and I can use the “Normal” Visual Effects setting.

filippo@X40:~$ uname -a

Linux X40 2.6.34-020634rc5-generic #020634rc5 SMP Tue Apr 20 10:07:04 UTC 2010 i686 GNU/Linux

filippo@X40:~$ dmesg | grep agp

[ 2.827117] Linux agpgart interface v0.103

[ 2.946581] agpgart-intel 0000:00:00.0: Intel 855GM Chipset

[ 2.947454] agpgart-intel 0000:00:00.0: detected 8060K stolen memory

[ 2.969120] agpgart-intel 0000:00:00.0: AGP aperture is 128M @ 0xe0000000

As far as I am concerned, this is a kernel bug introduced between 2.6.32-19 and 2.6.32-21,
but unfortunately, still present in 2.6.32-22. I have the choice of using 2.6.32-19, but I
will use 2.6.34-020634rc5 for the next few weeks to see if any problem arises.

Alban (seza) wrote :

So do I, keep 2.6.34 rc5 see all works fine with it..

For those who found that 2.6.32-19 (and -20) worked, but -21 and -22
don't work, it's likely that simply re-enabling KMS as recorded on
https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes will resolve your
problem.

Dana Olson (adolson) wrote :

Can't install Ubuntu 10.04 final due to X crashing due to this bug.

A workaround is found in this thread: http://ubuntuforums.org/showthread.php?t=1465883

I was able to install, but now am stuck in 640x480 resolution.

Dana Olson (adolson) wrote :

Sorry, I posted my comment #114 before seeing the rest of the comments in the thread. I thought I had seen them all, but apparently not.

I followed some advice (re-enable kms, disable dri and change driver to intel in xorg.conf) from Chris' link in his comment #113, and *so far* I am not having any lockups, and my screen res is back to 1280x1024.

Vilius (vilius) wrote :

Just upgraded my ThinkPad R50e from 9.10 to 10.04. Cannot get any further than black screen.

> 1. After power on your PC, press shift (keep press) until see boot loader menu. Choose a recovery mode option.
Nothing happens if I hold Shift. How do I get into the recovery mode?

pvanderploeg (pieter-nescio) wrote :

I have a Dell Latitude D400 with the I855 graphics card. The Lucid livecd does not boot. Just the word Ubuntu with the 5 red dots, and after a short while a blank screen, cd spins down and thats it. I reinstalled Karmic, and then upgraded to Lucid today (april 30). System does not boot anymore and I had to reinstall Karmic. My other laptop (D410) with a i915 graphics card does run the live cd ok, but I will wait with the upgrade.

@pvanderploeg,
I have the same issue with my D400 with the same graphics card. I just tried the workaround listed in posts #33 and #34 in the following thread, and it worked for me (http://ubuntuforums.org/showthread.php?t=1465883&page=4).

Basically, what I did was as follows:

Step 1: When booting into the live CD, I pressed Tab, and added the i915.modeset=1 option to the boot command line.
Refer to thread/post: http://ubuntuforums.org/showpost.php?p=9203466&postcount=33

Step 2: I installed Lucid with no problems, but when restarted, I of course ran into the same boot issue. So, I needed to make the changes in step 1 permanent.

Step 3: I restarted into the live environment again (as in step 1), mounted my local root partition, and as root, did two things in the /etc/ directory.
  a) permanently added "i915 modeset=1" (w/o quotes) to the grub configuration file (/etc/default/grub).
 Refer to thread/post: http://ubuntuforums.org/showpost.php?p=9171045&postcount=11 (as redirected from http://ubuntuforums.org/showpost.php?p=9203466&postcount=34)

 b)created the configuration file /etc/modprobe.d/i915-kms.conf and added the line "options i915 modeset=1" (w/o quotes) according to the first workaround listed in this thread:
https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes

And it now everything works. Finally. And life is good again. If you have any questions, feel free to PM me through launchpad. Although I am not an expert, I can at least explain what worked for me, although I couldn't tell you why. :)
 ~Elena

Gustavo (gstv.inc) wrote :

My HP 1130us laptop, 82852/855GM

Linux 2.6.32-21-generic' They NOT WORK FOR ME Freezing on purple screen

Linux 2.6.32-21-generic (recovery mode)' They NOT WORK FOR ME Freezing on purple screen

Linux 2.6.32-19-generic (recovery mode)' This work
Linux 2.6.32-19-generic ...' This work
i don't' now why but I still use the 19 since the 21 not work..

I do not know much but I want to know why something that worked before now has problems to work.One system has to give importance first the basic operation such as video, sound, and boot without crash, need to give great importance to the GPU from Intel such as more simpler and less powerful, because PCs with these GPUs are cheaper than those that have Nvidia, many people bought these PCs for the low price.
it makes you imagine that much more low-end PCs with GPUs out there, and everyone has a tendency to test new systems in their simplest PC before using them in their high end PCs.
when I heard that Linux was running smoothly on my old PC gave me a great interest because many people saw it a way to rescue your old hardware and test server at home and many other things,I hope that things like these become rare

It's escape, not shift. And you only need to do this if your GRUB
automatically goes right into the OS.

If you successfully are at the GRUB menu, select the second line,
"recovery mode" rather than the normal boot and select "failsafe
graphics mode" from the menu which eventually appears.

  http://www.howtogeek.com/howto/ubuntu/show-the-grub-menu-by-default-on-ubuntu/

Ahimsa

"As long as there are slaughterhouses, there will be battlefields." -Leo Tolstoy

-Jess E.

"I want a processor so powerful I can read the
manual by the light of the heat sink."- R.I.P. MRX

On Fri, Apr 30, 2010 at 4:50 AM, Vilius <email address hidden> wrote:
> Just upgraded my ThinkPad R50e from 9.10 to 10.04. Cannot get any
> further than black screen.
>
>> 1. After power on your PC, press shift (keep press) until see boot loader menu. Choose a recovery mode option.
> Nothing happens if I hold Shift. How do I get into the recovery mode?
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>
> Status in Ubuntu Release Notes: Fix Released
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help manage other bug reports.
>
> Most bug reports on i855 are probably due to the CPU/GPU incoherency problem that is now consolidated upstream at http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off from a bug report for i845). For now, we mark all automatically reported GPU lockups on i855 as duplicates of this unless there is a reason not to. There are some tests you may do to help upstream with this issue, and I will come back with instructions here. For those of you who know how to patch and compile a kernel you may look at comment #30 (and #6 for what kind of feedback they want) in the upstream bug report. Actually, if someone could volunteer to build an ubuntu-packaged kernel with this patch for others to test, that would be nice.
>
> There is a similar master bug report for i845 at bug 541492.
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu-release-notes/+bug/541511/+subscribe
>

dan rhodes (daniel-r-rhodes) wrote :

What a disaster, as predicted. i915.modeset=1 works for me (though it didn't when I first had the problem last week).

Dana Olson (adolson) wrote :

As a follow-up, my system ran fine for a little bit with the fixes I mentioned in post #115, but then after a while the screen went black and I couldn't do anything but hold the power button in to power off.

The system wouldn't even reboot with Alt+SysRq - K, S, U, B...

This is not very fun - I am trying to set up Ubuntu for a Windows user, trying to convert her, but if the system randomly crashes like this, the conversion is bound to fail.

Filippo neri (fneri23) on 2010-05-01
Changed in ubuntu-release-notes:
status: Fix Released → In Progress
status: In Progress → Fix Committed
pvanderploeg (pieter-nescio) wrote :
Download full text (3.2 KiB)

I tried your procedure and it worked. Great. Thank you very much.

2010/5/1 Elena M. Lopez ("Nena") <email address hidden>

> @pvanderploeg,
> I have the same issue with my D400 with the same graphics card. I just
> tried the workaround listed in posts #33 and #34 in the following thread,
> and it worked for me (
> http://ubuntuforums.org/showthread.php?t=1465883&page=4).
>
> Basically, what I did was as follows:
>
> Step 1: When booting into the live CD, I pressed Tab, and added the
> i915.modeset=1 option to the boot command line.
> Refer to thread/post:
> http://ubuntuforums.org/showpost.php?p=9203466&postcount=33
>
> Step 2: I installed Lucid with no problems, but when restarted, I of
> course ran into the same boot issue. So, I needed to make the changes in
> step 1 permanent.
>
> Step 3: I restarted into the live environment again (as in step 1), mounted
> my local root partition, and as root, did two things in the /etc/ directory.
> a) permanently added "i915 modeset=1" (w/o quotes) to the grub
> configuration file (/etc/default/grub).
> Refer to thread/post:
> http://ubuntuforums.org/showpost.php?p=9171045&postcount=11 (as redirected
> from http://ubuntuforums.org/showpost.php?p=9203466&postcount=34)
>
> b)created the configuration file /etc/modprobe.d/i915-kms.conf and added
> the line "options i915 modeset=1" (w/o quotes) according to the first
> workaround listed in this thread:
> https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes
>
> And it now everything works. Finally. And life is good again. If you have
> any questions, feel free to PM me through launchpad. Although I am not an
> expert, I can at least explain what worked for me, although I couldn't tell
> you why. :)
> ~Elena
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Ubuntu Release Notes: Fix Released
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help
> manage other bug reports.
>
> Most bug reports on i855 are probably due to the CPU/GPU incoherency
> problem that is now consolidated upstream at
> http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off
> from a bug report for i845). For now, we mark all automatically reported GPU
> lockups on i855 as duplicates of this unless there is a reason not to. There
> are some tests you may do to help upstream with this issue, and I will come
> back with instructions here. For those of you who know how to patch and
> compile a kernel you may look at comment #30 (and #6 for what kind of
> feedback they want) in the upstream bug report. Actually, if someone could
> volunteer to build an ubuntu-packaged kernel with this patch for others to
> test, that would be nice.
>
> There is a similar master bug report for i845 at bug 541492.
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launc...

Read more...

Ciccio (franapoli) wrote :

On my portege m100 (82852/855GM) the following worked:

1) upgraded kernel to 2.6.34-020634rc5-generic
2) removed "nomodeset" from kernel boot parameters

Both step were necessary.
Thank you everyone for the hints.

N8N (njnagel) wrote :

can confirm that this bug still exists on final release version of 10.04 (upgraded on the 29th from 9.10) previously functioning Gateway 7330GZ laptop with 82852/855GM graphics suddenly would go blank and freeze after GRUB screen. Could boot into safe mode. Options to reconfig graphics and troubleshoot problem in safe mode menu do not do anything. Removing and reinstalling xserver-xorg and xserver-xorg-video-intel (using default versions in synaptic) did not change anything Have temporarily copied contents of xorg.conf.failsafe to a new xorg.conf file (none previously existed?) until this can be resolved. Update manager shows no updates available as I type this.

N8N (njnagel) wrote :

Add'l info: re-enabling KMS allowed system to boot "normally" but locked up about 2 min. later. Am still running with xorg.conf.failsafe copied to xorg.conf until a real fix is released.

David Oser (mirmos192) wrote :

Sorry - commented in the duplicate bug... Latest proposed update (lucid-proposed) seems to fix the main problem of non-bootability due to black screen; but it does not fix the (associated?) problem of Cheese Webcam Booth crashing the system. At least Lucid Linux version xx.x.22 is now usable without workarounds for those of us with this particular type of display chipset
David

pvanderploeg (pieter-nescio) wrote :

re: entry nr 123 where I wrote that I followed the procedure proposed by Elena.. I can now indeed boot the system succesfully. When I play an embedded videofile in Firefox however my system crashes, showing a few lines of text, blank(black) screen, few lines, b lank, etc etc. I have to press the on/off button to restart the system. I had ubuntu-restricted-extras installed btw.
When I try the same videofile on a very old laptop (EVO N600c, not a intel 855 graphics card) with a fresh install of lucid and also ubuntu-restricted-extras, the system reboots spontaneously. On the EVO I did not have to perform the "i955.modeset=1" procedure btw (see the original entry by Elena mentioned in entry nr 123). It booted from the livecd and I could install lucid without problems.
Enabling KMS mode does not solve all problems with the i855 card or so it seems. The developers probably had a reason to disable it.

greeneagle (ms-anoranza) wrote :

I Agree. Problem still exists (Fujitsu-Siemens amilo with i855). I have reverted all workarounds and will wait for final fix.

So far so good on my end with the workaround I detailed in #118. No problems with extended use, except when I attempted to hook up my external LCD monitor, the display wasn't functioning properly (lots of flashing and blinking). But, I haven't tried to troubleshoot that at all yet.

lispy (janietz) wrote :

Elena, do you have 3D acceleration? I see you turned modesetting off, this caused the loss of 3D on my end.

tags: removed: patch
pvanderploeg (pieter-nescio) wrote :

Re #130, Elena: did you try to play the video in this link? http://player.omroep.nl/?aflID=10913905
Its a documentary on Dutch TV about the voyage of the Beagle and Charles Darwin. It starts with a small ad, and before the documentary starts my D400 wih Lucid crashes.

timosha (timosha) wrote :

@pvanderploeg

I can confirm this crash on my Thinkpad R51.

> --- Comment #136 from René Gabriëls <email address hidden> 2010-04-19 18:23:43 PDT ---
> However, my system crashes when starting doomsday or warzone2100 (both OpenGL
> games). I hadn't noticed this before, and is probably unrelated to this
> coherency bug. Where should I file this bug report? Mesa? Dmesg says:
>
> [drm:i915_gem_do_execbuffer] *ERROR* Invalid object handle 48 at index 0
>
> X log says:
>
> [ 5132.404] (EE) intel(0): Failed to submit batch buffer, expect rendering
> corruption or even a frozen display: Bad file descriptor.

Looks like an (unrelated) bug in xf86-video-intel - it's submitting a
batchbuffer with a corrupt object/reloc table.

@pvanderploeg,
An emphatic yes, I can replicate your described crash on my D400 when playing the embedded video in FF that you linked to. I was able to see the ad, but as soon as the video loaded up, my screen dropped to the black screen with the single blinking cursor, and I had to force a shutdown of the system. I've been able to watch other (flash) videos in FF fine before I tried this embedded video. Hopefully a full fix is forthcoming.

@lispy,
Yes, 3D acceleration is enabled.
glxinfo | grep direct --> "direct rendering = yes"
glxgears --> outputs avg. of ~1200 fps

@Elena M. Lopez ("Nena")
OK. Thanks. There is more to it than just the I855 graphics card I'm afraid.
When I run that link on a (very old) Compaq Evo N600c with Lucid installed,
the system reboots spontaneously. And the EVO does not have an I855 graphics
card.
Hope also that a full fix is fortcoming.
Regards
Pieter van der Ploeg

2010/5/5 Elena M. Lopez ("Nena") <email address hidden>

> @pvanderploeg,
> An emphatic yes, I can replicate your described crash on my D400 when
> playing the embedded video in FF that you linked to. I was able to see the
> ad, but as soon as the video loaded up, my screen dropped to the black
> screen with the single blinking cursor, and I had to force a shutdown of the
> system. I've been able to watch other (flash) videos in FF fine before I
> tried this embedded video. Hopefully a full fix is forthcoming.
>
> @lispy,
> Yes, 3D acceleration is enabled.
> glxinfo | grep direct --> "direct rendering = yes"
> glxgears --> outputs avg. of ~1200 fps
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Ubuntu Release Notes: Fix Committed
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help
> manage other bug reports.
>
> Most bug reports on i855 are probably due to the CPU/GPU incoherency
> problem that is now consolidated upstream at
> http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off
> from a bug report for i845). For now, we mark all automatically reported GPU
> lockups on i855 as duplicates of this unless there is a reason not to. There
> are some tests you may do to help upstream with this issue, and I will come
> back with instructions here. For those of you who know how to patch and
> compile a kernel you may look at comment #30 (and #6 for what kind of
> feedback they want) in the upstream bug report. Actually, if someone could
> volunteer to build an ubuntu-packaged kernel with this patch for others to
> test, that would be nice.
>
> There is a similar master bug report for i845 at bug 541492.
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu-release-notes/+bug/541511/+subscribe
>

Created an attachment (id=35450)
failure after starting xfce4-panel

I have just pulled drm-intel, recompiled it (without patch v8, which seems to be already there) and then I can no more use Xorg, I instantly get these errors when starting XFCE:

intel_bufmgr_gem.c:1052: Error setting domain 69: Input/output error
intel_bufmgr_gem.c:1052: Error setting domain 65: Input/output error
intel_bufmgr_gem.c:1052: Error setting domain 89: Input/output error

and then the usual waterfall of I/O errors. I am on vesa now.

Versions of my packages:

xorg-server 1.7.6
libdrm 2.4.19
xf86-video-intel 2.10.0

Looks like Arch Linux hasn't yet upgraded these, nor I am able to run a freedesktop git development stack

(In reply to comment #151)
> Created an attachment (id=35450) [details]
> failure after starting xfce4-panel
>
> I have just pulled drm-intel, recompiled it (without patch v8, which seems to
> be already there) and then I can no more use Xorg, I instantly get these errors
> when starting XFCE:
>
> intel_bufmgr_gem.c:1052: Error setting domain 69: Input/output error
> intel_bufmgr_gem.c:1052: Error setting domain 65: Input/output error
> intel_bufmgr_gem.c:1052: Error setting domain 89: Input/output error
>
> and then the usual waterfall of I/O errors. I am on vesa now.
>
> Versions of my packages:
>
> xorg-server 1.7.6
> libdrm 2.4.19
> xf86-video-intel 2.10.0
>
> Looks like Arch Linux hasn't yet upgraded these, nor I am able to run a
> freedesktop git development stack

All the new stuff is in the testing repository.

(In reply to comment #152)
> (In reply to comment #151)
> > Looks like Arch Linux hasn't yet upgraded these, nor I am able to run a
> > freedesktop git development stack
>
> All the new stuff is in the testing repository.

I got the new testing packages:

xf86-video-intel 2.11.0-1
xorg-server 1.8.902-1
libdrm 2.4.20-2
mesa 7.8.1-2

And exactly the same crash at startup. Some regression here?

timosha (timosha) wrote :

@vanderploeg - #132

No crashes anymore when I play the video when using the mainline kernel:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.33.3-lucid/

@timosha
thanks. good news. could you please tell me how to install this kernel??

2010/5/6 timosha <email address hidden>

> @vanderploeg - #132
>
> No crashes anymore when I play the video when using the mainline kernel:
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.33.3-lucid/<http://kernel.ubuntu.com/%7Ekernel-ppa/mainline/v2.6.33.3-lucid/>
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Ubuntu Release Notes: Fix Committed
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help
> manage other bug reports.
>
> Most bug reports on i855 are probably due to the CPU/GPU incoherency
> problem that is now consolidated upstream at
> http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off
> from a bug report for i845). For now, we mark all automatically reported GPU
> lockups on i855 as duplicates of this unless there is a reason not to. There
> are some tests you may do to help upstream with this issue, and I will come
> back with instructions here. For those of you who know how to patch and
> compile a kernel you may look at comment #30 (and #6 for what kind of
> feedback they want) in the upstream bug report. Actually, if someone could
> volunteer to build an ubuntu-packaged kernel with this patch for others to
> test, that would be nice.
>
> There is a similar master bug report for i845 at bug 541492.
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu-release-notes/+bug/541511/+subscribe
>

timosha (timosha) wrote :

@vanderploeg

Just download the packages for your architecture (386 or amd64) + the sources + the all.deb . Put them in a directory. Open a terminal, go to the directory and type "sudo dpkg -i *.deb"

pvanderploeg (pieter-nescio) wrote :

@timosha
thanks a lot. will try that asap and let you know.

2010/5/6 timosha <email address hidden>

> @vanderploeg
>
> Just download the packages for your architecture (386 or amd64) + the
> sources + the all.deb . Put them in a directory. Open a terminal, go to
> the directory and type "sudo dpkg -i *.deb"
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Ubuntu Release Notes: Fix Committed
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help
> manage other bug reports.
>
> Most bug reports on i855 are probably due to the CPU/GPU incoherency
> problem that is now consolidated upstream at
> http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off
> from a bug report for i845). For now, we mark all automatically reported GPU
> lockups on i855 as duplicates of this unless there is a reason not to. There
> are some tests you may do to help upstream with this issue, and I will come
> back with instructions here. For those of you who know how to patch and
> compile a kernel you may look at comment #30 (and #6 for what kind of
> feedback they want) in the upstream bug report. Actually, if someone could
> volunteer to build an ubuntu-packaged kernel with this patch for others to
> test, that would be nice.
>
> There is a similar master bug report for i845 at bug 541492.
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu-release-notes/+bug/541511/+subscribe
>

> --- Comment #151 from legolas558 <email address hidden> 2010-05-05 23:46:28 PDT ---
> I have just pulled drm-intel, recompiled it (without patch v8, which seems to
> be already there) and then I can no more use Xorg, I instantly get these errors
> when starting XFCE:

Nope the patch is not yet there, at least not yet fully. So it's expected
that the kernel you've tested is rather crash-happy ;)

I've hoped that a few patches more would go in before I rebase, but atm
stuff is stalling. I'll post a rebased version of the patch asap.

David Oser (mirmos192) wrote :

Or... Hat tip to Alban #54 over at https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/568779 ... but slightly modified because it's now rc6. If you have some kind of Lucid desktop, you won't need 1-4, so start at 5.

Alban wrote:
"1. After power on your PC, press shift (keep press) until see boot loader menu. Choose a recovery mode option.
2. On next step select start failsafeX session.
3. Choose OK when advert for poor resolution.
4. Login in, your are now on your desktop
5. Open a terminal command line (Application > Accessories -> terminal)
6. Type this command and press enter (without quotes):
"sudo -i" and enter your password.
7. Type this command and press enter (without quotes):
"cd /tmp"
8. Type this command and press enter (without quotes):
"wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.34-rc6-lucid/linux-image-2.6.34-020634rc5-generic_2.6.34-020634rc6_i386.deb"
9. Type this command and press enter (without quotes):
"dpkg -i linux-image-2.6.34-020634rc6-generic_2.6.34-020634rc6_i386.deb"
10. Type this command and press enter (whitout quotes):
"apt-get update && apt-get upgrade"
11 . Type this command and press enter (whitout quotes):
"init 6"

Your computer restarts with new kernel and intel driver updated and, hopefully, works without freeze. "

... And it does!

David Oser (mirmos192) wrote :

.... Woops! My error - where you see 'rc5' still in 8. above, replace with 'rc6'

pvanderploeg (pieter-nescio) wrote :

@Elena M. Lopez ("Nena")
@timosha
@David Oser

I applied the procedure, starting with step, 5 outlined by David Oser in #140 and #141 to install the mainline rc6 kernel. The system boots without problems and the video link mentioned in #132 plays without any problems.
Its great.
Thank you all very much.
Regards.

tags: added: patch
Steve Langasek (vorlon) on 2010-05-08
Changed in ubuntu-release-notes:
status: Fix Committed → Fix Released
Alban (seza) wrote :

I want to say a big thank you to all who sends me private message with happiness that their pc work again with my method.

I would like to emphasized that the way was manage the various bug related to the intel graphics card is absolutely deplorable and lamentable. It's amazing that no official fix resolution method is available, only a dark page on a wiki that gives advice for power user like to enable the KMS (https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes).

Everywhere I see the news of ubuntu and I read that Ubuntu is pleased to have done a major release without a problem... It's a shame.
Leaving users without specific knowledge in the most total blackout, at the time of ubuntu becomes large public and affecting more and more ordinary users, is a testament to the incredible voluntary of not show this problem to the public.
For fear of losing users?

I do not know, I just know that me, normal user, I drop ubuntu because I am ashamed, ashamed to work on a platform that lets its users with a black screen.
I'm going back to Debian, really stable distribution, with people works seriously when we report a bug, and now I will point to as many people around me that I converted to ubuntu not to use it and I suggest them to use Debian right now.

Ubuntu makes me pity.
I just want to add this, English is not my language and not very good in with it. This is with a big pain and difficulty I wrote here and try to tell you my idea, help people with my little possibility.

No more pain now... I use ubuntu since 2005, and you canonical, you managed to make me go away, not for this technical issue, but for ethical reasons.

Download full text (3.7 KiB)

@Alban I am sorry to hear you are disappointed. All I can say is that I
know there are a lot of people working hard to fix bugs and solve problems
and that a 6-month release cycle is putting a heavy burden and a lot of
strain on the developers.

What I do with new versions of any sofware I have been using since 1975 is
not use the x.0 version for production but wait for a x.1 release, and in
the meantime try to help as much as I can to solve issues with the x.0
version.

Regards

2010/5/8 Alban <email address hidden>

> I want to say a big thank you to all who sends me private message with
> happiness that their pc work again with my method.
>
> I would like to emphasized that the way was manage the various bug
> related to the intel graphics card is absolutely deplorable and
> lamentable. It's amazing that no official fix resolution method is
> available, only a dark page on a wiki that gives advice for power user
> like to enable the KMS
> (https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes).
>
> Everywhere I see the news of ubuntu and I read that Ubuntu is pleased to
> have done a major release without a problem... It's a shame.
> Leaving users without specific knowledge in the most total blackout, at the
> time of ubuntu becomes large public and affecting more and more ordinary
> users, is a testament to the incredible voluntary of not show this problem
> to the public.
> For fear of losing users?
>
> I do not know, I just know that me, normal user, I drop ubuntu because I am
> ashamed, ashamed to work on a platform that lets its users with a black
> screen.
> I'm going back to Debian, really stable distribution, with people works
> seriously when we report a bug, and now I will point to as many people
> around me that I converted to ubuntu not to use it and I suggest them to use
> Debian right now.
>
> Ubuntu makes me pity.
> I just want to add this, English is not my language and not very good in
> with it. This is with a big pain and difficulty I wrote here and try to tell
> you my idea, help people with my little possibility.
>
> No more pain now... I use ubuntu since 2005, and you canonical, you
> managed to make me go away, not for this technical issue, but for
> ethical reasons.
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Ubuntu Release Notes: Fix Released
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help
> manage other bug reports.
>
> Most bug reports on i855 are probably due to the CPU/GPU incoherency
> problem that is now consolidated upstream at
> http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off
> from a bug report for i845). For now, we mark all automatically reported GPU
> lockups on i855 as duplicates of this unless there is a reason not to. There
> are some tests you may do to help ...

Read more...

Brian Rogers (brian-rogers) wrote :

Alban, there is warning of this bug in the release notes: http://www.ubuntu.com/getubuntu/releasenotes/1004#Intel%208xx%20X%20freezes/crashes

It's also a bug in the upstream code that every distro pulls from, so it's going to affect every linux distro in existence. You can't escape it by switching to another distro. But since it's so sensitive to timing, just about any change you make can affect whether the bug is frequent or rare.

But the bug still exists and your system could freeze at any moment, up until the final fix is released.

Alban (seza) wrote :

You've both rights !

@pvanderploeg : Yes I'm sure developer have hard working, this is not their fault. I point to how ubuntu communicate about. Here this is not a x.0 version it's a Long Term Support stable version. Not the same context.

@Brian Rogers : Yes, that's true, but have you count number of link to this page ? I found just 1 link to the release note into the "how to upgrade" page. They're lot of link to "get ubuntu", "upgrade notes" etc... but just only 1 to the release note. This is a little sarcastic...

nomnex (nomnex) wrote :

Can somebody from the bug team explain me the "Fix Released" for Unbuntu release notes? And why the bug is still unassigned (but marked as high) but to one team? Here is a copy of the last comment on FreeDesktop Bugzilla – Bug 27187 (reported in 2003; this is not yesterday)

<blockquote>2010-05-07 (from Daniel Vetter)
Nope the patch is not yet there, at least not yet fully. So it's expected
that the kernel you've tested is rather crash-happy ;)
I've hoped that a few patches more would go in before I rebase, but atm
stuff is stalling. I'll post a rebased version of the patch asap.</blockquote>

I don't like to sound negative, but it's not tomorrow <ins>unluncky</ins> owners of a 855 Intel gpu can expect a patch available in the repository (10.04 LTS or not).

Hi Daniel,

I just wanted to ask if there are any news on fixing the performance problems when using your patch?

I'm using your patch on the latest Lucid-kernel (Backported by comparing and copy'n pasting every singe line) and my notebook now works perfectly stable, even with the older Lucid intel-drivers (In UMS and KMS-mode)

P.S. :

I've disabled “intel_wait_for_canary_flocks” after the flush in the function "intel_i830_chipset_flush". This heavily improves the performance and only had one single crash in the last days (Which might related to another bug, because i could switch to console and restart the systems without problems).

Stenten (stenten) wrote :

"ubuntu-release-notes" is a tag for the Release Notes for Lucid (http://www.ubuntu.com/getubuntu/releasenotes/1004). Marking "Fix Released" means that this bug is already mentioned in the Release Notes.

nomnex (nomnex) wrote :

Everest, thanks.

Created an attachment (id=35546)
v8 patch rebased against latest drm-intel (anholt repository)

(In reply to comment #154)
> > --- Comment #151 from legolas558 <email address hidden> 2010-05-05 23:46:28 PDT ---
> > I have just pulled drm-intel, recompiled it (without patch v8, which seems to
> > be already there) and then I can no more use Xorg, I instantly get these errors
> > when starting XFCE:
>
> Nope the patch is not yet there, at least not yet fully. So it's expected
> that the kernel you've tested is rather crash-happy ;)
>
> I've hoped that a few patches more would go in before I rebase, but atm
> stuff is stalling. I'll post a rebased version of the patch asap.

I have made an attempt to rebase your patch vs latest drm-intel-next; I hope the result is good (some people on Arch Linux forums were asking me about the patch, so now I am pointing them to this bug tracker)

I can't say if it's due to my badly rebased patch or to some recent change to software, but Firefox persona's background is badly garbled, VLC shows a still image in place of the video overlay and videos played with mplayer have nice psychedelic glitches

Created an attachment (id=35548)
v9 against latest drm-intel-next

Sorry for the delay, but I want to test new patches a little before posting (especially now that quite a few people are on this bugs cc list). Changes vs v8:

- rebased against latest drm-intel-next (patch shrunk quite decently, yeah!).
- increased the gtt flock size to 16 pages. Perhaps this helps.

Plans going forward:
- I haven't yet started on the performance work. I don't really like mucking around in a very delicate and hard to debug part of gem. So I still hope that this problem somehow magically fixes itself ;) More honestly: Correctnes first, performance later (and if I'm very lucky, other ongoing work by other people will make this much easier).
- Merging plans: Due to the (new) failures reported with v8 I'm reluctant to submit the patch as-is. I'm definitely pushing everything up to the cache coherency checker for inclusion into -next (already submitted). But the actual fix probably needs to wait some more.
- I haven't yet had time to research/implement the RAM bank idea by rainy6144.

(In reply to comment #158)
> Created an attachment (id=35548) [details]
> v9 against latest drm-intel-next
>
> Sorry for the delay, but I want to test new patches a little before posting
> (especially now that quite a few people are on this bugs cc list). Changes vs
> v8:
>
> - rebased against latest drm-intel-next (patch shrunk quite decently, yeah!).
> - increased the gtt flock size to 16 pages. Perhaps this helps.
>
The Firefox persona's background glitch is still there, I strongly think that it can be a new bug in intel driver or libdrm.

> - Merging plans: Due to the (new) failures reported with v8 I'm reluctant to
> submit the patch as-is. I'm definitely pushing everything up to the cache
> coherency checker for inclusion into -next (already submitted). But the actual
> fix probably needs to wait some more.
I think the patch should get critical priority even as-is because the vanilla kernel (also drm-intel-next) crashes in a few seconds without it.

Download full text (4.7 KiB)

Didn't take long:

[ 2111.864905] WARNING: at /home/indan/src/linux-2.6/drivers/char/agp/intel-gtt.c:1007 intel_i830_chipset_flush+0x2e3/0x32d()
[ 2111.864912] Hardware name: 2371GHG
[ 2111.864917] i8xx chipset flush failed, expected: 118451, cpu_read: 117939
[ 2111.864922] Modules linked in: pl2303 usbserial usb_storage uhci_hcd ehci_hcd usbcore
[ 2111.864940] Pid: 788, comm: X Not tainted 2.6.34-rc6-v9 #52
[ 2111.864945] Call Trace:
[ 2111.864956] [<c101ea80>] ? warn_slowpath_common+0x5d/0x70
[ 2111.864964] [<c101eac6>] ? warn_slowpath_fmt+0x26/0x2a
[ 2111.864973] [<c1141b1b>] ? intel_i830_chipset_flush+0x2e3/0x32d
[ 2111.864984] [<c113da68>] ? agp_flush_chipset+0xc/0xd
[ 2111.864994] [<c115bdae>] ? i915_gem_flush+0x1a/0xbb
[ 2111.865003] [<c115fae5>] ? i915_gem_do_execbuffer+0x9bb/0xe3f
[ 2111.865023] [<c115d187>] ? i915_gem_object_set_to_gtt_domain+0x33/0x5c
[ 2111.865032] [<c116004d>] ? i915_gem_execbuffer2+0xe4/0x164
[ 2111.865041] [<c114826f>] ? drm_ioctl+0x1cf/0x27a
[ 2111.865049] [<c115ff69>] ? i915_gem_execbuffer2+0x0/0x164
[ 2111.865060] [<c10695d9>] ? do_sync_read+0x9d/0xd2
[ 2111.865069] [<c11480a0>] ? drm_ioctl+0x0/0x27a
[ 2111.865078] [<c1073701>] ? vfs_ioctl+0x1c/0x7d
[ 2111.865086] [<c1073c97>] ? do_vfs_ioctl+0x478/0x4bc
[ 2111.865096] [<c1030e62>] ? hrtimer_try_to_cancel+0x43/0x60
[ 2111.865105] [<c1021c12>] ? do_setitimer+0xa4/0x17f
[ 2111.865113] [<c1021d35>] ? sys_setitimer+0x48/0x73
[ 2111.865121] [<c1034f41>] ? ktime_get_ts+0xb3/0xbb
[ 2111.865129] [<c1073d08>] ? sys_ioctl+0x2d/0x44
[ 2111.865138] [<c10025d0>] ? sysenter_do_call+0x12/0x26
[ 2111.865144] ---[ end trace d90ca0d623dcc2a3 ]---
[ 2934.051532] ------------[ cut here ]------------
[ 2934.051547] WARNING: at /home/indan/src/linux-2.6/drivers/char/agp/intel-gtt.c:1007 intel_i830_chipset_flush+0x2e3/0x32d()
[ 2934.051551] Hardware name: 2371GHG
[ 2934.051554] i8xx chipset flush failed, expected: 156295, cpu_read: 155783
[ 2934.051557] Modules linked in: pl2303 usbserial usb_storage uhci_hcd ehci_hcd usbcore
[ 2934.051569] Pid: 788, comm: X Tainted: G W 2.6.34-rc6-v9 #52
[ 2934.051572] Call Trace:
[ 2934.051580] [<c101ea80>] ? warn_slowpath_common+0x5d/0x70
[ 2934.051584] [<c101eac6>] ? warn_slowpath_fmt+0x26/0x2a
[ 2934.051589] [<c1141b1b>] ? intel_i830_chipset_flush+0x2e3/0x32d
[ 2934.051596] [<c113da68>] ? agp_flush_chipset+0xc/0xd
[ 2934.051602] [<c115bdae>] ? i915_gem_flush+0x1a/0xbb
[ 2934.051607] [<c115fae5>] ? i915_gem_do_execbuffer+0x9bb/0xe3f
[ 2934.051614] [<c116620f>] ? intel_mark_busy+0x9b/0x177
[ 2934.051619] [<c115d187>] ? i915_gem_object_set_to_gtt_domain+0x33/0x5c
[ 2934.051624] [<c116004d>] ? i915_gem_execbuffer2+0xe4/0x164
[ 2934.051629] [<c114826f>] ? drm_ioctl+0x1cf/0x27a
[ 2934.051634] [<c115ff69>] ? i915_gem_execbuffer2+0x0/0x164
[ 2934.051641] [<c1007a62>] ? restore_i387_fxsave+0x4c/0x5c
[ 2934.051647] [<c1034fa4>] ? ktime_get+0x5b/0xcf
[ 2934.051652] [<c11480a0>] ? drm_ioctl+0x0/0x27a
[ 2934.051658] [<c1073701>] ? vfs_ioctl+0x1c/0x7d
[ 2934.051662] [<c1073c97>] ? do_vfs_ioctl+0x478/0x4bc
[ 2934.051669] [<c1031655>] ? hrtimer_start+0xd/0x11
[ 2934.051674] [<c1021c91>] ? do_setitimer+0x123/0x17f
...

Read more...

Oh, forgot to mention: The above failed flush was without having done a suspend.

>>I have the same problem with my Satellite A50 laptop, which has intel 82852/855GM GPU.
$ lsb_release -rd
Description: Ubuntu 10.04 LTS
Release: 10.04 #RC edition, clearly installed
$ dpkg -l xserver-xorg-video-intel
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
+++-==============-==============-============================================
ii xserver-xorg-v 2:2.9.1-3ubunt X.Org X server -- Intel i8xx, i9xx display d

$ uname -a
Linux 2.6.32-21-generic #32-Ubuntu SMP Fri Apr 16 08:10:02 UTC 2010 i686 GNU/Linux>>

-----------------------------------------

I have precisely the same machine as the poster above and experienced precisely the same problem after upgrading to 10.04 using Update Manager.

I've installed all updates since full release and problem still not solved:

Machine boots to initial Ubuntu splash screen for about .35 sec then goes black. Only way to escape is to hold down power button. Can boot using "low-res graphic mode" but system eventually locks up. Also of note, on this dual boot Windows XP system, after a short while of loading XP, it also locks up and goes to BSOD. Never done that previously.

trikke (patrik-uytterhoeven) wrote :

yup

same problem happend to me on my lenovo netbook with intel 945GM chipset

i think this sucks for linux users who uses ubuntu for the 1st time especially knowing this is a LTS release it should have been postponed till this was fixed its a major showstopper for a first time user.

even mickeysoft 7 has no big showstoppers like this atleast they understood after vista how to do ....

On Fri, May 14, 2010 at 10:30:03AM -0000, trikke wrote:
> same problem happend to me on my lenovo netbook with intel 945GM chipset

No, it did not. This is a bug about a problem with the i855 chipset.
Please file a separate bug report for the issue you encountered.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

(In reply to comment #158)
> Due to the (new) failures reported with v8 I'm reluctant to submit the
> patch as-is.

Patch v8 performed well on my 852GME. In three weeks of regular usage and limited stress testing the kernel reported exactly one flush failure. I did not notice any slowdown compared to unpatched kernels, and there is no measurable difference as reported in comment #122.
Patch v9 is just as good of course.

(In reply to comment #140)
> I'll look into allocating the pages as one big chunk (ie higher order
> alloc)

There is a decent chance that the pages are already allocated in one chunk. At least on my machine the pages happen to be allocated consecutively. (verified with page_to_phys(intel_private.i8xx_pages[i]))
You could make your next patch print the physical addresses. If other machines behave like mine but still show more failures, then you do not need to bother with explicit higher order allocations.

David Oser (mirmos192) wrote :

@Brian Rogers (#145) - I may be misunderstanding the word 'distro', but Peppermint Linux (otherwise ugh-for-me) is not affected (kernel 2.6.32-22, if I remember correctly) and rcs 5,6.7 of Mainline Lucid version is also not affected - at least, they work fine with my i855 display.
@Steve Langasek (#152) - the same basic symptom is evident on a number of intel display chipsets. Does it make sense to file them under separate bug numbers? I know programming is an iterative process, but sometimes it makes sense to take an overview.

Now for a bit of a rant, if you'll forgive me. Just a glance through the normal forums indicates that there are many many ordinary people affected by this bug... to the extent that they cannot upgrade from the excellent 9.10 version without very complicated (for normal people) workarounds, nor install Lucid cleanly for the first time. I really think the release of Lucid before these bugs were fixed (and they certainly were not evident in the excellent early Lucid alphas - so we are talking of regression, here) was a serious mistake. I know open source development is a labour of love, and many people have devoted enormous numbers of hours to Ubuntu Lucid, without any financial reward for their efforts. I certainly do not blame them at all, as some are still hard at work trying to solve this particular problem. But somebody made a decision to respect some 'magic date' for full release, and, in my opinion, that someone made a serious error. This kind of thing tends to bring the concept of open-source software into disrepute. Is it intended for the masses, or just a geeky few?

Best
David

Brian Rogers (brian-rogers) wrote :

If Ubuntu releases were held up by this kind of bug, Karmic wouldn't be out yet. I helped reinstall Jaunty for someone with an i845 that was locking up on Karmic, for example. And in retrospect, that system had graphical issues even before Jaunty, such as freezing on the splash screen until a VT switch on some releases, and a rare lockup that was unexplained at the time.

i8xx issues have been lurking behind the scenes for a long time now, and the factors that caused the bug to surface are complex. You might find a way to stop the freezing for yourself, but that doesn't mean it will work for other people, and it might even break someone's working setup. So there's really no good fix other than wait for upstream's final version of the fix to be committed. Which hopefully will happen soon. And then it will be backported to Lucid's kernel.

Alban (seza) wrote :

if intel driver for ixx is so unstable why not put it into blacklisted driver list and let advanced users take their own risks by activating it and let basic user profit to their desktop with a simple vesa driver in 2D ? Why not implement a mechanism like proprietary drivers ?

The main problem now, is not how to fix all theses bugs. We know developers work about them. It's learn about this case and take a decision for the next release. Decisions makers should say something like that: "Yes we took a bad decision and made a big mistake with this release about intel ixx driver. We'll now reflect about what to do in the future..."

But I doubt we never had to heard something like that.

Stenten (stenten) wrote :

"The main problem now, is not how to fix all theses bugs. We know developers work about them. It's learn about this case and take a decision for the next release. Decisions makers should say something like that: "Yes we took a bad decision and made a big mistake with this release about intel ixx driver. We'll now reflect about what to do in the future...""

This is because there's really no reason for UMS to cause the breakage that it currently is. The devs are very perplexed by this, and there's really no way for them to have predicted this outcome. It's like choosing to boot in recovery mode and then that causing a kernel panic.

So I don't really know what you expect them to do better next time outside of looking into the future before making decisions. Hindsight is always 20/20, right? ;)

Brian Rogers (brian-rogers) wrote :

They did blacklist KMS support for i8xx in an attempt to fix this, but that didn't work for everyone. I believe reverting to vesa was considered, but that has its own regressions including bad performance, lack of multi-monitor support, and often even a lack of support for a laptop's native resolution. And those regressions would be inflicted on all i8xx users, not just those that would encounter this bug.

So there were no good options. I'm sure if they knew 100% of i8xx users were affected, they'd switch to vesa. But there's no way to know that. It could be 5% and we wouldn't be able to tell the difference, since only affected users come here and report it.

LordTroy20 (troymarshall20) wrote :

i haven't read everyone comment so not sure if this is fully relevant but some information posted here led me to a fix for my intel issue.

i followed the info on here. as shown on a comment. http://ubuntu-tutorials.com/2010/05/06/ubuntu-10-04-lucid-blank-screen-at-startup-workaround/

then i edited my xorg.conf to change the driver to intel (was using vesa because i couldn't get a display) so i assume thats what the GPU lock up people are referring too.

but after that i rebooted. it shows screen and enabled compiz effect so far no other issues.
If this helps someone awesome. if not oh well figure better get what i did that helped me out to maybe help another.

candtalan (aeclist) wrote :

affects a relative of mine who is a school administrator. had a very unfortunate affect of turning him away from ubuntu.

his laptop graphics:
lspci:
VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 01)
00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 01)

laptop is
toshiba
satellite pro A10

good luck with the work fix

Gustavo (gstv.inc) wrote :

ok, I hate my intel 82852/855GM GPU.But unfortunately I can not change it. Anyone can create a script to run on BOOT, who ask for one as the workaround like, workaround-1-2 workaround, workaround-3,and click on the option to try.And that people can test which one works, until it is solved?

My Not Is
HP DV1130us
intel 82852/855GM GPU

Or tel me how compile the Intel Corporation 82852/855GM Integrated Graphics Device on kernel o whatever ....Help!!!!!!!!!!!!!!!!!!!!!!!!!!!!

2all who need

unofficial Live-CD of Ubuntu 10.04
with updated Intel-drivers and 855gm-patched kernel-modules

and more:

http://glasen-hardt.de/

Am Montag, den 17.05.2010, 02:18 +0000 schrieb Gustavo:
> ok, I hate my intel 82852/855GM GPU.But unfortunately I can not change
> it. Anyone can create a script to run on BOOT, who ask for one as the
> workaround like, workaround-1-2 workaround, workaround-3,and click on
> the option to try.And that people can test which one works, until it is
> solved?
>
> My Not Is
> HP DV1130us
> intel 82852/855GM GPU
>
> Or tel me how compile the Intel Corporation 82852/855GM Integrated
> Graphics Device on kernel o whatever
> ....Help!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>

(In reply to comment #158)
> Created an attachment (id=35548) [details]
> v9 against latest drm-intel-next
>
> Sorry for the delay, but I want to test new patches a little before posting
> (especially now that quite a few people are on this bugs cc list). Changes vs
> v8:
>
> - rebased against latest drm-intel-next (patch shrunk quite decently, yeah!).
> - increased the gtt flock size to 16 pages. Perhaps this helps.

So far (5 days of testing) v9 works flawlessly here: no crashes or artefacts.

On Mon, 2010-05-17 at 06:46 +0000, Thoer wrote:

> unofficial Live-CD of Ubuntu 10.04
> with updated Intel-drivers and 855gm-patched kernel-modules
>
> http://glasen-hardt.de/

tested?

timosha (timosha) wrote :

On Mon, 2010-05-17 at 06:46 +0000, Thoer wrote:

> unofficial Live-CD of Ubuntu 10.04
> with updated Intel-drivers and 855gm-patched kernel-modules
>
> http://glasen-hardt.de/

tested?

Yes ! About 28 times and it works.

On Mon, 2010-05-17 at 07:29 +0000, timosha wrote:

> Yes ! About 28 times and it works.

Thank you timosha

Hello,
I do have a 855 chipset as well but unfortunately I am not an advanced user - could somebody explain in some short words how to install the Patch? Is there a git software necessary?
Thank you so much!! D.

Gustavo (gstv.inc) wrote :

ok If someone make a cd work CAN HE PLEEEEAAASE MAKE THE FILES available on someone server for like

wget ........
or sudo apt-get .........

what about this I found on site http://glasen-hardt.de/

sudo update-initramfs -u -k all

sudo apt-get install 855gm-fix-exp-dkms

------------------------
sudo add-apt-reposority ppa:glasen/855gm-fix
sudo apt-get update
sudo apt-get install dkms linux-headers-generic 855gm-fix-dkms
sudo update-initramfs -u (optional, siehe oben)

I'm NOT TRY Anything OF THIS !!

I'm still wait for help ,maybe someone can check this cd and try make the fix official For everyone just make the Update and fix all

Sorry

Gustavo (gstv.inc) wrote :

I don't wanna reinstall my system .......

(In reply to comment #164)
> Hello,
> I do have a 855 chipset as well but unfortunately I am not an advanced user -
> could somebody explain in some short words how to install the Patch? Is there a
> git software necessary?
> Thank you so much!! D.

The patch is against drm-next, so git is probably easiest.

# Get Linux git tree (takes a while):
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

cd linux-2.6

# Add the drm-intel-next branch from drm-intel:
git remote add -t drm-intel-next drm-intel-next git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel.git

# Change working dir to this new stuff:
git checkout drm-intel-next

# Apply patch:
patch --dry-run -p1 < ../fix-i855-cache-coherency-v9.patch

# If that succeeds redo without the --dry-run bit.

Good luck!

 *******************NO WARRANTY**************

READ: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

http://bugs.freedesktop.org/show_bug.cgi?id=27187
https://launchpad.net/~glasen/+archive/libdrm
https://launchpad.net/~glasen/+archive/855gm-fix
http://glasen-hardt.de/?p=568 (English)
http://glasen-hardt.de/?cat=8 (German)

++++++++++++++++++++++++++++++++++++++++++++
summarized from http://glasen-hardt.de/

# install the 855gm-patched kernel-modules

1. include the 855gm-patched kernel-modules
sudo add-apt-repository ppa:glasen/libdrm
sudo add-apt-reposority ppa:glasen/855gm-fix
2. update, upgrade repo
sudo apt-get update
sudo apt-get upgrade
3. install the 855gm-patched kernel-modules
sudo apt-get install dkms linux-headers-generic 855gm-fix-dkms

-------- OPTIONAL----------------------------
4. Plymouth to Initial-Ramdisk, repare colour
echo "FRAMEBUFFER=yes" | sudo tee /etc/initramfs-tools/conf.d/splash
sudo update-initramfs -u -k all
---------------------------------------------

# purge the 855gm-patched kernel-modules

1. wget
https://launchpad.net/~xorg-edgers/+archive/ppa/+files/ppa-purge_0.2.6_all.deb
2. sudo dpkg -i ppa-purge_0.2.6_all.deb
3. sudo ppa-purge ppa:glasen/intel-driver
4. sudo ppa-purge ppa:glasen/libdrm
5. sudo ppa-purge ppa:glasen/855gm-fix

Am Dienstag, den 18.05.2010, 02:38 +0000 schrieb Gustavo:
> I don't wanna reinstall my system .......
>

from
http://www.linux.com/community/forums?func=view&amp;catid=25&amp;id=5462
(1 Month, 1 Week ago)
-----------------------------------------------------------------------

Basically, the Intel Linux developers have decided to screw with the
i915 gpu Linux kernel driver which used to work for the Intel 855GM or
Intel 85x graphics chips. They have decided to drop ums (user mode
setting) for the driver without providing a working kms (Kernel mode
setting) alternative.
So in short, if you have an Intel 85x graphics card (Extremely common on
slightly older Pentium M Centrino Notebooks) You have practically zero
chance of using any current distribution release which uses a current
version of the Linux kernel and be able to use any kind of working
xserver (Say for instance to use a desktop, kde sc, gnome, xfce, etc.)

The last working Fedora kernel version is :
2.6.31.1-56.fc12.i686

I believe that any of the 2.6.30.x kernels should work on any
distribution.
The recent Ubuntu LTS support release does not suffer from the problem
because the Ubuntu developers identified the issue and marked it as a
regression. Unfortunately, upstream are either not interested in fixing
or reverting the regression or are having no real luck fixing it.
(From some bug tracker hunting, it looks like a combination of the two
with the person involved in pushing the regression completely ignoring
the problem.)

This issue has been known for and was reported over six months ago now.
The developer who pushes the updates to the i915 driver was told that
the commits he was about to push were a regression but still he pushed
them and it was merged anyway.

The problem manifests itself as a complete lock up when the x server has
started or shortly after the xserver has started.
No magic sysrq key combination or ctrl + alt + backspace key combination
achieves any kind of escape from the lock up and the only solution is to
power down the machine manually by holding down the power button.

So if you do own a machine which contains an Intel 85x graphics chip,
you may as well either buy a new machine without Intel graphics to
replace it or run an old distribution on it (And hope that it is ever
fixed. Which seems pretty unlikely at the moment.)
I myself have one of these machines and have now learned the lesson the
hard way, to never again buy a machine which contains Intel graphics
hardware to try to use in conjunction with the Linux kernel. I advise
others to do the same.

Download full text (4.6 KiB)

it's not entirely right. my Dell D505 works like a champ ( with compiz,
emerald). the workaround is to add i915 to boot configuration.

this probably loads proper kernel module. beyond that i did not have time
between now and then to hook it up to the debugger so there is generic
version running on my machine.

Linux host 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010
i686 GNU/Linux

there are a few minor bugs. one is there is no DVD play back on totem (this
relates to video and codecs) it simply blackscreens when I start totem...

On Tue, May 18, 2010 at 12:20 PM, Thoer <email address hidden> wrote:

> from
> http://www.linux.com/community/forums?func=view&amp;catid=25&amp;id=5462
> (1 Month, 1 Week ago)
> -----------------------------------------------------------------------
>
> Basically, the Intel Linux developers have decided to screw with the
> i915 gpu Linux kernel driver which used to work for the Intel 855GM or
> Intel 85x graphics chips. They have decided to drop ums (user mode
> setting) for the driver without providing a working kms (Kernel mode
> setting) alternative.
> So in short, if you have an Intel 85x graphics card (Extremely common on
> slightly older Pentium M Centrino Notebooks) You have practically zero
> chance of using any current distribution release which uses a current
> version of the Linux kernel and be able to use any kind of working
> xserver (Say for instance to use a desktop, kde sc, gnome, xfce, etc.)
>
> The last working Fedora kernel version is :
> 2.6.31.1-56.fc12.i686
>
> I believe that any of the 2.6.30.x kernels should work on any
> distribution.
> The recent Ubuntu LTS support release does not suffer from the problem
> because the Ubuntu developers identified the issue and marked it as a
> regression. Unfortunately, upstream are either not interested in fixing
> or reverting the regression or are having no real luck fixing it.
> (From some bug tracker hunting, it looks like a combination of the two
> with the person involved in pushing the regression completely ignoring
> the problem.)
>
> This issue has been known for and was reported over six months ago now.
> The developer who pushes the updates to the i915 driver was told that
> the commits he was about to push were a regression but still he pushed
> them and it was merged anyway.
>
> The problem manifests itself as a complete lock up when the x server has
> started or shortly after the xserver has started.
> No magic sysrq key combination or ctrl + alt + backspace key combination
> achieves any kind of escape from the lock up and the only solution is to
> power down the machine manually by holding down the power button.
>
> So if you do own a machine which contains an Intel 85x graphics chip,
> you may as well either buy a new machine without Intel graphics to
> replace it or run an old distribution on it (And hope that it is ever
> fixed. Which seems pretty unlikely at the moment.)
> I myself have one of these machines and have now learned the lesson the
> hard way, to never again buy a machine which contains Intel graphics
> hardware to try to use in conjunction with the Linux kernel. I advise
> others to do the same.
>
> --
> MASTER...

Read more...

caca (pjonniau) wrote :

just upgraded from karmic to lucyd on my dell inspiron 700m with intel graphic card: after boot it crashes nice.

The solution (using an old kernel) found on post #54 from this duplicate bug page
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/568779?comments=all

###############
1. After power on your PC, press shift (keep press) until see boot loader menu. Choose a recovery mode option.
2. On next step select start failsafeX session.
3. Choose OK when advert for poor resolution.
4. Login in, your are now on your desktop
5. Open a terminal command line (Application > Accessories -> terminal)
6. Type this command and press enter (without quotes):
"sudo -i" and enter your password.
7. Type this command and press enter (without quotes):
"cd /tmp"
8. Type this command and press enter (without quotes):
"wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.34-rc5-lucid/linux-image-2.6.34-020634rc5-generic_2.6.34-020634rc5_i386.deb"
9. Type this command and press enter (without quotes):
"dpkg -i linux-image-2.6.34-020634rc5-generic_2.6.34-020634rc5_i386.deb"
10. Type this command and press enter (whitout quotes):
"apt-get update && apt-get upgrade"
11 . Type this command and press enter (whitout quotes):
"init 6"
##############

worked fine for me. Still unfortunate to see a good old display problem i d encounter on edgy can resurface like this.

Stenten (stenten) wrote :

For anyone who cares, the above workaround is already documented at
https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes
and is updated with the current mainline kernel as well as containing the proper header files and 64-bit image.

Gustavo (gstv.inc) wrote :

> So if you do own a machine which contains an Intel 85x graphics chip,
> you may as well either buy a new machine without Intel graphics to
> replace it or run an old distribution on it (And hope that it is ever
> fixed. Which seems pretty unlikely at the moment.)
> I myself have one of these machines and have now learned the lesson the
> hard way, to never again buy a machine which contains Intel graphics
> hardware to try to use in conjunction with the Linux kernel. I advise
> others to do the same

*****************************************
This is right? All Intel based GPU (like 855 ...) PCs will not be supported by the developers? It's been some time since I only use Ubuntu and rarely use XP. I'm very accustomed to using only the Ubuntu, I did not want to stop using it, and is not in my plans to buy another PC ....

*****************************************************
OK let me ask something.

I have 3 different Kernel on grub at the Boot

the 2 new ones don't BOOT the stop on black screen
and the oldest one Boot and the PC work but i cant play videos on PC because they crash but I can play video on Youtube
on Firefox ,but I open Full screen of video from another site that i not remember and the PC crash again.

>OK on my newest Kernel I cant boot but i can go to recovery>failsafe>and i can login but without graphical,they not start the X
can i go on this and try the Thoer (#168) suggestion ?
This gonna Brooke my oldest work KERNEL?

If I try the Thoer (#168)They gonna delete my oldest Kernel?
because everitime they upgrade the kernel the keep the before kernel and purge the others oldest...or somthing like this and i have 3 maybe wen I type upgrade on terminal they gonna automatic purging the oldest?

Which way should I take?
try it on newest (not working Kernel) by command line...
or try this in the kernel that works?(My fear is it stop to work, since it at least I can boot using the PC and even with limitations)
Tell me about the risks .. Thanks for everything.Thanks for everything.
Thanks for everything.
Thanks for everything.
Thanks for everything.

(In reply to comment #164)
> Hello,
> I do have a 855 chipset as well but unfortunately I am not an advanced user -
> could somebody explain in some short words how to install the Patch? Is there a
> git software necessary?
> Thank you so much!! D.

if you are looking for updated packages this depends on your distribution.

For Ubuntu Lucid you will find
- updated kernel module (the module only) here:
  https://launchpad.net/~glasen/+archive/855gm-fix/
  (you have to remove this package when the bug is fixed in ubuntu kernel)
- or a whole kernel update here:
  https://launchpad.net/~dnjl/+archive/kernel/
  (which will be superseded/updated in the case its fixed in ubuntu kernel)

Also, dont forget to install all other provided updates!
On my systems no updates for drm or xorg-intel are needed anymore.

For other distrobutions I don't now...

(In reply to comment #166)
> (In reply to comment #164)
> > Hello,
> > I do have a 855 chipset as well but unfortunately I am not an advanced user -
> > could somebody explain in some short words how to install the Patch? Is there a
> > git software necessary?
> > Thank you so much!! D.
>
> if you are looking for updated packages this depends on your distribution.
>
> For Ubuntu Lucid you will find
> - updated kernel module (the module only) here:
> https://launchpad.net/~glasen/+archive/855gm-fix/
> (you have to remove this package when the bug is fixed in ubuntu kernel)
> - or a whole kernel update here:
> https://launchpad.net/~dnjl/+archive/kernel/
> (which will be superseded/updated in the case its fixed in ubuntu kernel)
>
> Also, dont forget to install all other provided updates!
> On my systems no updates for drm or xorg-intel are needed anymore.
>
> For other distrobutions I don't now...

For Debian Squeeze on i686 a kernel package is here:
http://www2.informatik.hu-berlin.de/~beier/tmp/linux-image-2.6.34gtt-fix-v9_2.6.34gtt-fix-v9-10.00.Custom_i386.deb

Brian Rogers (brian-rogers) wrote :

Is there 64-bit capable hardware with i8xx graphics? I just assumed it was too old for that. I'm going to post a 2.6.34 (final release) kernel with the latest patch for this bug to my PPA. If there are i8xx machines with 64-bit, I'll make sure to include a 64-bit build.

> --- Comment #160 from Indan Zupancic <email address hidden> 2010-05-11 15:33:24 PDT ---
> There's also a small copy&paste bug in your patch:
>
> for (i = 0; i < I830_CC_CANARY_FLOCK_PAGES; i++) {
> intel_private.i8xx_cpu_canary_pages[i]
> = kmap(intel_private.i8xx_pages[i+2]);
> if (!intel_private.i8xx_cpu_flush_page) {
> WARN_ON(1);
> intel_i830_fini_flush();
> return;
> }
> }
> That should be if (!intel_private.i8xx_cpu_canary_pages[i]).

Thanks for spotting this. Fixed in my local version.

> I don't understand this bit:
>
> /* Don't map the first page, we only write via its physical address
> * into it. */
> for (i = 0; i < I830_CC_DANCE_PAGES; i++) {
> writel(agp_bridge->driver->mask_memory(agp_bridge,
> page_to_phys(intel_private.i8xx_pages[i+1]), 0),
> intel_private.registers+I810_PTE_BASE+((num_entries+i)*4));
> }
>
> The first page is i8xx_cpu_flush_page, but if it isn't mapped, the gmch doesn't
> know about it, and intel_flush_mch_write_buffer() has no effect, has it? Or is
> any write at any address sufficient to fill the write buffer?

This just implements the gtt mapping. The direct mapping using physical
address is done a few lines before. And because
intel_flush_mch_write_buffer only needs a direct mapping, I've decided to
save on gtt page.

> We seem to have mysterious behaviour here, all the canary pages ended up
> coherent, but that one write somehow didn't?!
>
> I guess the gmch has a local cache that hides writes. If you know that cache's
> design (associativity etc.) then you can probably flush it out by doing a read
> or write to the right address. The canary stuff seems to work most of the time,
> so the cache can't be too big.

Well, that's exactly the problem. No one knows how it works exactly ...

> Or maybe you can flush it out by putting the chip in D1-3 and back to D0
> quickly, or something crazy like that.

That one probably takes even longer than what I'm doing here ...

Daniel Baumann (dnjl) wrote :

As I mentioned in post 102 i've build a default lucid kernel 2.6.32-22.33 with applied patches of Daniel Vetter (v8).
This will be found now here: https://launchpad.net/~dnjl/+archive/kernel
On my machines this works well for weeks without the need to install any more experimental stuff like drm-2.4.20 or xorg-intel-2.11.
Install this kernel-ppa and update all your packages and this should run...

David Oser (mirmos192) wrote :

@Thoer (#162 etc/usw).
Many, many thanks for that effort.

Slight bug initially evident on my i8xx machine, in that application (eg Firefox) may open without showing the app bar at the top containing its minimize, maximize and close symbols, and closing under file-close still leaves the app showing in the bottom Ubuntu toolbar, and that app manifestation cannot be gotten rid of without a complete restart.

However, this problem has now disappeared with today's ppa.launchpad.net/glasen source update. Brilliant!! No rigorous testing done by me, however.

I do have this query... what happens long term, as this (altered kernel?) gets replaced via normal (official) updates. Will I need, eventually, to modify my grub to ensure this unofficial version is the one that loads?

Still - mustn't complain. Ubuntu is now a joy to use once more...
David

Greetings All:

On Tue, May 18, 2010 at 2:11 PM, Everest <email address hidden> wrote:
> For anyone who cares, the above workaround is already documented at
> https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes
> and is updated with the current mainline kernel as well as containing the proper header files and 64-bit image.

I do have EXACTLY the the Intel graphics card listed in section F of
of this "Lucid8xxFreezes" Wiki page.

***
$ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation
82852/855GM Integrated Graphics Device [8086:3582] (rev 02)"
***

I also downloaded the 855gm live CD "ubuntu-10.04-855gm-desktop-i386.iso"..

Both work. Up to a point. I have 3D desktop but as soon as I launch
Movie Player, either with a video or audio file, the screen freezes.
Everything continues to run (I can hear the audio) but my system is
effectively nuked.

Both break in this way.

The crash also happens if I turn off Compiz and work on a 2D desktop.

It is very nice to have a full 3D desktop again, but without
multimedia, I will be keeping my "sleep problematic" 9.10 partition
where it is and use the 10.04 for testing.

Thanks for the pointers. The problem seems to go a lot deeper than
expected. My original bug was the black screen at boot problem.

Ahimsa

"As long as there are slaughterhouses, there will be battlefields." -Leo Tolstoy

-Jess E.

"I want a processor so powerful I can read the
manual by the light of the heat sink."- R.I.P. MRX

nomnex (nomnex) wrote :

On Wed, 2010-05-19 at 13:37 +0000, Daniel Baumann (dnjl) wrote:
> As I mentioned in post 102 i've build a default lucid kernel
> 2.6.32-22.33 with applied patches of Daniel Vetter (v8).
> This will be found now here:
> https://launchpad.net/~dnjl/+archive/kernel

After installing the patched kernel, what happens when there is a Kernel
official update?

Or, when (and if) this bug is fixed? Do we have to un install it?

Daniel Baumann (dnjl) wrote :

Any lucid kernel update will supersed it. Also updates without containing this fix. So be careful!

(In reply to comment #163)
> So far (5 days of testing) v9 works flawlessly here: no crashes or artefacts.

Meh, font-render errors in Emacs with v9 patch. This was not the case with v7 patch AFAIK. Some fonts aren't drawn at all, some aren't cleared after deleting, and some fonts are rendered too fat (twice maybe, with slichtly different position?)

Created an attachment (id=35778)
dmesg + i915_error_state

dmesg + i915_error_state

Hey guys I would really like to thank you All and especially Daniel for your hard work for solving this problem. I was struggling with it since I upgraded from Slackware 12.2 to 13.0. I've been making long searches in internet and finding many people with the same problem and not a single solution. I'm really happy that finally there is a real chance to get this solved.

Yesterday I build a kernel with patch v9 and the crashes stopped. Finally I could upgrade the intel driver from 2.3.2-legacy to 2.11. Unfortunately there were several problems:

1.The video performance slowed down and now I can't watch HD videos any more (I worked so hard last month to get HD 720p working on my old laptop :( )
If I boot without KMS and the legacy driver the video is faster, but no 3D accel:

(EE) AIGLX error: i915 does not export required DRI extension
(EE) AIGLX: reverting to software rendering
(EE) AIGLX error: dlopen of /usr/lib/xorg/modules/dri/swrast_dri.so failed (/usr/lib/xorg/modules/dri/swrast_dri.so: cannot open shared object file: No such file or directory)
(EE) GLX: could not load software renderer
(II) GLX: no usable GL providers found for screen 0

2. When I tried to open a game with wine I had again crash:

(WW) intel(0): i830_uxa_pixmap_swap_bo_with_image: bo map failed
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.

I would like to help you if I can. I don't have knowledge of low level programming but at least with testing.

I have attached dmesg and i915_error_state (sorry that it's in the previous post)

And one more thing:
I have
kernel-2.6.34-rc6 from drm-intel-next + v9 patch
libdrm-2.4.20
xf86-video-intel-2.11
mesa-7.8.1
xorg-server-1.6.3

just I would like to ask how are you upgrading xorg-server cause there are many packages related to it and I don't feel like spending days to build all X11.

Thanks!

On Fri, May 21, 2010 at 02:02:41AM -0700, <email address hidden> wrote:
> 1.The video performance slowed down and now I can't watch HD videos any more (I
> worked so hard last month to get HD 720p working on my old laptop :( )
> If I boot without KMS and the legacy driver the video is faster, but no 3D
> accel:

The mesa you have doesn't support non-kms anymore. So if you haven't
downgraded that, too, dead-slow opengl is expected ;)

> (EE) AIGLX error: i915 does not export required DRI extension
> (EE) AIGLX: reverting to software rendering
> (EE) AIGLX error: dlopen of /usr/lib/xorg/modules/dri/swrast_dri.so failed
> (/usr/lib/xorg/modules/dri/swrast_dri.so: cannot open shared object file: No
> such file or directory)
> (EE) GLX: could not load software renderer
> (II) GLX: no usable GL providers found for screen 0
>
> 2. When I tried to open a game with wine I had again crash:
>
> (WW) intel(0): i830_uxa_pixmap_swap_bo_with_image: bo map failed
> (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
> (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
> (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
> (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or
> even a frozen display: Input/output error.
>
> I would like to help you if I can. I don't have knowledge of low level
> programming but at least with testing.
>
> I have attached dmesg and i915_error_state (sorry that it's in the previous
> post)

I've taken a quick look. The gpu jumped to a location where there's no
batchbuffer. Likely some memory corruption, but can't say for sure.

> And one more thing:
> I have
> kernel-2.6.34-rc6 from drm-intel-next + v9 patch
> libdrm-2.4.20
> xf86-video-intel-2.11
> mesa-7.8.1
> xorg-server-1.6.3
>
> just I would like to ask how are you upgrading xorg-server cause there are many
> packages related to it and I don't feel like spending days to build all X11.

You don't have to. But it would be great if you could upgrade to latest
libdrm and xf86-video-intel from git master, there have been quite a few
bug-fixes since the last release. Get them from
http://cgit.freedesktop.org/mesa/drm
http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/
Compile&install libdrm first (otherwise you get a non-working system).

Thanks for testing, Daniel

Gustavo (gstv.inc) wrote :

No no news yet ?
for fix on 855 Intel?
OOO MY GOSH ....
maybe ubuntu can make the 2 versions for us until fix be released ... like one for 855...

dan rhodes (daniel-r-rhodes) wrote :

There is a version with working 855 graphics...it's called Windows XP. I have used Ubuntu for the last 5 years and have had my laptop break in some manner on nearly every upgrade (usually Broadcom networking), but this issue is the final straw. Microsoft may be the great "evil", but they would never release a product that would break so many machines and remain so silent for so long. I have subscribed to every thread, read every comment, tried all the kernel updates, driver regressions, formatted and reinstalled clean and my D400 is still nearly useless for most common tasks, requiring me to boot to XP on a daily basis for anything more complicated than gmail. I am planning my HTPC and was researching MythTV for many weeks, but this issue has convinced me that Windows Media Center is the way to go for professional, reliable operation and trustworthy updates. LInux is for uber geeks only, and they are a vanishing breed in this decaying culture.

nomnex (nomnex) wrote :

el_smurfo

you should unsubscribee from this bug and cancel your launchpad account. If this 3 mouse clicks operation is above you, ask for (free) assistance among the vanishing breed of the uber geeks community.

Have fun with Mikeysoft. It surely matches your level of ridicule (Winows Medai Player Center is the way to go. :-) that's a funny one!).

I would like to add that 855 graphics do not work properly on Microsoft operating systems. Windows XP, including Media Center Edition, is not a supported operating system (mainstream support ended 4/14/2009) and is therefore not germane to any conversation involving a currently-supported operating system. Windows Vista is still available for certain locales, and mainstream support ends 4/10/2012. Windows Vista supports most, but not all, i852/855 systems, and may be an option for some. The currently-supported and sold version of Windows is called Windows 7, and 852/855 graphics are completely unsupported by both Microsoft and Intel. Community-based workarounds are available, with varying degrees of success. Microsoft and Intel together released a product that broke every i852/855 machine still in existence.

Please be aware that on Windows XP or Windows XP MCE, you will be entirely on your own for product support because Microsoft is no longer interested in supporting you (unless you have a qualifying Microsoft Support Agreement in place). If you are interested in support on Ubuntu 10.04, your support will end in 2013 for desktop systems or 2015 for server systems.

This comment was off topic, and in reply to another off-topic comment. Please ensure that your comments directly contribute to adding further information about the i855 GPU lockup problem described in the bug report, including successes or failures caused by system updates.

For my part, I'm still experiencing the problem on mainline 2.6.34-lucid, so it isn't fixed yet.

Now back to your regularly-scheduled program...

I don't seem to have the problems as el_smurfo mentioned with my Dell D400.
I've been spoiled as this is the first real issue I've had to deal with
since I got the machine a few years back.

I applied the patch to my system and it is stable and usable for me. I
don't have desktop effects and cannot use the new Unity environment (screen
flickers when I hover over the launcher icons on the left side) but I am not
dead-in-the-water either.

I outlined what I did in the following thread, post #78
 => http://ubuntuforums.org/showthread.php?t=1472054&page=8

Frustrating? Hell Yes! End of the world? Hell No.

I've set up an Ubuntu PPA with linux 2.6.34 + drm-intel-next + fix-i855-cache-coherency-v9.patch at https://launchpad.net/~brian-rogers/+archive/graphics-fixes

I just posted this PPA to some downstream bug reports. Where should I direct the feedback? Should I tell people to report here directly, or should I try to collect and aggregate feedback and report totals here?

And what's the status of this patch? Waiting for more feedback, in need of revision, or what?

JJ (juanma-smith) wrote :

I've just had two GPU hung today. That's very frustrating.

My lspci:
-------------------------------------------------------------------------------
00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
 Subsystem: Wistron Corp. Device 205a
 Flags: bus master, fast devsel, latency 0, IRQ 11
 Memory at e8000000 (32-bit, prefetchable) [size=128M]
 Memory at e0000000 (32-bit, non-prefetchable) [size=512K]
 I/O ports at 1800 [size=8]
 Capabilities: <access denied>
 Kernel driver in use: i915
 Kernel modules: i915

00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
 Subsystem: Wistron Corp. Device 205a
 Flags: fast devsel
 Memory at f0000000 (32-bit, prefetchable) [size=128M]
 Memory at e0080000 (32-bit, non-prefetchable) [size=512K]
 Capabilities: <access denied>
-------------------------------------------------------------------------------------------------------------------------

description: updated
Brian Rogers (brian-rogers) wrote :

I just updated the bug description to point at my PPA containing a kernel with the proposed fix. I would advise people to try this kernel, because it has had plenty of positive feedback so far. Then people can report how it goes, and we can aggregate the data and report it upstream.

> --- Comment #173 from Brian Rogers <email address hidden> 2010-05-27 01:33:27 PDT ---
> I just posted this PPA to some downstream bug reports. Where should I direct
> the feedback? Should I tell people to report here directly, or should I try to
> collect and aggregate feedback and report totals here?

IMHO gathering the feedback and aggregating interesting/special stuff here
is the best option. This bug report is already rather crowded as-is.

> And what's the status of this patch? Waiting for more feedback, in need
> of revision, or what?
The preparatory stuff missed the .35 merge window, so currently nothing's
gonna happen. I'll intend to submit it for .36 - but I'm slightly uneasy
with the fact that some systems still show reports of failed gtt flushes.

(In reply to comment #174)
> > --- Comment #173 from Brian Rogers <email address hidden> 2010-05-27 01:33:27 PDT ---

> IMHO gathering the feedback and aggregating interesting/special stuff here
> is the best option. This bug report is already rather crowded as-is.

Apologies - but I can not see where else to post the feedback,

I downloaded and burned the Ubuntu liveCD referenced here: http://glasen-hardt.de/?p=568 which I believe has this patch. I then booted my Fujitsu-Siemens Amilo 7400M (w/1.25 GB RAM) and an Intel i855 GM graphics with that "unofficial" Ubuntu community liveCD that has the patch, and it booted and worked with Intel driver (I played for 2 hours with xrandr (driving external display), wireless, and special desktop efects). openSUSE-11.1 w/2.6.27 kernel was last successful Linux w/this laptop. Neither openSUSE-11.3 Milestone7 nor the released Fedora-13 work with the Intel driver on this laptop. I am in favour of this patch being sent upstream.

Stenten (stenten) wrote :

I think there's something wrong with v9 of the patch. Stefan Glasenhardt and Brian Rogers's DKMS modules [1] and kernel [2] (respectively) freeze my D505 hard (can't even REISUB) on lid-close. But Daniel Baumann's kernel [3] doesn't freeze on lid-close.

The only difference I can see is that Daniel's kernel is dated 4/30 and v9 was posted 5/10. Stefan and Brian's fixes both have v9, but I have no idea what version Daniel has. Guessing v8?

Current Lucid kernel (2.6.32-22) and the 2.6.34 mainlines since rc6 all work perfectly on lid-close. It's just the first two above that cause problems.

[1]: https://launchpad.net/~glasen/+archive/855gm-fix
[2]: https://launchpad.net/~brian-rogers/+archive/graphics-fixes
[3]: https://launchpad.net/~dnjl/+archive/kernel

Michael Rickmann (mrickma) wrote :

On an FSC Amilo M7400 with Intel Corporation 82852/855GM Integrated Graphics Device (rev 02) the fix-proposed kernel runs really well whereas the standard lucid kernel was a disaster. Now xv-overlay works. I have to close the lid twice for the notebook to suspend, On opening it gracefully recovers. glxgears' frame rate has risen by 75% as compared to Stefan Glasenhardt's fix. The only glitch I could discover on my system is that a gl-screensaver would not show up and prevent Xorg to recover though the mouse cursor adapted perfectly to the underlying invisible screen. I guess that this bug is independant from V9.
I also think that the lid closure of Stenten in #187 is a problem independent from V9. Stefan Glasenhardt's fix contains a patch in addition to V9 to fix the xv-overlay trouble he had (see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/554432 comment #15). On my hardware Daniel Baumann's kernel freezes when I watch video.
Thanks for the new kernel.

Created an attachment (id=35946)
2 flush failures with latest version of patch

These are 2 failures happening during normal usage. I have latest version of patch and also with the copy/paste bug fixed manually.

These flush failures happen without anything noticed on the user side; I am also using xv_overlay_mode_fix.diff (the 1st flush failure in the attachment was without it, but does not seem related) to limit the crashes when watching videos and it seem to work (reduced from 1/hour to 0.5/day)

Gustavo (gstv.inc) wrote :

2.6.32-19 kernel
ok this is the kernel they work not to much by i can boot and use the basic for now .
can i just install the 2.6.32-18 kernel OLD kernel and just to make my pc work like old-times just for now?
only for waiting for the fix?

let me ask something ...
this bug is on kernel?
or drive?

how can i learn to help the fix?
where?
on the kernel website? http://www.kernel.org/?
can i compile the new kernel to make my pc work?
if yes where I can look for make this ?
someone say to fix is regression and they not want regression but, why they no let just us the 855 make the regression?
I'm apologize for ask but I wanna help because this is too long for waiting

This is the old that work ,and this is the thing they remove? http://www.kernel.org/diff/diffview.cgi?file=/pub/linux/kernel/v2.6/patch-2.6.32.14.bz2
on
drivers/gpu/drm/i915/intel/
can someone just put it back to work?

This is my lspci

00:00.0 Host bridge: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
00:00.1 System peripheral: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
00:00.3 System peripheral: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 83)
00:1f.0 ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 03)
00:1f.6 Modem: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 03)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
02:06.0 Network controller: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller (rev 03)
02:09.0 CardBus bridge: Texas Instruments PCIxx21/x515 Cardbus Controller
02:09.2 FireWire (IEEE 1394): Texas Instruments OHCI Compliant IEEE 1394 Host Controller
02:09.3 Mass storage controller: Texas Instruments PCIxx21 Integrated FlashMedia Controller
02:09.4 SD Host controller: Texas Instruments PCI6411/6421/6611/6621/7411/7421/7611/7621 Secure Digital Controller

Maybe this help....

God help those who are losing their nights of sleep trying to solve these problems, I wish I could really help

On Sat, 2010-05-29 at 20:23 +0000, Gustavo wrote:

> how can i learn to help the fix?
> where?

This is the upstream bug report about the bug
https://bugs.freedesktop.org/show_bug.cgi?id=27187

*** Bug 24789 has been marked as a duplicate of this bug. ***

claus madsen (post-stemning) wrote :

I installed the kernel from Brian Rogers PPA and all seems to work fine on my Acer Travelmate 4000 with 82852/855GM Integrated Graphics. Video and Suspend ok.

Nick Sharp (njsharp) wrote :

It is good to see some folk getting 10.04+855GM working using various new kernels, but I expect I am not alone in saying that I am unlikely to load 10.04 on my 855GM Toshiba Satellite A10 (still 9.04 after a poor experience with 9.10 also) until:

1 Upstream release a newer kernel that works really well with 855GM (IF it's possible) and is ...
2 ... cut into Ubuntu 10.04 as a respin, so that I can simply download the CD, install and enjoy

I might just about cope with a special install (boot time options re vesa, 915 nosplash etc), provided that an immediately following update pulls the new kernel and leaves the machine perfect from then on!

I expect there are many who want to convert from the dark side who would not be prepared to do even that.

In the meantime I am actually considering refreshing the Toshiba with a repeat install of 9.04 (I like to do a clean reload every year or so, even with Ubuntu but especially of W... you know what).

If 1 and possibly 2 does not happen, I am likely to move the ~8 year old Toshiba back to its original WXP and find someone else who wants it. I never had any screen trouble with it under WXP.

I believe 855GM has some fundamental flaws, and it certainly has some limitations that have begun to annoy me as high resolution second screens become available, namely its max 2048x2048 virtual display space. For nice intuitive use, you REALLY do want a second screen that is on the right hand side of the desk also to be VIRTUALLY on the right hand side of the laptop screen, and that cannot happen with 855GM if the second screen is more than 1024 wide (hard to get one as small as that now!!).

So for me, perhaps it is time to retire the Toshiba to light duties elsewhere, and move on. Sad for one like me who likes to squeeze the last reasonable drop of use out of everything!! And for me it has been a very good machine.

All the best to the gurus working on this!

Hey hey, i've been trying the present glasen-Kubuntu-CD and thought it's working fine, until i tried to surf to:
http://www.ardmediathek.de/
The massive use of flash etc seems to be too much for the patch: First strange colour stripes on screen and then crash of Xserver. Sorry - I don't know to post which log file! D.

Hey hey, i've been trying the present glasen-Kubuntu-CD and thought it's working fine, until i tried to surf to:
http://www.ardmediathek.de/
The massive use of flash etc seems to be too much for the patch: First strange colour stripes on screen and then crash of Xserver. Sorry - I don't know to post which log file! D.

Kubuntu 10.04
xserver-xorg-video-intel: 2:2.11.0+git20100531~glasen~ppa1
855gm-fix-dkms: 0.6.2~glasen~ppa1
Kernel 2.6.32-22.33

Michael Rickmann (mrickma) wrote :

A follow up to what I posted about the fix-proposed 2.6.34-52.1 kernel in comment #188. I had a single crash since then while using firefox on a page ( www.bahn.de , German railways) which seems to fragment bitmaps not only in the ff window but eventually also the panels. The fragmentation is rather reproducible the crash not.

claus madsen (post-stemning) wrote :

An addition to #191: The Travelmate 4000 wont power off. The ubuntu logo and the red dots remain forever, animated.

(In reply to comment #179)
> Hey hey, i've been trying the present glasen-Kubuntu-CD and thought it's
> working fine, until i tried to surf to:
> http://www.ardmediathek.de/
> The massive use of flash etc seems to be too much for the patch: First strange
> colour stripes on screen and then crash of Xserver. Sorry - I don't know to
> post which log file! D.
>
> Kubuntu 10.04
> xserver-xorg-video-intel: 2:2.11.0+git20100531~glasen~ppa1
> 855gm-fix-dkms: 0.6.2~glasen~ppa1
> Kernel 2.6.32-22.33

This website also freezes my Xserver. Cursor is still moving, but that's about it. Xorg.log states:

[ 10152.357] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 10152.357]
Backtrace:
[ 10152.357] 0: /usr/bin/X (xorg_backtrace+0x3c) [0x80ebabc]
[ 10152.357] 1: /usr/bin/X (mieqEnqueue+0x1f5) [0x80eb3d5]
[ 10152.357] 2: /usr/bin/X (xf86PostMotionEventP+0xc8) [0x80c5f68]
[ 10152.357] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0xb73de000+0x34ff) [0xb73e14ff]
[ 10152.357] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0xb73de000+0x37e6) [0xb73e17e6]
[ 10152.357] 5: /usr/bin/X (0x8048000+0x6c8cf) [0x80b48cf]
[ 10152.357] 6: /usr/bin/X (0x8048000+0x127b2a) [0x816fb2a]
[ 10152.357] 7: (vdso) (__kernel_sigreturn+0x0) [0xb7849400]
[ 10152.357] 8: /usr/lib/libpixman-1.so.0 (0xb760b000+0x5de3a) [0xb7668e3a]
[ 10152.357] 9: /usr/lib/libpixman-1.so.0 (0xb760b000+0x17193) [0xb7622193]
[ 10152.357] 10: /usr/lib/libpixman-1.so.0 (pixman_blt+0x78) [0xb7648108]
[ 10152.357] 11: /usr/lib/xorg/modules/libfb.so (fbCopyNtoN+0x1ad) [0xb737777d]
[ 10152.357] 12: /usr/lib/xorg/modules/drivers/intel_drv.so (0xb7384000+0x32691) [0xb73b6691]
[ 10152.357] 13: /usr/bin/X (miCopyRegion+0x1ba) [0x81a140a]
[ 10152.357] 14: /usr/bin/X (miDoCopy+0x475) [0x81a19b5]
[ 10152.357] 15: /usr/lib/xorg/modules/drivers/intel_drv.so (0xb7384000+0x31ebf) [0xb73b5ebf]
[ 10152.357] 16: /usr/bin/X (0x8048000+0xdf21f) [0x812721f]
[ 10152.358] 17: /usr/bin/X (0x8048000+0x27dbc) [0x806fdbc]
[ 10152.358] 18: /usr/bin/X (0x8048000+0x294c7) [0x80714c7]
[ 10152.358] 19: /usr/bin/X (0x8048000+0x1da8b) [0x8065a8b]
[ 10152.358] 20: /lib/libc.so.6 (__libc_start_main+0xe2) [0xb7495bb2]
[ 10152.358] 21: /usr/bin/X (0x8048000+0x1d641) [0x8065641]

The problem with al these kind of bugs is that as long as Daniel's patches aren't upstream, it's hard to work out where the problem is. My guess is that this particular bug has nothing to do with Daniel's patches, but is a bug in xf86-video-intel. I can try to test without Daniel's patches, but chances are I won't even be able to start Firefox before X.org crashes.

(In reply to comment #180)
> (In reply to comment #179)
> > Hey hey, i've been trying the present glasen-Kubuntu-CD and thought it's
> > working fine, until i tried to surf to:
> > http://www.ardmediathek.de/
> > The massive use of flash etc seems to be too much for the patch: First strange
> > colour stripes on screen and then crash of Xserver. Sorry - I don't know to
> > post which log file! D.
[SNIP]
>
> The problem with al these kind of bugs is that as long as Daniel's patches
> aren't upstream, it's hard to work out where the problem is. My guess is that
> this particular bug has nothing to do with Daniel's patches, but is a bug in
> xf86-video-intel. I can try to test without Daniel's patches, but chances are
> I won't even be able to start Firefox before X.org crashes.

No crash for my 855GM rev02; I am using mainstream git linux with v9 patch.

> --- Comment #180 from René Gabriëls <email address hidden> 2010-06-03 15:50:05 PDT ---
> The problem with al these kind of bugs is that as long as Daniel's patches
> aren't upstream, it's hard to work out where the problem is. My guess is that
> this particular bug has nothing to do with Daniel's patches, but is a bug in
> xf86-video-intel. I can try to test without Daniel's patches, but chances are
> I won't even be able to start Firefox before X.org crashes.

As long as there's nothing in dmesg about the gpu hanging it's rather
likely that this is a different bug. Not really suprising given that this
cache coherency problem seems to prevent tons of users from testing the
latest & greatest.

(In reply to comment #182)
> As long as there's nothing in dmesg about the gpu hanging it's rather
> likely that this is a different bug. Not really suprising given that this
> cache coherency problem seems to prevent tons of users from testing the
> latest & greatest.

I have tested this site with a variety of kernels:

2.6.33 + Firefox: works
2.6.33 + Opera: works
2.6.34 + Firefox: hang
2.6.34 + Opera: works
2.6.34-rc6 + v9 + Firefox: hang (sometimes works)
2.6.34-rc6 + v9 + Opera: works

In other words, there seems to be no correlation between this bug and this bug or Daniel's patch for it. It probably is a bug introduced in between kernel 2.6.33 and 2.6.34-rc6.

ssuuddoo (ssuuddoo) wrote :

maybe U all know it guys, but I today discovered, when using the "acpi=off" option, the system boots into the grafical terminal.
maybe it helps somehow.

(the added option)
GRUB_CMDLINE_LINUX_DEFAULT="splash acpi=off"

I know, the screen is somehow not what it should be, but at least it boots correctly.
ssuuddoo :D
thumbs up!

ssuuddoo (ssuuddoo) wrote :

maybe SOLUTION!

now I can normally log-in and work as previous. :D

Gustavo (gstv.inc) wrote :

where is the file to ?

(the added option)
GRUB_CMDLINE_LINUX_DEFAULT="splash acpi=off"

this is on grub?

very quiet here this maybe the developers gave up on solving this bug ...
We do not see them talking about retries already a few days .. or realized that the solution involves something that one can not fight, such as the regression they spoke,
I again ask why they do not make a clone of the kernel with such regression, ONLY to us that uses 855?
perhaps this is simply for who knows how to do?

I know it does not and the real solution .. but that would put us right back to work until that in the future ,this Bug can be solve!!
and if possible, would only put him in the repository that anyone could install it and everything would be solved (for now) that would allow time so that they could think of a better solution, I ask this because really NEED UBUNTU to work .. ..

ssuuddoo (ssuuddoo) wrote :

the file is the grub configuration file:

/etc/default/grub

edit it with for example: "sudo gedit /etc/default/grub"
and add the option "acpi=off" into already existing GRUB_CMDLINE_LINUX_DEFAULT=""
save the file!!!

after that update the grub for the changes to take efekt
run: "sudo update-grub"

and see 4 URself

:D
thumbs up!

ssuuddoo (ssuuddoo) wrote :

for the first reboot, the graphics was ugly (resolution, icons, wallpaper, gtk), but after I changed it,
it remembered everything and the system works well. (hopefully I didnt disable such an important thing).

:D
greetings from Slovakia

Does this remain after a new kernel is installed? Or does this have to occur
after each upgrade?

On Wed, Jun 9, 2010 at 4:31 PM, ssuuddoo <email address hidden> wrote:

> for the first reboot, the graphics was ugly (resolution, icons, wallpaper,
> gtk), but after I changed it,
> it remembered everything and the system works well. (hopefully I didnt
> disable such an important thing).
>
> :D
> greetings from Slovakia
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>

Nick_Hill (nick-nickhill) wrote :

I updated my Toshiba M100 to Lucid 10.04 today. Crashes catatonic.
Tried acpi=off and tried making an xorg file with vesa set as the driver No success.

Followed acpi=off instructions as mentioned earlier. No success.

 Then followed
sudo apt-add-repository ppa:brian-rogers/graphics-fixes
sudo apt-get update
sudo apt-get dist-upgrade

But linux-image-2.6.34-52-generic not available

So tried booting. No success.

Then discovered and installed
linux-image-2.6.34-v9patch-generic

Again no success.

Reversed the acpi=off i had set previously, and removed the xorg.conf file.

 I can now boot to a desktop.

Tried then rebooting with the vanilla kernel. river halts. Go back to V9 kernel. Boots to desktop.

Seems to make no difference whether the driver in xorg.conf is set to vesa or intel. xorg detects my machine as double-headed. Presumably one is the external monitor socket.

With the v9 patch kernel, acpi=off will apparently break the driver.

description: updated
Julien Olivier (julo) wrote :

With Brian Rogers's PPA (2.6.34rc7-51-generic), Xorg works perfectly and totally stopped crashing on "Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)". However, I have frequent freezes when trying to suspend to RAM. Any idea what could be wrong?

Note that with the default Lucid kernel, I have frequent xorg crashes, especially when watching videos, but I didn't seem to have problems when suspending to RAM...

Brian Rogers (brian-rogers) wrote :

Julien, did you try linux-image-2.6.34-v9patch-generic from ppa:brian-rogers/graphics-fixes ? It's based on the final 2.6.34, so should have some bug fixes. But if it doesn't work as well as 2.6.34rc7-51-generic, then there's some sort of regression and we'll try to hunt it down.

Julien Olivier (julo) wrote :

Brian, I'm trying it right now and so far so good. If I manage to reproduce a crash or a freeze, I'll come back there and report it.

Julien Olivier (julo) wrote :

Brian: I spoke too soon. I found a reliable way to reproduce the problem with 2.6.34-v9patch-generic:

 - if I start my laptop on AC or battery, then suspend it, it works.
 - if I plug my laptop on AC, then suspend it, it works.
 - if I put it back to battery, then suspend it, two things can happen:
     1) it tries to suspend, but fails, but I can go on working.
     2) it tries to suspend, but fails, and everything is frozen...
 - if I put my laptop on battery, then plug it back to AC, and then try to suspend it, if will fail as if it were on battery.

In conclusion, it seems that whenever the laptop gets unplugged from AC, something happens and makes it impossible to suspend it, even if I re-plug it, most of the times even resulting in a freeze.

Jim Brumbaugh (bleumyst) wrote :
Download full text (3.9 KiB)

Thanks Brian:

I'm one of those with an 855 Intel chipset.

I tried out your system patch on a completely clean 10.04 install.

Though the dots animation screen proior to the login screen is not
visible, log in proceeds perfectly.

Only problem is my Compiz graphics have been reduced to the Ubuntu
8.10 to 9.04 range where video playing in media player of VLC does not
track window dragging until the window is released. Additionally,
Windows-E does not resize the video playing in the window. This means
I'm back to playing one video at a time as the video playback is
locked to a hard screen region, not to a logical screen region.

Additionally the sleep problem fixed with 10.04 has reverted back to
9.04. If I close and reopen the lid while the system is running, the
system locks.

Regressing my system by 6 to 12 months is not the way to go. Sorry.

I should mention that aside from the sleep problem, I have full 3D
compiz graphics with no glitches in the default install of 9.10 I'm
still running as my main system on the same Dell x300 laptop.

Thanks for the effort. Hope this helps.

Now about the rumors.. is this maybe a problem in Intel's driver
supporting my older hardware, and are they doing any work? I think
I've noticed at least one Intel driver come down the pike so far..

Ahimsa

"As long as there are slaughterhouses, there will be battlefields." -Leo Tolstoy

-Jess E.

"I want a processor so powerful I can read the
manual by the light of the heat sink."- R.I.P. MRX

On Wed, Jun 16, 2010 at 5:35 AM, Brian Rogers <email address hidden> wrote:
> ** Description changed:
>
>  Binary package hint: xserver-xorg-video-intel
>
>  This is a MASTER bug report, i.e. not a real bug report, but a tool to
>  help manage other bug reports.
>
>  Most bug reports on i855 are probably due to the CPU/GPU incoherency
>  problem that is now consolidated upstream at
>  http://bugs.freedesktop.org/show_bug.cgi?id=27187 (which was split off
>  from a bug report for i845). For now, we mark all automatically reported
>  GPU lockups on i855 as duplicates of this unless there is a reason not
>  to.
>
>  A kernel with the proposed fix is available at https://launchpad.net
>  /~brian-rogers/+archive/graphics-fixes
>
>  To use this fixed kernel, run the following commands:
>
>  sudo apt-add-repository ppa:brian-rogers/graphics-fixes
>  sudo apt-get update
>  sudo apt-get dist-upgrade
> - sudo apt-get install linux-image-2.6.34-52-generic
> + sudo apt-get install linux-image-2.6.34-v9patch-generic
>
>  There is a similar master bug report for i845 at bug 541492.
>
> --
> MASTER: [i855] GPU lockup (apport-crash)
> https://bugs.launchpad.net/bugs/541511
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>
> Status in Ubuntu Release Notes: Fix Released
> Status in X.org xf86-video-intel: Confirmed
> Status in “xserver-xorg-video-intel” package in Ubuntu: Triaged
> Status in “xserver-xorg-video-intel” source package in Lucid: Triaged
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> This is a MASTER bug report, i.e. not a real bug report, but a tool to help manage other bug reports.
>
> Most bug reports on i855...

Read more...

David Oser (mirmos192) wrote :

@Brian Rogers
Many thanks - I'd just like to report that your patch (linux-image-2.6.34-v9patch-generic) appears to be working perfectly for me, with my i855 HP Compaq Nx5000. I'm not trying anything fancy regarding video - but can report that the system now starts normally, no freeze during boot, and - so far - no odd messages. Further, using Cheese with my (equally) ancient webcam does not crash the system either, which happened with previous attempts at fixes. It is also remaining stable with advanced visual effects selected via desktop background. Only thing I haven't tried is suspend - but that is because I need to press down on the casing of my old machine near the power input to get it to power up at all with any OS - and I always had problems with suspend, anyway (go figure...). So I have removed the 10.04 rc7 Mainline kernel at last!! I'll report back if I get any problems.
Best
David

David Oser (mirmos192) wrote :

@Brian Rogers
Edit to above - no Ubuntu splash on subsequent restarts (any visual effects setting), though there was on the first - but I wasn't getting the splash with the Mainline kernel, either, so I don't miss it anyway.
Best
David

*** Bug 28796 has been marked as a duplicate of this bug. ***

(In reply to comment #184)
> *** Bug 28796 has been marked as a duplicate of this bug. ***

Chris Wilson directed me here diagnosing cache coherency problems with my laptop setup. I have looked at the V9 patch and I am not convinced that it is appropriate for use with my stock 2.6.34 kernel. I am looking for advice as to which kernel (and/or other components) I should use that will allow me to get fully up to speed with this issue. My objective is to contribute to a solution if possible or at least proved a test resource for others.

thanks

My System Spec:
IBM ThinkPad R51, model 2889SG1
CPU: Intel(R) Pentium(R) M processor 1.70GHz stepping 06
agpgart-intel 0000:00:00.0: Intel 855GM Chipset
agpgart-intel 0000:00:00.0: detected 8060K stolen memory
agpgart-intel 0000:00:00.0: AGP aperture is 128M @ 0xe0000000
Kernel 2.6.34
Running recent Development LFS system with the following X11 components:
XServer 1.8.1
Mesa 7.8.2
libdrm-2.4.21
xf86-video-intel-2.12

The v9 patch will not apply to plain 2.6.34. It was based on the drm-intel-next branch which has since been merged. So the easiest thing to do is apply the patch to 2.6.35-rc3.

Hi,

I've backported the v9-patch to several kernel versions. You can download them from my homepage :

http://glasen-hardt.de/?page_id=707

(In reply to comment #187)
> Hi,
>
> I've backported the v9-patch to several kernel versions. You can download them
> from my homepage :
>
> http://glasen-hardt.de/?page_id=707

Hi,

Thanks but I have already set my system up with 2.6.35-rc3 + V9.
I have also updated my X server to 1.8.2.

My test senario is to run two 3d apps (glxgears & atlantis from xscreensaver) and an XV mplayer AVI loop with fvwm2 as the window manager. After about four hours my kernel has reported the flush failure that has seen by others earlier. CPU usage was low (< 10%) at all times (CPU throttled back to 600 Mhz by acpi_cpufreq) but the displays were not smooth and stalled on occassion.
Is my system showing the expected behaviour for the current state of development ?

thanks.

WARNING: at drivers/char/agp/intel-gtt.c:1007 intel_i830_chipset_flush+0x2f5/0x400()
Hardware name: 2889SG1
i8xx chipset flush failed, expected: 113920, cpu_read: 113408
Modules linked in: nfs nfsd lockd sunrpc exportfs microcode usbhid uhci_hcd pcmcia thinkpad_acpi ehci_hcd hwmon rfkill rtc_cmos led_class snd_intel8x0 8250_pnp yenta_socket floppy usbcore 8250_pci rtc_core nvram battery ac ide_cd_mod rtc_lib 8250 pcmcia_rsrc e100 snd_ac97_codec psmouse nls_base pcmcia_core cdrom ac97_bus serial_core i2c_i801 thermal rng_core snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore acpi_cpufreq processor mperf unix
Pid: 2552, comm: X Not tainted 2.6.35-rc3 #1
Call Trace:
 [<c1027748>] ? warn_slowpath_common+0x78/0xb0
 [<c119a2b5>] ? intel_i830_chipset_flush+0x2f5/0x400
 [<c119a2b5>] ? intel_i830_chipset_flush+0x2f5/0x400
 [<c1027813>] ? warn_slowpath_fmt+0x33/0x40
 [<c119a2b5>] ? intel_i830_chipset_flush+0x2f5/0x400
 [<c119502c>] ? agp_flush_chipset+0xc/0x10
 [<c11bdaac>] ? i915_gem_object_flush_cpu_write_domain+0x2c/0x40
 [<c11bfbba>] ? i915_gem_object_set_to_gtt_domain+0x3a/0x80
 [<c11d8d1f>] ? intel_overlay_do_put_image+0x7f/0x7c0
 [<c11d9c4c>] ? intel_overlay_put_image+0x58c/0x770
 [<c11cacb6>] ? intel_mark_busy+0x1d6/0x1e0
 [<c11a2c67>] ? drm_ioctl+0x157/0x330
 [<c11d96c0>] ? intel_overlay_put_image+0x0/0x770
 [<c1073288>] ? handle_mm_fault+0x208/0x7a0
 [<c109486f>] ? do_vfs_ioctl+0x8f/0x610
 [<c101deb7>] ? do_page_fault+0x197/0x3b0
 [<c11a2b10>] ? drm_ioctl+0x0/0x330
 [<c10424ca>] ? ktime_get_ts+0x10a/0x140
 [<c112b763>] ? copy_to_user+0x33/0x70
 [<c1094e2d>] ? sys_ioctl+0x3d/0x70
 [<c1002b10>] ? sysenter_do_call+0x12/0x26

Hi everyone!
My Thinkpad X40 (i915) was fixed by the driver in glasen's ppa (previously it locked up on boot with 2.6.32-22, although it worked fine with 2.6.31-20).

It all seems good except for playing a movie in totem, where I get some sort of driver crash. The system is still responsive to eg. ACPI events but the screen is totally blank. Attached is the backtrace from syslog

Thanks to everyone looking into this issue, nothing worse than unsupportive vendors :(

I've been blindly writing 'i915' as the graphics chip on the x40 when in fact it's an 855GM, my bad

juliobahar (yahalla-julio) wrote :

I've just install lucid kernel 2.6.34-020634-generic, as a desperate solution for the intel 855GM freezes.

Videos are working now fine, but my graphics seem to run very slow, especially when compiz is set as the main windows decorator. Changing it to Metacity, makes the computer more responsive, yet I'm losing the eye-candy effects of compiz.

Videos aren't running well on totem with this new kernel.

I'm running Lucid 386, on an old Toshiba Satellite-A55, with an Intel Corporation 82852/855Gm integrated Graphics Device.

Created an attachment (id=36797)
dmesg debufs and X server output

Updated kernel to 2.6.35.rc4 + V9 and running a small benchmark program that I have that exercises X, the file system and the CPU results in this GPU hang every time (on first run - sofar).

Is this the same issue or have I drifted off the thread ?

thanks

Julien Olivier (julo) wrote :

I have now installed linux-image-2.6.34-drm-generic from ppa:brian-rogers/graphics-fixes and everything is working perfectly: no crash when viewing videos, no crash when suspending. This is with the standard i810 xorg driver that comes with lucid.

Julien Olivier (julo) wrote :

Sorry, I'm going to make a fool of myself, but the suspend crash bit me again... So, both linux-image-2.6.34-drm-generic and linux-image-2.6.34-v9patch-generic crash when going from AC power to battery, and then trying to suspend.

Is there a way to debug the crash? How can I help?

I appreciate the effort of all the people trying to fix this problem, but please let me ask a question: Would it be possible to make the "legacy" driver work with current X.org again? Wouldn't that be easier than trying to fix the bug? It is hard to believe the bug will ever get fixed. It only affects old hardware, so I don't think Intel (or any other employer) is willing to spend money on this...

I remember there was a "legacy" driver in early 2009 that *just* *worked*. (ArchLinux had a package called xf86-video-intel-legacy.) It was meant to be a temporary workaround before this bug is *fixed*. Unfortunately and despite the fact that the bug has not been fixed, all the support for the legacy driver has been dropped months ago.

There has been (de facto) no support for old Intel graphic chipsets on Linux since April 2009. :-( Using 1-year-old packages is not an option for most people. If the legacy driver could work with the latest X.org again, it would be just great...

(In reply to comment #190)
> please let me ask a question: Would it be possible to make the "legacy" driver
> work with current X.org again? Wouldn't that be easier than trying to fix the

I totally agree with this!
My intel video driver _was_ worked with my hardware /00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)/ with XAA. I can use compiz, xvideo, etc, anything, what I want. Now, I can see a crappy, but "graphical" boot screen /with kms, but I can do the same thing with intelfb, from many years ago/, but, if I want to use X, I must use fbdev drv in xorg.conf /with newest kernel (2.6.35-rcX-gitY), I can use intel-drv, but no luck with compiz for example/.

So a legacy driver would be nice :)

@Andrej Podzimek & Szabo, Akos :
Did you actually try the patch or are you both just being objective?

(In reply to comment #192)
> @Andrej Podzimek & Szabo, Akos :
> Did you actually try the patch or are you both just being objective?

In my case, the v9 patch improved the situation. Instead of freezing the whole kernel (beyond magic SysRq) right after X.org startup (which is what the mainline kernel + current X.org does), I could "only" see the well-known X-server freeze immediately after login. (I saw it the second my mouse cursor touched an icon on the desktop. Highlighting the icon obviously triggered the freeze.) But there was nothing that could be called a usable desktop.

Furthermore, testing the patches is *extremely* difficult for me, since all the Intel laptops I care about use the Reiser4 file system. Reiser4 patches are available for mainline kernels only. So in fact I had to give up testing most versions of the patch. (The mainline kernel was so different that patching it manually was simply impossible for someone unfamiliar with the code.) That said, it is well possible that I applied the patch incorrectly and caused some other problems due to a typo...

To sum up, one of the following would help:
1) A patch against the mainline kernel (such as 2.6.34.1) to which Reiser4 could be applied as well.
2) A live distro that would use the patch, just to test it. (Does it exist?)

(In reply to comment #192)
> @Andrej Podzimek & Szabo, Akos :
> Did you actually try the patch or are you both just being objective?

Yep. I wrote: now, I can use X, with latest patches, I think, never Fedora kernel contains it. /I use rawhide kernel now/.
But, I can't use X, just like with fedora10: no compiz, no (minimal) 3d, any complex video output make an unusable system. For a 1.5 year ago, I can play with fallout2 with wine, now just freeze the display, sometime with a nice blue screen, sometime not. And every time when X freezing, I can login through ssh.

v9 patch works perfectly with latest kernel. However I have not tested compiz and any other complex 3D, so it is possible that it crashes in such cases

(In reply to comment #195)
> v9 patch works perfectly with latest kernel.

Well, it might work fine on *your* hardware, but that does not imply it "works perfectly" in general. Believe it or not, the may still be (and in fact really *are*) many people observing hangs and crashes.

BTW, what do you mean by "latest kernel"? I can only patch against the mainline, due to Reiser4. If there was a patch agains the mainline 2.6.34.x kernel, I could easily test every single patch version with every single kernel release.

> However I have not tested compiz and any other complex 3D...

Neither have I, but it crashes anyway. Furthermore, compositting has become a standard feature. Obviously, switching all acceleration off is *not* a usable workaround. The legacy driver was *perfectly* stable and weeks of uptime with (accelerated) KDE 4, DVB-T and simple 3D games were not a problem.

(In reply to comment #196)
> (In reply to comment #195)
> > v9 patch works perfectly with latest kernel.
>
> Well, it might work fine on *your* hardware, but that does not imply it "works
> perfectly" in general. Believe it or not, the may still be (and in fact really
> *are*) many people observing hangs and crashes.
>
I thought it was implied that I had not changed my hardware since last test, it's still i855GM rev02.

I know that there are hangs and crashes, it's just that since v9 patch was created I am no more experiencing them; overlays are also working fine.

> BTW, what do you mean by "latest kernel"? I can only patch against the
> mainline, due to Reiser4. If there was a patch agains the mainline 2.6.34.x
> kernel, I could easily test every single patch version with every single kernel
> release.
>

2.6.35-rc4

There are ports of the patch to other versions as well, you'd better check them from comment 187

> > However I have not tested compiz and any other complex 3D...
>
> Neither have I, but it crashes anyway. Furthermore, compositting has become a
> standard feature. Obviously, switching all acceleration off is *not* a usable
> workaround. The legacy driver was *perfectly* stable and weeks of uptime with
> (accelerated) KDE 4, DVB-T and simple 3D games were not a problem.

The legacy driver is no more an option, this was explained in this bug and in bug 26345.

I am sure you can track down the cause of the crash and the relative bug; it could be some part of Xorg stack or also a new bug.

Brian Rogers (brian-rogers) wrote :

I just uploaded some new kernels:
linux-image-2.6.35rc4-131+ge467e10+nopatch-generic
linux-image-2.6.35rc4-131+ge467e10+v9patch-generic

Julien, do these kernels help the suspend issue?

Julien Olivier (julo) wrote :

Brian, I have tested both kernels, and I have exactly the same behaviour as in comment #205: it fails to suspend if the laptop when fro AC to battery or the contrary.

(In reply to comment #189)
> Created an attachment (id=36797) [details]
> dmesg debufs and X server output
>
> Updated kernel to 2.6.35.rc4 + V9 and running a small benchmark program that I
> have that exercises X, the file system and the CPU results in this GPU hang
> every time (on first run - sofar).
>
> Is this the same issue or have I drifted off the thread ?
>
> thanks

can you provide sources of such program? If I can verify the same crash with my i855GM rev02 then it would be a testcase program

(In reply to comment #198)
> (In reply to comment #189)
> > Created an attachment (id=36797) [details] [details]
> > dmesg debufs and X server output
> >
> > Updated kernel to 2.6.35.rc4 + V9 and running a small benchmark program that I
> > have that exercises X, the file system and the CPU results in this GPU hang
> > every time (on first run - sofar).
> >
> > Is this the same issue or have I drifted off the thread ?
> >
> > thanks
>
> can you provide sources of such program? If I can verify the same crash with my
> i855GM rev02 then it would be a testcase program

I have discovered that if I just run the X server (no window manager etc) and my bench mark program the GPU has not hung. So I think I will explore why the window manager (fvwm2) and the other desktop apps that I have contributes to the regular GPU hang I originally reported. If I can discover a solid repeatable senario I will be happy release my test program.

thanks

*** Bug 25086 has been marked as a duplicate of this bug. ***

Created an attachment (id=37157)
Test Program

(From update of attachment 37157)
See README for context and usage.

Michael Rickmann (mrickma) wrote :

Hi Julien,
could your suspend issues be caused by https://bugs.launchpad.net/ubuntu/+source/upower/+bug/531190 . There is a work around which you could try.
Regards
Michael

(In reply to comment #202)
> (From update of attachment 37157 [details])
> See README for context and usage.

Updated kernel to 2.6.35.rc6 + V9 gpu hang reported, as before, with test program.

Julien Olivier (julo) wrote :

Hi mrickma,

I think this bug doesn't affect me. And the work-around didn't change anything.

Also note that I only have suspend problems with Brian Rogers kernels. I am now using the solution from bug #541492, and everything is working fine (suspend included).

1.Sometimes the header bar of windows completely dissapears, and when the windows are closed, they still remain on the task bar. It is impossible to switch to another desktop in such a case.

2. Other times the system hangs. Nothing but hard reset.

Seems like we are back to square 1 with most recent 2.6.35 git update.

I don't know what has been touched, but the v9 patch is no more effective.

I can only get a black screen (display is ON but no output, only a plain black surface at a possibly low resolution).

(In reply to comment #205)
> Seems like we are back to square 1 with most recent 2.6.35 git update.

I'm running 2.6.35-rc6 (juli 22) + v9 patch, which works for me. There have been a number of Intel DRM updates since though. I'll see if I encounter the same problem with latest git and then bisect.

PS: do any of you have problems with full screen video or OpenGL apps? My system crashes immediately after starten such apps (flash, doom, warzone, etc.).

(In reply to comment #206)
> (In reply to comment #205)
> > Seems like we are back to square 1 with most recent 2.6.35 git update.
>
> I'm running 2.6.35-rc6 (juli 22) + v9 patch, which works for me. There have
> been a number of Intel DRM updates since though. I'll see if I encounter the
> same problem with latest git and then bisect.
>
> PS: do any of you have problems with full screen video or OpenGL apps? My
> system crashes immediately after starten such apps (flash, doom, warzone,
> etc.).

I've been running v9 patch with the same system as since 11-5-2010, 2.6.34-rc6, with userspace updated recently, and it has been very stable. No glitches, no crashes, no errors in dmesg. So it seems that something like the v9 patch should be pushed upstream, because it seems to work. It doesn't fix all bugs, but it makes things a lot more stable.

I can run video fine in full screen in VLC. My laptop isn't fast enough to play flash full screen in a smooth way, nor to do real 3D stuff.

<off_topic>OpenSolaris is affected by this problem as well. Seems like last hope is gone. :-D</off_topic>

(In reply to comment #206)
> I'm running 2.6.35-rc6 (juli 22) + v9 patch, which works for me. There have
> been a number of Intel DRM updates since though. I'll see if I encounter the
> same problem with latest git and then bisect.

Latest git + v9 patch works for me. However, another bug (invisible cursor) was introduced, that I needed to track down and fix.

Legolas: you can do a git-bisect of the kernel source to find out which patch introduced the bug that affects your machine.

Indan: for me the v9 patch primarily gets rid of font rendering errors. Without it, my system is quite stable now modulo opengl/flash. I don't care too much for them either, but it also means compositing (and thus compiz/GNOME 3) is out of the question.

Let's hope Intel will not decide to simply stop supporting the 855 chipset, which Phoronix is hinting at.

(In reply to comment #206)
> (In reply to comment #205)
> > Seems like we are back to square 1 with most recent 2.6.35 git update.
>
> I'm running 2.6.35-rc6 (juli 22) + v9 patch, which works for me. There have
> been a number of Intel DRM updates since though. I'll see if I encounter the
> same problem with latest git and then bisect.
>
> PS: do any of you have problems with full screen video or OpenGL apps? My
> system crashes immediately after starten such apps (flash, doom, warzone,
> etc.).

Bisect just completed, the bad commit is:

592d32cc4156ee512e55c5bc052fdece215f52b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6

It modifies (strangely) the i915 driver although it's not an USB-thing. I am inspecting the diff right now but it seems like a blunder.

I'll post my findings upstream as soon as I have finished with it

(In reply to comment #210)
> (In reply to comment #206)
> > (In reply to comment #205)
> > > Seems like we are back to square 1 with most recent 2.6.35 git update.
> >
> > I'm running 2.6.35-rc6 (juli 22) + v9 patch, which works for me. There have
> > been a number of Intel DRM updates since though. I'll see if I encounter the
> > same problem with latest git and then bisect.
> >
> > PS: do any of you have problems with full screen video or OpenGL apps? My
> > system crashes immediately after starten such apps (flash, doom, warzone,
> > etc.).
>
> Bisect just completed, the bad commit is:
>
> 592d32cc4156ee512e55c5bc052fdece215f52b2 Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
>
> It modifies (strangely) the i915 driver although it's not an USB-thing. I am
> inspecting the diff right now but it seems like a blunder.
>
> I'll post my findings upstream as soon as I have finished with it

Most of the patch has already been reverted, except the change reverted by the following small patch (which I am now testing):

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 5dde80f..8608462 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -806,7 +806,6 @@ intel_dp_dpms(struct drm_encoder *encoder, int mode)
                        intel_dp_link_train(intel_encoder, dp_priv->DP, dp_priv->link_configuration);
                        if (IS_eDP(intel_encoder)) {
                                ironlake_edp_panel_on(dev);
- ironlake_edp_backlight_on(dev);
                        }
                }
        }

This is possibly the only (related) difference.

Sorry for the noise, I can't reproduce the problem anymore. It was verifiable before, but must have been a compilation glitch of some sort. FTR, this was my bisection log:

git bisect start
# good: [1afaab90e8c0317170a53967064a934a77a59c16] Input: w90p910_keypad - change platfrom driver name to 'nuc900-kpi'
git bisect good 1afaab90e8c0317170a53967064a934a77a59c16
# bad: [a63ecd835f075b21d7d5cef9580447f5fbb36263] Merge master.kernel.org:/home/rmk/linux-2.6-arm
git bisect bad a63ecd835f075b21d7d5cef9580447f5fbb36263
# good: [2aa72f612144a0a7d4b0b22ae7c122692ac6a013] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
git bisect good 2aa72f612144a0a7d4b0b22ae7c122692ac6a013
# good: [4609a179c97ae60fef173547a9bbb214359808ce] ARM: Fix csum_partial_copy_from_user()
git bisect good 4609a179c97ae60fef173547a9bbb214359808ce
# good: [4609a179c97ae60fef173547a9bbb214359808ce] ARM: Fix csum_partial_copy_from_user()
git bisect good 4609a179c97ae60fef173547a9bbb214359808ce
# bad: [592d32cc4156ee512e55c5bc052fdece215f52b2] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
git bisect bad 592d32cc4156ee512e55c5bc052fdece215f52b2
# good: [0e1cf38889110a7188999388614aef17a84d9d25] Merge branch 'bugzilla-16396' into release
git bisect good 0e1cf38889110a7188999388614aef17a84d9d25
# good: [809cd1cb80d7dffe75dc94bc94ef2aab3dadc86a] USB: Fix USB3.0 Port Speed Downgrade after port reset
git bisect good 809cd1cb80d7dffe75dc94bc94ef2aab3dadc86a
# good: [b690e96cf9e6a6cde6f0393de47bdd6317ddb5de] drm/i915: add pipe A force quirks to i915 driver
git bisect good b690e96cf9e6a6cde6f0393de47bdd6317ddb5de
# good: [4afb93b4211b3f65ebd8ea0d9018426dd9e8693e] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
git bisect good 4afb93b4211b3f65ebd8ea0d9018426dd9e8693e
# good: [c30c791c946a14a03e87819eced562ed28711961] USB: xhci: Set Mult field in endpoint context correctly.
git bisect good c30c791c946a14a03e87819eced562ed28711961
# good: [63ab71deae67b031045bb28bf8cff45180089f8f] USB: add quirk for Broadcom BT dongle
git bisect good 63ab71deae67b031045bb28bf8cff45180089f8f
# good: [2b795ea00c2bbb077a1199a4d729c8ac03a6bded] USB: musb: tusb6010: fix compile error with n8x0_defconfig
git bisect good 2b795ea00c2bbb077a1199a4d729c8ac03a6bded

Chris Halse Rogers (raof) wrote :

Ok! Ladies and Gentlemen! Chris Wilson of upstream fame has done some work to re-integrate a legacy driver for the Intel cards that have been hard done by in the GEM transition.

The xserver-xorg-video-intel packages in https://edge.launchpad.net/~raof/+archive/aubergine have a GEM-less legacy driver re-integrated which is activated when KMS is disabled - which it is by default on your card for Lucid.

It would be useful if you could remove any work-arounds you've used to get your system more stable and test the drivers from this PPA. They should hopefully end up slightly more stable than the drivers in Ubuntu 9.04 (Jaunty), which were the last set of drivers to not use the GEM memory manager, and so significantly more stable than the drivers in Ubuntu 9.10 and Ubuntu 10.04 (Karmic and Maverick).

Could you please test the drivers from this PPA and report your experiences with them?

Changed in xserver-xorg-video-intel (Ubuntu Lucid):
assignee: nobody → Chris Halse Rogers (raof)

Tried to evaluate patch... found my system more unstable after installing (Ubuntu) recommended patches
 - font colours changing fairly quickly to something unreable making browsing/text editor-log viewing/window titles unreadable
 - logging out/switching users goes to a black screen that flickers, like it's cycling through trying to change the video or it's trying to start the driver and keeps failing.
 - Didn't get to fully test GL apps (once I saw things going down hill I backed out)

Was (and now back to) running .34 kernel+X-Updates (Intel 2.11) and it was running OK (when doing nothing GL related)... stable 2D and video largely OK (not entirely smooth but pretty good)

Note: the 2.12/libdrm installed was enough to cause me the grief above... I didn't notice it until I had the 855GM patch installed, but it still persisted after I uninstalled the 855GM patch.

(In reply to comment #203)
> (In reply to comment #202)
> > (From update of attachment 37157 [details] [details])
> > See README for context and usage.
>
> Updated kernel to 2.6.35.rc6 + V9 gpu hang reported, as before, with test
> program.

Further updated to latest 2.6.35 kernel, gpu hang reported as before (both with and without V9 patch) using test program.

René Gabriëls: Did you happen to come across something regarding the mouse cursor mysteriously being on strike? Upgraded drm-intel-next kernel/v9 patch and userspace drivers (xf86-video-intel/libdrm) to git head and ran into the same problem (I assume the problem is somewhere in kernel code, as going back to 2.6.34 brings back the cursor).

Also +1 to GPU hanging, but couldn't find any flush-related messages in dmesg, so I assume I'm simply running into some other problem once again. Much to my delight I also noticed that the intel driver now seems to fall back to software rendering when the GPU is hung, so at least I can still work without everything going down the drain. Very nice work.

(In reply to comment #215)
> René Gabriëls: Did you happen to come across something regarding the mouse
> cursor mysteriously being on strike? Upgraded drm-intel-next kernel/v9 patch
> and userspace drivers (xf86-video-intel/libdrm) to git head and ran into the
> same problem (I assume the problem is somewhere in kernel code, as going back
> to 2.6.34 brings back the cursor).

Yes. I tracked down the problem: if you comment the following lines in drivers/gpu/drm/intel_display.c the problem will probably be gone (it worked for me).

/* 855 & before need to leave pipe A & dpll A up */
{ 0x3582, PCI_ANY_ID, PCI_ANY_ID, quirk_pipea_force },
{ 0x2562, PCI_ANY_ID, PCI_ANY_ID, quirk_pipea_force },

> Also +1 to GPU hanging, but couldn't find any flush-related messages in dmesg,
> so I assume I'm simply running into some other problem once again. Much to my
> delight I also noticed that the intel driver now seems to fall back to software
> rendering when the GPU is hung, so at least I can still work without everything
> going down the drain. Very nice work.

My system locks up hard when I fullscreen certain video/gl apps. Also I have render errors with progressively rendered images in Firefox. It's hard to stay optimistic about the state of the Linux desktop considering that graphics has never really worked as it should since the day I bought it (almost 7 years ago!).

(In reply to comment #215)
> René Gabriëls: Did you happen to come across something regarding the mouse
> cursor mysteriously being on strike? Upgraded drm-intel-next kernel/v9 patch
> and userspace drivers (xf86-video-intel/libdrm) to git head and ran into the
> same problem (I assume the problem is somewhere in kernel code, as going back
> to 2.6.34 brings back the cursor).
I'm experiencing this behavior, too. The problem does not occur with 2.6.35-rc4. I'll try your fix.

Fix from comment #216 worked here, too. Now "Initializing HW Cursor" is back in Xorg.0.log.

Julien Olivier (julo) wrote :

Chris,

I've just tested the PPA and I get two bugs:

 1) When trying to play a video with gstreamer and XV, I get the following error:
The program 'gstreamer-properties' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadAlloc (insufficient resources for operation)'.
  (Details: serial 60 error_code 11 request_code 132 minor_code 19)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
I got the error with both totem and gstreamer-properties.
Using non-XV output, it works fine though.

 2) I tried to activate compiz, but all I got was an Xorg crash.

Christiansen (happylinux) wrote :

@chis

I've installed the module from your PPA on a cuple of new-installed boxes, and unfortunatly this seems to makes things even worse than the drivers released within Lucid until now.

I'm able to get the boxes running installation and then running Lucid from harddisk, but ONLY by supplying the extra kerne parameter "i915.modeset=1" in Grub in both cases. Then after opgrading the installation from your PPA, I'm still not able to boot without this parameter - the boxes completly freeses just before X starts without it. Then supplying this kernel parameter while using the modules from your PPA, I got the boxes up and running, but with som odd experinces in the desktop enviroment (KDE) - like menu entries suddenly changing colours and making them nearly dissapear into the menu-background. Videoes now "playes" without imidiate X crash though, but without picture ever seen (black screen in the gecko-mediaplayer plugin). And after I've stopped the player X and closed the browser, X sometimes crashes completely.

But I have a question though, why can I make an unmodified Lucid run perfect on those boxes by using the kernel image 2.6.32-17, 2.6.33-5, 2.6.34 or now 2.6.35 from the Ubuntu Mainline archive (http://kernel.ubuntu.com/~kernel-ppa/mainline/) ?. No kernel parameter needed, compiz and video playing runs fine and even functions like Brightness control, OSD (found ind KDE) for sound and brightness among other things (not seen working with the released Lucid kernel ever) just works. And then why was KMS disabled for i855 in_last_minute_before_release, just to make Lucid uninstallable to most ordenary i855 user, and making a lot of fuss when they try to upgrade (is this good for a LTS rel.) ?.

I can't help wonder why workarounds for a bug, witch to me seems introdused/exposed by the Ubuntu specifik kernels (according to above), are hunted so bad in modules eksternal to the kernel. The real problem must reside in backporting things from the 2.6.33 kernel to the 2.6.32 kernel as both mainline kernel 2.6.32 and 2.6.33 seems working okay - and more people have in different i855 related bugreports replied that they use those kernels as a workaround, some from the backported kernel PPA (https://launchpad.net/~kernel-ppa/+archive/ppa) though.

It would be great if Maverick isn't going suffer this fate, and it have until now (Alpha 1, 2 & 2+) been running okay on one of those (i855) boxes with its default 2.6.35 kernel...

This is probably of little help in the context, but I have installed Fedora 13, kernel 2.6.33.6-147.2.4.fc13.i686 on my old let's note Panasonic CF-W2 (i855 int(h)el(l) GPU).

This distribution installed without a glitch (live CD). It is perfectly stable. None of the installation and stability problem I had encountered with Ubuntu 10.04 has affected me (on this machine + this with type of chipset).

nomnex (nomnex) wrote :

FYI

Installed Fedora 13, Kernel 2.6.33.6-147.2.4.fc13.i686 on my Let's note Panasonic CF-W2 with Int(h)el(l) i855 GPU.
It installed without a glitch (live CD) and runs like a Swiss watch.
Not to mention among other things: there is no Mono crap, default UID is 002 (vs Debian Ubuntu 022), the distro is very polished, a lot less eye candy. An overall nice and unexpected surprise.

The latest Ubuntu releases have been a disappointment and a very bad experience on my hardware (3 older notebooks). Fedora 13 on the same hardware run fine. This might be different with a recent hardware, but for those annoyed with the grub2, kernel boot settings, ppa kernel to install, etc... simply install a distribution, you might want to give the Fedora 13 live CD a try.

I have just tested patch v9 with 2.6.34.3. The patch breaks all the support for Intel graphics. How is this "solution" supposed to work? (Sounds like a bad joke.)

This is what I can see in dmesg:

[drm:i915_init] *ERROR* drm/i915 can't work without intel_agp module!

Presumably, intel_agp is present and loaded. I tried the patch with intel_agp compiled into the kernel and as a loadable module. It failed the same way in both cases. Without the patch, there is no such problem. (But X.org hangs.)

With the patch applied, the i915 driver never works. When compiled as a module, it cannot be loaded manually and says "No such device".

This is an Asus M2400N laptop with an Intel 82852/855GM (rev 02) GPU.

I wanted to switch to another operating system, but it seems that the Intel driver bug is omnipresent:

Linux: unusable (X.org freezes after a while)
FreeBSD: unusable (kernel panic)
OpenSolaris: unusable (X.org freezes during initialization)

Avoiding Intel graphic chipsets is probably the only solution. That's the most important lesson I have learned from all this. A perfectly working driver has been discontinued more than one year ago (April 2009) and no replacement is available so far.

BTW, has anyone tested a recent version of OpenBSD or NetBSD? Is there at least one reasonable (UNIX-like) system that would support old Intel chipsets?

Andrej Podzimek: I have exactly the same hardware. Previously tested 2.6.34-rc3 and v6 of this patchset, now running 2.6.35+ (from drm-intel-next) and v9, and have had no such problems. I'd suggest checking up on kernel configuration and if the patch was actually applied correctly before going on a rampage here.

km (km-mathcs) wrote :

The V9 kernel patch does in fact let me run Lucid on my Intel 855GME based laptop, which I coudn't before. However, even though xvinfo shows the X-Video Extension with Video Overlay the performance is down relative to Karmic. Both can do 720P video, but Karmic can just keep up with 1080 and Lucid cannot.

On the 1080 video on Karmic I see Xorg at 12% of the cpu with 20% idle headroom.

On the 1080 video on Lucid I see Xorg at 38% of the cpu with 0% idle and it can't keep up.

Any thoughts of why the difference? I can post Xorg logs if that would be helpful.

Created an attachment (id=37807)
v9 against latest git linus tree (manually fixed)

I installed the GTT Incoherency Patch as described at:
https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes#GTT%20Incoherency%20Patch
I have a HP ze4904us laptop with 00:02.0 VGA compatible controller [0300]: Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 02) graphics card.
I'm running Ubuntu 10.04 Lucid Lynx with kernel 2.6.32-24.
Everything works so far, I used to get a freeze-up during boot but now it works properly.
For a time I used the change in /etc/default/grub, adding "i915.modeset=1" after "quiet splash", during that time I could operate normally except if I tried to play a video on Totem movie player it would crash the computer.
Now everything seems to work now.

Thank you all for all your effort.

(In reply to comment #214)
> (In reply to comment #203)
> > (In reply to comment #202)
> > > (From update of attachment 37157 [details] [details] [details])
> > > See README for context and usage.
> >
> > Updated kernel to 2.6.35.rc6 + V9 gpu hang reported, as before, with test
> > program.
>
> Further updated to latest 2.6.35 kernel, gpu hang reported as before (both with
> and without V9 patch) using test program.

Updated kernel to 2.6.35.2 + V9 patch GPU hang reported on first run of my stress test program:

Linux Mars 2.6.35.2 #5 Mon Aug 16 06:47:43 BST 2010 i686 i686 i386 GNU/Linux

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 828897 at 828892)

(In reply to comment #224)
> (In reply to comment #214)
> > (In reply to comment #203)
> > > (In reply to comment #202)
> > > > (From update of attachment 37157 [details] [details] [details] [details])
> > > > See README for context and usage.
> > >
> > > Updated kernel to 2.6.35.rc6 + V9 gpu hang reported, as before, with test
> > > program.
> >
> > Further updated to latest 2.6.35 kernel, gpu hang reported as before (both with
> > and without V9 patch) using test program.
>
> Updated kernel to 2.6.35.2 + V9 patch GPU hang reported on first run of my
> stress test program:
>
> Linux Mars 2.6.35.2 #5 Mon Aug 16 06:47:43 BST 2010 i686 i686 i386 GNU/Linux
>
> [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
> [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting
> 828897 at 828892)

Updated xserver to 1.9.0 and kernel to 2.6.35.3 and ran my stress test over night with NO GPU hangs reported.
Some issues with Xv and mplayer but that is probably another story.

(In reply to comment #225)
> (In reply to comment #224)
> > (In reply to comment #214)
> > > (In reply to comment #203)
> > > > (In reply to comment #202)
> > > > > (From update of attachment 37157 [details] [details] [details] [details] [details])
> > > > > See README for context and usage.
> > > >
> > > > Updated kernel to 2.6.35.rc6 + V9 gpu hang reported, as before, with test
> > > > program.
> > >
> > > Further updated to latest 2.6.35 kernel, gpu hang reported as before (both with
> > > and without V9 patch) using test program.
> >
> > Updated kernel to 2.6.35.2 + V9 patch GPU hang reported on first run of my
> > stress test program:
> >
> > Linux Mars 2.6.35.2 #5 Mon Aug 16 06:47:43 BST 2010 i686 i686 i386 GNU/Linux
> >
> > [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
> > [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting
> > 828897 at 828892)
>
> Updated xserver to 1.9.0 and kernel to 2.6.35.3 and ran my stress test over
> night with NO GPU hangs reported.
> Some issues with Xv and mplayer but that is probably another story.

Sorry kernel was 2.6.35.3 + V9 patch.

I tested https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes#GTT%20Incoherency%20Patch on my laptop with:
00:02.0 VGA compatible controller [0300]: Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 02)

While it allowed the system to boot, and KMS / 3D acceleration were working fine, it had problems with XV. E.g. playing a video with totem/vlc or testing XV with gstreamer-properties resulted in a crash:
$ gstreamer-properties
gstreamer-properties-Message: Skipping unavailable plugin 'artsdsink'
gstreamer-properties-Message: Skipping unavailable plugin 'esdsink'
gstreamer-properties-Message: Skipping unavailable plugin 'glimagesink'
gstreamer-properties-Message: Skipping unavailable plugin 'sdlvideosink'
gstreamer-properties-Message: Skipping unavailable plugin 'v4lmjpegsrc'
gstreamer-properties-Message: Skipping unavailable plugin 'qcamsrc'
gstreamer-properties-Message: Skipping unavailable plugin 'esdmon'
The program 'gstreamer-properties' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadAlloc (insufficient resources for operation)'.
  (Details: serial 60 error_code 11 request_code 132 minor_code 19)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)

So the best workaround for me so far is to use the https://launchpad.net/~brian-rogers/+archive/experimental kernel which gives me KMS, 3D and XV with no problems.

Tested 2.6.35.3 and the V9 patch.

This time the Intel adapter *works* and the KDE desktop is usable when compositing is switched off.

With compositing switched on, freezes *do* occur as usual, but this time they do not block virtual console switching. This means that the frozen machine could be rebooted gracefully if there wasn't another (possibly related) bug (see below).

Unfortunately, something gets broken inside the kernel. An attempt to sync the file systems (and suspend/hibernate/reboot) gets stuck forever. There is a kernel process called 'flush-8:0' that consumes 100% of CPU time. Existing sessions remain usable, but no new sessions can be established. The Magic SysRq is the only solution here.

Created an attachment (id=38135)
Failed chipset flush backtrace

A backtrace from dmesg. Occurs with the V9 patch and 2.6.35.3.

Bryan S (bryanschuman) wrote :

I have to say I was highly disappointed in Ubuntu's handling of this error as well. I'm using 9.10 on an older Centrino M 1300 as well for a basic file server, and Lucid was a disaster. I understand the issues involved quite clearly, and understand it's not an easy fix. It would be nice to be able to abandon this issue entirely. However, most of these chipsets are stuck inside laptops, and the reality is that people (myself included) turn to Linux releases like Ubuntu to have modern OS support on these older machines that are still out there running quite well. I for one hate to throw out old hardware when it functions perfectly well for what I need.

Here's the real rub: Lucid is the ONLY Linux release from this time frame that will not run OOB on my 852 setup. That's just nuts to me. It may be an upstream issue, but other distros have dealt with it effectively enough to have a running GUI from the get go. I realize that even WinXP had issues related to this chipset, but even getting to a base GUI is better than nothing for Average Joe. Blank Screen Syndrome is doom incarnate. Period. We can deal with suspend issues and software crashes. But Average Joe needs to get to a GUI first... or even a prompt for that matter. Yes, the other distros I've tried have their issues once they were loaded. But the point is they DID load, into GUI, OOB. Even openSUSE 11.2 (and now 11.3), which is arguably one of the most bloated Linux distros out there, loads onto my machine fine, with KDE 4.x running and useable.

Long rant cut short: Ubuntu devs NEED to look at other working distros to see what workarounds are being plastered into place on major releases like this.

(In reply to comment #229)
> Created an attachment (id=38135) [details]
> Failed chipset flush backtrace
>
> A backtrace from dmesg. Occurs with the V9 patch and 2.6.35.3.

Some extra notes on the issues mentioned above:

The "flush hang" problem (the 'flush-8:0' kernel process taking up all the CPU time) occurs no matter if compositing is off, no matter if intensive disk operations take place and no matter if the desktop is actually used. (It can easily happen right after boot when only KDM is displayed.)

The "flush hang" problem does not occur immediately in the moment when the backtrace appears. Minutes to tens of minutes elapse between the warning and the flush-8:0 process going out of control.

I tried the Reiser4 patch alone (without the V9 patch (and without X.org, of course)) and had no issues. Everything worked even after hours of uptime. But this could have just happened by chance...

I see there's some VFS related stuff in the backtrace. If I understand it well, a scheduling clock tick is involved. Could this be a bug related to interrupt disabling and other synchronization? If the VFS data integrity is compromised, it could possibly explain some of these issues.

I use a fully preemptible kernel and a 300Hz scheduling clock. The machine is an Asus M2400N (uniprocessor Pentium M).

What should I try next? Any suggestions?

(In reply to comment #187)
> http://glasen-hardt.de/?page_id=707

I used to have crashes in less than 2 minutes under Ubuntu karmic and lucid when I simply ran firefox or even a gnome-term.
I applied fix-i8xx-gtt-cache-coherency-v9-2.6.35.1.patch to Ubuntu maverick kernel 2.6.35-18.24 from http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-maverick.git (the patch needed adaptation, but 'patch' did the magic alone).

I have no freeze anymore. Thanks for the patch!
Please put it in the main trunk for maverick, from other test reports it might not be perfect, but it is certainly better than the current status.
I run 2xserver-xorg-video-intel :2.9.1-3ubuntu5 from lucid. And KMS is activated through the boot command i915.modeset=1

markling (markling) wrote :

I was unable to execute these commands.

The last generated an error:

couldn't find package linux-image-2.6.34-v9patch-generic

I tried the patch after being told that it might make it possible for me to upgrade to Ubuntu v10. I'm still unable to upgrade to Ubuntu v10:

http://ubuntuforums.org/showthread.php?p=9774440#post9774440

The first command in your list is erroneous (apt-add). I'm sure this is plain to see for most people, but it does not fill a feeble linux user like myself with confidence.

description: updated
Brian Rogers (brian-rogers) wrote :

Sorry, the instructions at the top of this bug were outdated and referenced an older version of the patched kernel. I fixed that now. Here's the command to install the newest kernel in my PPA:

sudo apt-get install linux-image-2.6.35-v9patch1-generic

If you're still on 9.04 these instructions won't work because it will be looking in the 9.04 version of the graphics-fixes repository, which is empty. You can override that by going to System -> Administration -> Software Sources and finding my PPA, then editing it and changing the distribution to 'lucid'. Then after updating the repositories again, the kernel will be available, and you can run the command to install it.

Then if you upgrade to 10.04, make sure when it asks to remove obsolete/unsupported packages that it is not trying to remove that kernel. If the kernel is listed, have it skip removing the packages. You will now have 10.04 + a patched kernel, which hopefully will work right.

Daniel, could you be so kind and give us an update on this bug? From what I can see you created a decent patch which solves or enhances the situation for most of 855gm users but it seems that this patch did not make it upstream, is it correct? I also read on other approaches elsewhere: 1) dumping 855gm as unsupported, 2) reverting to user-mode setting... Thank You.

> --- Comment #232 from Michal Nowak <email address hidden> 2010-08-30 02:38:12 PDT ---
> Daniel, could you be so kind and give us an update on this bug? From what I can
> see you created a decent patch which solves or enhances the situation for most
> of 855gm users but it seems that this patch did not make it upstream, is it
> correct? I also read on other approaches elsewhere: 1) dumping 855gm as
> unsupported, 2) reverting to user-mode setting... Thank You.

Ok, the long overdue status report: I haven't upstreamed the patch for a
few reasons:
- It's an extremely ugly approach, involving way too much duct-tape. Now
  if it would actually reliably work, but that's not the case.
- It has (under certain circumstances) rather severe performance
  implications (mostly because the eviction code is not clever enough).
Hence why I'm not satisfied and of the opinion that upstreaming might
cause more harm than good. Different story for distros, though.

I have a few ideas as how to amend this, but that requires a complete
rewrite of the gtt code. I've finally found time to start hacking on this,
see

http://cgit.freedesktop.org/~danvet/drm/log/?h=intel_gtt_rework

Don't try this on an i8xx, no cache coherency stuff in there (yet). I'll
give updates as soon as there is stuff to try out.

On other approaches for the short/medium term, as I seem them (take this
with a grain of salt, I'm just doing this for fun and leisure and I'm not
an Intel employee):

- Keep the old ums stuff around. Not supported by intel (and I don't think
  this will ever happen). You're basically on your own.
- Keep this patch as a band-aid. Thanksfully all the nice people here have
  been awesome with forward-porting and helping each another out, so this
  basically maintains itself ;)
- Chris Wilson's shadowfb branch. See

http://cgit.freedesktop.org/~ickle/xf86-video-intel/log/?h=shadow

  This won't give opengl, tough. But that looks like a good approach
  until the i8xx cache coherency nightmare is fixed for real. And it has
  the change of being merged to master.
- Burn your i855 on a pyre ;)

-Daniel

Brian Rogers (brian-rogers) wrote :

Developer Chris Wilson has published a branch of xserver-xorg-video-intel which accesses the graphics card differently and avoids the kinds of operations that cause problems with the old chipsets.

I created a PPA for it: https://launchpad.net/~brian-rogers/+archive/intel-shadow

This branch is based off a new enough version of the driver that it requires Xorg 1.8, while Lucid has version 1.7.6. Therefore this PPA depends on the xorg-edgers PPA. To use it, you need to add both:

sudo add-apt-repository ppa:xorg-edgers
sudo add-apt-repository ppa:brian-rogers/intel-shadow

You will also need to edit your xorg.conf file to enable shadow buffer mode, since it is not enabled by default:

sudo gedit /etc/X11/xorg.conf

If you do not already have a xorg.conf file, you will be editing a blank document and you can put the following in it:

Section "Device"
    Identifier "GPU"
    Option "Shadow" "True"
EndSection

If you already have xorg.conf file with contents, find the "Device" section and insert the 'Option "Shadow" "True"' line like above. Or if there is no existing "Device" section, copy the one above to the end of the file.

I welcome reports of how this works and how it compares to the other solutions.

Created an attachment (id=38324)
v9.1: v9 patch updated for 2.6.36-rc3

Good news Daniel, keep up the good work!

In the mean time, I've ported the v9 patch to Linus's most recent git-tree (2.6.36-rc3 atm). Some Sandy Bridge patches interfered with the old patch, so I merged them. I hope the result is correct (it works for me).

fossfreedom (fossfreedom) wrote :

Just a few observations with the new ppa.

The framerate has dropped dramatically. I can no longer view full-screen flash video. Compiz no longer works - I've had to resort to using metacity compositing. On boot, plymouth is low resolution, with the old Karmic double black-graphics flash when moving from plymouth to the GDM.

Having said all that, graphics seem rock-solid without any graphical glitches and panel refresh issues with the 2.6.35-rc6 "ppa:brian-rogers/graphics-fixes" kernel.

I tried adding i915.modeset=1 to the grub boot option. However, whilst this fixed the low resolution plymouth, the GDM was never displayed - just a black-screen. I tried with both i915.modeset=1 and without the "shadow" = "true" in xorg.conf. This resulted in the infamous GLib-WARNING **: getpwuid_r(): failed due to unknown user id (0) issue.

 I also tried without i915.modeset=1 and without "shadow"="true". To be honest, the only difference that I observed was that the framerate with full screen graphics was slightly faster, but still unwatchable.

Tested using: VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)

On a separate issue - before testing the new ppa, I cant use the 2.6.35-v9patch1-generic "ppa:brian-rogers/graphics-fixes" kernel since the mouse cursor is not displayed. On this issue, this new kernel for me has regressed since the rc-6 kernel that had the incoherency fix.

Thanks Daniel, for both the work on this issue and the status report, both really appreciated.

Fedora 13 with 2.6.34 kernel works 100% in terms of stability for me. (But the performance is... ~50 fps in glxgears -- I can live with that for sure.)

Brian Rogers (brian-rogers) wrote :

Yeah, the shadow buffer mode does disable OpenGL. I forgot to mention it. It's a sort of safe mode that's a more functional alternative to the vesa driver.

I'm refreshing the kernels in my graphics-fixes PPA, since 2.6.35-v9patch1 is getting kind of stale. There have been relevant fixes in the 2.6.35.x line of stable releases, and this new build will include them. One fix is cursor-related, so hopefully that will make your cursor visible again. Another fix involves a black-screen bug.

The freshened kernel that also includes the v9 patch will be called linux-image-2.6.35-v9patch+19-generic. It should be up later today.

description: updated
fossfreedom (fossfreedom) wrote :

Hi Brian - I've ppa-purge'd your new ppa. Also I've tried your new kernel. 'fraid it didnt fix my cursor issue. I'll stick to your rc6 kernel for the moment. Whilst it may be slightly glitchy, I consider your rc6 kernel infinitely better than the "shadow" fix. If the shadow fix is the only solution for maverick, I'm going to stick with lucid for the foreseeable future. Thanks.

Brian Rogers (brian-rogers) wrote :

I found a bug report for the invisible cursor issue. It's bug 614176 here and upstream here: https://bugs.freedesktop.org/show_bug.cgi?id=29413

When there's a fix, I'll put up a new kernel including it. I'll be providing maverick kernels soon as well, so once the cursor bug is fixed, maverick should become usable for you.

(In reply to comment #233)
> I have a few ideas as how to amend this, but that requires a complete
> rewrite of the gtt code. I've finally found time to start hacking on this,
> see
>
> http://cgit.freedesktop.org/~danvet/drm/log/?h=intel_gtt_rework

What I like about that is that it removes a lot of lines of code.
I hope you can get it small and simple enough that it becomes very
stable.

Tell us when you want some testing and we'll provide.

> Don't try this on an i8xx, no cache coherency stuff in there (yet). I'll
> give updates as soon as there is stuff to try out.

Any new ideas how to get cache coherency, or how to avoid the need for it?

> On other approaches for the short/medium term, as I seem them (take this
> with a grain of salt, I'm just doing this for fun and leisure and I'm not
> an Intel employee):
>
> - Keep the old ums stuff around. Not supported by intel (and I don't think
> this will ever happen). You're basically on your own.

Chris Wilson reintegrated the legacy UMS and put it in his "legacy" branch,
to make it easier for people who are on their own to work on it together.

> - Keep this patch as a band-aid. Thanksfully all the nice people here have
> been awesome with forward-porting and helping each another out, so this
> basically maintains itself ;)
> - Chris Wilson's shadowfb branch. See
>
> http://cgit.freedesktop.org/~ickle/xf86-video-intel/log/?h=shadow
>
> This won't give opengl, tough. But that looks like a good approach
> until the i8xx cache coherency nightmare is fixed for real. And it has
> the change of being merged to master.

I don't care about opengl, this thing can't do anything opengl for real anyway.
All I want is stable and fast 2D rendering. :-( Full-screen video playback is
nice to have, but not a must. I used to have DRI disabled and only the UMS 2D
intel driver enabled.

I think I'll give Chris' shadowfb branch a try, it looks very promising.
Any idea where I can give feedback about it?

> - Burn your i855 on a pyre ;)

When unplugging my mother's Ipod nano it burnt the USB part of the chip,
so almost got there. ;-) I've a Thinpad X40 and it's a wonderful machine,
but it also gives no choice as far as the i855 goes.

> -Daniel

Thank you for all your work Daniel.

I have an i855GM rev.02 and with linus git tree + v9 patch I get a lovely black screen.

What spell shall I use to get it working again?

Thanks

description: updated
Brian Rogers (brian-rogers) wrote :

David, I just uploaded a new kernel (currently building for both Lucid and Maverick) that reverts the commit that caused the invisible cursor regression. That way you or anyone else experiencing this bug isn't stuck on an RC kernel until there's a proper fix.

The new kernel is linux-image-2.6.35-ppa20+v9+cursorfix-generic.

fossfreedom (fossfreedom) wrote :

Thank you Brian - indeed, this has fixed the cursor issue. It has also fixed a few other issues such as permanent screen dimming and panel update issues. From the short time I've been playing, the kernel looks rock solid. Many thanks again.

fossfreedom (fossfreedom) wrote :

Further update Brian - I've been running the kernel continuously today. Whilst the cursor is now visible, there are occasional freezes which make navigation annoying. For the moment, I'll stick to the rc6 kernel. Hopefully there will be good news on this front from upstream soon (here's hoping).

Brian Rogers (brian-rogers) wrote :

You mean that it freezes temporarily, then resumes, like a stuttering behavior? In that case, do new messages appear in dmesg after a freeze?

Brian, I've been using your latest ppa20+v9+cursorfix-generic kernel, and it has made my system usable again, unlike the previous attempts. The cursor freeze has disappeared, and I can once again watch videos in Dragon Player and VLC, neither of which could be used at all before. Oddly, flash video in a browser has always worked well, and the new kernel has not degraded performance there. The mouse cursor does still occasionally stutter, but there are no error messages that I can discover. Similarly, video sometimes stutters, but this seems to happen only on higher-resolution sources (like HD video), but I notice this only when playing fullscreen. Lower res video plays very well, including fullscreen.

The system still sometimes freezes solid, but this behavior is greatly reduced from previous kernels, and is now at an almost tolerable level. Unfortunately, a system freeze means just that, so I have no way to determine if new error messages appeared immediately before the freeze. Freezing doesn't seem to be triggered by any particular operation, such as certain mouse movements, and it can happen just as easily when typing in a Konsole or resizing a window as when watching a fullscreen video.

fossfreedom (fossfreedom) wrote :

Brian, I can concur with the above. I had tail -f /var/log/messages in one window and glxgears in another. The reason for glxgears was that I just needed something that had continuous movement to be displayed. When the cursor appeared to momentarily freeze, glxgears also stopped. No additional messages was traced. I haven't though seen any solid freezes as seen by scott.

A ThinkPad R51 was freezing at boot after upgrading to Lucid. I used Stefan Glasenhardt's Ubuntu PPA as described here:

https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes#GTT%20Incoherency%20Patch

and it solved the problem.

(In reply to comment #238)
> A ThinkPad R51 was freezing at boot after upgrading to Lucid. I used Stefan
> Glasenhardt's Ubuntu PPA as described here:
>
> https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes#GTT%20Incoherency%20Patch
>
> and it solved the problem.

Stefan uses the same patch which is here, which is not effective for me. I am using git linus tree; I think the patch should be sent upstream as it only improves things for all use-cases I have seen so far.

@Chris Wilson: is there a cumulative patch for your UMS "legacy" work?

Hello, i tried the latest patch with 2.6.36-rc3, and it basically is working, but every ~9-10s the mouse pointer hangs, and dmesg is flooded with a lot of this messages:

[ 214.024020] [drm:intel_calculate_wm] *ERROR* Insufficient FIFO for plane, expect flickering: entries required = 43, available = 42.
[ 214.024029] [drm:intel_calculate_wm] *ERROR* Insufficient FIFO for plane, expect flickering: entries required = 43, available = 42.

my GPU:
00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02) (prog-if 00 [VGA controller])
 Subsystem: Acer Incorporated [ALI] Device 0064
 Flags: bus master, fast devsel, latency 0, IRQ 6
 Memory at e8000000 (32-bit, prefetchable) [size=128M]
 Memory at e0000000 (32-bit, non-prefetchable) [size=512K]
 I/O ports at 1800 [size=8]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: <access denied>
 Kernel driver in use: i915

> --- Comment #240 from Kristijan Vrban <email address hidden> 2010-09-12 12:57:37 PDT ---
> Hello, i tried the latest patch with 2.6.36-rc3, and it basically is working,
> but every ~9-10s the mouse pointer hangs, and dmesg is flooded with a lot of
> this messages:
>
> [ 214.024020] [drm:intel_calculate_wm] *ERROR* Insufficient FIFO for plane,
> expect flickering: entries required = 43, available = 42.
> [ 214.024029] [drm:intel_calculate_wm] *ERROR* Insufficient FIFO for plane,
> expect flickering: entries required = 43, available = 42.

Both known problems: The hang every 10s is hotplug code wasting too much
time (and hence stalling mouse updates). The warning is harmless, the code
wasn't changed at all, it just started reporting possible causes for
flicker. Both problems have patches in drm-intel/drm-intel-fixes that
should land in -stable sooner or later.

-Daniel

Brian Rogers (brian-rogers) wrote :

A real fix was posted for the invisible cursor issue, so I incorporated it into linux-image-2.6.35-ppa21+v9patch-generic (building now).

As for the periodic freezing issue, that is covered by this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=29536

The patches there don't apply cleanly to 2.6.35, so I'll have to look at them before I include them in a graphics-fixes kernel.

description: updated
fossfreedom (fossfreedom) wrote :

Thank you brian - I've tested your latest kernel and it indeed has fixed the cursor issue. On my machine, the periodic freezing issue from your last kernel is not observed. I've decided to remove the rc6 kernel and run full time on this kernel.

Question - what's the chances of the fixes you've got in this new kernel finding its way into maverick? Its a real shame for those i855 users of maverick just using vesa graphics.

Brian Rogers (brian-rogers) wrote :

The invisible cursor fix will be sent to stable and make it into Maverick that way. I don't know if it will make it in before release, though.

As for the stability fix, Daniel Vetter has said the following:

"I haven't upstreamed the patch for a few reasons:
- It's an extremely ugly approach, involving way too much duct-tape. Now if it would actually reliably work, but that's not the case.
- It has (under certain circumstances) rather severe performance implications (mostly because the eviction code is not clever enough).
Hence why I'm not satisfied and of the opinion that upstreaming might cause more harm than good. Different story for distros, though.

I have a few ideas as how to amend this, but that requires a complete rewrite of the gtt code. I've finally found time to start hacking on this [...]"

Complete comment here: https://bugs.freedesktop.org/show_bug.cgi?id=27187#c233

So it's not planned to go into 2.6.35.x, but it could be picked up individually by distros. A better patch will eventually be made for future kernels and sent upstream.

On Tue, 2010-09-14 at 00:44 +0000, Brian Rogers wrote:
> The invisible cursor fix will be sent to stable and make it into
> Maverick that way. I don't know if it will make it in before release,
> though.

Brian, do you have any idea why Fedora 13 (current kernel 2.6.34.6-54)
is not affected by this bug?

I have been wondering for some times now, why i855 notebooks were
incompatible with 9.10, 10.04, 10.10, unless doing some workaround, but
installed fine on Fedora 13 (I haven't tried any previous release).

Isn't the kernel common to all the distributions?

Brian Rogers (brian-rogers) wrote :

In Fedora's kernel package:
http://pkgs.fedoraproject.org/gitweb/?p=kernel.git;a=tree;h=refs/heads/f13/master;hb=f13/master

I see drm-intel-big-hammer.patch. That's a patch that improved stability somewhat, but didn't quite solve the problem. My testcase could still kill the system. It also causes slowdowns, which can be extreme in some cases.

nomnex (nomnex) wrote :

On Tue, 2010-09-14 at 03:56 +0000, Brian Rogers wrote:
> In Fedora's kernel package:
> http://pkgs.fedoraproject.org/gitweb/?p=kernel.git;a=tree;h=refs/heads/f13/master;hb=f13/master
>
> I see drm-intel-big-hammer.patch. That's a patch that improved stability
> somewhat, but didn't quite solve the problem. My testcase could still
> kill the system. It also causes slowdowns, which can be extreme in some
> cases.
>

Thank you for the explanation and the link.
nomnex

Changed in xserver-xorg-video-intel:
importance: Unknown → Medium

Daniel, are you interested in people testing your gtt rework branch on non-i8xx hardware right now, or is it in an early enough state that the feedback would just be noise?

I could test that branch on my i965 laptop if you'd find that helpful.

https://wiki.ubuntu.com/X/Bugs/Lucidi8xxFreezes Workaround A alone worked for Dell Latitude D505 with (using $ lspci | grep Intel)
00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
Workaround A is..
To turn KMS back on, run this command in a Terminal window and reboot:
echo options i915 modeset=1 | sudo tee /etc/modprobe.d/i915-kms.conf
sudo update-initramfs -u
Previously Lucid with kernel 2.6.31-22 generic and all previous releases (Karmic and earlier) worked fine but Lucid kernel 2.6.32-24 generic on startup showed some graphics initially then went to a blank screen.
The problem was first seen after upgrading from Karmic to Lucid Xubuntu. 22Sep2010

Hello,
I have tested your fix and your driver for i855GM' chipset on Lucid.
I can turn on compiz, but if I want to watch a video on VLC, compiz crash. But if I don't activate the hardware acceleration (sorry I'm french so I don't know if it's the good word), it's works. But If I activate the hardware acceleration, VLC crash after 30mn.

So I test Ubuntu 10.10, because I have read that it works better on. But the hardware acceleration isn't activated so when I want to watch a video I have some lags... I have activated it manually but it isn't stable, so I have tested your driver and your fix, but I couldn't install the fix because of the kernel 2.6.35.

Can you adapt your fix to the kernel 2.6.35 for Maverick ?
Or do you think at another solution ?

Chris Halse Rogers (raof) wrote :

I'm unassigning myself from this bug; I've got the needed feedback for Maverick, and we've gone with the safe option of fbdev.

I'll leave this bug open; there's still a reasonable chance we can get a proper fix, and apparently some i8xx documentation has just been released.

Changed in xserver-xorg-video-intel (Ubuntu Lucid):
assignee: Chris Halse Rogers (raof) → nobody
ilanrab (ilanrab) wrote :

This issue has been floating around, in Ubuntu, for too long. Why wasn't it escalated?

When my upgrade from 8.04LTS to 10.04.1LTS failed so miserably (Wireless problems and locking GPU problem) I moved myself to Puppy Linux. The Puppy distro does not lock up on me when I use graphics and video.

Version 5.1 of Puppy Linux is based on the Ubuntu Lucid binaries (called Lupu). I am having difficulty understanding why Ubuntu Lucid is failing so miserably. What do PuppyLinux's drivers have that Lucid does not? I get a freeze, on average, every 15 minutes, every single day that I try to use Ubuntu Lucid, nowadays.

Maybe someone should contact Barry Kauler (of Puppy Linux) and ask some questions so that this issue GOES AWAY FAST.

Is this a Kernel version issue? What's going on?

 - Ilan -

gmud (gmud) wrote :

>When my upgrade from 8.04LTS to 10.04.1LTS failed so miserably (Wireless problems and locking GPU problem)
>I moved myself to Puppy Linux. The Puppy distro does not lock up on me when I use graphics and video.

Oh, thats interesting. Maybe it has something to do with the upgrade process (e.g. old xorg.conf, upstart or something). Would you mind testing a fresh install with Lucid (in case you did a fresh install of Puppy Linux)?

ilanrab (ilanrab) wrote :

Thank you for the suggestion, gmud. That's a good idea. Unfortunately I do not have the resources to install a fresh copy of Ubuntu on top of the existing Ubuntu partition that I am using. The Puppy Linux distro is running fully off an 8GB pendrive.

Earlier today I installed the Glasen 0.7.7 driver fix for the i855, in my Dell Latitude D400. Both the stable version, of the driver, and the experimental (exp) version acted the same way:
1. I could now run Videos (like, with VLC) with no problem. Until this fix, my GPU would lock up immediately, the second I invoked any video. The fix allows video.
2. Now, the GPU locks up when I run certain graphics only. This is an improvement.
I can make it fail consistently when I run Google Chrome with the facebook game: "Mafia Wars". The page that causes the lockup is "NewYork" stage> "properties" page. After less than a minute of pressing various buttons on this page, the GPU locks up. My xorg.conf is the standard one.

Good work, Stephan. Thank you for the fix. Good progress.

denisb (denis-bonnenfant) wrote :

The last version almost works on my Dell i855 notebook. But there are still occasional freezes when using SolidWorks/wine1.3.5 (big openGL app). If it can help, I can post logs
On the other side, under WinXP, Solidworks is using software OpenGL on this hardware...

Good work ! It's the first time I see such a big app working under wine on intel platform ! 3D is really faster than expected, so solving remaining stability issues should be a real step ahead.

Geir Ove Myhr (gomyhr) wrote :

There is a (yet another) new kernel patch in the upstream bug report that has the potential to fix this bug once and for all:
https://bugs.freedesktop.org/show_bug.cgi?id=27187#c281

I don't have the resources (full hard drive and other limitations) to build a Ubuntu kernel package with this patch. If anybody has the possibility to put a test kernel in a PPA (or some other place), that would be nice.

@Geir Ove Myhr
How difficult is it to build an Ubuntu kernel package?
Because I still have the machine that had this problem, but I am not really using it anymore. So it is free for testing. But I fear I already upgraded it to Maverick. (Is there also a way to build the older kernel in Maverick?)

Geir Ove Myhr (gomyhr) wrote :

I have previously used the instructions at https://wiki.ubuntu.com/KernelTeam/GitKernelBuild . Instead of getting the vanilla upstream kernel in #2, I would use the drm-intel-next head from git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel.git. Then you would need to apply the patch before going on. If you have a faster newer computer, I would recommend building on that one, because compiling a kernel on the computers that have an 855GM chipset tends to be slow. You will need to use the 32-bit version of Ubuntu on that machine, though.

It is probably better to use Maverick than Lucid for testing this. Natty even better. With Natty, you should be able to steal most of the kernel configuration from the drm-intel-next mainline build in step 4 (see https://wiki.ubuntu.com/Kernel/MainlineBuilds).

I haven't done this in quite a while, so I don't remember all there is to it.

Brian Rogers (brian-rogers) wrote :

I've made a kernel with the latest fix available in this PPA:
https://launchpad.net/~brian-rogers/+archive/graphics-fixes-testing

If it gets good feedback, I'll copy it to my regular graphics-fixes PPA.

This kernel is based on Natty's 2.6.37 kernel, has the changes in drm-intel-next applied, and the patch at https://bugs.freedesktop.org/show_bug.cgi?id=27187#c291 is added. I've produced builds for Lucid, Maverick, and Natty. It has a cache coherency checker, so you'll see periodic messages in dmesg reporting on the number of flushes and whether any problems are detected.

From Daniel Vetter's patch description:
> Poke HIC bit + wbinv + cache coherency checker
>
> Chris Wilson's latest patch with my cache coherency checker added. Spills the
> number of chipset flushes regurlarly into the dmesg and bails loudly if one
> fails.
>
> Tested-by lines (like for the previous patch attempts by me) highly welcome.

Feedback about the patch can be sent directly to the upstream bug report at
https://bugs.freedesktop.org/show_bug.cgi?id=27187

If you have issues relating to installing/booting this kernel, report them here.

fossfreedom (fossfreedom) wrote :