Bug #311895 “[i945] spontaneous black screen (major pipe-A under...” : Bugs : linux package : Ubuntu

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2008-12-28:

#1

Created an attachment (id=21511)
intel_reg_dump when working

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2008-12-28:

#2

Created an attachment (id=21512)
intel_reg_dump after going black

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2008-12-28:

#3

Created an attachment (id=21513)
Xorg.0.log

X.org log which shows the plethora of pipe-A underruns.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2008-12-28:

#4

When this happens, I get the following kernel messages:

Dec 28 10:25:07 tick kernel: [ 5559.025081] mtrr: no MTRR for d0000000,10000000 found
Dec 28 10:25:08 tick kernel: [ 5560.478087] apm: BIOS not found.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2008-12-28:

#5

$ diff -U 0 intel_regs.works.txt intel_regs.black.txt
--- intel_regs.works.txt 2008-12-28 10:33:22.000000000 +0100
+++ intel_regs.black.txt 2008-12-28 10:24:50.000000000 +0100
@@ -57 +57 @@
-(II): PIPEASTAT: 0x00000203 (status: VSYNC_INT_STATUS VBLANK_INT_STATUS OREG_UPDATE_STATUS)
+(II): PIPEASTAT: 0x80000000 (status: FIFO_UNDERRUN)
@@ -132 +132 @@
-(II): FBC_CONTROL: 0x43e847e2
+(II): FBC_CONTROL: 0xc3e847e2
@@ -134 +134 @@
-(II): FBC_STATUS: 0x20000000
+(II): FBC_STATUS: 0x60000000
@@ -138 +138 @@
-(II): MI_MODE: 0x00000200
+(II): MI_MODE: 0x00000000

Revision history for this message

Martin Pitt (pitti) wrote on 2008-12-28:

#6

Binary package hint: xserver-xorg-video-intel

I am using a Dell Latitude D430 with an Intel GM945.

When I use my external 19" TFT (through DVI, 1280x1024), I occasionally get a black screen. This is not triggered by anything obvious, it just happens spontaneously. It is impossible to recover from this with restarting X, only a reboot cures it.

Further investigation shows that this is caused by a long series of

(EE) intel(0): underrun on pipe A!

errors (some 10.000 lines in the log). I get short underruns pretty often, which results in the screen flickering for a split second, but when the long series happens, the screen stays black forever.

A comparison of the intel_reg_dump output (-: works, +: black screen) confirms this as well:

-(II): PIPEASTAT: 0x00000203 (status: VSYNC_INT_STATUS VBLANK_INT_STATUS OREG_UPDATE_STATUS)
+(II): PIPEASTAT: 0x80000000 (status: FIFO_UNDERRUN)

I have not observed this behaviour when I use the laptop undocked, with the internal screen (1280x800).

This is current Jaunty with -intel 2:2.5.1-1ubuntu7. It also happened in Intrepid, but back then I didn't know about intel_reg_dumper.

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
Package: xserver-xorg-video-intel 2:2.5.1-1ubuntu7
ProcEnviron:
PATH: custom, user
LANG=de_DE.UTF-8
SHELL=/bin/bash
ProcVersion: Linux version 2.6.28-4-generic (buildd@palmer) (gcc version 4.3.3 20081217 (prerelease) (Ubuntu 4.3.2-2ubuntu9) ) #5-Ubuntu SMP Fri Dec 26 22:48:51 UTC 2008

SourcePackage: xserver-xorg-video-intel
Uname: Linux 2.6.28-4-generic i686
xkbcomp:

Revision history for this message

Martin Pitt (pitti) wrote on 2008-12-28:

#7

Dependencies.txt Edit (4.0 KiB, text/plain; charset="utf-8")
LsMod.txt Edit (1.8 KiB, text/plain; charset="utf-8")
LsPci.txt Edit (10.2 KiB, text/plain; charset="utf-8")
XorgLog.txt Edit (50.2 KiB, text/plain; charset="utf-8")
XorgLogOld.txt Edit (420.0 KiB, text/plain; charset="utf-8")
Xrandr.txt Edit (5.9 KiB, text/plain; charset="utf-8")
setxkbmap.txt Edit (285 bytes, text/plain; charset="utf-8")
xdpyinfo.txt Edit (2.8 KiB, text/plain; charset="utf-8")

Revision history for this message

Martin Pitt (pitti) wrote on 2008-12-28:

#8

intel_regs.works.txt Edit (7.7 KiB, text/plain)

This is the register dump when it works.

The pipe-A underruns can be seen in XorgLogOld.txt.

dmesg output when this happens:
Dec 28 10:25:07 tick kernel: [ 5559.025081] mtrr: no MTRR for d0000000,10000000 found
Dec 28 10:25:08 tick kernel: [ 5560.478087] apm: BIOS not found.

Revision history for this message

Martin Pitt (pitti) wrote on 2008-12-28:

#9

intel_regs.black.txt Edit (7.7 KiB, text/plain)

This is the register dump after it goes black.

Bug Watch Updater (bug-watch-updater) on 2008-12-28

Changed in xserver-xorg-video-intel:
status:	Unknown → Confirmed

Revision history for this message

reini (rrumberger) wrote on 2008-12-31:

#10

I have what appears the same problem with Hardy on my Dell Inspiron 1525 with an Intel GM965 using the internal LCD screen (1280x800).
My hardy installation is up-to-date and my xserver-xorg-video-intel version is 2:2.2.1-1ubuntu13.8.
I have apport installed & running, but there seem to be no files in /var/crash/.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-01-06:

#11

I think this might be a DUP, can you try the patch in 18651?

*** This bug has been marked as a duplicate of bug 18651 ***

Bryce Harrington (bryce) on 2009-01-06

Changed in xserver-xorg-video-intel:
status:	New → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-01-07:

#12

I'm building/installing -intel with this patch applied. I'll report back in a day or two, since the underruns only start to happen after a couple of hours (presumably when I'm doing particular things with my computer, but I'm not able to pinpoint what triggers it).

Thanks!

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-01-07:

#13

I found out that starting kvm and doing some other window juggling triggers the quick underrun (i. e. the flickering, not the total blackout) pretty reliably.

With the proposed patch applied, I still get underruns, though. I'll let it run for a couple of days to see whether I get any black screen still.

Revision history for this message

Martin Pitt (pitti) wrote on 2009-01-07:

#14

I'm building an -intel package with the patch in https://bugs.freedesktop.org/show_bug.cgi?id=18651 applied, and will let it run for two days. (The flicker underruns only seem to happen after some hours, and the complete blackout only happens every other day).

Revision history for this message

Martin Pitt (pitti) wrote on 2009-01-07:

#15

Hm, if only I could actually build it..

../doltcompile gcc -DHAVE_CONFIG_H -I. -I../../src -I.. -I/usr/include/xorg -I/usr/include/pixman-1 -I/usr/include/drm -Wall -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -fno-strict-aliasing -I/usr/include/xorg -I/usr/include/pixman-1 -I/usr/include/drm -I/usr/include/X11/dri -I../../uxa -DI830_XV -DI830_USE_XAA -DI830_USE_EXA -Wall -g -O2 -MT i830_dri.lo -MD -MP -MF .deps/i830_dri.Tpo -c -o i830_dri.lo ../../src/i830_dri.c
../../src/i830_dri.c: In function 'I830DRISwapContext':
../../src/i830_dri.c:1162: error: 'drm_i915_flip_t' undeclared (first use in this function)
../../src/i830_dri.c:1162: error: (Each undeclared identifier is reported only once
../../src/i830_dri.c:1162: error: for each function it appears in.)
../../src/i830_dri.c:1162: error: expected ';' before 'flip'
../../src/i830_dri.c:1168: error: 'flip' undeclared (first use in this function)

I cannot find "drm_i915_flip_t" anywhere. I also downgraded linux-libc-dev to -3.4 (which the current jaunty .deb was built with). NB that this is not due to the patch I was attaching, that was already existin code. I grepped /usr/include/ and the source tree, nothing.

Revision history for this message

Martin Pitt (pitti) wrote on 2009-01-07:

#16

I worked around this by explicitly adding

typedef struct drm_i915_flip {
int pipes;
} drm_i915_flip_t;

(Copied from http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=ba55ff15df974197bebd871e28bb96d817ae41c7)

Revision history for this message

Martin Pitt (pitti) wrote on 2009-01-07:

#17

I found out that starting kvm and doing some other window juggling triggers the quick underrun (i. e. the flickering, not the total blackout) pretty reliably. With the upstream patch applied, I still get underruns, though. I'll let it run for a couple of days to see whether I get any black screen still.

Revision history for this message

Søren Holm (sgh) wrote on 2009-01-07:

#18

log of unerrun on pipe A Edit (103.5 KiB, application/x-trash)

It's happenens for me 4 times a day.

Revision history for this message

Wolfgang (wt-lists) wrote on 2009-01-11:

#19

I have the same problem on an Apple Mac Mini (Intel i945), screen flickers from time to time and even "freezes" occasionally. I can switch to console with CTRL-ALT-F1 but to get X-Windows back alive system must be rebooted.

Xlog says:
...
(EE) intel(0): underrun on pipe A!
(EE) intel(0): underrun on pipe A!
(EE) intel(0): underrun on pipe A!
(EE) intel(0): underrun on pipe A!
...

The problem exists since update to 8.10, 8.04 worked stable...

Revision history for this message

Raghu (raghua1111+list) wrote on 2009-01-16:

#20

I installed fresh Ubuntu 8.10 on EEE Box 202 (Atom, 945GME) couple of days back and seeing the exact 'blank screen' problem with "underrun on pipe A!" errors. Xorg log looks pretty much the same.

Bug Watch Updater (bug-watch-updater) on 2009-01-20

Changed in xserver-xorg-video-intel:
status:	Confirmed → Invalid

Bryce Harrington (bryce) on 2009-01-23

description:

updated

Revision history for this message

Raghu (raghua1111+list) wrote on 2009-01-23:

#21

Martin,

Can we have access to the package you built? I will try it out on my Eee Box.

This is biggest problem I see with my ubuntu currently. I am used to not having to reboot for months.
Thanks.

Revision history for this message

Raghu (raghua1111+list) wrote on 2009-01-23:

#22

Martin,

I just saw your comment over at https://bugs.freedesktop.org/show_bug.cgi?id=18651 that said the patch does not fix the problem.

Are there less optimal work around for this? Will I be able to use a generic driver that could be more stable? I don't need 3D performance.

thanks.
Raghu.

Revision history for this message

rhtme52 (reinhard-enders) wrote on 2009-01-28:

#23

Happened to me reliably with my Acer Aspire One 110L, when watching movies with mplayer (usually after 5 - 15 minutes). In my case switching off framebuffer compression worked (See also https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/193419):

Section "Device"
Identifier "Configured Video Device"
Driver "intel"
Option "FramebufferCompression" "off"
EndSection

Bryce Harrington (bryce) on 2009-01-28

Changed in xserver-xorg-video-intel:
status:	Invalid → Unknown
importance:	Undecided → High
status:	Confirmed → Triaged

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-01-28:

#24

Looks separate from 18651 unfortunately.

Revision history for this message

Raghu (raghua1111+list) wrote on 2009-01-28:

#25

Thanks for the info. I am trying out the config now. will see how it works for multiple days. I was using VESA driver as the work around for this issue.

Bryce, thanks for increasing priority. This affects a lot of users.

Bug Watch Updater (bug-watch-updater) on 2009-01-28

Changed in xserver-xorg-video-intel:
status:	Unknown → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-01-29:

#26

I have used the suggestion in https://bugs.launchpad.net/bugs/311895
since yesterday (Option "FramebufferCompression" "off"), and that *seems* to do
the trick. I want to test it a little longer before fully confirming,
especially since the most recent X.org stopped logging the underruns in
Xorg.0.log, and I got too used to the occasional screen flicker, so I might
well have ignored them.

But my screen went black (or brown, or white) irrecoverably after a day or two
without that option. If that doesn't happen any more either, I'll report back here.

Revision history for this message

Raghu (raghua1111+list) wrote on 2009-01-29:

#27

Looks like "Assigned To" field should change from 'freedesktop-bugs #18651' to 'freedesktop-bugs #19304' since 19304 is not a duplicate of 18651 anymore.

Martin Pitt (pitti) on 2009-01-29

Changed in xserver-xorg-video-intel:
status:	Confirmed → Unknown

Bug Watch Updater (bug-watch-updater) on 2009-01-29

Changed in xserver-xorg-video-intel:
status:	Unknown → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-01-30:

#28

Two days have passed with Option "FramebufferCompression" "off", and I didn't notice a single flickering, nor encounter another black screen. Thus I'm fairly sure that this is at least a very good (if not perfect) workaround for the problem, and might also point to the root cause.

Just reiterating that I never ever observed those problems with the internal LVDS (1280x800), just with the external TFT (1280x1024).

</facts>

<wild and unqualified speculations>
May it be possible that compressing the framebuffer just occasionally takes too long, once it gets bigger than a critical treshold (which lies somewhere in between 1280x800 and 1280x1024 pixels)? Any idea why it would sometimes not recover from this at all any more, perhaps if it takes too long, and it cannot 'catch up' any more?

Thanks!

Revision history for this message

In freedesktop.org Bugzilla #19304, Dave-justdave (dave-justdave) wrote on 2009-01-30:

#29

That mirrors my experience, too.

I'm on a Mac Mini with a GM945 video... using the TV-out at 1024x768 for several months I never had any issues, and when I changed to using DVI->HDMI output on it at 1280x720, I started getting the solid color screen really frequently. Disabling the FramebufferCompression about three weeks ago did make the machine usable again. I've run the thing for 5 or 6 hours per day on a daily basis (I have it hooked up to a TV using MythTV on it), and although I have still gotten that solid color screen since then, it's only happened once in all that time (as opposed to every 5 or 10 minutes before). I was getting that periodic flicker before, too, and that's infrequent enough that I don't notice it anymore if it's still happening at all.

Revision history for this message

Jean-Paul Calderone (exarkun) wrote on 2009-02-03:

#30

I can add another data point for the "FramebufferCompression" "Off" fix. I have a Mac Mini. 8.10 is the first release to come even close to being able to drive a display from it. I've been experiencing screen flickering and black screens as described in the initial report as well. mplayer triggers this behavior more quickly than anything else I've found, usually blacking the screen and requiring a reboot within 5 or 10 minutes of video playing. Other media players trigger the behavior too, but less frequently. Everything causes flickering to happen now and then - including non-media playing, like browsing around the filesystem with nautilus. I also get frequent pipe underrun reports in my X log file.

After adding `Option "FramebufferCompression" "off"´ to my xorg.conf (as described by rhtme52 above), the system has been stable for several days. The flickering is gone and the display hasn't required a reboot to unblack it since.

Revision history for this message

Raghu (raghua1111+list) wrote on 2009-02-10:

#31

Another happy customer with the "FramebufferCompression" "off". No flicker and no need to reboot for last two weeks.

I am willing to try any proposed patches.

Revision history for this message

reini (rrumberger) wrote on 2009-02-13:

#32

I had to add the "FramebufferCompression" "off" to the monitor section (see below). While the issue sometimes only occurred when watching two videos in parallel and making some overly (KDE kicker tooltip) appear over them, it doesn't occur anymore even when watching 4 videos + overlay. IOW, issue closed for me. (Kubuntu 8.04.2 BTW)

Section "Monitor"
        Identifier "Configured Monitor"
        Option "FramebufferCompression" "off"
        Option "Ignore" "false"
EndSection

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-02-13:

#33

In #18491 there's a patch (https://bugs.freedesktop.org/attachment.cgi?id=22319) to mess with the FIFO watermark values that might help. But more than that, it includes a patch to dump the FIFO watermark regs to the intel_reg_dumper tool. Can someone apply it and capture a reg dump both before and after starting X on their machine with the patch applied?

The spontaneous black screen is almost surely caused by a series of pipe underruns. That generally happens if our memory arbitration settings are off (so a given pipe can't get its pixels due to some other pipe hogging the memory interface) or the FIFO watermark regs being incorrect (we fetch a new chunk of pixels too late and end up missing our window of time to feed them to the pipe).

The framebuffer compression hardware periodically compresses the framebuffer into a private section of memory (the compressed buffer), temporarily increasing memory activity; it could be that we're not accounting for that in the FIFO settings, so the screen goes black after the first compression pass (which is usually after about 15s iirc).

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-14:

#34

I enabled FB compression again and applied the patch in bug 18491. It had quite a dramatic regressive effect: the screen now flickers at each hard disk access, mouse movement, or key press, and only stands still if absolutely nothing happens.

I captured the registers right after boot, then after X and gdm started, and finally after GNOME was fully running.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-14:

#35

Created an attachment (id=22944)
regs with patch from #18491: right after boot

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-14:

#36

Created an attachment (id=22945)
regs with patch from #18491: after X and gdm start

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-14:

#37

Created an attachment (id=22946)
regs with patch from #18491: GNOME fully running

That's the watermark change you asked for:

--- boot-nox.regs 2009-02-14 15:49:55.000000000 +0100
+++ boot-gdm.regs 2009-02-14 15:50:15.000000000 +0100
@@ -31,2 +31,2 @@
-(II): FWATER_BLC: 0x03060106
-(II): FWATER_BLC2: 0x00000306
+(II): FWATER_BLC: 0x033f033f
+(II): FWATER_BLC2: 0x0000033f

It doesn't change any further after starting GNOME (which does xrandr stuff, etc.) Other registers do change during GNOME startup, though.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-02-14:

#38

Heh, I think I had the watermark regs backwards... I'll have to spin a new patch, but you could try changing the watermark value in the patch in the meantime:

watermark = (3 << 8) | 0x3f

should instead be something like

watermark = (3 << 8) | 1

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-14:

#39

I did that change, much better. :-) It doesn't flicker so badly any more, and the watermark reg diff is now

$ diff -U 0 boot-nox.regs boot-gnome.regs |grep WATER
-(II): FWATER_BLC: 0x03060106
-(II): FWATER_BLC2: 0x00000306
+(II): FWATER_BLC: 0x03010301
+(II): FWATER_BLC2: 0x00000301

I have to run now, so I can't do the full test which triggers the original underrun; will report back tomorrow or Monday.

Thank you so far and have a nice weekend!

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-14:

#40

Created an attachment (id=22956)
regs with fixed patch from #18491: right after boot

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-14:

#41

Created an attachment (id=22957)
regs with fixed patch from #18491: GNOME fully running

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-14:

#42

OK, I now threw kvm, glxinfo, and totem at it, all running at the same time, and not a single flicker. No watermark difference in the registers.

Great work, thanks!

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-02-14:

#43

Great, thanks a lot for testing. Dave does this change also help your situation?

Revision history for this message

In freedesktop.org Bugzilla #19304, Raghu (raghua1111+list) wrote on 2009-02-14:

#44

Thanks for the new the patch.

Martin,

Do you have a pointer to how to build the new driver with the patch for 8.10 (Interpid)?

Or if someone could post a like the binary driver, that would be great! I can just replace the original intel driver with this.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-15:

#45

Raghu,

I ported the patch to the intrepid version (2.4.1), will attach in a bit.

To make testing easier for everybody, I also uploaded it to my personal package archive, so that you can grab the ready-built .deb from there, or just add the new apt source:

https://launchpad.net/~pitti/+archive/ppa

Be warned, though, I didn't test it. In the (unlikely) event that it totally screws up your system, please boot with the "text" kernel command line option in grub, log in at Ctrl+Alt+F1, and do

sudo apt-get install xserver-xorg-video-intel/intrepid-updates

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-15:

#46

Created an attachment (id=22959)
patch ported to 2.4.1

Revision history for this message

Martin Pitt (pitti) wrote on 2009-02-15:

#47

I ported the patch proposed upstream to the intrepid version (2.4.1).

To make testing easier for everybody, I also uploaded it to my personal package archive, so that you can grab the ready-built .deb from there, or just add the new apt source:

https://launchpad.net/~pitti/+archive/ppa

Be warned, though, I didn't test it. In the (unlikely) event that it totally screws up your system, please boot with the "text" kernel command line option in grub, log in at Ctrl+Alt+F1, and do

sudo apt-get install xserver-xorg-video-intel/intrepid-updates

Revision history for this message

Martin Pitt (pitti) wrote on 2009-02-15:

#48

Oh, just to be clear: I did test the patch quite extensively in Jaunty (where it works really well for me on my GMA945), just not in Intrepid.

Also, when you test this, please remove all "FramebufferCompression", "Ignore" and other options from xorg.conf which you added to workaround this bug.

Changed in xserver-xorg-video-intel:
assignee:	nobody → pitti

Revision history for this message

In freedesktop.org Bugzilla #19304, Raghu (raghua1111+list) wrote on 2009-02-15:

#49

Thanks Martin, for making real easy to install!

I am currently running xserver-xorg-video-intel-2.4.1-1ubuntu10.4~test1 from your repository. so far so good. Using the original xorg.conf that does not have any options set for the device.

Revision history for this message

In freedesktop.org Bugzilla #19304, Dave-justdave (dave-justdave) wrote on 2009-02-15:

#50

the deb will make it easy for me to test, too, thanks! It'll probably be tomorrow before I can get to it though.

Revision history for this message

Raghu (raghua1111+list) wrote on 2009-02-16:

#51

Thanks Martin, for making real easy to install!

I am currently running xserver-xorg-video-intel-2.4.1-1ubuntu10.4~test1 from
your repository. so far so good. Using the original xorg.conf that does not
have any options set for the device.

Revision history for this message

Raghu (raghua1111+list) wrote on 2009-02-16:

#52

Update with 0.4~test1 : I am seeing screen flickering but haven't seen blank screen yet (one day).

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-16:

#53

Created an attachment (id=23007)
regs with fixed patch from #18491: after hibernate

Ugh, after a hibernate/resume cycle the flickering is back. I have never seen it any more when not using hibernate (didn't test suspend, it's currently broken).

The watermark registers did not change, though.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-02-17:

#54

Hm, so the regs look ok after resume but you see flickering? That sounds bad; it means there may be another reg we've got to write to get things working again.

Revision history for this message

In freedesktop.org Bugzilla #19304, Dave-justdave (dave-justdave) wrote on 2009-02-17:

#55

(In reply to comment #25)
> or just add the new apt source:
>
> https://launchpad.net/~pitti/+archive/ppa

How do I add this source? If I to that URL and follow the directions given on that page, I should add this to my sources.list:

deb http://ppa.launchpad.net/pitti/ppa/ubuntu intrepid main

But after doing that, I get a 404 error trying to retrieve Packages.gz

I just downloaded the package by hand for now, will let you know how it goes.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-17:

#56

@Dave: Weird, that should be the correct URL. I just tried it here, and it works.

Revision history for this message

In freedesktop.org Bugzilla #19304, Raghu (raghua1111+list) wrote on 2009-02-18:

#57

Dave, just in case : make sure sure you don't have 's' after 'http'.

'deb http://ppa.launchpad.net/pitti/ppa/ubuntu intrepid main' worked for me as well.

The Syaptic Package manager complains about either lack of or mismatch of signatures, but repo works.

Revision history for this message

In freedesktop.org Bugzilla #19304, Dave-justdave (dave-justdave) wrote on 2009-02-18:

#58

Huh. I tried again just now and it worked. Maybe I just caught it at a bad time during a repo refresh or something before.

Anyhow, I installed the deb manually yesterday, and I've been running the thing most of the day, with the workaround hacks removed from xorg.conf (so it just has the default "detect everything" settings again). No screen blankouts yet. I did just get a flicker, though, just before typing this (I quit out of MythTV so I could run Synaptic and try the repo source add again, the flicker happened right after MythTV quit). The flicker did have the corresponding:

(EE) intel(0): underrun on pipe A!

in Xorg.0.log

It's the one and only occurrence of that error in the log since Xorg was restarted this morning. I'm not sure how to check the registers that were mentioned.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-02-18:

#59

If it was just one flicker when you exited MythTV that might be normal, if a mode set or pipe on/off sequence occurred.

Anyway sounds like we have at least this part of the problem narrowed down; I'll put together a patch for the 2.7 release.

Revision history for this message

In freedesktop.org Bugzilla #19304, Dave-justdave (dave-justdave) wrote on 2009-02-21:

#60

Been running this for a few days now, and no further issues so far. Looks like it fixes it for me.

Revision history for this message

In freedesktop.org Bugzilla #19304, Dave-justdave (dave-justdave) wrote on 2009-02-21:

#61

Oh, and I haven't tested Martin's situation from comment 29... I've never had reason to suspend or hibernate this thing.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-23:

#62

Jesse, do you think that http://bugs.freedesktop.org/attachment.cgi?id=22319 plus the "0x3f -> 1" fix is good for uploading? I'd like to get it some more testing exposure, but I'm not sure whether this was just a test patch and needs to be redone for public consupmtion?

Thank you!

Revision history for this message

Martin Pitt (pitti) wrote on 2009-02-25:

#63

I'll apply the upstream patch for the time being. It has been successfully tested by several people on both intrepid and jaunty.

Changed in xserver-xorg-video-intel:
status:	Triaged → Fix Committed

Revision history for this message

Martin Pitt (pitti) wrote on 2009-02-25:

#64

Raghu: see upstream bug report; the patch is not perfect yet, and isolated short flickering can still happen; but with this patch it should be very seldom only, and hopefully the "black screen" issue will be fixed completely. There might still be a similar problem after suspend/hibernate, though.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2009-02-25:

#65

This bug was fixed in the package xserver-xorg-video-intel - 2:2.6.1-1ubuntu3

---------------
xserver-xorg-video-intel (2:2.6.1-1ubuntu3) jaunty; urgency=low

  * Add 02_i830-fifo-watermark-conservative.patch: Avoid pipe underruns on
    high graphics activity, which caused flicker and sometimes complete screen
    corruption. (LP: #311895, fd.o #19304)

-- Martin Pitt <email address hidden> Wed, 25 Feb 2009 08:26:35 +0100

Changed in xserver-xorg-video-intel:
status:	Fix Committed → Fix Released

Revision history for this message

Søren Holm (sgh) wrote on 2009-02-26:

#66

I don't think it is fixed. I got a black screen today with a default configuration. No framebuffercompression or anything. The logs did not say anything though. No errors or anything.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-02-27:

#67

Hm, for a few days now I get the screen flicker immediately, even after a clean boot and no suspend, etc. Odd, I was running Jaunty with this patch for over a week without a single glitch; apparently something else changed in the system now (newer X.org, kernel updates, etc.)

$ sudo intel_reg_dumper |grep WATER
(II): FWATER_BLC: 0x03010301
(II): FWATER_BLC2: 0x00000301

Revision history for this message

Martin Pitt (pitti) wrote on 2009-02-27:

#68

Indeed, starting from yesterday or so I now immediately get screen flicker, worse than before. Odd, I was running Jaunty with this patch for over a week without a single glitch; apparently something else changed in the system now :(

Changed in xserver-xorg-video-intel:
status:	Fix Released → Confirmed

Revision history for this message

Martin Pitt (pitti) wrote on 2009-02-27:

#69

Original sponsoring was done, throwing the ball back to upstream (I replied there as well and will continue to follow up).

Changed in xserver-xorg-video-intel:
assignee:	pitti → nobody

Revision history for this message

Raghu (raghua1111+list) wrote on 2009-02-27:

#70

Martin,

I have seen flicker as well from the start of using your package. This is an Eee Box and there was no suspend or hibernation involved. The flicker used to happen pretty much like before the patch, at random times. But haven't seen a blank screen though.

Revision history for this message

Søren Holm (sgh) wrote on 2009-03-04:

#71

dmesg says this after blackout

[ 737.891632] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
[ 744.702207] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0

Revision history for this message

Kavok (cjc-brimd) wrote on 2009-03-08:

#72

xorglog.txt.zip Edit (7.6 KiB, application/zip)

I am having the same issue I believe, but I am not sure. It is occuring on a Asus EEE Box which is the desktop version of the EEE line.

I have a thread here:
http://ubuntuforums.org/showthread.php?t=1084060&highlight=intel+underrun+pipe+A!

Attached is the Xorg log.

Revision history for this message

Søren Holm (sgh) wrote on 2009-03-25:

#73

The last couple of days I have experienced more of these "ERROR* trying to get vblank count for disabled pipe 0". Allways ending up in a dead screen.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-03-30:

#74

There's a patch in 18651 that might also help (they're more proper at least, not like the hack I posted here). Can you give try?

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-03-31:

#75

I applied the latest patch (http://bugs.freedesktop.org/attachment.cgi?id=24375) to 2.6.3 (what we have in Ubuntu Jaunty).

I threw everything at it which I could find: running glxgears under a load of 6.6 (having an rsync and jigdo in the background), playing a fullscreen video while booting a live system under kvm, suspend/resume, everything works. I haven't seen a single flickering so far. This even fixes the flickering of glxgears when running under EXA (we didn't switch to UXA in Ubuntu yet, since it still causes too many crashes and problems).

I will run with this patch for a while, to see the long-term behaviour. Before, I got the flickering/hang after running for some hours, or some time after suspend (see bug 20520). Perhaps bug 20520 is even just another consequence of this one, although it happened even with FramebufferCompression off.

I'll report back in a couple of days with the long-term results.

Kudos, Jesse! You made my day!

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-03-31:

#76

Just got the hang after suspend again (bug 20520), so that is independent after all.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-03-31:

#77

(In reply to comment #42)
> Just got the hang after suspend again (bug 20520), so that is independent after
> all.

Hm that could be one of the other suspend/resume related bugs we have open at the moment. It could also be due to some missing bits I posted a patch for in 18702. Care to try that out?

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-03-31:

#78

A first quick shot at trying that patch left me with a ton of rejections (tried to apply against linux 2.6.28.8, with some ubuntu modifications). I'll try again later, but this might take a while.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-03-31:

#79

As for this bug, I have used this latest patch for several hours now, with no problem whatsoever. Thanks muchly!

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-03-31:

#80

Great, thanks a lot for testing, Martin. I'll push as soon as I get some review on intel-gfx.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-03-31:

#81

*** Bug 18651 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-01:

#82

Created an attachment (id=24431)
registers with latest version 5 patch

Argh, this is haunting me. With the latest patch applied, it was working perfectly yesterday, but just now the flickering is back. No suspend involved.

I attach the current registered, do you see anything wrong there?

Thanks!

Revision history for this message

Martin Pitt (pitti) wrote on 2009-04-01:

#83

Upstream proposed a new patch which is more robust and dynamic. I have tested it since yesterday with great success. This patch is currently undergoing review upstream.

I have uploaded the new driver to my PPA (https://launchpad.net/~pitti/+archive), testing appreciated.

Changed in xserver-xorg-video-intel (Ubuntu):
assignee:	nobody → pitti
status:	Confirmed → In Progress

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-01:

#84

Created an attachment (id=24438)
Xorg.log with patch 5 and flicker

I'm attaching current Xorg log as well, since it has a couple of messages like

(II) intel(0): Setting FIFO watermarks - A: 1, B: 37, C: 2, SR 127

Revision history for this message

Martin Pitt (pitti) wrote on 2009-04-01:

#85

I updated the package in my PPA to include the fixes from 0ubuntu4. Please test and give some feedback here. It works wonderfully for me, but I'd like to become a bit more confident about it.

If this goes well, I'd like to upload this by the end of the week.

Revision history for this message

Manuel Siggen (manuel-siggen) wrote on 2009-04-01:

#86

I tested the proposed driver (xserver-xorg-video-intel_2.6.3-0ubuntu4pitti1_i386.deb) on a i855 chipset with DRI=false, and didn't notice any degradation. I wasn't impacted by this bug in first place, though. In a side note, I noticed some visual artifacts (patches of background popping up in front) after a suspend-to-ram but I already had those artifacts with the previous version.

Revision history for this message

Martin Pitt (pitti) wrote on 2009-04-01:

#87

good test result from Adilson Oliveira as well.

Revision history for this message

Martin Pitt (pitti) wrote on 2009-04-01:

#88

Argh, and just now I got the flickering/blackout again. This is haunting me.. :-(

Revision history for this message

Martin Pitt (pitti) wrote on 2009-04-01:

#89

Continuing discussion/investigation upstream.

Changed in xserver-xorg-video-intel (Ubuntu):
assignee:	pitti → nobody
status:	In Progress → Triaged

Revision history for this message

David Mandala (davidm) wrote on 2009-04-01:

#90

Putting this patch into my T60 laptop has almost made it unusable, the screen flickers constantly, anytime I do anything that updates the screen in any way. I'm going to try and revert to the earlier drivers.

davidm@dm-lappy:~$ lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02)
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)
15:00.0 CardBus bridge: Texas Instruments PCI1510 PC card Cardbus Controller
davidm@dm-lappy:~$

Putting this patch into my T60 laptop has almost made it unusable, the screen flickers constantly, anytime I do anything that updates the screen in any way.  I'm going to try and revert to the earlier drivers.

davidm@dm-lappy:~$ lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02)
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)
15:00.0 CardBus bridge: Texas Instruments PCI1510 PC card Cardbus Controller
davidm@dm-lappy:~$

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-02:

#91

Just as an early warning, this patch (same .deb that I am running) completely broke matters for a colleague of mine, also on 945GM/GMS. I asked for registers and Xorg.log, will forward as soon as I get it.

Revision history for this message

Martin Pitt (pitti) wrote on 2009-04-02:

#92

David, thanks for testing. It is very important to spot such regressions, thus I'd like to forward this upstream. Can you please install this once again, including the corresponding -intel-dbg package, reproduce the "constant flicker" situation, and do

sudo intel_reg_dumper > /tmp/intel-regs.txt

then please attach /tmp/intel-regs.txt and /var/log/Xorg.0.log here? Thank you!

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-02:

#93

Created an attachment (id=24465)
debug logs for monitor/VT state changes

Ah, I know what changed. After a clean boot, with the latest (version 5) patch applied, everything works perfectly for me, the trouble starts when I switch off my monitor, and switch it on again (as I usually do during lunch break).

So I looked at dmesg, Xorg log, and registers in three states.

1. After clean boot, and GNOME login. See boot.* files.

2. Switch to VT1 and back. dmesg says

[drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 1

Registers are wildly different, see diff -U0 boot.registers.txt vtsiwtch.registers.txt. After waiting for one minute, the registers change further to

-(II): FBC_STATUS: 0x20000000
+(II): FBC_STATUS: 0x60000000

Xorg log gets a some 30 lines of info lines, and some interesting warnings:

$ diff -u boot.Xorg.log vtswitch.Xorg.log | grep -v '(II)'
+(WW) intel(0): ESR is 0x00000001, instruction error
+(WW) intel(0): Existing errors found in hardware state.
+(WW) intel(0): plane B needs more FIFO entries

3. Switch off monitor, and turn it on again. Now I get the occasional flickering.

dmesg gets some USB disconnect/connect messages (monitor has USB hub with some stuff), nothing X related.

registers do not change at all.

No new Xorg log entries.

Now, having typed this, it seems to me that switching off the monitor doesn't change much, and that most likely the VT switch is to blame; I will do another test to affirm that I get flickering after VT switch already (I'll report back if that is not the case). Your patch seems to work by and large, but seems to not take VT switches into account correctly.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-02:

#94

Ah I can believe that VT switches might cause trouble... The diff actually doesn't look too interesting though, mainly LVDS is off. However this part definitely does look weird:
(II) intel(0): FIFO entries - A: 25, B: 0
(II) intel(0): FIFO size - A: 28, B: 59
(WW) intel(0): plane B needs more FIFO entries

That FIFO entries line indicates that pipe B is off. Maybe I don't handle that case correctly...

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-02:

#95

Created an attachment (id=24471)
add save/restore of watermark regs across VT switch

Not restoring these across VT switch might be bad... This one leaves the programming alone but takes care to save/restore the regs across VT switch.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-03:

#96

Created an attachment (id=24493)
debug logs for monitor/VT state changes for patch v6

similar debug logs for patch v6 (https://bugs.freedesktop.org/attachment.cgi?id=24471).

Unfortunately the flickering still happens. :-(

Yesterday night I conducted another experiment. I just switched the monitor off and on, without any VT switch. After that I already got the flickering. However, there was no change at all in registers, Xorg.log, or dmesg. (This was with the previous patch version 5, though).

Many thanks for your efforts,

Martin

P.S. If ssh to my machine helps you in any way, I can provide that. I'll just be away next week on the LinuxFoundation collaboration summit in San Francisco, and can spend little to no time testing this.

Revision history for this message

In freedesktop.org Bugzilla #19304, Daniel J Blueman (danielblueman) wrote on 2009-04-03:

#97

Download full text (7.3 KiB)

The first time only I installed xserver-xorg-vido-intel 2:2.6.3-0ubuntu4pitti1 (AMD64) from Martin Pitt's PPA and restarted GDM, I was met with a blank screen, though it was clear GDM was waiting for input. Hardware is GM45 rev 7, dual-channel mem with only pipe B connected to internal LVDS to a 1440x900 6-bit TN (aargh) panel.

Perhaps this relates to when the pipe watermarks are reprogrammed and thus data is discarded; we see pipe B's LBLC_EVENT_STATUS flag set and EDID detection was not performed. There were no kernel/syslog messages, and the Xorg.log difference against normal operation is:

$ diff -u /var/log/Xorg.0.log.working /var/log/Xorg.0.log.blank
@@ -197,10 +197,7 @@
(WW) intel(0): Register 0x61200 (PP_STATUS) changed from 0xc0000008 to 0xd000000a
(WW) intel(0): PP_STATUS before: on, ready, sequencing idle
(WW) intel(0): PP_STATUS after: on, ready, sequencing on
-(WW) intel(0): Register 0x71024 (PIPEBSTAT) changed from 0x80000206 to 0x80000246
-(WW) intel(0): PIPEBSTAT before: status: FIFO_UNDERRUN VSYNC_INT_STATUS SVBLANK_INT_STATUS VBLANK_INT_STATUS
-(WW) intel(0): PIPEBSTAT after: status: FIFO_UNDERRUN VSYNC_INT_STATUS LBLC_EVENT_STATUS SVBLANK_INT_STATUS VBLANK_INT_STATUS
-(WW) intel(0): Register 0x321b (FBC_FENCE_OFF) changed from 0x59018500 to 0x2a03a200
+(WW) intel(0): Register 0x321b (FBC_FENCE_OFF) changed from 0x59008500 to 0x2a03a200
(==) Depth 24 pixmap format is 32 bpp
(II) do I need RAC? No, I don't.
(II) resource ranges after preInit:
@@ -432,93 +429,17 @@
(II) AT Translated Set 2 keyboard: Device reopened after 1 attempts.
(II) Video Bus: Device reopened after 1 attempts.
(II) Macintosh mouse button emulation: Device reopened after 1 attempts.
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0)...

The first time only I installed xserver-xorg-vido-intel 2:2.6.3-0ubuntu4pitti1 (AMD64) from Martin Pitt's PPA and restarted GDM, I was met with a blank screen, though it was clear GDM was waiting for input. Hardware is GM45 rev 7, dual-channel mem with only pipe B connected to internal LVDS to a 1440x900 6-bit TN (aargh) panel.

Perhaps this relates to when the pipe watermarks are reprogrammed and thus data is discarded; we see pipe B's LBLC_EVENT_STATUS flag set and EDID detection was not performed. There were no kernel/syslog messages, and the Xorg.log difference against normal operation is:

$ diff -u /var/log/Xorg.0.log.working /var/log/Xorg.0.log.blank
@@ -197,10 +197,7 @@
 (WW) intel(0): Register 0x61200 (PP_STATUS) changed from 0xc0000008 to 0xd000000a
 (WW) intel(0): PP_STATUS before: on, ready, sequencing idle
 (WW) intel(0): PP_STATUS after: on, ready, sequencing on
-(WW) intel(0): Register 0x71024 (PIPEBSTAT) changed from 0x80000206 to 0x80000246
-(WW) intel(0): PIPEBSTAT before: status: FIFO_UNDERRUN VSYNC_INT_STATUS SVBLANK_INT_STATUS VBLANK_INT_STATUS
-(WW) intel(0): PIPEBSTAT after: status: FIFO_UNDERRUN VSYNC_INT_STATUS LBLC_EVENT_STATUS SVBLANK_INT_STATUS VBLANK_INT_STATUS
-(WW) intel(0): Register 0x321b (FBC_FENCE_OFF) changed from 0x59018500 to 0x2a03a200
+(WW) intel(0): Register 0x321b (FBC_FENCE_OFF) changed from 0x59008500 to 0x2a03a200
 (==) Depth 24 pixmap format is 32 bpp
 (II) do I need RAC?  No, I don't.
 (II) resource ranges after preInit:
@@ -432,93 +429,17 @@
 (II) AT Translated Set 2 keyboard: Device reopened after 1 attempts.
 (II) Video Bus: Device reopened after 1 attempts.
 (II) Macintosh mouse button emulation: Device reopened after 1 attempts.
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-exaCopyDirty: Pending damage region empty!
-(II) PM Event received: Capability Changed
-I830PMEvent: Capability change
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) PM Event received: Capability Changed
-I830PMEvent: Capability change
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) PM Event received: Capability Changed
-I830PMEvent: Capability change
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) PM Event received: Capability Changed
-I830PMEvent: Capability change
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) PM Event received: Capability Changed
-I830PMEvent: Capability change
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) PM Event received: Capability Changed
-I830PMEvent: Capability change
-(II) intel(0): EDID vendor "LEN", prod id 16435
-(II) intel(0): Using hsync ranges from config file
-(II) intel(0): Using vrefresh ranges from config file
-(II) intel(0): Printing DDC gathered Modelines:
-(II) intel(0): Modeline "1440x900"x0.0  101.60  1440 1488 1520 1792  900 903 909 945 -hsync -vsync (56.7 kHz)
-(II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 909 926 -hsync -vsync (46.3 kHz)
-(II) intel(0): EDID vendor "LEN", prod id 16435

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-03:

#98

Daniel,

please note that 4pitti1 has the "v 5" patch. I just uploaded my current test package with the latest "v6" patch (http://bugs.freedesktop.org/attachment.cgi?id=24471) to my PPA, as 4pitti2.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-07:

#99

Daniel, looks like you hit the LVDS detect bug with the version Martin packaged.

Martin, the fact that you see flickering after just a monitor power cycle is strange. If the FIFO regs weren't changed the flicker you see shouldn't be caused by underruns... I'm putting together another patch which will report that so we can check.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-07:

#100

Created an attachment (id=24651)
Add underrun debugging

This one should log any underruns that occur so we can figure out if the flicker you're seeing is some other problem.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-07:

#101

Thanks, Jesse. I applied the patch to the current Ubuntu 9.04 package and uploaded it to my personal package archive again, so that people on 9.04 can test it.

I can't test it myself until next Tuesday, since this week I'm in San Francisco on the LF summit. I never got any flickering with the internal LVDS, and I don't have an external screen here.

Revision history for this message

In freedesktop.org Bugzilla #19304, Daniel J Blueman (danielblueman) wrote on 2009-04-07:

#102

Rebuilding the xserver-xorg-video-intel package with the updated patch, I was unable to trigger underruns with my GM45 rev 7 hardware, rebooting a some times for initial state, separately restarting GDM in a loop ~50 times, and switching VTs, testing both EXA and UXA paths.

Since the runtime overhead is minimal, I'd say it's worth carrying this patch forward to help understand the failure mechanism later.

Daniel

Revision history for this message

In freedesktop.org Bugzilla #19304, Daniel J Blueman (danielblueman) wrote on 2009-04-07:

#103

The X-server was still solid after ~10 suspend-resume cycles (running in EXA) also, though I do see the Error Status Register getting bit 0 set - presumably expected. See attached Xorg.0.log.

Revision history for this message

In freedesktop.org Bugzilla #19304, Daniel J Blueman (danielblueman) wrote on 2009-04-07:

#104

Created an attachment (id=24653)
GM45 (rev7) patched intel-2.6.3 log on Thinkpad T400, showing ESR:0x1

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-07:

#105

Daniel, glad to hear things are stable for you. But my patch shouldn't affect your configuration (GM45 has automatic FIFO sizing & pipe arbitration). Looks like your LVDS detection bug is fixed though, which is good.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-15:

#106

Created an attachment (id=24816)
debug logs for patch v8

I applied the latest patch (v8) to my PPA against the current Jaunty package (2:2.6.3-0ubuntu9pitti1). Again I captured logs right after a clean X.org startup (startup.*), right after a monitor off/on cycle (not included, since no change), and a while after a VT switch.

I didn't see any underruns happen after switching off the monitor. Perhaps the effect during lunch break is that the monitor gets disabled by the screensaver (DPMS off), which acts more like a VT switch?

The underruns started some minutes after a real VT switch, and due to the new patch I get them logged now:

(EE) intel(0): underrun on pipe A!
(EE) intel(0): underrun on pipe A!
(EE) intel(0): underrun on pipe A!

The attached logs have just one instance of those, but the underruns become more frequent now. After the first underrun happened, I got this change:

-(II): FBC_STATUS: 0x20000000
+(II): FBC_STATUS: 0x60000000

(vtswitch2.regs)

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-16:

#107

The pipe underruns also start to happen massively after I used kvm (even after kvm was stopped long ago).

Revision history for this message

In freedesktop.org Bugzilla #19304, Daniel J Blueman (danielblueman) wrote on 2009-04-16:

#108

Perhaps this is a symptom of high (IRQ-safe) spinlock hold-times, preventing the pipe being reset/refilled within the needed time window? (unless I'm misunderstanding the mechanism)

This may be key to reproducing the issue, and may be worse on kernels without preemption and lock-break points (ie server/throughput/compute optimised kernels).

Using latencytop or kernel ftrace to see what magnitude of lock hold time is needed to cause the pipe underruns may be useful to developers trying to reproduce this later...

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-16:

#109

No the pipe is filled automatically by hardware (the GPU just does fetches from RAM based on the FIFO watermark values), so either the watermarks are incorrect or the FIFO sizes are wrong or both.

Revision history for this message

Peter Altherr (peter-altherr-deactivatedaccount) wrote on 2009-04-18:

#110

hi guys, just want to confirm the bug on my msi wind u100. random short flickering 2 or 3 times a day and then a blank screen. i work with an external 19" tft connected by a 15pin dsub vga.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-23:

#111

Oh wow I definitely see this problem now on my 945 test machine with the patch applied...

Ah looks like my latency constant wasn't so pessimistic after all. This one works for me though; hope it fixes your problem too (though I'm not sure why a VT switch would trigger it).

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-23:

#112

Created an attachment (id=25075)
Increase latency constant

Made the latency 5us instead of 3us, which seems to be closer to the truth on my Acer platform at least.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-23:

#113

Created an attachment (id=25080)
debug logs for patch v9

I tried the v9 patch (also uploaded to PPA again). Unfortunately this is now worse.

At gdm, when both the internal LVDS and the external TFT are active @1024x768 (no xrandr in gdm yet), I get a constant flickering about twice per second. This cannot even be worked around any more with disabling fb compression.

After logging in, when the internal LVDS switches off, behaviour is identical to the v8 patch: occasional flickering starts after a vt switch (or some hours of usage).

I attached the logs again, after a clean boot (start.*), a vt switch (vtswitch.*), after the first overflow a few minutes later (overflow.*), and after several more overflows occurred (overflow-more.*).

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-24:

#114

Created an attachment (id=25116)
Fix watermark sanity check

Arg, maybe I'll get this right one day:

(II) intel(0): FIFO entries - A: 42, B: 0
(II) intel(0): FIFO size - A: 28, B: 59
(II) intel(0): Setting FIFO watermarks - A: -16, B: 1, C: 2, SR 5

That negative A value would certainly cause trouble.

Looks like my sanity check was looking at the wrong variable; I should have been checking the watermark value against <= 0, not the entries value (that should always be positive).

Interestingly, the new calculation indicates that you're driving pipe A pretty hard relative to it's FIFO RAM allocation, but with just a single pipe enabled it should be safe. If not, we could modify both DSPARB and the FIFO watermarks to increase the chances of a given config working, or enable pixel clock doubling perhaps.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-24:

#115

Sigh, looking again at your older logs I doubt that last patch will fix the issue:

(II) intel(0): FIFO size - A: 28, B: 59
(II) intel(0): Setting FIFO watermarks - A: 1, B: 1, C: 2, SR 22

So we're already setting the watermark as aggressively as possible, so the pipe should be continuously fetching data for display. In your config that's still not enough though, since we drain it faster than we fill it.

Another thing that might help is to reduce the pixel clock on the mode you're sending to your external monitor; you can use the cvt or gtf tools to create a mode with reduced blanking or a lower refresh.

I think I'll need to cook one up to modify DSPARB as well (like we do in the current driver).

Bug Watch Updater (bug-watch-updater) on 2009-04-24

Changed in xserver-xorg-video-intel:
status:	Confirmed → In Progress

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-04-26:

#116

Ah, so you are saying that something after a VT switch or after putting a high load on the graphics card introduces a fill/drain backlog which the card can't ever catch up with any more?

So the disabling of the fb compression helps because dropping that extra work causes the GPU to have enough time again to re-fill the pipes?

NB that I have used that very same laptop to drive a 1920x1200 external screen without problems, but then again I hadn't done it for very long (just about an hour for testing the new monitor for my wife's computer).

So if this is principally not fixable due to hw speed limitations, maybe it would be possible to automatically disable fb compression once the chip hits pipe underruns?

Thanks for your efforts!

Martin

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-04-27:

#117

Yeah avoiding compression when the FIFO watermark is low is probably a good idea. But we may also be able to increase the amount of FIFO RAM allocated to the large display.

Revision history for this message

Bryce Harrington (bryce) wrote on 2009-04-27:

#118

109_i830-fifo-watermark-conservative.patch Edit (3.1 KiB, text/x-diff)

This was the patch we included in the jaunty release for this bug, but looking at the upstream bug it seems to be outdated (and allegedly may be causing some X freeze issues.)

Revision history for this message

Bryce Harrington (bryce) wrote on 2009-04-27:

#119

With further testing I've cleared my suspicions that this patch causes the freeze bug I'm seeing.

That said, given that the patch didn't fix this issue and is not present in the upstream tree, it seems to have higher than average risk; I'd like to see it eventually replaced with a fully sanitized patch at some point.

Revision history for this message

Martin Pitt (pitti) wrote on 2009-04-28: Re: [Bug 311895] Re: [i945] spontaneous black screen (major pipe-A underrun)

#120

Bryce Harrington [2009-04-27 23:25 -0000]:
> That said, given that the patch didn't fix this issue and is not present
> in the upstream tree, it seems to have higher than average risk; I'd
> like to see it eventually replaced with a fully sanitized patch at some
> point.

Me too, and in early karmic we should rip it out again. I have tested
a ton of patches from Jesse since then (see upstream bug),
unfortunately we still don't have a perfect one yet, or even one which
is significantly better than the one we have currently applied.

Bryce Harrington (bryce) on 2009-05-06

tags:

added: black-screen

Revision history for this message

In freedesktop.org Bugzilla #19304, Bryce Harrington (bryce) wrote on 2009-05-08:

#121

Btw, we're carrying an old patch from this bug in the Ubuntu release, one from Feb 2009, patches/109_i830-fifo-watermark-conservative.patch.

It sounds like that patch has grown obsolete, or at least doesn't solve this bug 100%, however I'm going to leave it in place when we move to 2.7.0. If we should be doing something differently, please ping me so we can get a better fix in.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-05-08:

#122

Bryce, I think you should drop the patch. It's insufficient, might cause regressions on other platforms, and doesn't help at all any more at least on my computer.

Revision history for this message

linuxwarrior (linuxwarrior) wrote on 2009-05-13:

#123

I happen to have a MSI Wind U120 and I don't observe this behaviour in the internal LCD. On my EEE BOX I am bothered with the flickers and black screen. May be it is just a problem when using the external monitor ??
I will try mi MSI Wind with an external monitor and post back.

Revision history for this message

In freedesktop.org Bugzilla #19304, Bryce Harrington (bryce) wrote on 2009-05-16:

#124

Thanks Martin, I've removed the patch from Karmic.

Revision history for this message

Alistair Buxton (a-j-buxton) wrote on 2009-05-26:

#125

Hi, I seem to be getting this bug on my Acer Asipre One. When using an external VGA monitor with the internal LVDS display disabled it always happens within about 15 minutes. If I run dual head with both displays enabled it does not seem to happen, although I have only run for about 4 hours that way.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-06-03:

#126

Jesse,

as we discussed last week in Barcelona, I have now tried -intel git head, mesa git head, 2.6.30rc7 on my home system with the external monitor again, now with the extra 1 GB of RAM that I plugged in last week.

As you suspected, the underruns are now gone, apparently having a second RAM bar now provides enough bandwidth for the graphics card to avoid underruns.

I'm happy to test further patches, I can easily remove the extra GB of RAM again. The very same symptom happens on the Samsung NC10 of a friend of mine, I can test stuff on his machine as well (with some delay).

My impression is that with FB compresssion my machine is simply not fast enough, regardless of the watermark settings (given that all of above patches failed consistently). Would it be possible for the driver to disable FB compression dynamically if it encounters pipe underruns, such as "twice in five minutes"?

I wonder why this problem didn't occur at all with earlier driver versions (2.4). Didn't that use FB compression yet?

Thanks!

Revision history for this message

Martin Pitt (pitti) wrote on 2009-06-03:

#127

For the record, this still happens with the latest -intel/UXA/KMS versions.

However, it does _not_ happen any more on my system since I plugged in a second GB of RAM. Now it has two RAM pipelines available which apparently makes memory operations fast enough to avoid the underruns. I discussed this with Jesse Barnes at UDS, and he seems to understand the root cause of this now.

I'll continue discussion upstream.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-06-04:

#128

On Wed, 3 Jun 2009 00:06:45 -0700 (PDT)
<email address hidden> wrote:
> as we discussed last week in Barcelona, I have now tried -intel git
> head, mesa git head, 2.6.30rc7 on my home system with the external
> monitor again, now with the extra 1 GB of RAM that I plugged in last
> week.
>
> As you suspected, the underruns are now gone, apparently having a
> second RAM bar now provides enough bandwidth for the graphics card to
> avoid underruns.
>
> I'm happy to test further patches, I can easily remove the extra GB
> of RAM again. The very same symptom happens on the Samsung NC10 of a
> friend of mine, I can test stuff on his machine as well (with some
> delay).
>
> My impression is that with FB compresssion my machine is simply not
> fast enough, regardless of the watermark settings (given that all of
> above patches failed consistently). Would it be possible for the
> driver to disable FB compression dynamically if it encounters pipe
> underruns, such as "twice in five minutes"?
>
> I wonder why this problem didn't occur at all with earlier driver
> versions (2.4). Didn't that use FB compression yet?

Great, thanks for the update. Yes, we should detect either memory
configuration or underruns and take appropriate action. Previous
drivers didn't modify the FIFO or DSPARB settings, so the defaults may
have been working on your platform, or something else changed to affect
the way we access memory (it's also possible that FBC was disabled on
older releases in your config for some reason).

Jesse

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-06-18:

#129

Created an attachment (id=26930)
most recent, KMS version of the patch

This patch applies to the kernel. It still doesn't contain checks against available bandwidth & latency to reject modes we can't support, but it should behave a bit better than the current 2D driver.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-07-01:

#130

I applied the patch to 2.6.31rc1 and first tested it with 2 GB of RAM. No noticeable difference, everything continued to work smoothly.

Now I ripped out the second GB RAM bar again, and did some stress testing: kvm -m512 (booting another Ubuntu desktop live system), running glxgears, and do some compiz juggling and VT switches. In previous versions this was a reliable way of triggering underruns quickly (which otherwise just occur after a couple of hours). I had a load of 4.3, and glxgears/compiz froze for some fractional seconds due to the high load, but I didn't get any pipe underrun.

I now continue to use the system for a couple of hours to see the longer-term effects.

What I didn't do yet is exercising the same stress test on 2.6.31 without this patch. Do you need this?

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-07-01:

#131

Only if you're feeling thorough. :) Thanks for the updated report though. I fixed a few bugs in the calculations in the KMS patch, so maybe one of those fixed your issues. I'm really looking forward to closing this one; I'll ping Eric about including the patch.

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-07-01:

#132

Yay, fix pushed!

commit 7662c8bd6545c12ac7b2b39e4554c3ba34789c50
Author: Shaohua Li <email address hidden>
Date: Fri Jun 26 11:23:55 2009 +0800

drm/i915: add FIFO watermark support

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-07-01:

#133

Oops, I am terribly sorry. We currently put i915 into the initramfs, and it gets loaded from there. When I built the module with the patch, I forgot to update the initramfs, so all these successful tests were actually done with the original i915 from 2.6.31rc1.

Later this afternoon some other package updated the initramfs, and now the screen goes entirely and irrecorverably black when booting, both when docked (external DVI) and when undocked (internal LVDS).

So, perhaps you should revert this from your tree until this is investigated further? So far, I don't seem to have this underrun problem at all with 2.6.31rc1, thus I leave the bug as "resolved".

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-07-01:

#134

Uh-oh, ok thanks for the heads-up. I'll look at this. Can you modprobe your drm with debug=1 so we can see what the watermark values end up being on your machine? It would help if you could confirm that this particular patch caused the problem too, was that the only change or was there another kernel update as well?

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-07-01:

#135

It wasn't the only patch, I also applied the tiny patch from bug 20520 (register restoring ordering fix for resuming). However, I tested that patch in isolation before, and it worked fine. Also, I don't think that code path is active on boot. There was no other kernel update.

I'll send detailled debugging information tomorrow (I hope I can ssh into the machine still, or it gets logged far enough), bed time for today. I just wanted to give you an early warning to perhaps defer propagation of the patch (or just revert it for now, since it just works without it.

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-07-02:

#136

Created an attachment (id=27329)
logs for early/late i915 loading with drm debugging

So, first I turned on DRM debugging and dmesg capturing:

$ cat /etc/rcS.d/S80dmesg
#!/bin/sh
dmesg > /var/log/dmesg-`date +%T`
$ cat /etc/modprobe.d/drmdebug.conf
options drm debug=1

In the attached logs I renamed the dmesg files from timestamps to situation descriptions, such as "dmesg-31rc1-vanilla-early-2GB-ok.txt"

Then I tested all possible combinations of 2.6.31rc1 with/without this patch, with 1GB or 2 GB RAM, and with "
early" or "late" loading of i915/drm.

early: modules are contained and loaded by initramfs, i. e. pretty much as one of the first things after the k
ernel starts to boot

late: I booted without an initramfs, thus init starts readahead, sets the hostname and keyboard layout, and th
en starts udev which does an "udev trigger" and causes modules such as drm and i915 to be loaded, which in tur
n does KMS.

In earlier Karmic (2.6.30 release candidates), we didn't put i915/drm into the initramfs, and it worked fine (just looked a bit ugly since mode got switched halfway through boot). Now I noticed that this late loading doe
s not work any more for some reason, not with 2.6.30 final, not with 31rc1, or with 31rc1+your patch. That is
a bug in itself, and sounds pretty unrelated to this pipe underrun issue, so perhaps I should report it separa
tely?

Results from this testing:
* late loading never works, I always get LVDS and DVI turned off
* early loading works with .30 final and .31rc1 vanilla
* with this patch applied, it never works, and worse, I don't even get a dmesg captured; this means that the
boot doesn't even get to rcS/70. Sounds like it wedges display and causes a kernel panic? Anything I can do to
debug this?
* 1 GB/2 GB does not make any difference in any test case

Revision history for this message

In freedesktop.org Bugzilla #19304, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2009-07-02:

#137

(In reply to comment #87)
> Then I tested all possible combinations of 2.6.31rc1 with/without this patch,
> with 1GB or 2 GB RAM, and with "
> early" or "late" loading of i915/drm.
>
> early: modules are contained and loaded by initramfs, i. e. pretty much as one
> of the first things after the k
> ernel starts to boot
>
> late: I booted without an initramfs, thus init starts readahead, sets the
> hostname and keyboard layout, and th
> en starts udev which does an "udev trigger" and causes modules such as drm and
> i915 to be loaded, which in tur
> n does KMS.

Sounds like a good set of combinations, thanks for testing.

> In earlier Karmic (2.6.30 release candidates), we didn't put i915/drm into the
> initramfs, and it worked fine (just looked a bit ugly since mode got switched
> halfway through boot). Now I noticed that this late loading doe
> s not work any more for some reason, not with 2.6.30 final, not with 31rc1, or
> with 31rc1+your patch. That is
> a bug in itself, and sounds pretty unrelated to this pipe underrun issue, so
> perhaps I should report it separately?

One thing jumped out between the early (working) and late (broken) logs: in the broken ones there's no line for the fbcon loading & initializing. Which would leave your display blank if/until X starts. Maybe that's missing from the load in the late case?

> Results from this testing:
> * late loading never works, I always get LVDS and DVI turned off
> * early loading works with .30 final and .31rc1 vanilla
> * with this patch applied, it never works, and worse, I don't even get a dmesg
> captured; this means that the
> boot doesn't even get to rcS/70. Sounds like it wedges display and causes a
> kernel panic? Anything I can do to
> debug this?
> * 1 GB/2 GB does not make any difference in any test case

Ugh, ok so it's probably not a pipe underrun then if it kills the whole machine (at least I hope not); could be a kernel panic. You could try netconsole (modprobe netconsole netconsole=<params> and then use nc on another machine, the kernel Documentation/ directory has some info on that); it might capture a panic if you load the module by hand with the netconsole running.

(In reply to comment #87)
> Then I tested all possible combinations of 2.6.31rc1 with/without this patch,
> with 1GB or 2 GB RAM, and with "
> early" or "late" loading of i915/drm.
> 
> early: modules are contained and loaded by initramfs, i. e. pretty much as one
> of the first things after the k
> ernel starts to boot
> 
> late: I booted without an initramfs, thus init starts readahead, sets the
> hostname and keyboard layout, and th
> en starts udev which does an "udev trigger" and causes modules such as drm and
> i915 to be loaded, which in tur
> n does KMS.

Sounds like a good set of combinations, thanks for testing.

> In earlier Karmic (2.6.30 release candidates), we didn't put i915/drm into the
> initramfs, and it worked fine (just looked a bit ugly since mode got switched
> halfway through boot). Now I noticed that this late loading doe
> s not work any more for some reason, not with 2.6.30 final, not with 31rc1, or
> with 31rc1+your patch. That is
> a bug in itself, and sounds pretty unrelated to this pipe underrun issue, so
> perhaps I should report it separately?

One thing jumped out between the early (working) and late (broken) logs: in the broken ones there's no line for the fbcon loading & initializing.  Which would leave your display blank if/until X starts.  Maybe that's missing from the load in the late case?

> Results from this testing:
>  * late loading never works, I always get LVDS and DVI turned off
>  * early loading works with .30 final and .31rc1 vanilla
>  * with this patch applied, it never works, and worse, I don't even get a dmesg
> captured; this means that the
> boot doesn't even get to rcS/70. Sounds like it wedges display and causes a
> kernel panic? Anything I can do to
>  debug this?
>  * 1 GB/2 GB does not make any difference in any test case

Ugh, ok so it's probably not a pipe underrun then if it kills the whole machine (at least I hope not); could be a kernel panic.  You could try netconsole (modprobe netconsole netconsole=<params> and then use nc on another machine, the kernel Documentation/ directory has some info on that); it might capture a panic if you load the module by hand with the netconsole running.

Bug Watch Updater (bug-watch-updater) on 2009-07-02

Changed in xserver-xorg-video-intel:
status:	In Progress → Fix Released

Revision history for this message

In freedesktop.org Bugzilla #19304, Martin Pitt (pitti) wrote on 2009-07-06:

#138

Download full text (4.8 KiB)

> One thing jumped out between the early (working) and late (broken) logs: in the
> broken ones there's no line for the fbcon loading & initializing. Which would
> leave your display blank if/until X starts. Maybe that's missing from the load
> in the late case?

Indeed, I discussed that with our initramfs/boot guru. So that's not a concern here.

> Ugh, ok so it's probably not a pipe underrun then if it kills the whole machine
(at least I hope not); could be a kernel panic. You could try netconsole

Thanks for the netconsole hint, that worked beautifully. Indeed it catches a nice trace in the watermark updating:

[ 489.298734] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[ 489.298908] IP: [<ffffffffa030f1af>] intel_update_watermarks+0xcf/0xd40 [i915]
[ 489.299056] PGD 0
[ 489.299152] Oops: 0000 [#1] SMP
[ 489.299289] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/drm/card0/dev
[ 489.299384] CPU 0
[ 489.299481] Modules linked in: i915(+) drm netconsole i2c_algo_bit configfs snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm arc4 joydev ecb snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer iwl3945 iwlcore iTCO_wdt iTCO_vendor_support snd_seq_device mac80211 led_class snd psmouse dell_wmi dell_laptop cfg80211 soundcore snd_page_alloc usb_storage usbhid serio_raw dcdbas video output tg3 fbcon tileblit font bitblit softcursor intel_agp [last unloaded: drm]
[ 489.300005] Pid: 2208, comm: work_for_cpu Not tainted 2.6.31-1-generic #14-Ubuntu Latitude D430
[ 489.300005] RIP: 0010:[<ffffffffa030f1af>] [<ffffffffa030f1af>] intel_update_watermarks+0xcf/0xd40 [i915]
[ 489.300005] RSP: 0018:ffff8800229e98b0 EFLAGS: 00010202
[ 489.300005] RAX: 0000000000000000 RBX: ffff880022966800 RCX: ffffffffa03244fb
[ 489.300005] RDX: ffffffffa0321a20 RSI: ffffffffa0324518 RDI: 0000000000000001
[ 489.300005] RBP: ffff8800229e9930 R08: 0000000000000000 R09: 000000000001a400
[ 489.300005] R10: 0000000000000500 R11: 0000000000000000 R12: ffff880022967000
[ 489.300005] R13: 000000000001a400 R14: ffff8800229674a0 R15: 0000000000000001
[ 489.300005] FS: 0000000000000000(0000) GS:ffff8800019b4000(0000) knlGS:0000000000000000
[ 489.300005] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 489.300005] CR2: 0000000000000038 CR3: 0000000001001000 CR4: 00000000000006b0
[ 489.300005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 489.300005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 489.300005] Process work_for_cpu (pid: 2208, threadinfo ffff8800229e8000, task ffff88003d5416b0)
[ 489.300005] Stack:
[ 489.300005] ffff8800229e9910 ffffffffa0317a5a ffff000100000038 ffff8800229e98f0
[ 489.300005] <0> ffff000100010038 ffff8800229e98e0 0000000000000001 0000000000000002
[ 489.300005] <0> ffff8800229e0009 0000000000000000 ffff8800229e9920 ffff880022f3b000
[ 489.300005] Call Trace:
[ 489.300005] [<ffffffffa0317a5a>] ? intel_sdvo_read_byte+0x6a/0xc0 [i915]
[ 489.300005] [<ffffffffa031161c>] intel_crtc_dpms+0xb0c/0xef0 [i915]
[ 489.300005] [<ffffffffa0317cff>] ? intel_sdvo_set_active...

Ubuntu
linux package

[i945] spontaneous black screen (major pipe-A underrun)

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	xf86-video-intel	Fix Released	High	freedesktop-bugs #19304
	linux (Ubuntu)	Fix Released	High	Unassigned

Changed in xserver-xorg-video-intel:
status:	Confirmed → Fix Released

Changed in linux (Ubuntu):
status:	Triaged → Fix Released

Changed in xserver-xorg-video-intel:
importance:	Unknown → High

Changed in xserver-xorg-video-intel:
importance:	High → Unknown

Ubuntulinux package

[i945] spontaneous black screen (major pipe-A underrun)

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package