[g45] jaunty X "EQ overflow" infinite loop hang

Bug #305979 reported by Martin Olsson
32
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Unknown
xf86-video-intel
Fix Released
Critical
linux (Ubuntu)
Fix Released
High
Unassigned
Jaunty
Fix Released
High
Unassigned

Bug Description

I just upstreamed the bug report below (opening this bug just to track it in ubuntu).

************************************

Just now I started Firefox and my xserver froze. I grabbed a backtrace from gdb
and X was stucking waiting for some ioctl(). I also took a copy of xorg.lorg
which contains a "xorg is probably in an infinite loop backtrace" as such:

[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/X11R6/bin/X(xorg_backtrace+0x26) [0x4ee1a6]
1: /usr/X11R6/bin/X(mieqEnqueue+0x291) [0x4cebd1]
2: /usr/X11R6/bin/X(xf86PostMotionEventP+0xc4) [0x498554]
3: /usr/X11R6/bin/X(xf86PostMotionEvent+0xb1) [0x498731]
4: /usr/lib/xorg/modules/input//evdev_drv.so [0x7f26a0a559b2]
5: /usr/X11R6/bin/X [0x481625]
6: /usr/X11R6/bin/X [0x472147]
7: /lib/libpthread.so.0 [0x7f26b9bbf080]
8: /lib/libc.so.6(ioctl+0x7) [0x7f26b8221d87]
9: /usr/lib/libdrm.so.2 [0x7f26b6dfe8d3]
10: /usr/lib/libdrm.so.2(drmWaitVBlank+0x20) [0x7f26b6dfed70]
11: /usr/lib/dri/i965_dri.so [0x7f26a5ccb85e]
12: /usr/lib/dri/i965_dri.so(driWaitForVBlank+0x110) [0x7f26a5ccbaf0]
13: /usr/lib/dri/i965_dri.so(intelSwapBuffers+0xe5) [0x7f26a5cd53d5]
14: /usr/lib/dri/i965_dri.so [0x7f26a5cccdef]
15: /usr/lib/xorg/modules/extensions//libglx.so [0x7f26b7659b5f]
16: /usr/lib/xorg/modules/extensions//libglx.so [0x7f26b764d936]
17: /usr/lib/xorg/modules/extensions//libglx.so [0x7f26b7650bd2]
18: /usr/X11R6/bin/X(Dispatch+0x364) [0x44d754]
19: /usr/X11R6/bin/X(main+0x45d) [0x43376d]
20: /lib/libc.so.6(__libc_start_main+0xe6) [0x7f26b8162586]
21: /usr/X11R6/bin/X [0x432b49]
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

------- Comment #1 From martin 2008-12-07 06:43:30 PST [reply] -------

Created an attachment (id=20867) [details]
xorg_log with "x has gone into inf loop" backtrace (related to EQ overflow)

------- Comment #2 From martin 2008-12-07 06:50:21 PST [reply] -------

Previously when I used 2.6.27 kernel with intel 2.4.1 I never saw this
particular freeze at all.

When I upgraded to jaunty I got 2.6.28 kernel and intel 2.5.1 I run into this
problem.

This bug is not reproducible by specific steps though, it just happens at
random times out of the blue. If you have additional info you need I can write
it down on a post-it next to my machine and try to collect that data once the
bug happens to be triggered again.

My hardware is a x64 box with intel g45 (some lines from "lspci -nn" below):
00:00.0 Host bridge [0600]: Intel Corporation 4 Series Chipset DRAM Controller
[8086:2e20] (rev 03)
00:02.0 VGA compatible controller [0300]: Intel Corporation 4 Series Chipset
Integrated Graphics Controller [8086:2e22] (rev 03)
00:02.1 Display controller [0380]: Intel Corporation 4 Series Chipset
Integrated Graphics Controller [8086:2e23] (rev 03)

I have "current jaunty versions" which right now means:
ii libdrm-intel1 2.4.1-0ubuntu7
ii xserver-xorg-video-intel 2:2.5.1-1ubuntu5
ii linux-image-2.6.28-2-generic 2.6.28-2.3
ii xserver-xorg 1:7.4~5ubuntu5
ii libgl1-mesa-dri 7.2-1ubuntu2

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Created an attachment (id=20867)
xorg_log with "x has gone into inf loop" backtrace (related to EQ overflow)

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Previously when I used 2.6.27 kernel with intel 2.4.1 I never saw this particular freeze at all.

When I upgraded to jaunty I got 2.6.28 kernel and intel 2.5.1 I run into this problem.

This bug is not reproducible by specific steps though, it just happens at random times out of the blue. If you have additional info you need I can write it down on a post-it next to my machine and try to collect that data once the bug happens to be triggered again.

My hardware is a x64 box with intel g45 (some lines from "lspci -nn" below):
00:00.0 Host bridge [0600]: Intel Corporation 4 Series Chipset DRAM Controller [8086:2e20] (rev 03)
00:02.0 VGA compatible controller [0300]: Intel Corporation 4 Series Chipset Integrated Graphics Controller [8086:2e22] (rev 03)
00:02.1 Display controller [0380]: Intel Corporation 4 Series Chipset Integrated Graphics Controller [8086:2e23] (rev 03)

I have "current jaunty versions" which right now means:
ii libdrm-intel1 2.4.1-0ubuntu7
ii xserver-xorg-video-intel 2:2.5.1-1ubuntu5
ii linux-image-2.6.28-2-generic 2.6.28-2.3
ii xserver-xorg 1:7.4~5ubuntu5
ii libgl1-mesa-dri 7.2-1ubuntu2

Revision history for this message
In , Martin Olsson (mnemo) wrote :

I just hit this exact same freeze again, that's two times today so far. Seems like a pretty hard hitting bug.

Revision history for this message
Martin Olsson (mnemo) wrote : g45 jaunty X "EQ overflow" infinite loop hang
Download full text (3.4 KiB)

I just upstreamed the bug report below (opening this bug just to track it in ubuntu).

************************************

Just now I started Firefox and my xserver froze. I grabbed a backtrace from gdb
and X was stucking waiting for some ioctl(). I also took a copy of xorg.lorg
which contains a "xorg is probably in an infinite loop backtrace" as such:

[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/X11R6/bin/X(xorg_backtrace+0x26) [0x4ee1a6]
1: /usr/X11R6/bin/X(mieqEnqueue+0x291) [0x4cebd1]
2: /usr/X11R6/bin/X(xf86PostMotionEventP+0xc4) [0x498554]
3: /usr/X11R6/bin/X(xf86PostMotionEvent+0xb1) [0x498731]
4: /usr/lib/xorg/modules/input//evdev_drv.so [0x7f26a0a559b2]
5: /usr/X11R6/bin/X [0x481625]
6: /usr/X11R6/bin/X [0x472147]
7: /lib/libpthread.so.0 [0x7f26b9bbf080]
8: /lib/libc.so.6(ioctl+0x7) [0x7f26b8221d87]
9: /usr/lib/libdrm.so.2 [0x7f26b6dfe8d3]
10: /usr/lib/libdrm.so.2(drmWaitVBlank+0x20) [0x7f26b6dfed70]
11: /usr/lib/dri/i965_dri.so [0x7f26a5ccb85e]
12: /usr/lib/dri/i965_dri.so(driWaitForVBlank+0x110) [0x7f26a5ccbaf0]
13: /usr/lib/dri/i965_dri.so(intelSwapBuffers+0xe5) [0x7f26a5cd53d5]
14: /usr/lib/dri/i965_dri.so [0x7f26a5cccdef]
15: /usr/lib/xorg/modules/extensions//libglx.so [0x7f26b7659b5f]
16: /usr/lib/xorg/modules/extensions//libglx.so [0x7f26b764d936]
17: /usr/lib/xorg/modules/extensions//libglx.so [0x7f26b7650bd2]
18: /usr/X11R6/bin/X(Dispatch+0x364) [0x44d754]
19: /usr/X11R6/bin/X(main+0x45d) [0x43376d]
20: /lib/libc.so.6(__libc_start_main+0xe6) [0x7f26b8162586]
21: /usr/X11R6/bin/X [0x432b49]
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

------- Comment #1 From martin 2008-12-07 06:43:30 PST [reply] -------

Created an attachment (id=20867) [details]
xorg_log with "x has gone into inf loop" backtrace (related to EQ overflow)

------- Comment #2 From martin 2008-12-07 06:50:21 PST [reply] -------

Previously when I used 2.6.27 kernel with intel 2.4.1 I never saw this
particular freeze at all.

When I upgraded to jaunty I got 2.6.28 kernel and intel 2.5.1 I run into this
problem.

This bug is not reproducible by specific steps though, it just happens at
random times out of the blue. If you have additional info you need I can write
it down on a post-it next to my machine and try to collect that data once the
bug happens to be triggered again.

My hardware is a x64 box with intel g45 (some lines from "lspci -nn" below):
00:00.0 Host bridge [0600]: Intel Corporation 4 Series Chipset DRAM Controller
[8086:2e20] (rev 03)
00:02.0 VGA compatible controller [0300]: Intel Corporation 4 Series Chipset
Integrated Graphics Controller [8086:2e22] (rev 03)
00:02.1 Display controller [0380]: Intel Corporation 4 Series Chipset
Integrated Graphics Controller [8086:2e23] (rev 03)

I have "current jaunty versions" which right now means:
ii libdrm-intel1 ...

Read more...

Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed
Revision history for this message
Martin Olsson (mnemo) wrote :

I just hit this exact same spontaneous Xorg freeze again (I was browsing a pdf in evince this time). That's two freezes in one day.

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

This looks similar to http://bugzilla.kernel.org/show_bug.cgi?id=12166. Are you including vesafb in kernel too? If so, try removing it.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

It appears that I do _NOT_ have VESA compiled into the kernel (I'm running the stock ubuntu kernel). In fact I don't have anything containing "fb" compiled into the kernel. For details, look below:

mnemo@kingfish:~$ cat /boot/config-2.6.28-2-generic | grep -i vesa
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_UVESA=m
CONFIG_FB_VESA=m
mnemo@kingfish:~$ uname -a
Linux kingfish 2.6.28-2-generic #3-Ubuntu SMP Thu Dec 4 21:49:26 UTC 2008 x86_64 GNU/Linux
mnemo@kingfish:~$ cat /boot/config-2.6.28-2-generic | grep -i vesa
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_UVESA=m
CONFIG_FB_VESA=m
mnemo@kingfish:~$ cat /boot/config-2.6.28-2-generic | grep -i fb | grep -v m
mnemo@kingfish:~$

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Created an attachment (id=20959)
xorg_log taken at similar freeze (identical stack in gdb) but this time no "EQ overflow" in xorg_log

Today I found my jauny xorg hung again, I saw the exact same stack in gdb as the one I reported above. However, this time the xorg_log did not mention "EQ overflow". Could be another bug or maybe this gives some clue about this particular bug. I'm attaching the xorg_log from today that doesn't show any "EQ overflow" reference.

Revision history for this message
Martin Olsson (mnemo) wrote : Re: g45 jaunty X "EQ overflow" infinite loop hang

Today I found my jauny xorg hung again, I saw the exact same stack in gdb as the one I reported above. However, this time the xorg_log did not mention "EQ overflow". Could be another bug or maybe this gives some clue about this particular bug. I'm attaching the xorg_log from today that doesn't show any "EQ overflow" reference.

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

The following kernel commit may fix things.

Also, please include dmesg with bug reports.

commit 52440211dcdc52c0b757f8b34d122e11b12cdd50
Author: Keith Packard <email address hidden>
Date: Tue Nov 18 09:30:25 2008 -0800

    drm: move drm vblank initialization/cleanup to driver load/unload

    drm vblank initialization keeps track of the changes in driver-supplied
    frame counts across vt switch and mode setting, but only if you let it by
    not tearing down the drm vblank structure.

    Signed-off-by: Keith Packard <email address hidden>
    Signed-off-by: Dave Airlie <email address hidden>

Revision history for this message
In , Martin Olsson (mnemo) wrote :

I walked through that commit as specified here:
http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff_plain;h=52440211dcdc52c0b757f8b34d122e11b12cdd50;hp=6133047aa64d2fd5b3b79dff74f696ded45615b2

And I have all those changed in my current ubuntu jaunty kernel already. So this bug was not fixed by that commit. I had another two instances of this bug today with this particular fixed included.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Created an attachment (id=21100)
dmesg (saved from ssh while freeze was still in effect, but the gksu segv probably has nothing to do with it, right?)

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Some more info on my kernel config:

mnemo@kingfish:~/src/libexif_apt/libexif-0.6.16$ grep MTRR /boot/config-2.6.28-2-generic
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
mnemo@kingfish:~/src/libexif_apt/libexif-0.6.16$ grep PREEMPT /boot/config-2.6.28-2-generic
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y

This bug seems very similar to this kernel/DRI bug:
http://bugzilla.kernel.org/show_bug.cgi?id=12166

(I will try to collect the output of /proc/dri/0/i915_gem_interrupt the next time this bug repro's)

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel:
importance: Undecided → High
status: New → Triaged
Changed in linux:
status: Unknown → Confirmed
Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Yeah, it would be interesting to know if you're still getting interrupts at the point where things fail. That will tell us where to look in the interrupt handler...

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

I'm having this extremely annoying issue too.

Often, the backtrace is similar to what's been reported here. Today, I had this backtrace:

Backtrace:
0: /usr/X11R6/bin/X(xorg_backtrace+0x3b) [0x813161b]
1: /usr/X11R6/bin/X(xf86SigHandler+0x55) [0x80cb635]
2: [0xb809a400]
3: /usr/lib/xorg/modules//libexa.so [0xb7b65f23]
4: /usr/lib/xorg/modules//libexa.so [0xb7b675e2]
5: /usr/X11R6/bin/X [0x8178334]
6: /usr/X11R6/bin/X(miPaintWindow+0x231) [0x8110fb1]
7: /usr/X11R6/bin/X(miWindowExposures+0x142) [0x8111322]
8: /usr/lib/xorg/modules/extensions//libdri.so(DRIWindowExposures+0x97) [0xb7b79e17]
9: /usr/X11R6/bin/X [0x80c11af]
10: /usr/X11R6/bin/X(miHandleValidateExposures+0x74) [0x81290f4]
11: /usr/X11R6/bin/X(UnmapWindow+0x1f8) [0x8076e78]
12: /usr/X11R6/bin/X(DeleteWindow+0x36) [0x807a926]
13: /usr/X11R6/bin/X(FreeClientResources+0xe6) [0x80741e6]
14: /usr/X11R6/bin/X(CloseDownClient+0x6f) [0x808690f]
15: /usr/X11R6/bin/X(Dispatch+0x3e8) [0x808c988]
16: /usr/X11R6/bin/X(main+0x47d) [0x8071d6d]
17: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7c9a685]
18: /usr/X11R6/bin/X [0x8071151]

If this isn't something completely different, then perhaps it can prove useful.

@Jesse: I don't if this is what you want, but a "cat /proc/dri/0/i915_gem_interrupt" gave me this:

Interrupt enable: 00000053
Interrupt identity: 00000000
Interrupt mask: fffedfae
Pipe A stat: 00000000
Pipe B stat: 00400206
Interrupts received: 1145771
Current sequence: 1550525
Waiter sequence: 0
IRQ sequence: 1550525

This data was collected after one of these crashes (although not the one with the above backtrace).

FYI, crashes here don't happen during VT switches and I gave no fb drivers in use. They always happen (as far as I can tell anyway), when I use compiz, or if there's an opengl window (like a screensaver, google earth, or even glxgears running). I'm using a GEM kernel (2.6.28-rc9), and the chip is g45.

Regards.

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

Khashayar, you have a completely different bug (crash, not a hang, and a different backtrace). Report your own bug if you want anything to happen.

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

(In reply to comment #13)
> Khashayar, you have a completely different bug (crash, not a hang, and a
> different backtrace). Report your own bug if you want anything to happen.
>

Thanks for confirming that. I thought so, but wasn't completely sure. I'll file a new report about that if I see it again.

Just to make it clear, I'm _also_ having the problem posted by the OP. That is, I've also had backtraces looking similar to the one in the first post here. Like this one, for instance:

[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/X11R6/bin/X(xorg_backtrace+0x3b) [0x813161b]
1: /usr/X11R6/bin/X(mieqEnqueue+0x289) [0x8110bf9]
2: /usr/X11R6/bin/X(xf86PostMotionEventP+0xc2) [0x80ce702]
3: /usr/X11R6/bin/X(xf86PostMotionEvent+0x68) [0x80ce868]
4: /usr/lib/xorg/modules/input//synaptics_drv.so [0xa3b95426]
5: /usr/lib/xorg/modules/input//synaptics_drv.so [0xa3b97ae9]
6: /usr/X11R6/bin/X [0x80cb7c7]
7: /usr/X11R6/bin/X [0x80b133c]
8: [0xb7fd6400]
9: /usr/lib/libdrm.so.2(drmWaitVBlank+0x28) [0xb7a96718]
10: /usr/lib/dri/i965_dri.so [0xa7527ffd]
11: /usr/lib/dri/i965_dri.so(driWaitForVBlank+0xfb) [0xa752828b]
12: /usr/lib/dri/i965_dri.so(intelSwapBuffers+0xc7) [0xa7532597]
13: /usr/lib/dri/i965_dri.so [0xa75295a7]
14: /usr/lib/xorg/modules/extensions//libglx.so [0xb7afeb74]
15: /usr/lib/xorg/modules/extensions//libglx.so [0xb7af12ce]
16: /usr/lib/xorg/modules/extensions//libglx.so [0xb7af4c0a]
17: /usr/X11R6/bin/X(Dispatch+0x34f) [0x808c8ef]
18: /usr/X11R6/bin/X(main+0x47d) [0x8071d6d]
19: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7bd6685]
20: /usr/X11R6/bin/X [0x8071151]
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

...and so on

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

I created this nifty little script and let it run at boot time.

#!/bin/bash
until [ "`tail -n -1 /var/log/Xorg.0.log`" == "[mi] mieqEnequeue: out-of-order valuator event; dropping." ]; do
        sleep 30s
done
mkdir /root/X-bugs
for i in `ls /proc/dri/0/`; do cat /proc/dri/0/$i > /root/X-bugs/$i.output; done
cp /var/log/Xorg.0.log /root/X-bugs/
reboot

Then, I just waited for this to re-occur. I'll now attach an archive containing all those files. I hope it helps. Let me know if there's anything else I can do.

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

Created an attachment (id=21477)
archive containing Xorg log + the output of everything under /proc/dri/0/

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

Please retest with new mesa:
commit 6c01500228014a6cfa133b5dbba8c6d024833e84
Author: Eric Anholt <email address hidden>
Date: Tue Dec 23 16:08:40 2008 -0800

    dri: Fix driWaitForMSC32 when divisor >= 2 and msc < 0.

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

Eric,

I compiled mesa 7.2 with that patch applied but after a short while I was hit by this bug again. Are there other post-7.2 commits that I might need? That is, should I give mesa-git a whirl instead of a patched 7.2?

Do you want me to attach the log and output of /proc/dri/0/* this time?

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Can you check out the patch in 18879?

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

(In reply to comment #19)
> Can you check out the patch in 18879?
>

I'll build the drm modules in kernel 2.6.28 with that patch applied and see where that gets me. If you'd rather want me to try the modules from git, let me know. (My stack is basically the latest stable release of everything).

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

(In reply to comment #20)
> (In reply to comment #19)
> > Can you check out the patch in 18879?
> >
>
> I'll build the drm modules in kernel 2.6.28 with that patch applied and see
> where that gets me. If you'd rather want me to try the modules from git, let me
> know. (My stack is basically the latest stable release of everything).
>

That didn't work. In fact, X hung as soon as GDM started (but it wasn't this particular bug that triggered it). Let me know if you want me to try to catch some logs, but that should be for another bug report, I guess.

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #19)
> > > Can you check out the patch in 18879?
> > >
> >
> > I'll build the drm modules in kernel 2.6.28 with that patch applied and see
> > where that gets me. If you'd rather want me to try the modules from git, let me
> > know. (My stack is basically the latest stable release of everything).
> >
>
> That didn't work. In fact, X hung as soon as GDM started (but it wasn't this
> particular bug that triggered it). Let me know if you want me to try to catch
> some logs, but that should be for another bug report, I guess.
>

After reading the comments in #18879, I see the patch has been reported not to work with the 2.6.28 drm modules. I'll try drm-intel-next. Will report back here.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

There's another patch in 18041 that might help too, in case this is a different problem. Also I just updated the one 18879, so you might try that against the 2.6.28 branch again.

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

(In reply to comment #23)
> There's another patch in 18041 that might help too, in case this is a different
> problem. Also I just updated the one 18879, so you might try that against the
> 2.6.28 branch again.
>

I tried the updated patch. It didn't crash X, but didn't solve the problem either.
I've had some problems gitting. Is it the 'master' one should go for nowadays, or a branch? Is it possible to use drm-git, while the rest of the stack is latest released versions?

I'll try the libdrm patch later.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Yeah two new commits in drm-intel-next and drm-intel-2.6.28 might help:

commit e1a6fcee467556a7e955fe1f7ccc134dd2f974e7
Author: Jesse Barnes <email address hidden>
Date: Tue Jan 6 10:21:24 2009 -0800

    drm/i915: set vblank enabled flag correctly across IRQ install/uninstall

commit 9f4f07ceb1716d8796089fcef91621c5f07c872a
Author: Jesse Barnes <email address hidden>
Date: Thu Jan 8 10:42:15 2009 -0800

    drm/i915: don't enable vblanks on disabled pipes

along with libdrm:
commit f4f76a6894b40abd77f0ffbf52972127608b9bca
Author: Jesse Barnes <email address hidden>
Date: Wed Jan 7 10:18:08 2009 -0800

    libdrm: add timeout handling to drmWaitVBlank

Please confirm and close this out if things look good for you now.

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

Would that be these patches along with both of the patches you mention in comment #23?

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Created an attachment (id=21864)
xorg_log, dmesg, gdb, uname and 2 snapshots of /proc/dri/0/i915_gem_interrupt

Today my X hung again with the same gdb stack (but no EQ overflow spam in xorg_log). I had refresh jaunty packages installed which means:

libdrm-intel1 2.4.1-0ubuntu9
xserver-video-intel 2.5.1-1ubuntu7
libgl1-mesa-dri 7.2+git20081209.a0d5c3cf-0ubuntu4
Linux kingfish 2.6.28-4-generic #9-Ubuntu SMP Tue Jan 6 19:33:48 UTC 2009 x86_64 GNU/Linux

This time I sampled /proc/dri/0/i915_gem_interrupt with a couple of seconds in between and I saw that the "Interrupts received" was still being incremented even though X was hung.

This bug happens much more rarely now on ubuntu jaunty and I don't have specific repro steps so I can't really test patches effectively.

Is there any thing in general I can do inside Ubuntu so stress test "vblanking"? Maybe I can run some special part of x11perf or use some screensaver or something that increases the probability that I will hit this bug? Anyone got any ideas?

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

@martin: Did you apply the patches Jesse referenced + rebuilt affected packages?

@jesse: I've applied the patches and have had no hang so far, a couple of hours of normal usage. If there's no problem during the next 24 hours, I'd feel safe saying the patches have solved the problem. Expect a comment about that no later than tomorrow about this time.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

No I don't have Jesse's patches yet. Currently this bug repros like once a week for me so I would like to find a better way to repro this bug before I try patches. I really want this bug gone by Jaunty release in April, but to get drm patches backported I need a solid repro.

Revision history for this message
In , Khashayar Naderehvandi (khashayar) wrote :

Alright, as far as I'm concerned, this bug can be closed. These patches solve the issue for me.

Thank you very much!

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Thanks for confirming!

Revision history for this message
Khashayar Naderehvandi (khashayar) wrote :

I can confirm this issue, also on a g45. With my setup, the overflows happen after only a couple of hours of normal usage, if compiz is enabled. Various OpenGL applications, such as screensavers can trigger it as well. Backtrace is always similar to what's posted here. Examples from my setup can be found in the upstream freedesktop bug report.

The three patches that Jesse Barnes references in fdo #18922 are, in my opinion, absolutely vital to get in Jaunty, preferably as soon as possible. The G45 is relatively new, which means that more and more people probably are going to be purchasing machines with that chipset during Jaunty's development period. The patches are already in upstream.

Of the three patches, the kernel patches apply cleanly, at least to vanilla 2.6.28. The libdrm patch needs a small modification in order to apply cleanly to libdrm 2.3.1, currently in Jaunty. That small modification is simply replacing the line

libdrm_la_LDFLAGS = -version-number 2:4:0 -no-undefined

with

libdrm_la_LDFLAGS = -version-number 2:3:0 -no-undefined

For your convenience, I'll attach Jesse's patch with that modification here.

Revision history for this message
Matteo Collina (matteo-collina) wrote :

I can confirm this issue too and that the patches does their job. Without these patches my system (Dell E6400) was too unstable to use, I think they should be included in Jaunty asap.

Revision history for this message
In , Chris Miller (chrisamiller) wrote :

I've been seeing a similar problem under Ubuntu Intrepid x86_64 (kernel 2.6.27-9-generic)

I don't get a complete X crash, but firefox hangs, and then my mouse stops responding to button events. Keyboard shortcuts work, mouse movement occurs, but mouse clicks don't register.

Full Xorg log attached, but the relevant line seems to be the same as above problems:

[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
. . .
(repeated ad nauseum)

Can someone point me to the patches referenced above, (and perhaps some compilation instructions) so that I can test the fix? Thanks.

Revision history for this message
In , Chris Miller (chrisamiller) wrote :

Created an attachment (id=21962)
Xorg log from infinite loop causing mouse non-responsiveness

Revision history for this message
Matteo Collina (matteo-collina) wrote :

As it's not completely clear in the upstream bug reports, the kernel branch where the fixes are located is:
git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel.git drm-intel-2.6.28

You can see them on: http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=shortlog;h=drm-intel-2.6.28

commit e1a6fcee467556a7e955fe1f7ccc134dd2f974e7
Author: Jesse Barnes <email address hidden>
Date: Tue Jan 6 10:21:24 2009 -0800

    drm/i915: set vblank enabled flag correctly across IRQ install/uninstall

commit 9f4f07ceb1716d8796089fcef91621c5f07c872a
Author: Jesse Barnes <email address hidden>
Date: Thu Jan 8 10:42:15 2009 -0800

    drm/i915: don't enable vblanks on disabled pipes

Revision history for this message
Martin Olsson (mnemo) wrote :

Here is the upstream patches.

Revision history for this message
Martin Olsson (mnemo) wrote :

Here is a set of slightly modified patches that will apply cleanly to jaunty as of jan 18th (in this version the drm and i915 kernel patches have been merged into a single .patch file). Please note that these patches will likely not apply cleanly later on when jaunty gets new versions of libdrm/kernel etc. Also it seems that Timo Aaltonen (tjaalton) is working on getting these patches into Ubuntu already (see this e-mail: https://lists.ubuntu.com/archives/kernel-team/2009-January/004151.html ).

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

looks like it should be fixed in the kernel.

Revision history for this message
Khashayar Naderehvandi (khashayar) wrote :

Isn't Jaunty moving ahead to libdrm 2.4.4 (making the libdrm patches here obsolete)?

Revision history for this message
Martin Olsson (mnemo) wrote :

Yup, that's correct. Patches are for testing only (i'm running with these patches now to verify that they actually do remove the drmWaitForVBlank bug).

Timo said earlier that libdrm 2.4.4 is ready, but it (and intel 2.6.0) is blocked by mesa not building against the drm headers provided by the kernel.. soon, hopefully.

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Andy Whitcroft (apw)
Changed in linux:
assignee: nobody → timg-tpi
status: Triaged → Fix Committed
Revision history for this message
nicobrainless (nicoseb) wrote :

Yay... finally somewhere with a solution to those crazy hung-ups!

Still I don't find the file to patch on my lappy, sorry for the noob question but how do I apply them??

thanks

Revision history for this message
Andy Whitcroft (apw) wrote : Re: [Bug 305979] Re: [g45] jaunty X "EQ overflow" infinite loop hang

On Thu, Jan 22, 2009 at 10:22:03PM -0000, nicobrainless wrote:
> Still I don't find the file to patch on my lappy, sorry for the noob
> question but how do I apply them??

The kernels are close to release into Jaunty. Hopefully today.

Revision history for this message
Andy Whitcroft (apw) wrote :

According to Timo It appears that the mesa and drm header interaction should now be resolved.

Revision history for this message
Matteo Collina (matteo-collina) wrote :

Right now I received an update of libdrm and the X stopped working. Instead of the login screen it was filled by black-and-white squares.
Am I missing something and something else should be updated or it's just another bug?

Revision history for this message
Martin Olsson (mnemo) wrote :

Matteo, I had the exact same experience yesterday and filed this bug:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/320525

I also mentioned it in #ubuntu-x and they said it will be fixed as soon as intel xorg ddx driver 2.6.1 hits apt. Indeed this morning I got new updates and they fixed my black-and-white screen corruption issues and my machine is not bootable again.

Now let's try to confirm that the drmWaitForVBlank bug has disappeared with these new versions.

Revision history for this message
Matteo Collina (matteo-collina) wrote :

Yes, everything is now working.

Revision history for this message
Khashayar Naderehvandi (khashayar) wrote :

This particular bug is now fixed for me with the latest set of updates.
However, I'm now hitting the very annoying LP #320813

Revision history for this message
Tim Gardner (timg-tpi) wrote :

2.6.28-5.12

  * drm/i915: set vblank enabled flag correctly across IRQ
  * drm/i915: don't enable vblanks on disabled pipes

Changed in linux (Ubuntu Jaunty):
assignee: Tim Gardner (timg-tpi) → nobody
status: Fix Committed → Fix Released
Changed in linux:
status: Confirmed → Fix Released
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
Changed in xserver-xorg-video-intel:
importance: Critical → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.