[i915] X freeze due to end of aperture kernel issue

Bug #365994 reported by Milan Bouchet-Valat
30
This bug affects 2 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
Critical
linux (Ubuntu)
Invalid
Undecided
Unassigned
xserver-xorg-video-intel (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-intel

This freeze appeared when I upgraded to Jaunty, around the Alpha6 release. There's no trace of error related to it in the logs. The mouse cursor is still moving, and system is working fine, but no way to switch to a console (I guess that's the definition of a freeze, anyway). It happens every two days at least, and can happen from a fresh boot (i.e. no suspend/hibernate).

I've catched a (poor) gdb trace of X when it freezed using SSH, and I used the intel tool to get a registers dump. Attached is also the Xorg.0.log of when the freeze occurred (still using SSH).

I know that's little information, but please just tell me how I can get more...

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
Package: xserver-xorg-video-intel 2:2.6.3-0ubuntu9
ProcEnviron:
 LANG=fr_FR.UTF-8
 SHELL=/bin/bash
ProcVersion: Linux version 2.6.28-11-generic (buildd@palmer) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009
SourcePackage: xserver-xorg-video-intel
Uname: Linux 2.6.28-11-generic i686

[lspci]
00:00.0 Host bridge [0600]: Intel Corporation Mobile 915GM/PM/GMS/910GML Express Processor to DRAM Controller [8086:2590] (rev 03)
 Subsystem: Toshiba America Info Systems Device [1179:ff00]
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2592] (rev 03)
 Subsystem: Toshiba America Info Systems Device [1179:ff00]

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :
Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :
Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :
Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :
Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :

Hi Milan,

Thanks for taking the time to report this bug. To be able to create a good upstream bug report we're going to need some more information. Whenever the xserver gets stuck on an ioctl() like in your case it's actually the GPU which is hung and that typically means we need to see what kind of instructions were sent to the GPU through it's batchbuffers.

This type of analysis requires a new debugging interface which was included in the 2.6.30 kernel which is not finished yet. So can you please install 2.6.30-rc3 and then take a debug snapshot of these buffers? This would be very helpful for us. Luckily Ubuntu already provides pre-packaged .DEBs for vanilla mainline kernels.

Please refer to this guide:
https://wiki.ubuntu.com/X/Troubleshooting/Freeze
And in particular, follow the steps from the section "Get a Batchbuffer Dump (-intel only)", i.e:
https://edge.launchpad.net/~ubuntu-x-swat/+archive/x-freeze-test

In the next version of Ubuntu the normal kernel will have this type of debugging interface so we will try to get it included in normal apport bugs etc (which will be great) but until then a little bit of manual work is required.

Geir Ove Myhr (gomyhr)
description: updated
tags: added: 915gm freeze intel jaunty xorg
Revision history for this message
In , Martin Olsson (mnemo) wrote :

One ubuntu user has reported a X server freeze with xserver stuck in ioctl() and he installed 2.6.30-rc2 kernel and were able to repro the freeze and capture a full batch buffer dump.

Driver running in: EXA

Exact versions are basically ubuntu jaunty except for the kernel:
intel ddx 2.6.3-0ubuntu9
kernel non-tained 2.6.30rc2
mesa 7.4-0ubuntu3
xserver 1:7.4~5ubuntu18
libdrm 2.4.5-0ubuntu4

Finally his exact chipset is:
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2592] (rev 03)

downstream bug report with all the data you need:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/365994

Direct link to the batch buffer dump:
http://launchpadlibrarian.net/25995147/dri_debug.tgz

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote : Re: [i915] Random freeze every day or two

I'm running the 2.6.30rc2 kernel for two days, and I've not seen the freeze yet... If it's fixed with that release, I won't be able to debug it for Jaunty! :-p Let's wait.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Incomplete
Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

No worries! I've finally got it again. See attached archive - this new kernel feature is very nice, indeed.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → New
Revision history for this message
Martin Olsson (mnemo) wrote :

Great work Milan! I've upstreamed all the data you have collected. Feel free to subscribe to the upstream bug as well:
https://bugs.freedesktop.org/show_bug.cgi?id=21414

Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed
Geir Ove Myhr (gomyhr)
Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Confirmed
Revision history for this message
mutew (avheretic) wrote :

Hi Milan,

I have been noticing similar behaviour ever since I updated my Ubuntu install to Jaunty Jackalope. My bur report is at https://bugs.launchpad.net/ubuntu/+bug/368642

Could you please verify if it is the same and let me know if I can be of any help in tracing it further.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

I can't be sure, be that must be the same bug. The symptoms are the same, at least; You can try debugging with the procedure listed in comment 6, or wait and see what upstream says.

Revision history for this message
Martin Olsson (mnemo) wrote :

Yes, if you the exact same chipset and mutew's xserver is also stuck on ioctl() forever, then the bugs might be the same. However, note that mutew has a pretty nasty kernel backtrace in his dmesg so I hope we can capture complete records, including batch buffers dumps, for both of these issues. It's a good idea if you subscribe to each other bugs though so that if a potential fix becomes available you can both test it.

Revision history for this message
Aaron Roydhouse (aaron-roydhouse) wrote :

These symptoms sounds like the problem I've been having since upgrading from Intrepid 8.10 to Jaunty 9.04 (amd64) on a Lenovo X301 (2776) laptop. In the 24 hours since upgrading it has twice locked up the screen display, mouse pointer, and keyboard. /var/log/kern.log shows a kernel 'Oops' related to 'i915_gem_execbuffer'.

uname -a
Linux shim 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:58:03 UTC 2009 x86_64 GNU/Linux

lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07)

[...]
May 3 17:26:03 shim kernel: [ 3517.858785] BUG: unable to handle kernel NULL pointer dereference at 000000000000002c
May 3 17:26:03 shim kernel: [ 3517.858799] IP: [<ffffffffa03ea3a8>] i915_gem_execbuffer+0x1e8/0x740 [i915]
May 3 17:26:03 shim kernel: [ 3517.858821] PGD 13a128067 PUD 1379ae067 PMD 0
May 3 17:26:03 shim kernel: [ 3517.858830] Oops: 0000 [#1] SMP
[...]

Full dump on https://bugs.launchpad.net/ubuntu/+bug/368642

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Aaron: I don't think that's the same bug, since I don't experience kernel Oopses. Please report yours separately, and devs will determine what your bug may be. You can follow the debugging procedure they gave me, you'll gain some time.

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Adjusting severity: crashes & hangs should be marked critical.

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

This dump looks like:

commit 1142353b487c155a31011923fbd08ec67e60f505
Author: Keith Packard <email address hidden>
Date: Fri May 1 11:44:13 2009 -0700

    intel_batch_start_atomic: fix size passed to intel_batch_require_space (*4)

(batchbuffer starts in the middle of what should have been an atomic batch emission)

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Thanks for suggesting the patch. Looks like cworth already cherry picked it (as commit 115fc9a7d79da07301b96d9fc5c513d33734d273) for 2.7.1 as well.

Thanks a lot.

Revision history for this message
Martin Olsson (mnemo) wrote : Re: [i915] Random freeze every day or two

@Milan,

Good news! Upstream has fixed your bug in their master branch and the fix was also cherry picked for the upcoming 2.7.1 release. Once 2.7.1 is rolled out we're likely to package it into the X-Updates repository and karmic will also have the fix of course. You can find the X-Updates PPA here: https://launchpad.net/~ubuntu-x-swat/+archive/x-updates/

Thanks a lot for taking the time to create the batch buffer dump and helping make Ubuntu better.

summary: - [i915] Random freeze every day or two
+ [i915] Random freeze every day or two (needs intel 2.7.1)
Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Revision history for this message
Geir Ove Myhr (gomyhr) wrote : Re: [i915] Random freeze every day or two (needs intel 2.7.1)

2.7.1 is now out and available from x-updates https://edge.launchpad.net/~ubuntu-x-swat/+archive/x-updates/

Revision history for this message
In , Milan Bouchet-Valat (nalimilan) wrote :

Sorry, but I'm now using 2.7.1 and I've experienced the freeze twice in two days. I'll try to get a new dump, but the symptoms are the same so I assume that's the same freeze...

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote : Re: [i915] Random freeze every day or two (needs intel 2.7.1)

The new version does not fix the freeze, sadly. Pretty annoying... :-(

Revision history for this message
Martin Olsson (mnemo) wrote :

Ah that's too bad :(

I think it's another freeze bug hitting you on 2.7.1 though, if you're able to capture a batch buffer dump again we could open an upstream bug report for the new issue as well. Sorry for the inconvenience.

Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed
Revision history for this message
In , Milan Bouchet-Valat (nalimilan) wrote :

I've caught a new batch buffer dump, which looks to my unexperienced eye very different form the old one. Maybe another bug... See http://launchpadlibrarian.net/26815168/dri_debug-new.tgz.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote : Re: [i915] Random freeze every day or two (needs intel 2.7.1)

The more bugs I can catch, the better, now that I know how to do and that I'm using 2.6.30... ;-)

Here's a new dump, but diff says it's quite different from the previous one. So it may well be a different issue. I'll let upstream determine that.

Revision history for this message
Bryce Harrington (bryce) wrote :

Intel upstream suggests this kernel patch might help with some hang bugs.
https://bugs.freedesktop.org/attachment.cgi?id=25806

Revision history for this message
Bryce Harrington (bryce) wrote :

Please remember to update the title to add/remove (needs ...) when something is proven to need/not-need a given upstream version, so we know to look at the bug when updating packages.

Also, I *really* dislike the use of the word "random" in bug report titles.

summary: - [i915] Random freeze every day or two (needs intel 2.7.1)
+ [i915] Random freeze every day or two
Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote : Re: [i915] Random freeze every day or two

Sorry, the summary about 2.7.1 was not updated by me, and it was a little early to change it before confirmation... And "random" at least makes explicit that we don't have found a precise case that triggers it.

About the kernel patch: do you know whether it's supposed to be included in one of the next release candidates? I'd like to avoid the need to rebuild my kernel if that's not really useful. Thanks!

Revision history for this message
Bryce Harrington (bryce) wrote : Re: [Ubuntu-x-swat] [Bug 365994] Re: [i915] Random freeze every day or two

On Sat, May 16, 2009 at 09:27:28PM -0000, Milan wrote:
> "random" at least makes explicit that we don't have found a precise
> case that triggers it.

Very well, just be aware I'm setting up a procmail rule to exclude
looking at bugs with 'random' in the title going forward, so I can
better focus on bugs that are better characterized.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote : Re: [i915] Random freeze every day or two

OK, OK, I remove it! ;-)

Though I'm not sure we can characterize it better for now since we need to read the new batchbuffer dump to be sure.

summary: - [i915] Random freeze every day or two
+ [i915] Freeze every day or two
Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

Milan, don't reopen someone else's fixed bug to submit your bug. Submit your own bug.

Revision history for this message
In , Milan Bouchet-Valat (nalimilan) wrote :

Actually, I'm the original reporter on Launchpad. Sorry for the confusion, I forgot that my report was upstreamed by someone else. So the new batchbuffer dump has been made on the same machine, and is likely to be the same freeze.

Revision history for this message
In , Milan Bouchet-Valat (nalimilan) wrote :

Using kernel 2.6.30rc6 fixes it. Bryce Harrington pointed to the following patch, which is indeed included in rc6:
https://bugs.freedesktop.org/attachment.cgi?id=25806

So the present bug is most likely a duplicate of bug 21488 (but it occurred with and without UXA and KMS).

*** This bug has been marked as a duplicate of bug 21488 ***

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote : Re: [i915] Freeze every day or two

OK, that seems to be fixed with kernel 2.6.30rc6. So that must be the above cited patch. Thanks!
 (There are new bugs with KMS in that version, though... :-p )

Geir Ove Myhr (gomyhr)
Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → High
Revision history for this message
In , Milan Bouchet-Valat (nalimilan) wrote :

Reopening for the second time, sorry. Actually, I experienced it again with kernel 2.6.30rc6 and driver 2.7.99.1+git20090519. The output of dmesg is exactly the same as that of the first dump linked here:
[ 1320.512119] Call Trace:
[ 1320.512137] [<c02d296e>] ? rb_erase+0xbe/0x130
[ 1320.512150] [<c0512ed4>] __mutex_lock_slowpath+0xa4/0x100
[ 1320.512159] [<c0512c10>] mutex_lock+0x20/0x40
[ 1320.512197] [<e0a4d688>] i915_gem_retire_work_handler+0x28/0x70 [i915]
[ 1320.512209] [<c014c8bd>] run_workqueue+0x6d/0x130
[ 1320.512238] [<e0a4d660>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
[ 1320.512248] [<c014cf48>] worker_thread+0x88/0xe0
[ 1320.512259] [<c01505a0>] ? autoremove_wake_function+0x0/0x40
[ 1320.512268] [<c014cec0>] ? worker_thread+0x0/0xe0
[ 1320.512277] [<c01501fc>] kthread+0x4c/0x80
[ 1320.512284] [<c01501b0>] ? kthread+0x0/0x80
[ 1320.512294] [<c01039c7>] kernel_thread_helper+0x7/0x10

Only difference: mouse cursor was frozen too this time (not sure those data are really meaningful).

See http://launchpadlibrarian.net/27031541/dri_debug.tgz for the full dump.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote : Re: [i915] Freeze every day or two

So bad! I've experienced another freeze, and while the symptoms are a little different (no moving mouse cursor this time), the kernel trace is the very same as in the first dump:
[ 1320.512119] Call Trace:
[ 1320.512137] [<c02d296e>] ? rb_erase+0xbe/0x130
[ 1320.512150] [<c0512ed4>] __mutex_lock_slowpath+0xa4/0x100
[ 1320.512159] [<c0512c10>] mutex_lock+0x20/0x40
[ 1320.512197] [<e0a4d688>] i915_gem_retire_work_handler+0x28/0x70 [i915]
[ 1320.512209] [<c014c8bd>] run_workqueue+0x6d/0x130
[ 1320.512238] [<e0a4d660>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
[ 1320.512248] [<c014cf48>] worker_thread+0x88/0xe0
[ 1320.512259] [<c01505a0>] ? autoremove_wake_function+0x0/0x40
[ 1320.512268] [<c014cec0>] ? worker_thread+0x0/0xe0
[ 1320.512277] [<c01501fc>] kthread+0x4c/0x80
[ 1320.512284] [<c01501b0>] ? kthread+0x0/0x80
[ 1320.512294] [<c01039c7>] kernel_thread_helper+0x7/0x10

That's using kernel 2.6.30rc6, and driver 2.7.99.1+git20090519.09beee37-0ubuntu0sarvatt~jaunty.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :
Revision history for this message
In , Milan Bouchet-Valat (nalimilan) wrote :

Eric: may I hope somebody will look at the new traces soon? I don't want to offend the dev team, and I'm willing to help you as much as I can, but rebooting and losing work everyday is *really* annoying. I'm sure you understand that... Thanks! ;-)

Revision history for this message
roger64 (rogqip-suse) wrote : Re: [i915] Freeze every day or two

I wrote a bug report some days ago and have been told that my bug is the same than the one reported here.

I just wanted to tell you that I finally got rid of this freezing bug by reverting to the 2.4 driver, though I read that I am no longer able to "initialize GEM" for what it means and I use now the so-called "classic" mode.

I have been trouble free for the last five days.

roger@roger-laptop:~$ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2592] (rev 03)

Thanks for your hard work and help.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Good to know I'm not alone! Reverting to old drivers is a good option since the bug is new in Jaunty. But for the future improvements, we really need to get tis fixed.

I advise you to subscribe to the upstream bug, they may need more information, and showing we are several people affected by this will encourage them to solve it... :-p

Revision history for this message
mutew (avheretic) wrote :

roger64, could you please provide a link to the 2.4 drivers that you mentioned in your above post. Till the current bug is fixed it would be helpful to revert to the earlier version so that I can get my work done without having to reboot every couple of hours.

Revision history for this message
roger64 (rogqip-suse) wrote :

sure

I applied exactly this recommendation. It's a quick fix.
https://wiki.ubuntu.com/ReinhardTartler/X/RevertingIntelDriverTo2.4

Hope this works for you too. If not, they also point to an easy way back

Revision history for this message
Bryce Harrington (bryce) wrote :

Btw, from the dri dumps it looks like this freeze is a different bug than the original one you reported. Of course, symptoms with freezes are identical, but anyway.

jbarnes thinks this latest freeze should be fixed by the aperture kernel patch. The offset shown in the dri dump is awfully close to the aperture boundary (256k), which is what that fix addresses. So we'll need that kernel patch in order to fix this one.

summary: - [i915] Freeze every day or two
+ [i915] X freeze due to end of aperture issue
summary: - [i915] X freeze due to end of aperture issue
+ [i915] X freeze due to end of aperture kernel issue
Revision history for this message
Bryce Harrington (bryce) wrote : Re: [i915] X freeze due to end of aperture issue

This is the kernel patch needed to fix this issue:

commit 13f4c435ebf2a7c150ffa714f3b23b8e4e8cb42f
 Author: Eric Anholt <email address hidden>
 Date: Tue May 12 15:27:36 2009 -0700
     drm/i915: Don't allow binding objects into the last page of the aperture.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Sorry if the new bug is different form the old one, but I'm not able to read these dumps.

About the patches: the commit you refer to above is the same as
https://bugs.freedesktop.org/attachment.cgi?id=2580613f4c435ebf2a7c150ffa714f3b23b8e4e8cb42f which you cited before, but with a different signature (I guess you know that, but I was confused). It's included in 2.6.30rc6 and does not appear to fix the freeze. That may come from the fact that the freeze I get now is not the same.

Do you want me to open another report?

Revision history for this message
mutew (avheretic) wrote :

roger64, thanks for the link. I "downgraded" the drivers and haven't experienced the lockup again with an uptime of over 48 hours.

Revision history for this message
In , Milan Bouchet-Valat (nalimilan) wrote :

Closing since it looks like the bug is fixed with latest 2.7.99 driver and kernel 2.6.30rc8.

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Revision history for this message
Bryce Harrington (bryce) wrote :

Milan, yes as a general rule you should file a new bug if you're not 100% sure you have the original report's issue.

Revision history for this message
Lê Kiến Trúc (le-kien-truc) wrote :

I have this bug on my PC. This is a upgrade from 8.10. This bug make me crazy because I can't find anything except "restart". So I must to downgrade to Xorg 2.4 to fix this problem.

Revision history for this message
Bryce Harrington (bryce) wrote :

Thanks for following up with upstream; according to your final comment on the upstream bug it appears this was resolved as of 6/8/09 with the 2.7.99 driver and the 2.6.30rc8 kernel, which we're well beyond in karmic, so will be closing the LP bug as fixed at this time.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Vikram Dhillon (dhillon-v10) wrote :

Unfortunately it seems this bug is still an issue. Can you confirm this issue exists with the most recent Lucid Lynx 10.04 release - http://cdimage.ubuntu.com/releases/lucid/alpha-2/. If the issue remains in Lucid, please test the latest 2.6.32 upstream kernel build - https://wiki.ubuntu.com/KernelMainlineBuilds . Let us know your results. Thanks.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Yes, the freeze still exists in Karmic, but it comes from another bug. I'm marking this one as Fix Released for Linux, since a fix was released via Intel X.org/Linux developers. I've continued tracking the problem upstream, and you can have a look at http://bugs.freedesktop.org/show_bug.cgi?id=26974 if you are interested. But that's a new report about a different bug.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
Changed in xserver-xorg-video-intel:
importance: Critical → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.