Ubuntu

[gm45] x server crashes with: [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer

Reported by Andy Whitcroft on 2010-11-29
82
This bug affects 15 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Medium
xf86-video-intel
Won't Fix
Medium
compiz (Ubuntu)
High
Unassigned
linux (Ubuntu)
Undecided
Unassigned

Bug Description

I am getting a full X server crash and restart in combination with the kernel error below:

    [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer

and this error in the Xorg log:

    Fatal server error:
      Failed to submit batchbuffer: Invalid argument

This occurs about once a day, and always when clicking on an entry in the bottom task bar; compiz is enabled. Note that this is a Natty kernel in combination with Lucid userspace.

Update: I just triggered the same apparent symptoms doing a shift-+ in a gnome-terminal.

Andy Whitcroft (apw) wrote :

Adding xserver-xorg-video-intel as the kernel error implies that this is a misscommunication from userspace.

Andy Whitcroft (apw) wrote :
Andy Whitcroft (apw) wrote :
Andy Whitcroft (apw) on 2010-11-29
description: updated
Robert Hooker (sarvatt) wrote :

just noting these upstream responses for future reference:

https://bugzilla.kernel.org/show_bug.cgi?id=22652
http://<email address hidden>/msg01828.html

Bryce Harrington (bryce) on 2010-11-29
Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → High
Bryce Harrington (bryce) on 2010-11-29
tags: added: natty
Robert Hooker (sarvatt) on 2010-11-29
description: updated
Bryce Harrington (bryce) wrote :

Architecture is x86_64, I wonder if that's significant?

Anyway, another link of another report with this error message:
http://www.serverphorums.com/read.php?12,224683

Sarvatt noticed your xserver version 1.7.6 indicates a Lucid userspace rather than maverick?

tags: added: lucid
Bryce Harrington (bryce) wrote :

Fwiw, "Failed to submit batchbuffer" was a common issue back on Lucid.

totof1169 (bourgeotc) wrote :

i got the same message

chris-laptop kernel: [ 2589.262448] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer

on maverick with kernel 2.6.37 (applied to solve a sound bug)
didn t have this message with 2.6.36

madurax86 (madura-x86) wrote :

I have the same bug, I installed the 2.6.37 to get around an ACPI problem(hanging the system randomly, it didn't hang yet but this new message comes up on dmesg).

gene (eugenios) wrote :

Hello, I got it too.
uname -a: Linux 2.6.37-rc3-mine #1 SMP Sat Nov 27 19:08:07 CST 2010 x86_64 GNU/Linux (with Mike Galbraith's patch applied), Lucid
in kern.log I get
Dec 2 12:58:43 my kernel: [401571.066888] 11:3:1: cannot get freq at ep 0x84
Dec 2 13:05:51 my kernel: [401998.422719] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer

A gnome session is lost under some cpu/disk/memory load. When ried to add media to amarok. The Xorg also contains this:

[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x28) [0x4a33e8]
1: /usr/bin/X (mieqEnqueue+0x1f4) [0x4a2c64]
2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x47d044]
3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7f8fcd4a8000+0x53cf) [0x7f8fcd4ad3cf]
4: /usr/bin/X (0x400000+0x6fe47) [0x46fe47]
5: /usr/bin/X (0x400000+0x11d363) [0x51d363]
6: /lib/libpthread.so.0 (0x7f8fd1935000+0xf8f0) [0x7f8fd19448f0]
7: /lib/libc.so.6 (ioctl+0x7) [0x7f8fd06ec197]
8: /lib/libdrm.so.2 (drmIoctl+0x28) [0x7f8fcec9d5b8]
9: /lib/libdrm.so.2 (drmCommandNone+0x16) [0x7f8fcec9d8b6]
10: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f8fce7ef000+0x27838) [0x7f8fce816838]
11: /usr/bin/X (0x400000+0x168a44) [0x568a44]
12: /usr/bin/X (0x400000+0xa58a0) [0x4a58a0]
13: /usr/bin/X (BlockHandler+0x50) [0x435df0]
14: /usr/bin/X (WaitForSomething+0x141) [0x45faa1]
15: /usr/bin/X (0x400000+0x30952) [0x430952]
16: /usr/bin/X (0x400000+0x261aa) [0x4261aa]
17: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f8fd062cc4d]
18: /usr/bin/X (0x400000+0x25d59) [0x425d59]

gene (eugenios) wrote :

Can everyone please examine their Xorg logs. Both /var/log/Xorg.0.log.old and /var/log/Xorg.0.log or whichever is relevant.

gene (eugenios) wrote :

In my case things appear to be much more stable. An the crash concerns a current gnome session only. Since the Xorg.0.log doesn't get rewritten the X apparently survives it. I cannot cause the crash the way the reporter is doing either.

Bryce Harrington (bryce) on 2010-12-02
Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Triaged
Bryce Harrington (bryce) wrote :

Unfortunately, for these kinds of bugs the backtrace in Xorg.0.log tends not to reveal much - just that the server was unable to communicate with the GPU's "event queue" for some reason, and the server noticed graphics data was not getting consumed. That's what the 'EQ overflowing' message means. The backtraces don't show anything useful, they just point to the mieqEnqueue routine where it noticed the GPU had hung.

Is anyone able to reproduce this on natty userspace + natty kernel?

gene (eugenios) wrote :

Bryce, do you know if gnome-session keeps any logs?
As far as what gdm is telling I see this:
Fatal server error:
Server is already active for display 0
        If this server is no longer running, remove /tmp/.X0-lock
        and start again.

Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.

(WW) xf86CloseConsole: KDSETMODE failed: Bad file descriptor
(WW) xf86CloseConsole: VT_GETMODE failed: Bad file descriptor
(WW) xf86OpenConsole: VT_GETSTATE failed: Bad file descriptor
 ddxSigGiveUp: Closing log

which apparently is not related to the issue in question or dis it?
The other more recent gdm log has these:
(WW) intel(0): i830_uxa_prepare_access: bo map failed
../../intel/intel_bufmgr_gem.c:941: Error preparing buffer map 598 (pixmap): Invalid argument

gene (eugenios) wrote :

Sorry, I am with "DISPLAY=:1.0", so the crash did happen to the X session. The current log has the same as the gdm's
"(WW) intel(0): i830_uxa_prepare_access: bo map failed"

On Thu, Dec 02, 2010 at 11:10:06PM -0000, gene wrote:
> Bryce, do you know if gnome-session keeps any logs?

Aside from ~/.xsession-errors and /var/log/gdm/ I don't know of anything
in gnome worth looking at for this bug.

> As far as what gdm is telling I see this:
> Fatal server error:
> Server is already active for display 0
> If this server is no longer running, remove /tmp/.X0-lock
> and start again.
>
> which apparently is not related to the issue in question or dis it?

No, just means you tried starting X but it looked like there was another
X session already running. Probably best to just reboot at that point.

> The other more recent gdm log has these:
> (WW) intel(0): i830_uxa_prepare_access: bo map failed
> ../../intel/intel_bufmgr_gem.c:941: Error preparing buffer map 598 (pixmap): Invalid argument

Interesting, yes that does sort of sound relevant.

I notice that the section of code that produces this warning has changed
significantly between maverick 2.9.0 and 2.9.1:

// maverick 2.9.0
   /* Kernel manages fences at GTT map/fault time */
        if (i830->kernel_exec_fencing) {
            if (bo->size < i830->max_gtt_map_size) {
                if (drm_intel_gem_bo_map_gtt(bo)) {
                    xf86DrvMsg(scrn->scrnIndex, X_WARNING,
                               "%s: bo map failed\n",
                               __FUNCTION__);
                    return FALSE;
                }
            } else {
              if (dri_bo_map(bo, access == UXA_ACCESS_RW) != 0) {
                    xf86DrvMsg(scrn->scrnIndex, X_WARNING,
                               "%s: bo map failed\n",
                               __FUNCTION__);
                    return FALSE;
                }
            }
            pixmap->devPrivate.ptr = bo->virtual;
        } else { /* or not... */
            if (drm_intel_bo_pin(bo, 4096) != 0)
                return FALSE;
            drm_intel_gem_bo_start_gtt_access(bo, access == UXA_ACCESS_RW);
            pixmap->devPrivate.ptr = i830->FbBase + bo->offset;
        }

// natty:
       if (!list_is_empty(&priv->batch) &&
            (access == UXA_ACCESS_RW || priv->batch_write))
                intel_batch_submit(scrn, FALSE);

        if (priv->tiling || bo->size <= intel->max_gtt_map_size)
                ret = drm_intel_gem_bo_map_gtt(bo);
        else
                ret = dri_bo_map(bo, access == UXA_ACCESS_RW);
        if (ret) {
                xf86DrvMsg(scrn->scrnIndex, X_WARNING,
                           "%s: bo map failed: %s\n",
                           __FUNCTION__,
                           strerror(-ret));
                return FALSE;
        }

Bryce Harrington (bryce) wrote :

Commit 49d2ccab2a8 looks most interesting - if you want to try patching the -intel driver, I'd probably suggest starting with that.

I have attempted to build the newer -intel against maverick but it didn't build cleanly. Would you guys consider that to be an adequate fix?

tags: added: maverick
removed: natty
Bryce Harrington (bryce) wrote :

[I'm dropping the natty tag because it's looking more and more like this isn't needing work to be done in natty; if anyone can reproduce the issue with both a natty kernel and natty userspace, feel free to re-add the tag.]

gene (eugenios) wrote :

Got yet another crash caused by opening amarok's window. Interestingly, all progenies were killed except for two wget processes. They are now reported with "?" in the tty field by ps.
Bryce, what would you want us to try? Building the intel driver with the 49d2ccab2a8 patch?

On Fri, Dec 03, 2010 at 06:16:13PM -0000, gene wrote:
> Got yet another crash caused by opening amarok's window. Interestingly, all progenies were killed except for two wget processes. They are now reported with "?" in the tty field by ps.
> Bryce, what would you want us to try? Building the intel driver with the 49d2ccab2a8 patch?

Yeah, that'd probably be the easiest next step. It's kind of a stab in
the dark though.

Another patch worth trying is this one:
http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=a44a63d2ff6c01c3dc61de6f736dd441ddd25e52

According to Sarvatt, that fixed a bunch of GPU lockups like bug
#626967
.

 xf86-video-intel-49d2ccab2a82083110fe796636f3f91ba8c31237does not configure for me:
checking for sys/mman.h... yes
checking for mprotect... yes
./configure: line 12164: syntax error near unexpected token `XINERAMA,'
./configure: line 12164: `XORG_DRIVER_CHECK_EXT(XINERAMA, xineramaproto)'
Any suggestions?

gene (eugenios) wrote :

The commit a44a63d2ff6c01c3dc61de6f736dd441ddd25e52 does not even build the configure script.
autogen.sh gives me:
/usr/share/aclocal/xorg-macros.m4:39: XORG_MACROS_VERSION is expanded from...
configure.ac:40: the top level
autom4te: /usr/bin/m4 failed with exit status: 1
aclocal: /usr/bin/autom4te failed with exit status: 1
autoreconf: aclocal failed with exit status: 1
Not sure how to which version to apply patches there. All patches do not work for the last version 2.13.901.

On Fri, Dec 03, 2010 at 07:09:14PM -0000, gene wrote:
> xf86-video-intel-49d2ccab2a82083110fe796636f3f91ba8c31237does not configure for me:
> checking for sys/mman.h... yes
> checking for mprotect... yes
> ./configure: line 12164: syntax error near unexpected token `XINERAMA,'
> ./configure: line 12164: `XORG_DRIVER_CHECK_EXT(XINERAMA, xineramaproto)'
> Any suggestions?

apt-get build-dep xserver-xorg-video-intel

Bryce Harrington (bryce) wrote :

On Fri, Dec 03, 2010 at 07:36:21PM -0000, gene wrote:
> The commit a44a63d2ff6c01c3dc61de6f736dd441ddd25e52 does not even build the configure script.
> autogen.sh gives me:
> /usr/share/aclocal/xorg-macros.m4:39: XORG_MACROS_VERSION is expanded from...
> configure.ac:40: the top level
> autom4te: /usr/bin/m4 failed with exit status: 1
> aclocal: /usr/bin/autom4te failed with exit status: 1
> autoreconf: aclocal failed with exit status: 1
> Not sure how to which version to apply patches there. All patches do not work for the last version 2.13.901.

Probably best to retrieve the source via:

  apt-get source xserver-xorg-video-intel
  apt-get build-dep xserver-xorg-video-intel

You may need to apply the patch manually, as it looks like they've
renamed files and moved stuff around.

Then you should be able to build the package via the command:

  debuild

That will produce .debs you can install the usual way.

If you need more help on how to patch and build Ubuntu packages, it may
be faster to ask on #ubuntu in IRC, or on askubuntu.com, plus it might
save folks on this bug report some emails. ;-)

We've stuck the natty version of the driver into the x-updates PPA, as another option to test:

  https://edge.launchpad.net/~ubuntu-x-swat/+archive/x-updates

Note you probably need both the -intel driver and xutils-dev.

Brad Figg (brad-figg) on 2010-12-04
tags: added: acpi
tags: added: acpi-method-return
gene (eugenios) wrote :

I am with 10.04 and have no plans to jeopardize stability by upgrading it. My source for xserver is one year behind some of the patches Bryce mentioned. The latter do not build. Applying patches/or editing is not possible as I see it. Apparently, the patched files are present in my linux source directory. I was able to build 49d2ccab2a82083110fe796636f3f91ba8c31237 (version 2.9.99.901) though.

gene (eugenios) wrote :

I am building 2.36.1 kernel and will be testing it instead of 37. Installing xf86-intel 2.9.99 does not seem to help. After reboot I had a short lock and see the same messages in the demsg. Have not had any Xsession crashes so far.

A newer version of xf86-video-intel did not fix the problem. I tried to put the system under stress by watching 10 mpeg movies, running Fedora14 with qemu amd so on simultaneously - it stood. Then my Xsession crashed unexpectedly when scrolling a page in the firefox. I booted into a newly built 2.6.36.1 and will see how it behaves.

bugbot (bugbot) on 2010-12-05
tags: added: crash

It turns out, "Linux 2.6.36.1-mine #1 SMP Sat Dec 4 12:56:31 CST 2010 x86_64 GNU/Linux" does not have this issue.
uptime: 15:40:40 up 5 days, 18:11, 9 users, load average: 0.63, 0.49, 0.40
There is the same entry in the kern.log seen in ver. 35 (not seen in 37-rc3).
Dec 4 22:03:04 my kernel: [ 2051.693528] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
Dec 4 22:15:09 my kernel: [ 2777.127673] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id

It could be associated with this bug https://bugs.launchpad.net/bugs/593463. So fixing this bug (if it is?) introduces the current one?

Hello
A spare update.
I've upgraded maverick to natty's kernel and xorg, but it crash again.
In Xephyr don't crash, no large use of cpu or memory.
Step to reproduce, only on maverick:
Download the tar from https://launchpad.net/a4 , install the requested packages listed in README, then launch with ./a4, click open, tests, images, A4_nested_transforms.svg, click on the arrow till testo 3 on red background appear, continuosly click on zoom+ till xorg crash.
It is surely color related, because if I repeat this with testo 2 at limit i'll have a oom with a4 killed, and if I use as image drawing.svg (that is pratically all white) everythings works, no crash or large use of resource.
Hope it helps
Fabio

tags: added: natty

maverick + vanilla 2.6.37 here.

Error message "[drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer" is ever paired with a "(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Invalid argument" in Xorg.0.log

# lspci

00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)
00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)

Bryce Harrington (bryce) wrote :

Upstream indicates that it is a userspace application that triggers the crash.

So that suggests this approach to debug:
* Identify an application that reliably causes the crash (such as gnome-panel or gnome-terminal)
* Terminate that process
* Restart it using strace, capturing output to a log file
* Reproduce the X crash
* Attach the strace to this bug report.

(Fwiw, I haven't been able to reproduce this bug myself... it would be helpful if someone could better isolate the conditions and steps to reproduce the problem. Also maybe look for commonalities between individuals experiencing the same issue, in case it is hardware-specific in some sense.)

Bryce Harrington (bryce) wrote :

[Removing 'natty' tag again - from what I can tell this isn't a natty bug but rather a bug in earlier ubuntu releases when natty packages are backported to it.]

tags: removed: natty
tags: added: kj-triage
Bryce Harrington (bryce) on 2011-01-13
summary: - x server crashes with: [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting
- to mmap a purgeable buffer
+ [gm45] x server crashes with: [drm:i915_gem_mmap_gtt_ioctl] *ERROR*
+ Attempting to mmap a purgeable buffer
Changed in linux:
status: Unknown → Invalid
Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed
Changed in linux:
importance: Unknown → Medium
Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
Sami Nieminen (sami-nieminen) wrote :

I experienced the same error ([drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer) when I had desktop effects turned on and I was scrolling a web page using firefox. X was locked up, but I was able to ssh from another computer. I am running Kubuntu 10.10 but using a Natty pae-kernel.

Justin (justin-wzy) wrote :

Experienced the same under lucid with kernel 2.6.38.7..

by disabling Compiz effects the error is gone

Changed in xserver-xorg-video-intel:
status: Confirmed → Won't Fix
bugbot (bugbot) wrote :

Hey Andy,

Thanks for your interest in Ubuntu.

Thanks for testing maverick during its development period. Unfortunately it looks like this bug report didn't get attention during the maverick development period. But I see there's not been more comments on the bug since the release, which makes me wonder if this is still an issue for you?

If you've not seen this issue since maverick's release yourself, it may have been solved by kernel or X or other updates that occurred late in the release; if so, would you mind please closing the bug for us? Go to the URL mentioned in this bug report, click the yellow icon(s) in the status column and set to 'Fix Released'.

If you no longer have the hardware needed to reproduce the problem, or otherwise feel the bug no longer needs tracked in Launchpad, you can set the status to 'Invalid'.

If you are the original reporter and still have this issue, just reply to this email saying so. (Or set the bug status to Confirmed.) If you are able to re-test this against 11.04 Natty Narwhal (our current development focus) and find the issue still affects Natty, please also run 'apport-collect <bug-number>' while running natty, which will add fresh logs and debug data, and flag it for the Ubuntu-X development team to look at.

bugbot (bugbot) on 2011-04-27
Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → New
status: New → Incomplete
Bryce Harrington (bryce) wrote :

As per the upstream bug report, this isn't an X bug. It might not be compiz's fault but that's my next best guess.

affects: xserver-xorg-video-intel (Ubuntu) → compiz (Ubuntu)

I can just confirm that I am also experiencing this bug under Lucid (10.04).
kernel: [13263.635128] npviewer.bin[6387]: segfault
at 0 ip 00000000f617cf41 sp 00000000ffc48bc0 error 4 in libflashplayer.so[f5da50
00+bc1000]
kernel: [13634.651251] [drm:i915_gem_mmap_gtt_ioctl]
 *ERROR* Attempting to mmap a purgeable buffer

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 682712

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: hardy

I have the same problem (X crash with loosing session) with lucid installation and vanilla kernel 3.0.4 (upgraded due to wireless problems).
Should I provide logs here?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.