[i945gm] lucid: moving pointer over auto-hidden KDE panel with a non-KDE app icon in the systray triggers an X crash.

Bug #587708 reported by Hussein Abdallah
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xserver-xorg-video-intel (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-intel

When the KDE panel auto-hide feature is enabled and aMSN Messenger is started, if I move the pointer to the bottom of the screen several times (where the hidden panel is supposed to be), X will crash. Usually, X will not crash the first time I move the pointer to the bottom of the screen (instead, it will show the panel as it's supposed to do) and it does not take an exact number of pointer moves : it seems to take from 2 to 10 pointer moves to show the panel before X will crash. I didn't count the exact number before each crash but I realized it changes from one time to another.

I realized that I have the same bug if I use another non-KDE application that adds an icon to the system tray (I had the same bug after starting Skype).

If I enable KDE panel auto-hide but I don't start an application that adds an icon to the systray, X won't crash (no matter how many times I move the pointer to the bottom of the screen to show the panel). If I disable auto-hide (set the panel to "Always visible") and I start aMSN or Skype, X won't crash.

I have never experienced this bug when I use GNOME (I have both GNOME and KDE installed on my Ubuntu 10.04)

After a crash, I see the dialog when I have an option to run Ubuntu in low graphics mode. I'm still able to use ttys (Ctrl+Alt+F[1-7]). If I don't choose "run Ubuntu in low graphics mode" and I try to log in as root in text mode and then run startx, X won't start. If I choose to run Ubuntu in low-graphics mode, I can open a KDE or a GNOME session again and I still have a 1280x800 resolution (the same as in the 'normal' mode.

[CurrentDmesg.txt]
[ 66.406803] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
[ 80.883281] CPUFREQ: Per core ondemand sysfs interface is deprecated - up_threshold
[ 338.870123] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
[ 664.518792] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
...
[ 8240.845026] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 8240.845038] render error detected, EIR: 0x00000000
[ 8240.845186] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 730176 at 730170)
[/CurrentDmesg.txt]

[XorgLog.txt]
(II) intel(0): Initializing HW Cursor
(II) intel(0): No memory allocations

Fatal server error:
Failed to submit batchbuffer: Input/output error
[/XorgLog.txt]

This bug started after a recent apt-get upgrade (it was OK after installing Lucid on April 29th).

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: xserver-xorg-video-intel 2:2.9.1-3ubuntu5
ProcVersionSignature: Ubuntu 2.6.32-22.33-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-22-generic i686
Architecture: i386
Date: Sun May 30 20:15:36 2010
DkmsStatus: Error: [Errno 2] No such file or directory
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release i386 (20091028.5)
MachineType: Sony Corporation VGN-FE770QG
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-22-generic root=UUID=8764f9ae-9603-464d-a7ff-7ee6a9c948b6 ro quiet splash
ProcEnviron:
 LANG=fr_CA.UTF-8
 SHELL=/bin/zsh
SourcePackage: xserver-xorg-video-intel
dmi.bios.date: 09/21/2006
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: R0173J3
dmi.board.asset.tag: N/A
dmi.board.name: VAIO
dmi.board.vendor: Sony Corporation
dmi.board.version: N/A
dmi.chassis.asset.tag: 7R2L00000000348e62c0c50c3723
dmi.chassis.type: 10
dmi.chassis.vendor: Sony Corporation
dmi.chassis.version: C3LMPJ1P
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvrR0173J3:bd09/21/2006:svnSonyCorporation:pnVGN-FE770QG:pvrC3LMPJ1P:rvnSonyCorporation:rnVAIO:rvrN/A:cvnSonyCorporation:ct10:cvrC3LMPJ1P:
dmi.product.name: VGN-FE770QG
dmi.product.version: C3LMPJ1P
dmi.sys.vendor: Sony Corporation
system:
 distro: Ubuntu
 codename: lucid
 architecture: i686
 kernel: 2.6.32-22-generic

[lspci]
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03)
     Subsystem: Sony Corporation Device [104d:81ef]

Revision history for this message
Hussein Abdallah (abdallah98) wrote :
summary: - i945gm X crashes (lucid)
+ [i945gm] X crashes randomly (lucid)
description: updated
Stenten (stenten)
tags: added: 945gm crash
description: updated
Revision history for this message
Stenten (stenten) wrote : Re: [i945gm] X crashes randomly (lucid)

Can you please describe your issue in more detail? What exactly do you mean by "crash"? What happens and what screens do you see?

Do you still experience the issue if you disable Desktop Effects (compiz)?

Revision history for this message
Stenten (stenten) wrote :

Please run
cat /var/log/apt/history.log > apt_history.log.txt
and attach the file (should be in your home directory). Also note the date/time you remember upgrading to the failed packages (approximate time is ok).

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Confirmed
Revision history for this message
Hussein Abdallah (abdallah98) wrote :

By "crash" I mean that X (and all graphical applications) close and I am prompted to the "run Ubuntu in low-graphics mode (and other options)" window. When it happens, I'm still able to use ttys with Ctrl+Alt+F.. If I choose to run Ubuntu in low-graphics mode, it actually starts X in the same resolution as in the 'regular' mode (1280 x 800) but I don't have 3D acceleration any more. It doesn't crash in "low-graphics mode (at least it hasn't crashed since I started low graphics mode 4.5 hours ago). The OS was installed from an Ubuntu 9.10 CD to which I added KDE (apt-get install kubuntu-desktop) and then upgraded to Lucid on April 29 (apt-get dist-upgrade).

The latest apt-get upgrade before I had this bug was on 2010-05-26 (apt_history.log.txt attached).

I upgraded the kernel from 2.6.32.21.22 to 2.6.32.22.23 yesterday (May 30) but it was*after* the bug appeared and did not change anything. I also tried to boot on a 2.6.34 kernel (which I compiled myself from kernel.org sources) but I still have the same bug so it does not seem to be directly related to the kernel. When I sent the bug report (and also now) I was running the distribution kernel (Linux 2.6.32-22-generic #33-Ubuntu), not the custom one.

Revision history for this message
Hussein Abdallah (abdallah98) wrote :

I forgot to write in the previous post that I don't use Compiz.

Revision history for this message
Stenten (stenten) wrote :

Unfortunately that list isn't incredibly helpful because you updated at least 50 packages at the same time. My bet is it's either one of the mesa packages or the gtk update. But neither is very easy to test through downgrading because of dependency issues.

Instead, could you please enable Apport by running
gksudo gedit /etc/default/apport
and changing "enabled=0" to "enabled=1", and then save and exit? Then reboot and Apport should ask you to file a crash report after X crashes (it might take a few times); make sure you post the link to this bug in your apport bug. And then disable Apport after you get it to report the crash.

Also, are there any consistencies in when X crashes? Are you using graphics-intensive applications? 3D graphics? Do you suspend or hibernate anytime before it crashes?

Revision history for this message
Geir Ove Myhr (gomyhr) wrote :

I would usually say that mesa was the most likely suspect from the list from 2010-05-06, but looking at the changelog (https://launchpad.net/ubuntu/+source/mesa/7.7.1-1ubuntu3) it is clear that that upgrade does not change anything on intel hardware.

I am a bit surprised about the behaviour describe. Usually, these messages in dmesg output

[ 8240.845026] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 8240.845038] render error detected, EIR: 0x00000000
[ 8240.845186] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 730176 at 730170)

is associated with the display freezing and only a restart would recover that.

I'm not sure if Apport will trigger on this kind of error (partly because I'm not sure exactly what happens). Using the 2.6.34 kernel and see if an error state is captured in /sys/kernel/debug/dri/0/error_state (and attach it here if it is) may shed some light on this (see https://wiki.ubuntu.com/X/Troubleshooting/Freeze#How%20to%20Get%20a%20Batchbuffer%20Dump%20%28-intel%20only%29). You may enable automatic bug reporting for GPU errors by uncommenting the last line in /lib/udev/rules.d/40-xserver-xorg-video-intel.rules, but note that this change will be overwritten on package upgrade of xserver-xorg-video-intel.

Revision history for this message
Geir Ove Myhr (gomyhr) wrote :

Two more things:

- There is a kernel patch that should fix some GPU errors on i945 and i915 at http://lists.freedesktop.org/archives/intel-gfx/2010-May/006982.html . Since you are compiling the kernel yourself, you may want to try that.
- Adding the kernel option drm.debug=0x02 should make the drm driver more verbose about what is going wrong in dmesg output.

Revision history for this message
Hussein Abdallah (abdallah98) wrote :

I booted with 2.6.34 kernel and I'm sending as an attachment a copy of the /sys/kernel/debug/dri/0/i915_error_state file (after an X crash).

Revision history for this message
Hussein Abdallah (abdallah98) wrote :

I'm also sending the most recent dmesg output

Revision history for this message
Hussein Abdallah (abdallah98) wrote :

I have now an impression that this is bug is related to KDE because I started a GNOME session and X hasn't crashed yet (nothing else changed). I'm going to let the GNOME session open for the night to make X won't crash.

Revision history for this message
Hussein Abdallah (abdallah98) wrote :

I made a mistake in the previous post : 'to make SURE X won't crash'.

I also changed enabled=0 to enabled=1 in /etc/default/apport and uncommented the last line in /lib/udev/rules.d/40-xserver-xorg-video-intel.rules (and then rebooted the computer) but apport wasn't triggered by the X crash.

Revision history for this message
Geir Ove Myhr (gomyhr) wrote :

I attach the decoded i915_error_state (decoded with intel_error_decode) which makes it more human readable. I can't see anything wrong with it at first glance, but the GPU hangs in a 3D operation. As you can see, even the dmesg output claims that a reboot is required.

The difference between Gnome and KDE is probably because they perform different system calls, and the one that KDE performs is the one that triggers the bug. It is unlikely that it is a bug in KDE itself.

Revision history for this message
Geir Ove Myhr (gomyhr) wrote :

If you want to troubleshoot why the udev rule doesn't trigger, you could listen for uevents with `udevadm monitor --property`. It would be interesting if you could upload the output of that here (when the error triggers) since I haven't actually seen those events before. I think you will only see one of the three events described in /lib/udev/rules.d/40-xserver-xorg-video-intel.rules on i945, since the reset should only happen on i965 and newer.

There is an UdevLog.txt automatically uploaded with this bug report, which seems to be taken from /var/log/udev, but that seems to only record the events generated during boot (all the events are merely seconds apart and no new events are logged there when I unplug/plug a monitor). I attach the output of the above command when I unplug the external monitor and plug it back in 30 seconds later. The output for the GPU error event(s) should be similar.

Revision history for this message
Hussein Abdallah (abdallah98) wrote :

I will try udevadm monitor --property. I can confirm that I don't have this bug when I'm running GNOME (I left a GNOME session opened for one day and X didn't crashed).

Revision history for this message
Geir Ove Myhr (gomyhr) wrote :

Hussein, I've looked a bit closer at the batchbuffer dump and it doesn't look like any of the typical problems that we have in 945GM. It also probably won't be fixed by the kernel patch I mentioned, since the bugs it's supposed to fix have PGTBL_ER: 0x00000010 and you have a 0x0 (i.e. not a page table error).

Before we send this upstream, I would like you to check if the bug still is there is you run with the xorg-edgers PPA [1]. This will bring in the newest version of xserver-xorg-video-intel, mesa, and libdrm so that we know this bug has not been fixed there.

If the bug is still present with xorg-edgers (it probably is) we should send this upstream with a set of consistent logs. That is, Xorg.0.log, dmesg output, /var/log/dmesg (which is dmesg at boot, since this will be missing from dmesg output due to the verbosity of drm.debug=0x02), and /sys/kernel/debug/dri/0/i915_error_state from the _same_ run with kernel 2.6.34 with drm.debug=0x02 option. This should be with the xorg-edgers packages installed as well. Please upload the files here for now.

In addition, since you seem to be able to reproduce this quite easily, it could be useful with a couple of i915_error_state from different crashes, so we can see if the batchbuffer where something goes wrong is the same every time, and if not, what they have in common. Actually, you may run intel_error_decode to get the decoded version of i915_error_state and upload the decoded information directly.

[1]: https://launchpad.net/~xorg-edgers/+archive/ppa

Revision history for this message
Hussein Abdallah (abdallah98) wrote :

I finally found how to trigger this bug and it's quite strange. I remembered that this bug appeared not the day I did apt-get upgrade but two days later. The last thing I did before this bug appeared was to enable the auto-hide feature of the KDE panel :

right-click on the panel
Panel Options -> Panel Settings
More Settings...

and select

Auto-hide

However, the panel auto-hide feature is not sufficient to trigger this bug, I realized that I have also to start aMSN Messenger. With aMSN messenger started, if move the pointer to the bottom of the screen several times (where the hidden panel is supposed to be), X will crash. Usually, X will not crash the first time I move the pointer to the bottom of the screen (instead, it will show the panel as it's supposed to do) and it does not take an exact number of pointer moves. But with panel auto-hide feature enabled and with aMSN started, X will crash after 2 to 10 pointer moves to 'unhide' the panel. I didn't count the exact number before each crash but I realized it changes from one time to another.

If I enable KDE panel auto-hide but I don't start aMSN, X won't crash (no matter how many times I move the pointer to the bottom of the screen to show the panel). If I disable auto-hide (set the panel to "Always visible") and I start aMSN, X won't crash : maybe the bug was there since Lucid was released, but I didn't notice it because I wasn't using the auto-hide feature.

I don't really see the logic here. The only relationship between aMSN and KDE panel that I see is that aMSN adds an icon to the KDE system tray when it starts. I don't think I have other non-KDE applications that adds icons to the systray, so I'm not sure it is related to the systray icon. I use aMSN from the Lucid repository (0.98.3-0ubuntu1)

I have not tried yet xorg-edgers PPA. I will run intel_error_decode with several i915_error_state as soon as possible.

I tried udevadm monitor --property and I'm sending the output as an attachment. The last thing udevadm showed when X crashed was this block :

UDEV [1275752385.489124] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV_LOG=3
ACTION=change
DEVPATH=/devices/pci0000:00/0000:00:02.0/drm/card0
SUBSYSTEM=drm
ERROR=1
DEVNAME=/dev/dri/card0
DEVTYPE=drm_minor
SEQNUM=1786
ACL_MANAGE=1
MAJOR=226
MINOR=0
DEVLINKS=/dev/char/226:0

summary: - [i945gm] X crashes randomly (lucid)
+ [i945gm] lucid: moving pointer over auto-hidden KDE panel and aMSN
+ started crashes X
summary: - [i945gm] lucid: moving pointer over auto-hidden KDE panel and aMSN
+ [i945gm] lucid: moving pointer over auto-hidden KDE panel when aMSN is
started crashes X
description: updated
Bryce Harrington (bryce)
tags: added: kubuntu
Revision history for this message
Hussein Abdallah (abdallah98) wrote : Re: [i945gm] lucid: moving pointer over auto-hidden KDE panel when aMSN is started crashes X

I can confirm that I have this bug when I start another non-KDE application that adds an icon to the system tray. I did not start aMSN but I started Skype (which added its icon to the systray) and I had the same bug : when I moved the cursor to the bottom of the screen several times to show the panel, X crashed.

summary: - [i945gm] lucid: moving pointer over auto-hidden KDE panel when aMSN is
- started crashes X
+ [i945gm] lucid: moving pointer over auto-hidden KDE panel with a non-KDE
+ app icon in the systray triggers an X crash.
description: updated
Revision history for this message
Hussein Abdallah (abdallah98) wrote :

I still have this bug even with the latest Lucid updates. I tried xorg-edgers but then I couldn't use my keyboard and mouse so I was not able to check if I can reproduce this bug this xorg-edgers.

I did 3 tests using Lucid official repository X.org packages and Linux 2.6.34 kernel (which I compiled myself from kernel.org sources). For each test I collected these files :

* decoded_error_state (/sys/kernel/debug/dri/0/i915_error_state decoded with intel_error_decode)
* /var/log/dmesg
* /var/log/Xorg.0.log
* dmesg output (dmesg > dmesg_output)

The attached tarball Xcrash-debuginfo.tar.gz has the following files :

First test scenario : X crashed after I moved the mouse pointer over the auto-hidden KDE panel with the aMSN systray icon :
debuginfo/decoded_error_state
debuginfo/Xorg.0.log
debuginfo/dmesg
debuginfo/dmesg_output

Second test scenario: the same scenario as the first (after a reboot)
debuginfo2/decoded_error_state
debuginfo2/dmesg_output
debuginfo2/Xorg.0.log
debuginfo2/dmesg

Third test : X crashed after I moved the mouse pointer over the auto-hidden KDE panel with the Skype systray icon :
debuginfo3/dmesg_output
debuginfo3/Xorg.0.log
debuginfo3/dmesg
debuginfo3/decoded_error_state

In all three cases, drm.debug=0x02 was passed to the 2.6.34 kernel

I can see lines with 3DSTATE_ in all decoded GPU dumps.

Revision history for this message
Hussein Abdallah (abdallah98) wrote :

Still have the same problem with the latest stable kernel from kernel.org (2.6.35.4).

bugbot (bugbot)
description: updated
Revision history for this message
Chris Wilson (ickle) wrote :

Old UXA bug issuing an out-of-bounds copy.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.