Ubuntu
linux package

[i915] Crash after suspending (NULL pointer dereference in intel_crt_detect())

Bug #553176 reported by Milan Bouchet-Valat on 2010-04-01

106

This bug affects 18 people

Affects		Status	Importance	Assigned to
	X.Org X server	Fix Released	High	freedesktop-bugs #26974
	linux (Ubuntu)	Invalid	High	Manoj Iyer
Nominated for Lucid by Milan Bouchet-Valat

Bug Description

This happens about one hour after returning from suspend, and makes X crash. See bug 553174 for the X crash report, where I have put the informations and the link to upstream report.

ProblemType: KernelOops
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-19-generic
Regression: No
Reproducible: Yes (just suspend and resume)
TestedUpstream: Yes (drm-intel-next branch, see upstream report)
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Annotation: Your system might become unstable now and might need to be restarted.
Architecture: i386
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: milan 1445 F.... pulseaudio
/dev/snd/seq: timidity 1186 F.... timidity
CRDA: Error: [Errno 2] Aucun fichier ou dossier de ce type
Card0.Amixer.info:
Card hw:0 'ICH6'/'Intel ICH6 with ALC250 at irq 17'
   Mixer name : 'Realtek ALC250 rev 2'
   Components : 'AC97a:414c4752'
   Controls : 33
   Simple ctrls : 21
CurrentDmesg:
[ 59.755859] lib80211_crypt: registered algorithm 'TKIP'
[ 66.656439] apm: BIOS not found.
[ 67.324827] ppdev: user-space parallel port driver
[ 69.416020] eth1: no IPv6 routers present
[ 99.487128] padlock: VIA PadLock not detected.
Date: Thu Apr 1 11:53:06 2010
Failure: oops
Frequency: This has only happened once.
HibernationDevice: RESUME=UUID=dd37abed-09dd-499f-bbbe-c2c3866cb9cf
Lsusb:
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: TOSHIBA Satellite A80
PccardctlIdent:
Socket 0:
   no product info available
PccardctlStatus:
Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-19-generic root=UUID=2dcc00f7-0fad-4823-9469-9e4e8dd841d0 ro crashkernel=384M-2G:64M,2G-:128M quiet splash
RelatedPackageVersions: linux-firmware 1.33
RfKill:

SourcePackage: linux
Title: BUG: unable to handle kernel NULL pointer dereference at 00000108
WpaSupplicantLog:

dmi.bios.date: 02/23/2005
dmi.bios.vendor: TOSHIBA
dmi.bios.version: V1.40
dmi.board.name: EAT10/EAT20
dmi.board.vendor: TOSHIBA
dmi.board.version: Null
dmi.chassis.asset.tag: *
dmi.chassis.type: 10
dmi.chassis.vendor: TOSHIBA
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnTOSHIBA:bvrV1.40:bd02/23/2005:svnTOSHIBA:pnSatelliteA80:pvrPSA80E-02V024FR:rvnTOSHIBA:rnEAT10/EAT20:rvrNull:cvnTOSHIBA:ct10:cvrN/A:
dmi.product.name: Satellite A80
dmi.product.version: PSA80E-02V024FR
dmi.sys.vendor: TOSHIBA

See original description

Tags:

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-03-09:

Created an attachment (id=33884)
:0.log

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-03-09:

Created an attachment (id=33887)
Xorg.0.log.old

To be clear, I must add that those log files come from the failsafe X session that Ubuntu started after the crash. So they may not contain information about the crash itself, just about the fact that X is not able to restart correctly after the crash has occurred (I couldn't even switch to the consoles).

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-03-19:

Still happening with xserver-video-intel 2.10.902+git20100317.31d5f84b. Anything I can do to help debugging?

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-03-23:

Created an attachment (id=34354)
gdb trace of the SIGPIPE

Here's a gdb trace I could get of the crash, with kernel vmlinuz-2.6.33-997-generic (drm-intel-next), X.Org server 1.6.4, Intel driver 2.9.0. (Note this happens with more recent versions too, it's just that I'm using these as they don't suffer from other small issues.)

A funny thing is that while attached to gdb, X doesn't actually crash: I only get a SIGPIPE signal, and everything just works if I type 'continue'. When I detached Xorg, the signal killed the server (I guess that's intended).

Note that at the top of the trace, the call
__libc_writev (fd=-1218895884, vector=0xbfd61438, count=1)
is always exactly the same accross different SIGPIPES. Values of the parameters were the same the first time and the second time I received that signal (without restarting X).

Hope this helps, please, please ask if you need more details!

Revision history for this message

In freedesktop.org Bugzilla #26974, Chris Wilson (ickle) wrote on 2010-03-23:

This is a gpu hang, so the most interesting information would be i915_error_state and register dumps [as suspend and resume is complicit we need to ensure we are restoring the gpu state correctly].

In terms of driver packages, the most important one to make sure is up-to-date is perhaps libdrm, preferably from drm.git but 2.4.19 at a minimum.

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-03-23:

Ah, thanks for the feedback! When should I get the GPU dump? Right after the crash occurred, while gdb is blocking Xorg?

Revision history for this message

In freedesktop.org Bugzilla #26974, Chris Wilson (ickle) wrote on 2010-03-23:

If you grab a intel-gpu-tools/tools/intel_reg_dump before suspending and after resume, and if xorg-edgers is recent enough, then the gpu dump will be in /sys/kernel/debug/dri/0/i915_error_state following a hang.

Revision history for this message

In freedesktop.org Bugzilla #26974, Julien Cristau (jcristau) wrote on 2010-03-23:

> --- Comment #4 from Milan Bouchet-Valat <email address hidden> 2010-03-23 04:03:29 PST ---
> A funny thing is that while attached to gdb, X doesn't actually crash: I only
> get a SIGPIPE signal, and everything just works if I type 'continue'. When I
> detached Xorg, the signal killed the server (I guess that's intended).
>
X ignores SIGPIPE, you need to do the same in gdb. 'handle SIGPIPE
noprint nostop' at the gdb prompt should do the trick.

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-03-24:

Created an attachment (id=34403)
second gdb trace of the crash, SIGPIPE handling disabled

So here's a new gdb trace with SIGPIPE handling disabled, as asked above.

The screen turned uniformly orange-pink, and typing 'continue' didn't change anything to it. Hitting Ctrl+Alt+F[1-8] provoked another interruption in gdb, continuing didn't trigger anything new; hitting Ctrl+Alt+F[1-8] again had no effect, but going back to Ctrl+Alt+F7 provoked interruption in gdb, without changing the screen state.

Software versions:
xserver-xorg-core 1.6.5+git20091107+server-1.6-branch.2dbcb06a
xserver-xorg-video-intel 2.10.902+git20100317.31d5f84b
libdrm-intel1 2.4.19+git20100318.56712821
kernel drm-intel-next 2.6.33-997

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-03-24:

#10

Created an attachment (id=34404)
output of intel_reg_dumper before suspending

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-03-24:

#11

Created an attachment (id=34405)
output of intel_reg_dumper after returning from suspend

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-03-24:

#12

Created an attachment (id=34406)
output of intel_reg_dumper after lock (during gdb interruption)

Here are the GPU dumps. Hope this is what you need, I didn't completely understand your comment about grabbing 'a intel-gpu-tools/tools/intel_reg_dump".

/sys/kernel/debug/dri/0/i915_error_state always said there was no error to report, at all of the 3 stages I checked it.

Does that help debugging?

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-03-30:

#13

I've just found out that a report in Ubuntu's Launchpad has 165 people marked as affected, with about 30 duplicate reports. I think that deserves a higher priority - it's been more than a year suspend is broken on i915 chips!

See https://bugs.launchpad.net/ubuntu/lucid/+source/xserver-xorg-video-intel/+bug/447159, where more similar stacktraces are available.

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-04-01:

#14

Thanks to the new report mechanism in Ubuntu 10.04, I've been able to get traces for the kernel oops that occurs before the X crash, and of that X crash at the same time. Do you think I should open a bug in bugzilla.kernel.org rather?

See
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/553176
https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/553174

Particularly interesting is:
http://launchpadlibrarian.net/42769271/OopsText.txt

In which we can see the trace leading to the oops:
[<f857cac9>] ? intel_crt_detect+0x69/0xe0 [i915]
[<f80ceeee>] ? drm_helper_probe_single_connector_modes+0x26e/0x300 [drm_kms_helper]
[<f8368d5e>] ? drm_mode_object_find+0x4e/0x70 [drm]
[<f8369b7f>] ? drm_mode_getconnector+0x2df/0x380 [drm]
[<c0589b59>] ? mutex_lock+0x19/0x40
[<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
[<f835e7cd>] ? drm_ioctl+0x25d/0x3e0 [drm]
[<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
[<f83698a0>] ? drm_mode_getconnector+0x0/0x380 [drm]
[<f835e570>] ? drm_ioctl+0x0/0x3e0 [drm]
[<c0215f71>] ? vfs_ioctl+0x21/0x90
[<c0216259>] ? do_vfs_ioctl+0x79/0x310
[<c058d210>] ? do_page_fault+0x160/0x3a0
[<c0216557>] ? sys_ioctl+0x67/0x80
[<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
[<c01033ec>] ? syscall_call+0x7/0xb
[<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
[<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140

Revision history for this message

In freedesktop.org Bugzilla #26974, Chris Wilson (ickle) wrote on 2010-04-01:

#15

Ah, an OOPS! That makes a little more sense.

The userspace stacktraces are irrelevant, as any GPU hang or OOPS may trigger such a trace -- that one identical symptom may imply any number of bugs, i.e. all the duplicates are not necessary duplicate bugs.

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2010-04-01: [i915] Crash after suspending (unable to handle kernel NULL pointer dereference at 00000108)

#16

This happens about one hour after returning from suspend, and makes X crash. I'm posting a link to the X report where I put the informations and the link to upstream report.

ProblemType: KernelOops
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-19-generic
Regression: Yes
Reproducible: No
TestedUpstream: No
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Annotation: Your system might become unstable now and might need to be restarted.
Architecture: i386
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: milan 1445 F.... pulseaudio
/dev/snd/seq: timidity 1186 F.... timidity
CRDA: Error: [Errno 2] Aucun fichier ou dossier de ce type
Card0.Amixer.info:
Card hw:0 'ICH6'/'Intel ICH6 with ALC250 at irq 17'
   Mixer name : 'Realtek ALC250 rev 2'
   Components : 'AC97a:414c4752'
   Controls : 33
   Simple ctrls : 21
CurrentDmesg:
[ 59.755859] lib80211_crypt: registered algorithm 'TKIP'
[ 66.656439] apm: BIOS not found.
[ 67.324827] ppdev: user-space parallel port driver
[ 69.416020] eth1: no IPv6 routers present
[ 99.487128] padlock: VIA PadLock not detected.
Date: Thu Apr 1 11:53:06 2010
Failure: oops
Frequency: This has only happened once.
HibernationDevice: RESUME=UUID=dd37abed-09dd-499f-bbbe-c2c3866cb9cf
Lsusb:
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: TOSHIBA Satellite A80
PccardctlIdent:
Socket 0:
   no product info available
PccardctlStatus:
Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-19-generic root=UUID=2dcc00f7-0fad-4823-9469-9e4e8dd841d0 ro crashkernel=384M-2G:64M,2G-:128M quiet splash
RelatedPackageVersions: linux-firmware 1.33
RfKill:

SourcePackage: linux
Title: BUG: unable to handle kernel NULL pointer dereference at 00000108
WpaSupplicantLog:

This happens about one hour after returning from suspend, and makes X crash. I'm posting a link to the X report where I put the informations and the link to upstream report.

ProblemType: KernelOops
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-19-generic
Regression: Yes
Reproducible: No
TestedUpstream: No
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Annotation: Your system might become unstable now and might need to be restarted.
Architecture: i386
AudioDevicesInUse:
 USER        PID ACCESS COMMAND
 /dev/snd/controlC0:  milan      1445 F.... pulseaudio
 /dev/snd/seq:        timidity   1186 F.... timidity
CRDA: Error: [Errno 2] Aucun fichier ou dossier de ce type
Card0.Amixer.info:
 Card hw:0 'ICH6'/'Intel ICH6 with ALC250 at irq 17'
   Mixer name	: 'Realtek ALC250 rev 2'
   Components	: 'AC97a:414c4752'
   Controls      : 33
   Simple ctrls  : 21
CurrentDmesg:
 [   59.755859] lib80211_crypt: registered algorithm 'TKIP'
 [   66.656439] apm: BIOS not found.
 [   67.324827] ppdev: user-space parallel port driver
 [   69.416020] eth1: no IPv6 routers present
 [   99.487128] padlock: VIA PadLock not detected.
Date: Thu Apr  1 11:53:06 2010
Failure: oops
Frequency: This has only happened once.
HibernationDevice: RESUME=UUID=dd37abed-09dd-499f-bbbe-c2c3866cb9cf
Lsusb:
 Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: TOSHIBA Satellite A80
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-19-generic root=UUID=2dcc00f7-0fad-4823-9469-9e4e8dd841d0 ro crashkernel=384M-2G:64M,2G-:128M quiet splash
RelatedPackageVersions: linux-firmware 1.33
RfKill:
 
SourcePackage: linux
Title: BUG: unable to handle kernel NULL pointer dereference at 00000108
WpaSupplicantLog:
 
dmi.bios.date: 02/23/2005
dmi.bios.vendor: TOSHIBA
dmi.bios.version: V1.40
dmi.board.name: EAT10/EAT20
dmi.board.vendor: TOSHIBA
dmi.board.version: Null
dmi.chassis.asset.tag: *
dmi.chassis.type: 10
dmi.chassis.vendor: TOSHIBA
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnTOSHIBA:bvrV1.40:bd02/23/2005:svnTOSHIBA:pnSatelliteA80:pvrPSA80E-02V024FR:rvnTOSHIBA:rnEAT10/EAT20:rvrNull:cvnTOSHIBA:ct10:cvrN/A:
dmi.product.name: Satellite A80
dmi.product.version: PSA80E-02V024FR
dmi.sys.vendor: TOSHIBA

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2010-04-01:

#17

AlsaDevices.txt Edit (589 bytes, text/plain; charset="utf-8")
AplayDevices.txt Edit (275 bytes, text/plain; charset="utf-8")
ArecordDevices.txt Edit (526 bytes, text/plain; charset="utf-8")
BootDmesg.txt Edit (40.3 KiB, text/plain; charset="utf-8")
Card0.Amixer.values.txt Edit (4.7 KiB, text/plain; charset="utf-8")
Card0.Codecs.codec97.0.ac97.0.0.txt Edit (1.6 KiB, text/plain; charset="utf-8")
Card0.Codecs.codec97.0.ac97.0.0.regs.txt Edit (767 bytes, text/plain; charset="utf-8")
Dependencies.txt Edit (1.2 KiB, text/plain; charset="utf-8")
IwConfig.txt Edit (596 bytes, text/plain; charset="utf-8")
Lspci.txt Edit (15.5 KiB, text/plain; charset="utf-8")
OopsText.txt Edit (2.6 KiB, text/plain; charset="utf-8")
PciMultimedia.txt Edit (905 bytes, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (541 bytes, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (1.2 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (2.5 KiB, text/plain; charset="utf-8")
UdevDb.txt Edit (92.4 KiB, text/plain; charset="utf-8")
UdevLog.txt Edit (201.8 KiB, text/plain; charset="utf-8")
WifiSyslog.txt Edit (256.3 KiB, text/plain; charset="utf-8")

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2010-04-01:

#18

See bug 553174 for X.org side of the crash.

Milan Bouchet-Valat (nalimilan) on 2010-04-01

description:	updated
description:	updated
summary:	- [i915] Crash after suspending (unable to handle kernel NULL pointer - dereference at 00000108) + [i915] Crash after suspending (NULL pointer dereference in + intel_crt_detect())

Jeremy Foshee (jeremyfoshee) on 2010-04-02

tags:

added: lucid

Revision history for this message

S. Christian Collins (s-chriscollins) wrote on 2010-04-02:

#19

I am also affected by this bug.

** My System **
PC: HP Pavilion dv1550se laptop
CPU: Intel(R) Pentium(R) M processor 1.60GHz
RAM: 1GB DDR400
Video: Mobile 915GM/GMS/910GML Express Graphics Controller
Sound: 82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Audio Controller
OS: Ubuntu 10.04 w/ all updates as of 4/2/10 (including 2.6.32-19 kernel)

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2010-04-03:

#20

Glad to know I'm not alone then! ;-) Marking as Triaged.

Changed in linux (Ubuntu):
status:	New → Triaged
importance:	Undecided → High
affects:	linux → xorg-server

Gabe Gorelick (gabegorelick) on 2010-04-04

tags:

added: metabug

Revision history for this message

Gabe Gorelick (gabegorelick) wrote on 2010-04-04:

#21

A lot of people seem to be affected by this bug (see all the duplicates) and it happens in a lot more places than just suspend and resume. The call traces are very similar in a lot of the duplicates, except they have intel_tv_detect on top, which I assume is a related function to intel_crt_detect.

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2010-04-05:

#22

I don't think these are all duplicates. I've kept all the duplicates where the oops happens in intel_crt_detect(), and created bug 525801 for the ones in intel_tv_detect(). It seems Apport only considers the NULL pointer dereference as the common part, but I fail to see how this can ensure the bugs are identical. So better handle them separately than get tons of contradictory feedback from reporters.

Revision history for this message

Gabe Gorelick (gabegorelick) wrote on 2010-04-05:

#23

OK, I will break the bugs apart into this one for intel_crt_detect() and bug 525801 for the intel_tv_detect(). However, from the upstream bug report, "The userspace stacktraces are irrelevant, as any GPU hang or OOPS may trigger
such a trace -- that one identical symptom may imply any number of bugs, i.e.
all the duplicates are not necessary duplicate bugs." For now we can keep them together for efficiency's sake, but it may well be that they are different.

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2010-04-05:

#24

I think upstream were talking about the X trace, not the Oops, which is the only way to get the real cause of the X crash.

Revision history for this message

Gabe Gorelick (gabegorelick) wrote on 2010-04-06:

#25

Oh yes, I see that now. But if the oops traces do point to the same bug, then couldn't the same bug be causing the intel_tv_detect() NULL pointer dereference? They have very similar stacktraces:

[<f857cac9>] ? intel_crt_detect+0x69/0xe0 [i915]
[<f80ceeee>] ? drm_helper_probe_single_connector_modes+0x26e/0x300 [drm_kms_helper]
[<f8368d5e>] ? drm_mode_object_find+0x4e/0x70 [drm]
[<f8369b7f>] ? drm_mode_getconnector+0x2df/0x380 [drm]
[<c0589b59>] ? mutex_lock+0x19/0x40
[<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
[<f835e7cd>] ? drm_ioctl+0x25d/0x3e0 [drm]
[<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
[<f83698a0>] ? drm_mode_getconnector+0x0/0x380 [drm]
[<f835e570>] ? drm_ioctl+0x0/0x3e0 [drm]
[<c0215f71>] ? vfs_ioctl+0x21/0x90
[<c0216259>] ? do_vfs_ioctl+0x79/0x310
[<c058d210>] ? do_page_fault+0x160/0x3a0
[<c0216557>] ? sys_ioctl+0x67/0x80
[<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
[<c01033ec>] ? syscall_call+0x7/0xb
[<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
[<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140

[<f868540f>] ? intel_tv_detect+0x8f/0x1c0 [i915]
[<f8322c46>] ? drm_helper_probe_single_connector_modes+0x296/0x300 [drm_kms_helper]
[<f84c6e8e>] ? drm_mode_object_find+0x4e/0x70 [drm]
[<f84c83bf>] ? drm_mode_getconnector+0x2df/0x380 [drm]
[<f84bd815>] ? drm_ioctl+0x185/0x370 [drm]
[<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0
[<f84c80e0>] ? drm_mode_getconnector+0x0/0x380 [drm]
[<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0
[<c02f1c84>] ? security_file_permission+0x14/0x20
[<c0213e5b>] ? vfs_ioctl+0x7b/0x90
[<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0
[<c0214159>] ? do_vfs_ioctl+0x79/0x310
[<c0205750>] ? do_sync_write+0x0/0x100
[<c0214457>] ? sys_ioctl+0x67/0x80
[<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0
[<c010344c>] ? syscall_call+0x7/0xb
[<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0
[<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0

The bottom of the traces don't really matter, but they both have the same 3 functions leading up to the crash (besides the last one) which seems to indicate that at some common point, e.g. drm_mode_getconnector, NULL is passed as an argument to the next function in the call stack when it shouldn't be, thus causing the NULL pointer dereference later on down the road.

Oh yes, I see that now. But if the oops traces do point to the same bug, then couldn't the same bug be causing the intel_tv_detect() NULL pointer dereference? They have very similar stacktraces:

[<f857cac9>] ? intel_crt_detect+0x69/0xe0 [i915]
 [<f80ceeee>] ? drm_helper_probe_single_connector_modes+0x26e/0x300 [drm_kms_helper]
 [<f8368d5e>] ? drm_mode_object_find+0x4e/0x70 [drm]
 [<f8369b7f>] ? drm_mode_getconnector+0x2df/0x380 [drm]
 [<c0589b59>] ? mutex_lock+0x19/0x40
 [<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
 [<f835e7cd>] ? drm_ioctl+0x25d/0x3e0 [drm]
 [<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
 [<f83698a0>] ? drm_mode_getconnector+0x0/0x380 [drm]
 [<f835e570>] ? drm_ioctl+0x0/0x3e0 [drm]
 [<c0215f71>] ? vfs_ioctl+0x21/0x90
 [<c0216259>] ? do_vfs_ioctl+0x79/0x310
 [<c058d210>] ? do_page_fault+0x160/0x3a0
 [<c0216557>] ? sys_ioctl+0x67/0x80
 [<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
 [<c01033ec>] ? syscall_call+0x7/0xb
 [<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140
 [<c04c64a7>] ? ethtool_get_drvinfo+0x137/0x140

[<f868540f>] ? intel_tv_detect+0x8f/0x1c0 [i915]
 [<f8322c46>] ? drm_helper_probe_single_connector_modes+0x296/0x300 [drm_kms_helper]
 [<f84c6e8e>] ? drm_mode_object_find+0x4e/0x70 [drm]
 [<f84c83bf>] ? drm_mode_getconnector+0x2df/0x380 [drm]
 [<f84bd815>] ? drm_ioctl+0x185/0x370 [drm]
 [<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0
 [<f84c80e0>] ? drm_mode_getconnector+0x0/0x380 [drm]
 [<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0
 [<c02f1c84>] ? security_file_permission+0x14/0x20
 [<c0213e5b>] ? vfs_ioctl+0x7b/0x90
 [<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0
 [<c0214159>] ? do_vfs_ioctl+0x79/0x310
 [<c0205750>] ? do_sync_write+0x0/0x100
 [<c0214457>] ? sys_ioctl+0x67/0x80
 [<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0
 [<c010344c>] ? syscall_call+0x7/0xb
 [<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0
 [<c04c64a7>] ? hidinput_hid_event+0x1d7/0x3a0

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2010-04-06:

#26

Yeah, you're right, that could well be the same bug. But since we don't really know what leads to one or another trace... Could it be different hardware? In this case we could consider the traces as the same.

Revision history for this message

Gabe Gorelick (gabegorelick) wrote on 2010-04-06:

#27

> Could it be different hardware?
All the bug reports are on i915.

My guess would be that intel_crt_detect gets called when a CRT monitor is detected, while intel_tv_detect is for LCDs. Can anyone who knows the driver comment on this?

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-04-23:

#28

Can I do anything to ease debugging on this bug? I'd really like to help and get this fixed, this is quite annoying, and it seems to affect many users, seeing the number of duplicates only in Ubuntu.

Revision history for this message

S. Christian Collins (s-chriscollins) wrote on 2010-04-28:

#29

I'm still experiencing the problem as of 10.04 RC1 (with updates as of 4/28/10). My problem is exactly as described by the OP. No CRT monitor is involved.

** My System **
PC: HP Pavilion dv1550se laptop
CPU: Intel(R) Pentium(R) M processor 1.60GHz
RAM: 1GB DDR400
Video: Mobile 915GM/GMS/910GML Express Graphics Controller
Sound: 82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Audio Controller
OS: Ubuntu 10.04 i386 w/ all updates as of 4/28/10 (including 2.6.32-21 kernel)

Revision history for this message

roger64 (rogqip-suse) wrote on 2010-05-07:

#30

7th of May 2010

Toshiba A 80 - Intel Pentium M 1.73 Mhz
RAM: 1.5 GB
Video: Mobile 915GM/GMS/910GML Express Graphics Controller
Sound: 82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Audio Controller

This bug (crash of X between 10 to 60 minutes at the most) existed during the whole lifespan of Karmic.
It's still on with Lucid, even with kernel 2.6.33.3 and driver 2.11.

Of course present with current Lucid kernel 2.6.32. 22.23 and driver 2.9 (xserver-xorg-video-intel).

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-05-23:

#31

Still happening with 2.6.34:
[ 2329.012081] PM: resume of devices complete after 1945.462 msecs
[ 2329.012241] PM: resume devices took 1.948 seconds
[ 2329.012273] PM: Finishing wakeup.
[ 2329.012276] Restarting tasks ... done.
[ 2329.050531] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
[wait one hour or so]
[ 3529.748173] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 3529.748409] render error detected, EIR: 0x00000000
[ 3529.748455] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 527114 at 527113)

Manoj Iyer (manjo) on 2010-05-24

tags:

added: kernel-graphics kernel-reviewed

Revision history for this message

Manoj Iyer (manjo) wrote on 2010-06-03:

#32

upstream has made some changes in this area, here is the mainline kernel build for ubuntu

http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/

Can you please try the mainline kernel and see if this problem still exists?

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2010-06-03:

#33

Confirmed with 2.6.34 here, no hope, it's been like that for more than a year...

Revision history for this message

Manoj Iyer (manjo) wrote on 2010-06-04:

#34

Since this is also a bug in upstream kernel, I opened a report upstream.

https://bugzilla.kernel.org/show_bug.cgi?id=16123

Changed in linux (Ubuntu):
status:	Triaged → Incomplete
assignee:	nobody → Manoj Iyer (manjo)

Revision history for this message

In freedesktop.org Bugzilla #26974, Chris Wilson (ickle) wrote on 2010-07-10:

#35

Massive memory corruption following hibernation should be fixed with:

commit 985b823b919273fe1327d56d2196b4f92e5d0fae
Author: Linus Torvalds <email address hidden>
Date: Fri Jul 2 10:04:42 2010 +1000

drm/i915: fix hibernation since i915 self-reclaim fixes

    Since commit 4bdadb9785696439c6e2b3efe34aa76df1149c83 ("drm/i915:
    Selectively enable self-reclaim"), we've been passing GFP_MOVABLE to the
    i915 page allocator where we weren't before due to some over-eager
    removal of the page mapping gfp_flags games the code used to play.

This caused hibernate on Intel hardware to result in a lot of memory
corruptions on resume. See for example

http://bugzilla.kernel.org/show_bug.cgi?id=13811

I suspect that is is the memory corruption that is the root cause here.

Revision history for this message

In freedesktop.org Bugzilla #26974, Chris Wilson (ickle) wrote on 2010-07-24:

#36

2.6.35-rc6 has a further fix for corruption on hibernation which nobody has been able to break (so far).

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-07-29:

#37

Sorry, but it's still here, but in a different form (apparently no oops):
$ uname -r
2.6.35-020635rc6-generic

/var/log/kern.log:
[ 1467.408347] PM: Finishing wakeup.
[ 1467.408350] Restarting tasks ... done.
[ 1467.434616] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
[ 1467.747233] sky2 0000:02:00.0: eth0: enabling interface [...]
[ 1512.204160] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1512.205452] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 11072 at 11071)

At this point, the X server is killed, and won't restart:
Fatal server error:
Failed to submit batchbuffer: Input/output error

Should I try with a more recent X version? Seems to me that the bug is still in the kernel itself, so it may not change anything.

As always, please just ask if you need more testing. This bug is really a bitch...

Revision history for this message

In freedesktop.org Bugzilla #26974, Chris Wilson (ickle) wrote on 2010-07-29:

#38

That doesn't appear to be the same bug. And I should have pointed that out in comment 17...

The original bug with the OOPs could only be the result of memory corruption. The invalid framebuffer id could have been a symptom of the same memory corruption but now appears to be a more subtle issue.

Milan, please open a fresh bug report that focuses on the framebuffer id error. This is simply to try and keep the report coherent and so easier to review. [Otherwise when developers read the first few comments to familiarise themselves with the bug, then skip to the end to catch the new updates, the report no longer makes any sense.]

Revision history for this message

In freedesktop.org Bugzilla #26974, Milan Bouchet-Valat (nalimilan) wrote on 2010-08-01:

#39

Filed as https://bugzilla.kernel.org/show_bug.cgi?id=16488

Sorry, I know mixing problems on a single report is messy, but I've already tracked this bug using about five different reports, which were closed, and I'm losing track myself... ;-)

Revision history for this message

In freedesktop.org Bugzilla #26974, Chris Wilson (ickle) wrote on 2010-08-01:

#40

> --- Comment #22 from Milan Bouchet-Valat <email address hidden> 2010-08-01 01:57:36 PDT ---
> Filed as https://bugzilla.kernel.org/show_bug.cgi?id=16488
>
> Sorry, I know mixing problems on a single report is messy, but I've already
> tracked this bug using about five different reports, which were closed, and I'm
> losing track myself... ;-)

Thanks. We are getting closer, it looks like there may be a few related
bugs across the components that are complicating the issue,
e.g. bug 29320.

Bug Watch Updater (bug-watch-updater) on 2010-09-14

Changed in xorg-server:
importance:	Unknown → High
status:	Unknown → Fix Released

Bug Watch Updater (bug-watch-updater) on 2011-01-25

Changed in xorg-server:
importance:	High → Unknown

Bug Watch Updater (bug-watch-updater) on 2011-02-03

Changed in xorg-server:
importance:	Unknown → High

Revision history for this message

dino99 (9d9) wrote on 2015-08-28:

#41

This version has expired

Changed in linux (Ubuntu):
status:	Incomplete → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

freedesktop-bugs #26974
[RESOLVED FIXED] Edit
linux-kernel-bugs #13811
[RESOLVED DUPLICATE] Edit
linux-kernel-bugs #16123
[RESOLVED CODE_FIX] Edit
linux-kernel-bugs #16488
[CLOSED INVALID] Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

[i915] Crash after suspending (NULL pointer dereference in intel_crt_detect())

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package