[arrandale] kernel OOPS setting external monitor to a higher resolution

Bug #906086 reported by Martin Pool
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xf86-video-intel
Invalid
High
xserver-xorg-video-intel (Ubuntu)
Invalid
High
Unassigned

Bug Description

Following on from bug 745112 but on precise - similar but different symptoms so a separate bug.

With an external monitor connected through the doc displayport of a thinkpad x201, running current precise:

 * the internal screen looks ok
 * if I use the display control panel to try to turn off the internal screen and use only the external screen, it looks distorted and flickery, with two panels, as if it's trying to show two desktops on the same display; when the safety timeout expires it recovers ok
 * if I have the displays side by side and the external display at a low resolution, it works ok
 * if I try to turn the external display up above 1600x1200 the kernel oopses (unrecoverably) inside i915_gen_init_ioctl

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: xserver-xorg 1:7.6+7ubuntu7
ProcVersionSignature: Ubuntu 3.2.0-5.11-generic 3.2.0-rc5
Uname: Linux 3.2.0-5-generic x86_64
.tmp.unity.support.test.0:

ApportVersion: 1.90-0ubuntu1
Architecture: amd64
CompizPlugins: [core,bailer,detection,composite,opengl,compiztoolbox,decor,gnomecompat,mousepoll,imgpng,place,regex,session,unitymtgrabhandles,resize,vpswitch,animation,grid,move,snap,wall,expo,workarounds,ezoom,fade,scale,unityshell]
CompositorRunning: compiz
Date: Mon Dec 19 09:56:07 2011
DistUpgraded: Log time: 2011-12-15 10:06:44.057094
DistroCodename: precise
DistroVariant: ubuntu
EcryptfsInUse: Yes
ExtraDebuggingInterest: Yes, whatever it takes to get this fixed in Ubuntu
GraphicsCard:
 Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:215a]
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
MachineType: LENOVO 3249CTO
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-5-generic root=UUID=8aff985d-377a-420d-a38e-62ce8bd54504 ro crashkernel=384M-2G:64M,2G-:128M quiet splash vt.handoff=7
SourcePackage: xorg
UpgradeStatus: Upgraded to precise on 2011-12-15 (3 days ago)
dmi.bios.date: 05/31/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 6QET66WW (1.36 )
dmi.board.name: 3249CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6QET66WW(1.36):bd05/31/2011:svnLENOVO:pn3249CTO:pvrThinkPadX201:rvnLENOVO:rn3249CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 3249CTO
dmi.product.version: ThinkPad X201
dmi.sys.vendor: LENOVO
version.compiz: compiz 1:0.9.6+bzr20110929-0ubuntu8
version.ia32-libs: ia32-libs 20090808ubuntu26
version.libdrm2: libdrm2 2.4.27-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 7.11-0ubuntu4
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 7.11-0ubuntu4
version.xserver-xorg-core: xserver-xorg-core 2:1.10.4-1ubuntu6
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.6.0-1ubuntu13
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20110811.g93fc084-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.15.901-1ubuntu4
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20111201+b5534a1-1

Revision history for this message
Martin Pool (mbp) wrote :
Revision history for this message
Martin Pool (mbp) wrote :

This was failing consistently, but it just worked now. Perhaps it is fixed in the latest updates.

Revision history for this message
Bryce Harrington (bryce) wrote :

Hi Martin, how did the system do over the holidays? Still reproducing the bug?

affects: xorg (Ubuntu) → xserver-xorg-video-intel (Ubuntu)
Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Incomplete
Revision history for this message
Martin Pool (mbp) wrote :

Sadly it is still failing in Precise. It is fairly reliable at lower resolutions but rarely works at full resolution. I will have the laptop (but obviously not the 30in monitor!) at the rally if you want to have a lok.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi Martin, did RAOF resolve this for you at the rally? If not, I can work on it a bit for you.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 906086] Re: [arrandale] crashes setting external monitor to a higher resolution

On 18 January 2012 20:04, Bryce Harrington <email address hidden> wrote:
> Hi Martin, did RAOF resolve this for you at the rally?  If not, I can
> work on it a bit for you.

No, he helped with some other bugs but not this. It needs a high-res
external monitor to reproduce it and we didn't have one there. I
would be very happy to test things for you, or even try to debug it
myself if you give me some pointers. The path-dependency seems like
it's a good clue.

--
Martin

Bryce Harrington (bryce)
summary: - [arrandale] crashes setting external monitor to a higher resolution
+ [arrandale] kernel OOPS setting external monitor to a higher resolution
Revision history for this message
In , Bryce Harrington (bryce) wrote :
Download full text (3.8 KiB)

Forwarding this bug from Ubuntu reporter Martin Pool:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/906086

[Problem]
if I try to turn the external display up above 1600x1200 the kernel oopses (unrecoverably) inside i915_gen_init_ioctl. See attached photo showing the full trace.

BUG: unable to handle kernel NULL pointer dereference at 000000000000030
 IP: i915_gem_get_aperture_ioctl+0x67/0xb0 [i915]
 Oops: 0000 [#1] SMP

Process Xorg
Call trace:
 drm_ioctl+0x444/0x510
 ? i915_gem_init_ioctl
 ? do_page_fault
 do_vfs_ioctl
 ? vfs_write
 sys_ioctl
 system_call_fastpath

[Original Description]
Following on from bug 745112 but on precise - similar but different symptoms so a separate bug.

With an external monitor connected through the doc displayport of a thinkpad x201, running current precise:

 * the internal screen looks ok
 * if I use the display control panel to try to turn off the internal screen and use only the external screen, it looks distorted and flickery, with two panels, as if it's trying to show two desktops on the same display; when the safety timeout expires it recovers ok
 * if I have the displays side by side and the external display at a low resolution, it works ok
 * if I try to turn the external display up above 1600x1200 the kernel oopses (unrecoverably) inside i915_gen_init_ioctl

DistroRelease: Ubuntu 12.04
Package: xserver-xorg 1:7.6+7ubuntu7
ProcVersionSignature: Ubuntu 3.2.0-5.11-generic 3.2.0-rc5
Uname: Linux 3.2.0-5-generic x86_64
.tmp.unity.support.test.0:

ApportVersion: 1.90-0ubuntu1
Architecture: amd64
CompizPlugins: [core,bailer,detection,composite,opengl,compiztoolbox,decor,gnomecompat,mousepoll,imgpng,place,regex,session,unitymtgrabhandles,resize,vpswitch,animation,grid,move,snap,wall,expo,workarounds,ezoom,fade,scale,unityshell]
CompositorRunning: compiz
Date: Mon Dec 19 09:56:07 2011
DistUpgraded: Log time: 2011-12-15 10:06:44.057094
DistroCodename: precise
DistroVariant: ubuntu
EcryptfsInUse: Yes
ExtraDebuggingInterest: Yes, whatever it takes to get this fixed in Ubuntu
GraphicsCard:
 Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:215a]
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
MachineType: LENOVO 3249CTO
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-5-generic root=UUID=8aff985d-377a-420d-a38e-62ce8bd54504 ro crashkernel=384M-2G:64M,2G-:128M quiet splash vt.handoff=7
SourcePackage: xorg
UpgradeStatus: Upgraded to precise on 2011-12-15 (3 days ago)
dmi.bios.date: 05/31/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 6QET66WW (1.36 )
dmi.board.name: 3249CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6QET66WW(1.36):bd05/31/2011:svnLENOVO:pn3249CTO:pvrThinkPadX201:rvnLENOVO:rn3249CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 3249CTO
dmi.product.version: ThinkPad X201
dmi.sys.vendor: LENOVO
version.compiz: compiz 1:0.9.6+bzr20110929-0ubuntu8
version.ia3...

Read more...

Changed in xserver-xorg-video-intel:
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 55842
BUG trace photo

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 55843
BootDmesg.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 55844
CurrentDmesg.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 55845
XorgLog.txt

Revision history for this message
Bryce Harrington (bryce) wrote :

Ok, well the kernel BUG is the most tangible problem here, let's start with that. It's conceivable that once that's fixed the other symptoms will go away.

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → High
Revision history for this message
Bryce Harrington (bryce) wrote :

Martin Pool - I've forwarded this bug upstream to https://bugs.freedesktop.org/show_bug.cgi?id=44999 - please subscribe yourself to this bug, in case they need further information or wish you to test something. Thanks ahead of time!

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

OOPS-decoding for fun and profit:

A reasonable decode of the code from the OOPS

  0x400641 <array+1>: mov 0x16e0(%r12),%rdx
   0x400649 <array+9>: lea 0x16e0(%r12),%rcx
   0x400651 <array+17>: cmp %rdx,%rcx
   0x400654 <array+20>: lea -0xb0(%rdx),%rax
   0x40065b <array+27>: je 0x400682
   0x40065d <array+29>: nopl 0x0(%rax)
   0x400664 <array+36>: mov 0x88(%rax),%rdx
   0x40066b <array+43>: add 0x30(%rdx),%ebx <- we die here
   0x40066e <array+46>: mov 0xb0(%rax),%rdx
   0x400675 <array+53>: cmp %rdx,%rcx
   0x400678 <array+56>: lea -0xb0(%rdx),%rax
   0x40067f <array+63>: add %ah,0x1000a70(%rip) # 0x14010f5
   0x400685: sbb (%rbx),%eax
   0x400687: cmp (%rax),%ebp
   0x400689: add %al,(%rax)
   0x40068b: add %al,(%rax,%rax,1)
   0x40068e: add %al,(%rax)
   0x400690: rex.WR std
   0x400692: (bad)
   0x400693: incl 0x0(%rax,%rax,1)

Some comparison with asm from my own tree suggest that

%rdx == gtt_space
0x30(%rdx) gtt_space->size

%rax == obj
0x88(rax) == obj->gtt_space

0xb0(rax) == obj->mm_list.next

We die at NULL+0x30.

Stuff before&after makes less sense, and I'm misssing the function exit code which should follow. Propably the add %rip does something fancy out-of-line.

In other news we have an obj on the pinned list with gtt_space = NULL.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Created attachment 55880
use private slab for i915 gem objects

Please try to reproduce the issue with patch. Also ensure that you either use the SLAB allocator or if your using SLUB, please boot with slub_debug on the kernel cmdline.

Revision history for this message
In , Martin Pool (mbp) wrote :

OK, I'll try that.

Revision history for this message
Martin Pool (mbp) wrote :

/home/mbp/build/linux/ubuntu-precise/drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_create’:
/home/mbp/build/linux/ubuntu-precise/drivers/gpu/drm/i915/i915_gem.c:210:19: error: ‘dev_priv’ undeclared (first use in this function)
/home/mbp/build/linux/ubuntu-precise/drivers/gpu/drm/i915/i915_gem.c:210:19: note: each undeclared identifier is reported only once for each function it appears in

This doesn't build when applied to the precise kernel, I guess because it's based off something a bit newer... I'll look.

Revision history for this message
In , Martin Pool (mbp) wrote :

You are missing one declaration of dev_priv, around line 205. With that fixed, it does build on the Ubuntu kernel. I'm going to test it with slub_debug.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

> --- Comment #7 from Martin Pool <email address hidden> 2012-01-22 22:01:45 PST ---
> You are missing one declaration of dev_priv, around line 205.  With that fixed,
> it does build on the Ubuntu kernel.  I'm going to test it with slub_debug.

Oops, I've fixed that locally but forgot to amend the patch before attaching it.

Revision history for this message
In , Martin Pool (mbp) wrote :

Hi Daniel,

With this patch applied, and 'slub_debug' on the kernel command line, I get the same problem I was previously: no oops, but the external screen is blank or can't sync.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Just to check: the patch _does_ get rid of the oops?

For the blank screen issue: Please boot with drm.debug=0x4 added to your kernel cmdline, reproduce the problem and then attach the full dmesg.

Revision history for this message
In , Martin Pool (mbp) wrote : Re: [Bug 906086]

On 25 January 2012 17:42, Daniel-ffwll <email address hidden> wrote:
> Just to check: the patch _does_ get rid of the oops?

I haven't seen it since running that patch for a few days, with
several disconnect/reconnect cycles. It was never 100% reproducible.
>
> For the blank screen issue: Please boot with drm.debug=0x4 added to your
> kernel cmdline, reproduce the problem and then attach the full dmesg.

I have a separate upstream bug
https://bugs.freedesktop.org/show_bug.cgi?id=45211 for that. I'll
provide that info soon.

--
Martin

Bryce Harrington (bryce)
tags: added: resolution
tags: added: dual-head
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi Martin, just a quick ping - upstream is blocked waiting on some additional debug information. Are you still able to reproduce this with current Precise?

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 906086] Re: [arrandale] kernel OOPS setting external monitor to a higher resolution

Hi Bryce,

I have other external-monitor problems (bug 745112) but I haven't seen
an oops for a while.

    status invalid

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → Invalid
Changed in xserver-xorg-video-intel:
status: Confirmed → Incomplete
Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Ping.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

It looks like the private slab works around foreign memory corruption and for the remaining issues the reporter has gone awol. Also, I don't quite see the evidence for why this is a regression.

Tentatively closing, please reopen if this is still an issue on latest kernel version.

Changed in xserver-xorg-video-intel:
status: Incomplete → Invalid
Revision history for this message
In , Martin Pool (mbp) wrote : Re: [Bug 906086]

I haven't seen this problem with the standard precise kernels for several
months. ok to close.

Revision history for this message
In , Jari-tahvanainen (jari-tahvanainen) wrote :

Closing resolved+invalid. No activity on >4 years.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.