[Lenovo ThinkPad X220] External screens shut off randomly

Bug #1239186 reported by Robert Navarro
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
Medium
linux (Ubuntu)
Fix Released
High
Ricardo Salveti

Bug Description

I've been trying to figure out what's going on with my external display port screens with no luck....but I think I'm getting closer to the right direction.

I upgraded to the latest 13.10 build and turned on DRM debugging with:

# /etc/modprobe.d/drm.conf
# We are turning on debug information here

options drm debug=0xe

In doing so my logs lit up with lots of graphics activity. However the two events that actually had some meaning in terms of a pain point where lines like this:

Oct 12 11:22:02 rnavarro-thinkpad kernel: [ 673.201706] [drm:ironlake_irq_handler], Pipe B FIFO underrun
Oct 12 11:22:02 rnavarro-thinkpad kernel: [ 673.201719] [drm:cpt_serr_int_handler], PCH transcoder B FIFO underrun

and

Oct 12 11:22:31 rnavarro-thinkpad kernel: [ 702.411812] [drm:ironlake_irq_handler], Pipe A FIFO underrun

I noticed that right before either one of my monitors would blank and shut off, this message would be emitted in my kern.log

I think there was some stuff added upstream to detect this, as noted here:

http://lists.freedesktop.org/archives/intel-gfx/2013-July/029882.html

But i'm not sure what's causing it, much less how to fix it.

What other information can I give that would help debug?

ProblemType: Bug
DistroRelease: Ubuntu 13.10
Package: xserver-xorg-video-intel 2:2.99.904-0ubuntu2
ProcVersionSignature: Ubuntu 3.11.0-12.19-generic 3.11.3
Uname: Linux 3.11.0-12-generic x86_64
ApportVersion: 2.12.5-0ubuntu2
Architecture: amd64
CompizPlugins: [core,composite,opengl,decor,regex,mousepoll,imgpng,vpswitch,grid,wall,gnomecompat,animation,compiztoolbox,move,place,snap,resize,unitymtgrabhandles,workarounds,fade,expo,session,scale,ezoom,unityshell]
CompositorRunning: None
Date: Sat Oct 12 11:45:47 2013
DistUpgraded: 2013-10-12 10:47:24,030 DEBUG enabling apt cron job
DistroCodename: saucy
DistroVariant: ubuntu
DkmsStatus:
 i915-3.9-3.8, 0.02, 3.8.0-31-generic, x86_64: installed
 vboxhost, 4.2.18, 3.11.0-12-generic, x86_64: installed
 vboxhost, 4.2.18, 3.8.0-31-generic, x86_64: installed
ExtraDebuggingInterest: Yes
GraphicsCard:
 Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:21da]
InstallationDate: Installed on 2013-06-30 (103 days ago)
InstallationMedia: Ubuntu-GNOME 13.04 "Raring Ringtail" - Release amd64 (20130424)
MachineType: LENOVO 4286CTO
MarkForUpload: True
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.11.0-12-generic root=/dev/mapper/vg_rnavarro--leno-lv_root ro quiet splash i915.semaphores=0 vt.handoff=7
SourcePackage: xserver-xorg-video-intel
UpgradeStatus: Upgraded to saucy on 2013-10-12 (0 days ago)
XorgConf:
 Section "Device"
         Identifier "intel"
         Driver "intel"
         Option "AccelMethod" "sna"
 EndSection
dmi.bios.date: 02/14/2012
dmi.bios.vendor: LENOVO
dmi.bios.version: 8DET58WW (1.28 )
dmi.board.asset.tag: Not Available
dmi.board.name: 4286CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr8DET58WW(1.28):bd02/14/2012:svnLENOVO:pn4286CTO:pvrThinkPadX220:rvnLENOVO:rn4286CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 4286CTO
dmi.product.version: ThinkPad X220
dmi.sys.vendor: LENOVO
version.compiz: compiz N/A
version.ia32-libs: ia32-libs 20090808ubuntu36
version.libdrm2: libdrm2 2.4.46-1
version.libgl1-mesa-dri: libgl1-mesa-dri 9.2.1-1ubuntu3
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental 9.2.1-1ubuntu3
version.libgl1-mesa-glx: libgl1-mesa-glx 9.2.1-1ubuntu3
version.xserver-xorg-core: xserver-xorg-core 2:1.14.3-3ubuntu1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.3-0ubuntu3.1
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.2.0-0ubuntu10
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.904-0ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.9-2ubuntu1
xserver.bootTime: Sat Oct 12 11:20:57 2013
xserver.configfile: /etc/X11/xorg.conf
xserver.errors:

xserver.logfile: /var/log/Xorg.0.log
xserver.outputs:
 product id 16510
 vendor DEL
xserver.version: 2:1.14.3-3ubuntu1
---
ApportVersion: 2.12.5-0ubuntu2
Architecture: amd64
CompizPlugins: [core,composite,opengl,decor,regex,mousepoll,imgpng,vpswitch,grid,wall,gnomecompat,animation,compiztoolbox,move,place,snap,resize,unitymtgrabhandles,workarounds,fade,expo,session,scale,ezoom,unityshell]
CompositorRunning: None
DistUpgraded: 2013-10-12 10:47:24,030 DEBUG enabling apt cron job
DistroCodename: saucy
DistroRelease: Ubuntu 13.10
DistroVariant: ubuntu
DkmsStatus:
 i915-3.9-3.8, 0.02, 3.8.0-31-generic, x86_64: installed
 vboxhost, 4.2.18, 3.11.0-12-generic, x86_64: installed
 vboxhost, 4.2.18, 3.8.0-31-generic, x86_64: installed
ExtraDebuggingInterest: Yes
GraphicsCard:
 Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:21da]
InstallationDate: Installed on 2013-06-30 (106 days ago)
InstallationMedia: Ubuntu-GNOME 13.04 "Raring Ringtail" - Release amd64 (20130424)
MachineType: LENOVO 4286CTO
MarkForUpload: True
Package: xserver-xorg-video-intel 2:2.99.904-0ubuntu2
PackageArchitecture: amd64
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.11.0-12-generic root=/dev/mapper/vg_rnavarro--leno-lv_root ro quiet splash i915.semaphores=0 vt.handoff=7
ProcVersionSignature: Ubuntu 3.11.0-12.19-generic 3.11.3
Tags: saucy ubuntu
Uname: Linux 3.11.0-12-generic x86_64
UpgradeStatus: Upgraded to saucy on 2013-10-12 (2 days ago)
UserGroups: adm audio cdrom dialout dip fax floppy fuse lpadmin mythtv netdev plugdev sambashare scanner sudo tape vboxusers video wireshark
dmi.bios.date: 07/18/2013
dmi.bios.vendor: LENOVO
dmi.bios.version: 8DET69WW (1.39 )
dmi.board.asset.tag: Not Available
dmi.board.name: 4286CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr8DET69WW(1.39):bd07/18/2013:svnLENOVO:pn4286CTO:pvrThinkPadX220:rvnLENOVO:rn4286CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 4286CTO
dmi.product.version: ThinkPad X220
dmi.sys.vendor: LENOVO
version.compiz: compiz N/A
version.ia32-libs: ia32-libs 20090808ubuntu36
version.libdrm2: libdrm2 2.4.46-1
version.libgl1-mesa-dri: libgl1-mesa-dri 9.2.1-1ubuntu3
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental 9.2.1-1ubuntu3
version.libgl1-mesa-glx: libgl1-mesa-glx 9.2.1-1ubuntu3
version.xserver-xorg-core: xserver-xorg-core 2:1.14.3-3ubuntu2
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.3-0ubuntu3.1
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.2.0-0ubuntu10
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.904-0ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.9-2ubuntu1
xserver.bootTime: Tue Oct 15 08:39:05 2013
xserver.configfile: default
xserver.errors:

xserver.logfile: /var/log/Xorg.0.log
xserver.outputs:
 product id 728
 vendor LGD
xserver.version: 2:1.14.3-3ubuntu2

Revision history for this message
In , Robert Navarro (crshman) wrote :

Created attachment 87256
kernel error output

I'm not sure if this is the right place, but based on the error log it seemed right.

Anyways, I tried plugging in my two Dell monitors via display port to my Lenovo x220 machine and it generated these errors in the system log.

I'm not sure if it's related, but the primary reason I opened this bug was because the monitors will blank after a few minutes of activity. The timeout period seems random and I can't really correlate it with any one activity.

I'm running the latest version of the intel graphics drivers, 2013Q3.

## lspci output
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)

## uname -a output
Linux rnavarro-thinkpad 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

## lsb_release -a output
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 13.04
Release: 13.04
Codename: raring

## aptitude show libdrm-intel1 output
Package: libdrm-intel1
State: installed
Automatically installed: no
Multi-Arch: same
Version: 2.4.45-0ubuntu1
Priority: optional
Section: libs
Maintainer: Ubuntu Developers <email address hidden>
Architecture: amd64
Uncompressed Size: 189 k
Depends: libc6 (>= 2.17), libdrm2 (>= 2.4.38), libpciaccess0
PreDepends: multiarch-support
Breaks: libdrm-intel1 (!= 2.4.45-0ubuntu1)
Replaces: libdrm-intel1 (< 2.4.45-0ubuntu1)
Description: Userspace interface to intel-specific kernel DRM services -- runtime
 This library implements the userspace interface to the intel-specific kernel DRM services. DRM stands for "Direct Rendering
 Manager", which is the kernelspace portion of the "Direct Rendering Infrastructure" (DRI). The DRI is currently used on Linux to
 provide hardware-accelerated OpenGL drivers.

## aptitude show libdrm2 output
Package: libdrm2
State: installed
Automatically installed: no
Multi-Arch: same
Version: 2.4.45-0ubuntu1
Priority: optional
Section: libs
Maintainer: Ubuntu Developers <email address hidden>
Architecture: amd64
Uncompressed Size: 103 k
Depends: libc6 (>= 2.17)
PreDepends: multiarch-support
Breaks: libdrm2 (!= 2.4.45-0ubuntu1)
Replaces: libdrm2 (< 2.4.45-0ubuntu1)
Description: Userspace interface to kernel DRM services -- runtime
 This library implements the userspace interface to the kernel DRM services. DRM stands for "Direct Rendering Manager", which is
 the kernelspace portion of the "Direct Rendering Infrastructure" (DRI). The DRI is currently used on Linux to provide
 hardware-accelerated OpenGL drivers.

 This package provides the runtime environment for libdrm.

Revision history for this message
In , Robert Navarro (crshman) wrote :

Created attachment 87257
glxinfo output

Revision history for this message
In , Robert Navarro (crshman) wrote :

Forgot to mention, let me know if there is any other information that you need or if you'd like me to change any settings to up the debug level anywhere (along with location)

Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

You've come to the right place - if you're prepared to build your own kernels. Please try the drm-intel-nightly branch of [1]. First, your issues are related to hotplugging and display port link training/maintenance, both of which have been updated and fixed considerably in the latest kernels. It could be something we've taken care of already. Second, I can't find a kernel version where your log would match the source code; I can only presume it's a distro kernel with some changes on top. So I don't know what exactly you're running and what version I should be looking at.

Please report back; if the problem persist, attach the dmesg from early boot to the problem, with drm.debug=0xe module parameter.

[1] git://people.freedesktop.org/~danvet/drm-intel

Revision history for this message
In , Robert Navarro (crshman) wrote :

Hello Jani,

So I've gone ahead and updated my os install to the latest ubuntu, bringing me up to the 3.11.0-12 kernel.

It looks like the hot plug issues have gone away, which is great....however once I turned on debugging like you mentioned I figured out what was actually going on.

Right before my either of my monitors goes blank I get a message emitted like this:

Oct 12 11:22:02 rnavarro-thinkpad kernel: [ 673.201706] [drm:ironlake_irq_handler], Pipe B FIFO underrun
Oct 12 11:22:02 rnavarro-thinkpad kernel: [ 673.201719] [drm:cpt_serr_int_handler], PCH transcoder B FIFO underrun

Shortly after my second monitor goes blank, emitting a similar message:
Oct 12 11:22:31 rnavarro-thinkpad kernel: [ 702.411812] [drm:ironlake_irq_handler], Pipe A FIFO underrun

Is there more detailed debug information that I can output to help identify what is causing this?

I'm driving both monitors with:
Oct 12 11:22:56 rnavarro-thinkpad kernel: [ 727.212145] [drm:drm_mode_debug_printmodeline], Modeline 32:"2560x1440" 60 241500 2560 2608 2640 2720 1440 1443 1448 1481 0x48 0x9

Revision history for this message
Robert Navarro (crshman) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Confirmed
Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

Updating subject accordingly. We probably have fixes in this area too, so a test spin on drm-intel-nightly would be appreciated. Thanks.

Revision history for this message
Chris Wilson (ickle) wrote :

The hardware reports a hotplug disconnect for one of your DP monitors, after which userspace responds by switching it off. We don't have much control over the hotplug detection, so I don't think there is much we can do to rectify it other than by manually polling the outputs.

Revision history for this message
Robert Navarro (crshman) wrote : BootDmesg.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Robert Navarro (crshman) wrote : BootLog.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : Dependencies.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : DpkgLog.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : GconfCompiz.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : Lspci.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : Lsusb.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : MonitorsUser.xml.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : ProcEnviron.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : ProcModules.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : UdevDb.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : UdevLog.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : XorgLog.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : XorgLogOld.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : Xrandr.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : make.log.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : xdpyinfo.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : xserver.devices.txt

apport information

Revision history for this message
Robert Navarro (crshman) wrote : Re: External screens shut off randomly

Hey Chris,

(Not sure why the apport utility created multiple comments for all of those attachments instead of grouping them like before....but oh well)

Anyways, I tried it again...this time I booted with my two DP monitors plugged in. Started up fine, logged in...still good.

I switched the active workspace a few times left/right and then the first monitor blanked.

I did it a few more times and then the second monitor blanked.

I then physically undocked the laptop, detaching both screens and everything else, to send the captured system reports.

Maybe that will drop the amount of hotplug events you see to help filter things out for debugging?

Are there any other debug options you'd like me to turn on/up? What else can I do to help pinpoint the buffer underrun error?

Revision history for this message
Robert Navarro (crshman) wrote :

Also, if you look at the CurrentDmesg.txt file, around [16.673447] is where I pause for a second, then start switching the active workspace back and forth.

A few seconds later at [32.316462] is when the first monitor blanks off

I continue flipping back and forth (with one monitor on and the other off) and around [75.565372] the second monitor dies and I can't see anything.

At [89.513926] (14 seconds later) I disconnect the laptop from the docking station and open up firefox to start filing this bug.

Revision history for this message
In , Robert Navarro (crshman) wrote :

Hey Jani,

So I grabbed the latest intel drm kernel from here:

http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/current/

In particular, linux-image-3.12.0-994-generic_3.12.0-994.201310150447_amd64

I tried it again...this time I booted with my two DP monitors plugged in. Started up fine, logged in...still good.

I switched the active workspace a few times left/right and then the first monitor blanked.

I did it a few more times and then the second monitor blanked.

I then physically undocked the laptop, detaching both screens and everything else, to send the captured system reports.

Here is a copy of the apport log dump from right after the error occurred:

http://www.crshman.com/debug/apport.xserver-xorg-video-intel.4x6KCY.apport_unpack/

Also, if you look at the CurrentDmesg file, around [21.961459] is where I pause for a second, then start switching the active workspace back and forth.

A few seconds later at [72.034577] is when the first monitor blanks off

What other debug information can I grab to help out with this?

Revision history for this message
Robert Navarro (crshman) wrote :

Hello,

Is there anything I can test/add to this report to help figure out what's going on?

Revision history for this message
In , Robert Navarro (crshman) wrote :

Hello,

Is there anything else I can test/add to this report to help figure out what's going on?

Revision history for this message
In , Robert Navarro (crshman) wrote :

Hello,

Is there anything I can test/add to this report to help figure out what's going on?

Revision history for this message
In , Mika-kuoppala (mika-kuoppala) wrote :

(In reply to comment #8)
> Hello,
>
> Is there anything I can test/add to this report to help figure out what's
> going on?

Would be intresting to see if underruns persist with lower resolution modes.

Revision history for this message
In , Robert Navarro (crshman) wrote :

(In reply to comment #9)
> Would be intresting to see if underruns persist with lower resolution modes.

Here are the different modes my monitors can do:

DP2 connected 1440x2560+1440+0 left (normal left inverted right x axis y axis) 597mm x 336mm
   2560x1440 60.0*+
   1920x1200 59.9
   1920x1080 60.0 60.0 50.0 59.9 24.0 24.0
   1920x1080i 60.1 50.0 60.0
   1600x1200 60.0
   1680x1050 60.0
   1280x1024 75.0 60.0
   1280x800 59.8
   1152x864 75.0
   1280x720 60.0 50.0 59.9
   1024x768 75.1 60.0
   800x600 75.0 60.3
   720x576 50.0
   720x480 60.0 59.9
   640x480 75.0 60.0 59.9
   720x400 70.1

Any particular one you think would work best?

Revision history for this message
penalvch (penalvch) wrote :

Robert Navarro, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p xserver-xorg-video-intel REPLACE-WITH-BUG-NUMBER

Please note, given that the information from the prior release is already available, doing this on a release prior to the development one would not be helpful.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → Low
status: Confirmed → Incomplete
Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

We've had piles and piles of watermark fixes, which should help in rectifying pipe underruns. Can you please retest with a recent drm-intel-nightly build?

Revision history for this message
In , Robert Navarro (crshman) wrote :

Hey Daniel,

Sounds good, I'll update to the latest nightly and see how things go.

Revision history for this message
In , Robert Navarro (crshman) wrote :

Created attachment 91827
01-10-2014 drm-intel-nightly

Revision history for this message
In , Robert Navarro (crshman) wrote :

I've gone ahead and updated to the latest nightly:

Linux rnavarro-thinkpad 3.13.0-994-generic #201401100405 SMP Fri Jan 10 09:05:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

The release done on 1/10/14

I'm still getting the underruns. Right after I logged in I got one one my second monitor, and then shortly after another on my first.

The latest log can be found here (01-10-2014 drm-intel-nightly):

https://bugs.freedesktop.org/attachment.cgi?id=91827

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Ville, any ideas?

Revision history for this message
In , Ville-syrjala-e (ville-syrjala-e) wrote :

Assuming the watermarks get computed correctly, the only issue i can think of is that the WM latency values provided by the BIOS might be too optimistic. My SNB machine has the same SSKPD though (not that I've ever tried dual 25x14 displays on it).

Does the problem occur with just one of the displays plugged in?

Or with two displays and lower resolution on both. 1920x1080@60 should be OK.

Can you take some register dumps (for the original dual 2560x1440 case)? I'd like to check it things look OK. As root run this:

intel_reg_read 0xc6014 0xc6040 0xc6044 0xc6018 0xc6048 0xc604c 0x45100 0x45104
0x45108 0x4510c 0x45110 0x45120 0x145d10 0x42000 0x42004 0x42020 0x70180 0x71180

intel_reg_read is part of intel-gpu-tools.

Revision history for this message
In , Robert Navarro (crshman) wrote :

Replies inline:

> Does the problem occur with just one of the displays plugged in?
I've never had this happen with only a single display.

> Or with two displays and lower resolution on both. 1920x1080@60 should be OK.
With both monitors 1920x1080@60 it doesn't happen

> Can you take some register dumps (for the original dual 2560x1440 case)? I'd
> like to check it things look OK. As root run this:
>
> intel_reg_read 0xc6014 0xc6040 0xc6044 0xc6018 0xc6048 0xc604c 0x45100
> 0x45104
> 0x45108 0x4510c 0x45110 0x45120 0x145d10 0x42000 0x42004 0x42020 0x70180
> 0x71180
>
> intel_reg_read is part of intel-gpu-tools.
Do I run the intel_reg_read right after the problem has occurred, or when? (Never used that tool)

Thanks for looking into this guys, I'll do my best to get you all and any information you need to debug this!

Revision history for this message
In , Ville-syrjala-e (ville-syrjala-e) wrote :

(In reply to comment #17)
> Replies inline:
>
> > Does the problem occur with just one of the displays plugged in?
> I've never had this happen with only a single display.
>
> > Or with two displays and lower resolution on both. 1920x1080@60 should be OK.
> With both monitors 1920x1080@60 it doesn't happen
>
> > Can you take some register dumps (for the original dual 2560x1440 case)? I'd
> > like to check it things look OK. As root run this:
> >
> > intel_reg_read 0xc6014 0xc6040 0xc6044 0xc6018 0xc6048 0xc604c 0x45100
> > 0x45104
> > 0x45108 0x4510c 0x45110 0x45120 0x145d10 0x42000 0x42004 0x42020 0x70180
> > 0x71180
> >
> > intel_reg_read is part of intel-gpu-tools.
> Do I run the intel_reg_read right after the problem has occurred, or when?
> (Never used that tool)

You can run it as soon as the displays are lit up, and probably best to run it also after the problem has occured (just to make sure the registers haven't changed magically in between).

Revision history for this message
In , Robert Navarro (crshman) wrote :

Created attachment 92058
kernel output

Revision history for this message
In , Robert Navarro (crshman) wrote :

Created attachment 92059
reg_read_2014-01-14T09:33:48-0800.txt

Revision history for this message
In , Robert Navarro (crshman) wrote :

Created attachment 92060
reg_read_2014-01-14T09:30:53-0800.txt

Revision history for this message
In , Robert Navarro (crshman) wrote :

Created attachment 92061
reg_read_2014-01-14T09:30:42-0800.txt

Revision history for this message
In , Robert Navarro (crshman) wrote :

Created attachment 92062
reg_read_2014-01-14T09:30:01-0800.txt

Revision history for this message
In , Robert Navarro (crshman) wrote :

Created attachment 92063
reg_read_2014-01-14T09:29:03-0800.txt

Revision history for this message
In , Robert Navarro (crshman) wrote :

Ok, i've gone ahead and bumped my kernel to this version:

Linux rnavarro-thinkpad 3.13.0-994-generic #201401140526 SMP Tue Jan 14 10:27:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

I went ahead and added a dump of the intel_reg_read output to rc.local to get an early dump.

I then manually ran a dump right when I logged in, and again a few more times after it happened again.

Here is the kernel log:
https://bugs.freedesktop.org/attachment.cgi?id=92058

P.S. This was a particularly good one, my main monitor didn't even turn back on this time after the underrun.

Revision history for this message
In , Robert Navarro (crshman) wrote :

Also, to save you guys some time, I didn't notice any differences between any of the dumps from intel_reg_read

Revision history for this message
In , Ville-syrjala-e (ville-syrjala-e) wrote :

Hmm. The register dumps look perfectly fine. So the BIOS provided memory latency values being too low to keep the system happy remains my only theory.

Any chance there might be a BIOS update available for the machine? That might be worth a shot, although I can't guarantee that any update would affect the latency values.

In any case, I'll need to cook up some patches to allow run-time modification of the latency values, so that we can try and see if increasing them would actually help...

Revision history for this message
In , Robert Navarro (crshman) wrote :

(In reply to comment #27)

> Any chance there might be a BIOS update available for the machine? That
> might be worth a shot, although I can't guarantee that any update would
> affect the latency values.
I took a look at the lenovo website and I'm currently running the latest BIOS revision, 1.39.

Revision history for this message
In , Ville-syrjala-e (ville-syrjala-e) wrote :

Created attachment 92540
Patch to allow changing watermark latency values

This patch allows changing the latency values we use for computing the watermarks.

It adds three new debugfs files. "i915_pri_wm_latency" being the one we're interested in here.

Reading the file should give similar output as the kernel log had. So in this case it should look like this:

# cat i915_pri_wm_latency
Primary WM0 latency 7 (0.7 usec)
Primary WM1 latency 3 (1.5 usec)
Primary WM2 latency 4 (2.0 usec)
Primary WM3 latency 22 (11.0 usec)

What you could then do is write new latency values to the file. Let's say we try to double the latency values:
# echo '14 6 8 44' > i915_pri_wm_latency

Now reading the file again should show the new values. To actually make the system use them you'd need to force a modeset on all the displays.
"xset dpms force off; xset dpms force on" should be enough for that. After this is done you should see some change in the 0x45100 and 0x45104 registers.

And then it should just be a matter of trying to cause another underrun, and increasing the latency values until they no longer occur.

Revision history for this message
In , Robert Navarro (crshman) wrote :

Hello,

I actually just updated my kernel version a few minutes ago to check to see how things were going.

I'm currently running:

Linux rnavarro-thinkpad 3.13.0-994-generic #201401210405 SMP Tue Jan 21 09:05:52 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Did this patch make it into the 01/21/14 nightly, or should I wait for the 01/22/14 nightly?

Additionally, where is the 'i915_pri_wm_latency' located?

I searched in /sys/kernel/debug and it didn't show up (which would make sense if the patch hadn't hit the nightly yet)

Revision history for this message
In , Ville-syrjala-e (ville-syrjala-e) wrote :

(In reply to comment #30)
> Hello,
>
> I actually just updated my kernel version a few minutes ago to check to see
> how things were going.
>
> I'm currently running:
>
> Linux rnavarro-thinkpad 3.13.0-994-generic #201401210405 SMP Tue Jan 21
> 09:05:52 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>
> Did this patch make it into the 01/21/14 nightly, or should I wait for the
> 01/22/14 nightly?

I didn't even post it to the mailing list yet. I can do that if it's easier for you to test it through that prebuilt kernel.

> Additionally, where is the 'i915_pri_wm_latency' located?

It will show up in /sys/kernel/debug/dri/0/

Revision history for this message
In , Robert Navarro (crshman) wrote :

(In reply to comment #31)
> I didn't even post it to the mailing list yet. I can do that if it's easier
> for you to test it through that prebuilt kernel.

Oops jumping the gun a bit, yes it would be much easier for me to test in a pre-built kernel.

Thanks for your efforts in helping resolve this!

Revision history for this message
In , Ville-syrjala-e (ville-syrjala-e) wrote :

(In reply to comment #32)
> (In reply to comment #31)
> > I didn't even post it to the mailing list yet. I can do that if it's easier
> > for you to test it through that prebuilt kernel.
>
> Oops jumping the gun a bit, yes it would be much easier for me to test in a
> pre-built kernel.
>
> Thanks for your efforts in helping resolve this!

Daniel picked up the patch for -nightly, so hopefully it'll appear in your prebuilt kernels soonish.

Revision history for this message
In , Robert Navarro (crshman) wrote :

Ok, so it looks like I have this now.

root@rnavarro-thinkpad:/sys/kernel/debug/dri/0# cat i915_pri_wm_latency
WM0 7 (0.7 usec)
WM1 3 (1.5 usec)
WM2 4 (2.0 usec)
WM3 22 (11.0 usec)

root@rnavarro-thinkpad:/sys/kernel/debug/dri/0# echo '14 6 8 44' > i915_pri_wm_latency

root@rnavarro-thinkpad:/sys/kernel/debug/dri/0# cat i915_pri_wm_latency
WM0 14 (1.4 usec)
WM1 6 (3.0 usec)
WM2 8 (4.0 usec)
WM3 44 (22.0 usec)

Ran the commands just as described, would it make sense to figure out what the minimums are?

Does it matter which WMx I'm changing?

Should I change them all at the same time as described, or one by one?

Revision history for this message
In , Ville-syrjala-e (ville-syrjala-e) wrote :

(In reply to comment #34)
> Ok, so it looks like I have this now.
>
> root@rnavarro-thinkpad:/sys/kernel/debug/dri/0# cat i915_pri_wm_latency
> WM0 7 (0.7 usec)
> WM1 3 (1.5 usec)
> WM2 4 (2.0 usec)
> WM3 22 (11.0 usec)
>
> root@rnavarro-thinkpad:/sys/kernel/debug/dri/0# echo '14 6 8 44' >
> i915_pri_wm_latency
>
> root@rnavarro-thinkpad:/sys/kernel/debug/dri/0# cat i915_pri_wm_latency
> WM0 14 (1.4 usec)
> WM1 6 (3.0 usec)
> WM2 8 (4.0 usec)
> WM3 44 (22.0 usec)
>
> Ran the commands just as described, would it make sense to figure out what
> the minimums are?

I guess we can try to narrow it down as much as possible. If the doubled values work, then we could bisect it further to find the smallest acceptable value. If the doubled values didn't work, might want to try 3x,4x,5x...

>
> Does it matter which WMx I'm changing?

With two displays only WM0 will be used. The others only kick in to provide more power savings in single display use cases.

>
> Should I change them all at the same time as described, or one by one?

Probably best to keep changing all in sync for now. I think we at least need to maintain the relationship WM0<=WM1<=WM2<=WM3 (for the usec values).

Also probably a good idea to check at each step that the change resulted in a corresponding change to the 0x45100 and 0x45104 register values. In your two display case, those two registers should always have an identical value to each other.

Revision history for this message
In , Robert Navarro (crshman) wrote :

I'm not noticing a change in the register values when I change the latencies:

root@rnavarro-thinkpad:~# cat /sys/kernel/debug/dri/0/i915_pri_wm_latency
WM0 7 (0.7 usec)
WM1 3 (1.5 usec)
WM2 4 (2.0 usec)
WM3 22 (11.0 usec)

root@rnavarro-thinkpad:~# intel_reg_read 0x45100 0x45104
0x45100 : 0xD0006
0x45104 : 0xD0006

root@rnavarro-thinkpad:~# echo '14 6 8 44' > /sys/kernel/debug/dri/0/i915_pri_wm_latency

root@rnavarro-thinkpad:~# cat /sys/kernel/debug/dri/0/i915_pri_wm_latency
WM0 14 (1.4 usec)
WM1 6 (3.0 usec)
WM2 8 (4.0 usec)
WM3 44 (22.0 usec)

root@rnavarro-thinkpad:~# intel_reg_read 0x45100 0x45104
0x45100 : 0xD0006
0x45104 : 0xD0006

Is that unexpected?

Revision history for this message
In , Ville-syrjala-e (ville-syrjala-e) wrote :

(In reply to comment #36)
> I'm not noticing a change in the register values when I change the latencies:
>
> root@rnavarro-thinkpad:~# cat /sys/kernel/debug/dri/0/i915_pri_wm_latency
> WM0 7 (0.7 usec)
> WM1 3 (1.5 usec)
> WM2 4 (2.0 usec)
> WM3 22 (11.0 usec)
>
> root@rnavarro-thinkpad:~# intel_reg_read 0x45100 0x45104
> 0x45100 : 0xD0006
> 0x45104 : 0xD0006
>
> root@rnavarro-thinkpad:~# echo '14 6 8 44' >
> /sys/kernel/debug/dri/0/i915_pri_wm_latency
>
> root@rnavarro-thinkpad:~# cat /sys/kernel/debug/dri/0/i915_pri_wm_latency
> WM0 14 (1.4 usec)
> WM1 6 (3.0 usec)
> WM2 8 (4.0 usec)
> WM3 44 (22.0 usec)
>
> root@rnavarro-thinkpad:~# intel_reg_read 0x45100 0x45104
> 0x45100 : 0xD0006
> 0x45104 : 0xD0006
>
> Is that unexpected?

Did you do the "xset dpms force off; xset dpms force on" commands in between?

Revision history for this message
In , Robert Navarro (crshman) wrote :

Ah yes, forgot to run that command. Once I do that the register values are changed.

Doubling all of these numbers seems to work great. I'm trying to track down what the minimums are, I reset everything to the defaults and I'm slowly bumping WM0 (while keeping with WM0<=WM1<=WM2<=WM3) to see where things stop breaking.

So far so good with this:

root@rnavarro-thinkpad:~# cat /sys/kernel/debug/dri/0/i915_pri_wm_latency; intel_reg_read 0x45100 0x45104
WM0 10 (1.0 usec)
WM1 3 (1.5 usec)
WM2 4 (2.0 usec)
WM3 22 (11.0 usec)
0x45100 : 0x120006
0x45104 : 0x120006

WM0 7 (0.7 usec) --> WM0 10 (1.0 usec)

However, I'm going to keep testing to make sure that the WM0 1.0usec is solid.

Thanks for the assistance thus far, we're getting close to pinning this down!

Revision history for this message
In , Robert Navarro (crshman) wrote :

Hey Guys,

So after a few days of testing I've seen zero flickers with this config:

root@rnavarro-thinkpad:~# cat /sys/kernel/debug/dri/0/i915_pri_wm_latency
WM0 12 (1.2 usec)
WM1 3 (1.5 usec)
WM2 4 (2.0 usec)
WM3 22 (11.0 usec)

So I went from:
WM0 7 (0.7 usec) --> WM0 12 (1.2 usec)

Revision history for this message
In , Ville-syrjala-e (ville-syrjala-e) wrote :

Created attachment 94766
drm/i915: Increase WM memory latency values on SNB with high pixel clock

This patch should make the driver automagically increase the latency values when encoutering a high resolution display. Please test and report back whether it works as intended.

Revision history for this message
In , Robert Navarro (crshman) wrote :

Sounds good, I'll keep an eye out for it on my prebuilt kernels and report back when it's merged in.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #41)
> Sounds good, I'll keep an eye out for it on my prebuilt kernels and report
> back when it's merged in.

Nope, we won't merge this without positive testing feedback from you. Which means you need to apply this patch and build kernels yourself - we can't test every possible crazy hw combination out there ourselves and applying random patches to the main tree is a no-go (besides that usually it takes a bit of time for patches to land in pre-built kernels that way anyway).

If you can't test patches we need to close this as unresolved unfortunately.

Revision history for this message
In , Robert Navarro (crshman) wrote :

Ok, I'll have to figure out how to compile the kernel for my OS. It may take some time, but I'll figure it out.

Revision history for this message
Andreas Vinsander (andreas-vinsander) wrote :

I think I have the same issue as the original poster.
Regarding comment #29, is it enough to boot into a live CD and try to reproduce? (I'm a bit reluctant to upgrade my work laptop to an unofficial release)

Revision history for this message
penalvch (penalvch) wrote :

Andreas Vinsander, thank you for your comment. So your hardware and problem may be tracked, could you please file a new report by executing the following in a terminal:
ubuntu-bug xorg

Please ensure you have xdiagnose installed, and that you click the Yes button for attaching additional debugging information.

For more on this, please see the official Ubuntu documentation:
Ubuntu X.Org Team, Ubuntu Bug Control, and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report will delay your problem being addressed as quickly as possible.

Thank you for your understanding.

Revision history for this message
Andreas Vinsander (andreas-vinsander) wrote :

New bug 1290400 filed as requested by Christopher.

(Is there some clever way of relating several bugs to each other in launchpad?)

Revision history for this message
In , Rubin Starset (rubin110) wrote :
Download full text (5.7 KiB)

TL;DR The patch seems to correct my issue, which I've been directed to this bug after complaining about it on the Intel-gfx list. However my issue isn't exactly identical. I'm providing compiling instructions for Robert N to verify with also.

Some months ago I bought a cheap 27" S-IPS display off of ebay. The panel supports DVI, Displayport, and some others, and its native resolution is QHD 2560x1440. The display shipped from Korea, I plugged it into my Thinkpad X220 running Debian Sid and had a slew of issues. The seller went back and forth with me on trying to fix the issues, and provided replacement boards for the inside of the display, but ultimately the seller stalled and the one month period to request a refund flew by.

There are two issues I encounter...

Through a direct Displayport connection from my X220 to the monitor at full resolution, if there's a lot of motion on the screen (a full screen video or scrolling a web page back and forth) for about 60 seconds, the screen will blank out and return a few times until it acts as though the Displayport cable has been disconnected and there's no signal. Eventually the display will give up and go to sleep.

Through Displayport to Dual Link DVI via one of those adapters that requires power over USB, the display was more usable. During a lot of motion on the screen it wouldn't blank out, however after about 5 minutes of that all the pixels on the screen would vibrate together back and forth about 200px horizontally for half a second. This will repeat anywhere between every 30 seconds to 5 minutes. Again never blanking out.

If I drive the display at a smaller resolution like 1080p, I have no issues. The same goes with pushing over HDMI, but the max solution here is 1080p anyhow. Additional I haven't noticed this issue on most actual name brand displays, namely the higher priced Dell displays.

Recently I was getting fed up with this issue and started looking for a replacement monitor. After realizing that blowing another $500 sort of sucks, so I decided to do a little more testing. Using a spare Mac Mini I keep around for testing, I tested out Displayport to Displayport, playing a 1080p video on the display at full resolution. I encountered no issues (other than the Mac Mini simply dropping frames from the large video). Plugging in a spare drive I keep around with a bootable copy of Windows 7 into my Thinkpad X220, I again was able to drive the display playing full screen video with zero issues over Displayport.

Through out all my issues, I was never once able to find any sort of error or debug output in any logs. This includes kern.log, syslog and Xorg. Due to this I'm not sure if my issue is the same as Robert N's, and there for I would like it if he verified the fix too unless the devs here can safely say my issue described is the same.

So at this point I started to poke people on lists and bug some of my smarter kernel hacker friends, which has brought me to this bug.

After grabbing a copy of the drm-intel nightly source, applying the patch, compiling and giving it a spin, I was able to play a full screened video at full display resolution without issue over Displayport. I left the ...

Read more...

Revision history for this message
In , Robert Navarro (crshman) wrote :

Hey rubin110!

Thanks for the incredibly detailed instructions! (Stashing those away for the future!)

I'm compiling the new kernel as I write this, once it's done I'll reboot and start testing.

>>Additionally if I got a new fangled Lenovo dock with two Displayports, will I >>be able to drive two of the same displays at full resolution with this patch? I >>do understand I'll have to disable the laptop display to make the second >>external work.

The answer is YES! I actually drive my dual screens (3x Dell U2713HM) using this docking station:

http://www.amazon.com/gp/product/B0085MQLGC

Using both of the DP connectors.

penalvch (penalvch)
summary: - External screens shut off randomly
+ [Lenovo ThinkPad X220] External screens shut off randomly
Revision history for this message
In , Robert Navarro (crshman) wrote :

I got this compiled and running yesterday, worked for the rest of the evening without issues, I'll keep poking at it today to see how it goes. But things are definitely looking promising!

Revision history for this message
In , Robert Navarro (crshman) wrote :

So far so good with this patch, no flickers, no blanking and I didn't even have to touch any of the wm_latency params.

I've tested this on both 3.13 and 3.14rc6 (currently running on 3.14) and things look great.

Thanks for all the hard work guys!

Revision history for this message
In , Robert Navarro (crshman) wrote :

Hey Guys,

So about an hour ago I rebooted and forgot to select the custom kernel on boot up. Within 5 minutes screens were blanking and flickering like crazy....then I realized I was on the stock kernel.

I just wanted to stop and say thanks again for all the hard work, this has changed my computing experience greatly!

So far I've spent two days on the newly patched kernel with zero issues at all.

Revision history for this message
Ricardo Salveti (rsalveti) wrote :

Was able to reproduce this issue with latest trusty kernel (3.13.0-17-generic), so marking it as confirmed.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → In Progress
importance: Low → High
assignee: nobody → Ricardo Salveti (rsalveti)
Revision history for this message
Ricardo Salveti (rsalveti) wrote :

Applied the patch described at the upstream bug https://bugs.freedesktop.org/show_bug.cgi?id=70254, which seems to fix the issue described by this bug.

You can find the git tree at http://kernel.ubuntu.com/git?p=rsalveti/ubuntu-trusty.git;a=shortlog;h=refs/heads/intel-lenovo-x220 and the kernel packages at http://people.canonical.com/~rsalveti/intel/

Would be nice if anyone could install such kernel and give enough feedback to see if indeed fixes the issue, so I can push them to the kernel team.

Revision history for this message
In , Ricardo Salveti (rsalveti) wrote :

(In reply to comment #40)
> Created attachment 94766 [details] [review]
> drm/i915: Increase WM memory latency values on SNB with high pixel clock
>
> This patch should make the driver automagically increase the latency values
> when encoutering a high resolution display. Please test and report back
> whether it works as intended.

Backported this patch on top of latest 3.13 based Ubuntu kernel tree, and indeed fixed the issue described by this bug (you can find more details at https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/1239186).

Let me know if you need any further testing before sending the patch upstream.

Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
status: Unknown → Incomplete
Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

(In reply to comment #40)
> Created attachment 94766 [details] [review]
> drm/i915: Increase WM memory latency values on SNB with high pixel clock
>
> This patch should make the driver automagically increase the latency values
> when encoutering a high resolution display. Please test and report back
> whether it works as intended.

Ville, has this been posted on the ml?

penalvch (penalvch)
tags: added: cherry-pick
tags: added: latest-bios-1.39
Revision history for this message
In , Rubin Starset (rubin110) wrote :

Anything else us testers need to do to get this bug into a fixed verified state? Thanks.

Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

I posted Ville's patch for review [1]. Some further work is needed.

[1] http://<email address hidden>

Changed in xserver-xorg-video-intel:
status: Incomplete → In Progress
Revision history for this message
In , Robert Navarro (crshman) wrote :

Where there any changes required for the posted patch?

Any indication on when it'll get pushed out so I can start running pre-compiled kernels again?

I know how to compile my own now, thanks rubin110, but it's far more convenient to not have to waste my time building a kernel.

Revision history for this message
Rob Ludwick (rcludw) wrote :

I'm seeing a similar issue. I posted it to bug# 1299663.

Revision history for this message
In , Robert Navarro (crshman) wrote :

Hey Guys,

I've asked around and the consensus is that the patch "still needs work" but I'm not sure what that work might be.

What other things are needed for this to get included?

Revision history for this message
In , Vitaly Minko (vitaly-minko) wrote :

I had the same issue. Ville's patch solved the problem. Thanks a lot guys. I wish you all the best!

Revision history for this message
In , Robert Navarro (crshman) wrote :

Just as a heads up to all following this bug. Ville posted a cleaner patch here for testing:

http://patchwork.freedesktop.org/patch/25568/

Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

Fix pushed to drm-intel-fixes as

commit 94b93bc0093a37230ea7a0e91f04bfce677c430f
Author: Ville Syrjälä <email address hidden>
Date: Thu May 8 15:09:19 2014 +0300

    drm/i915: Increase WM memory latency values on SNB

Thanks for the report.

Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

(In reply to comment #57)
> Fix pushed to drm-intel-fixes as

commit e95a2f7509f5219177d6821a0a8754f93892ca56

> Author: Ville Syrjälä <email address hidden>
> Date: Thu May 8 15:09:19 2014 +0300
>
> drm/i915: Increase WM memory latency values on SNB
>
> Thanks for the report.

Changed in xserver-xorg-video-intel:
status: In Progress → Fix Released
affects: xserver-xorg-video-intel (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.