[snb] random hard lockup whilst using Xv

Bug #1176647 reported by Leon Winter on 2013-05-05
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Won't Fix
Medium
xserver-xorg-video-intel (Ubuntu)
Low
Unassigned

Bug Description

When using the system for some time, at some point display freezes. The system seems to continue working as mplayer remains streaming internet radio and also the audio output works properly. During this lock-up the mouse can be moved but any click is ignored. Furthermore any attempt to switch to a TTY is ignored as well. Even worse the power button does not trigger a shutdown. Therefore I usually hard reset by pressing the power button for several seconds.
I first witnessed this bug after upgrading to 13.04 (from 12.04). To make sure, I freshly installed Ubuntu, but unfortunately this did not help avoing the bug.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: xorg 1:7.7+1ubuntu4
ProcVersionSignature: Ubuntu 3.8.0-19.30-generic 3.8.8
Uname: Linux 3.8.0-19-generic x86_64
.tmp.unity.support.test.0:

ApportVersion: 2.9.2-0ubuntu8
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
Date: Sun May 5 21:27:55 2013
DistUpgraded: Fresh install
DistroCodename: raring
DistroVariant: ubuntu
DkmsStatus: virtualbox, 4.2.10, 3.8.0-19-generic, x86_64: installed
ExtraDebuggingInterest: I just need to know a workaround
GpuHangFrequency: Several times a day
GpuHangReproducibility: Seems to happen randomly
GpuHangStarted: Immediately after installing this version of Ubuntu
GraphicsCard:
 Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:21da]
InstallationDate: Installed on 2013-05-05 (0 days ago)
InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Release amd64 (20130424)
MachineType: LENOVO 4290W4H
MarkForUpload: True
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.8.0-19-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/02/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 8DET51WW (1.21 )
dmi.board.asset.tag: Not Available
dmi.board.name: 4290W4H
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr8DET51WW(1.21):bd08/02/2011:svnLENOVO:pn4290W4H:pvrThinkPadX220:rvnLENOVO:rn4290W4H:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 4290W4H
dmi.product.version: ThinkPad X220
dmi.sys.vendor: LENOVO
version.compiz: compiz 1:0.9.9~daily13.04.18.1~13.04-0ubuntu1
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.43-0ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 9.1.1-0ubuntu3
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 9.1.1-0ubuntu3
version.xserver-xorg-core: xserver-xorg-core 2:1.13.3-0ubuntu6
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.3-0ubuntu2b2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.1.0-0ubuntu2
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.21.6-0ubuntu4
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.7-0ubuntu1
xserver.bootTime: Sun May 5 21:09:18 2013
xserver.configfile: default
xserver.errors:

xserver.logfile: /var/log/Xorg.0.log
xserver.version: 2:1.13.3-0ubuntu6
xserver.video_driver: intel

Sometimes, when watching a video with MPlayer, my machine locks up hard so that I have to turn off power before I can use it again in any way. It doesn’t happen all the time, I have not yet found a way to reproduce it.

Unfortunately, I don’t have any logfiles from that. Do you have any suggestions on how to gather logs in such a case?

I’m using an Intel DH67GD mainboard with an Intel Core i7-2600K.

You can try netconsole, but using snb lockups so hard that even netconsole doesn't capture any dying whimpers.

Are you using mplayer -vo gl or -vo xv?

Have you tried with i915.i915_enable_rc6=0?

Have you tried with Option "SwapbuffersWait" "false"?

(In reply to comment #1)
> Are you using mplayer -vo gl or -vo xv?
I am using -vo xv.

> Have you tried with i915.i915_enable_rc6=0?
I will try that when I reboot the next time (i.e. after the next hard lockup :)).

> Have you tried with Option "SwapbuffersWait" "false"?
I have enabled that option now. I’ll update this report as soon as the lockup occurs the next time, or if it doesn’t occur for a month or so.

(In reply to comment #2)
> > Have you tried with Option "SwapbuffersWait" "false"?
> I have enabled that option now. I’ll update this report as soon as the
> lockup occurs the next time, or if it doesn’t occur for a month or so.
With i915.i915_enable_rc6=0 _and_ Option "SwapbuffersWait" "false" I have not had a single lockup since more than two weeks.

Unfortunately, I won’t have access to the box I have been testing this with for the next 2 months, so any further testing will have to wait.

(In reply to comment #3)
> (In reply to comment #2)
> > > Have you tried with Option "SwapbuffersWait" "false"?
> > I have enabled that option now. I’ll update this report as soon as the
> > lockup occurs the next time, or if it doesn’t occur for a month or so.
> With i915.i915_enable_rc6=0 _and_ Option "SwapbuffersWait" "false" I have
> not had a single lockup since more than two weeks.

Presumably you still encountered a hard lockup with just Option "SwapbuffersWait" or was a reboot forced in the meantime?

(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > > Have you tried with Option "SwapbuffersWait" "false"?
> > > I have enabled that option now. I’ll update this report as soon as the
> > > lockup occurs the next time, or if it doesn’t occur for a month or so.
> > With i915.i915_enable_rc6=0 _and_ Option "SwapbuffersWait" "false" I have
> > not had a single lockup since more than two weeks.
>
> Presumably you still encountered a hard lockup with just Option
> "SwapbuffersWait" or was a reboot forced in the meantime?
Sorry for not being more explicit about this: I have not tried just using SwapbuffersWait extensively, a reboot was indeed forced.

Leon Winter (lwi) wrote :
Leon Winter (lwi) wrote :

I continue to experience this bug several times a day. The bug can occur as early as 20 minutes after boot, but may also not occur during several hours of usage. I do not see anything suspicious in the log files though. It might be a coincidence but it may especially occur when using mplayer.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xorg (Ubuntu):
status: New → Confirmed
Leon Winter (lwi) wrote :

The bug can occur as early as 1 minute after boot up. I tried watching a video in mplayer. Several times the system locked up which I handled with a hard reset. Once I just started mplayer, skipped to the correct position in the video and could only watch 5 seconds more as the system froze again.
When the system locks up during video playback, not only the picture freezes but also the playback is stopped as I cannot hear any audio which I would be supposed to hear if the playback would continue (even with broken/frozen video output). On the other hand when just playing music in mplayer, a lockup does not stop mplayer from continious audio playback.
I therefore assume that the video playing mplayer is stopped due to its "feedback" (or possibly due to no response) from the XVideo extension (and the underlying driver). Since the mplayer in pure audio mode has no connection to the X11 system I can continue without problems.

Timo Aaltonen (tjaalton) on 2013-05-22
affects: xorg (Ubuntu) → xserver-xorg-video-intel (Ubuntu)
Changed in xserver-xorg-video-intel (Ubuntu):
status: Confirmed → New
Chris Wilson (ickle) wrote :

The symptoms are consistent with a GPU hang, so please try to grab the /sys/kernel/debug/dri/0/i915_error_state after the event (and before rebooting).

Leon Winter (lwi) wrote :

Since the affected maschine is not reachable via SSH, I just dumped the error state every second (while [ 1 ]; sleep 1; do cat /sys/kernel/debug/dri/0/i915_error_state >> error_state; done) and deleted all the "no error state" lines from the file before the actual error dump after the crash. As it turns out, the error state is all zeroes (I do not expect this to be helpful):

$ hexdump errorstate.dump
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000150 0000 0000 0000 0000 0000 0000 0000 000a
000015f

I expect the last byte to be the newline inserted from the shell.

Chris Wilson (ickle) wrote :

Hmm, suggests that the fresh data never reached disk before the system hang. Maybe try 'while :; do sleep 1; cat ... >> error_state; sync; done'

Leon Winter (lwi) wrote :

For some other reason I switched my window manager to dwm and can now almost reliably reproduce the failure by simply starting a video in mplayer and wait for approx. 5 seconds. Before with gnome/metacity I could sometimes watch entire movies of multiple hours. Sometimes it would crash in between but sometimes not. However with dwm it crashes all the time and almost immediately.

Since even with sync I could not see a file (error_state) being written I finally got around to SSH into my maschine and observe the behaviour which is worse than expected. At the time the crash happens my SSH shell is also unreponsive any more which strongly hints that not only my X/display subsystem is broken but rather the whole kernel. Due to this I am unable to produce the error_state as the maschine is not responding to any commands (either 'sleep X && ' or SSH) as it is ~halted already.

This issue is really annoying and it can be assumed to be related to video playback of some kind as this is the only way i can reproduce it now.

By the way I upgraded to kernel 3.8.0-23-generic in the meantime.

Chris Wilson (ickle) wrote :

How are you playing the video? If you are using mplayer, can you try the various backends to see if any are more susceptible?

Leon Winter (lwi) wrote :

Out of the backend I tried (xv, gl_nosw, x11, sdl/x11, gl, gl2) xv seems to be causing the problem. I am currently using the backend x11 and have yet to observe a crash using it. However during my tesing cauing the crash was not as reliable as in real-world usage before but at the time of the crash I had only (two) mplayers with xv backend running.

Chris Wilson (ickle) wrote :

Can you try the gl backend in real world usage? I expect it to fail as well, as I think the failure mode here is the machine dying whilst waiting up a scanline - and both the Xv and gl backends should trigger that code. (Except in the case of fullscreen gl, which will go through another path). Whereas the X11 backend will not (and also will not actually use the GPU).

In a very recent -intel DDX update, I've coupled Option "SwapbuffersWait" "false" to disable the vsync for the Xv backend as well. You can disable that through an Xv parameter -- provided it is exposed by the client. That will be very useful for testing whether the scanline wait is the root cause.

Chris Wilson (ickle) wrote :

Another thing to test would be whether i915.i915_enable_rc6=0 makes any difference.

summary: - Random Xorg freezes occuring hours after login
+ [snb] random hard lockup whilst using Xv
Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
status: Unknown → Incomplete
Mark Pointing (mp035) wrote :

I have also experienced this bug on 2 different machines running 13.04. The first is an eeepc with an Atom 1.66GHz N455 CPU, and the second is a macbook pro 8,1 with a Intel(R) Core(TM) i5-2415M CPU @ 2.30GHz.

The bug is most reproduceable when playing videos, but on rare occasions (3) it has occured while browsing the web.

It caused me to downgrade to 12.04 on the eeepc.

Same symptoms (video lockup, except cursor, unable to switch to VT) however in my experience, the soundtrack on video playback kept running. A hard reset was required to use the system.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Confirmed
Chris Wilson (ickle) wrote :

@Mark please file a separate bug as your hardware is completely different and cannot suffer the same root cause as this bug.

Leon Winter (lwi) wrote :

Regarding the backends, I used gl/x11 for the last few days and did not encounter the lockup. While at the same time, the machine will lockup instantly (after 5secs) when using -vo xv. Sorry, that this does not confirm your suspicion but maybe my gl backend does not trigger the code path you mentioned.
Maybe unrelated but I noticed my system was running very hot but the CPU was idle and powertop also said the CPU was running the lowest frequency mode (800 MHz), the system temperature almost reached 100 degrees celsius. After I rebooted the system now runs at about 40 degrees which I consider normal. Unfortunately the system is now unable to generate output via external display connector VGA1. I hope this is not a hardware damage and will try to generate output using a live CD and another computer. Also probably unrelated since 13.04 I experience suspend/resume bugs, where the system crashes on resume. All this issues together are very annoying and I am considering downgrading as well.

Leon Winter (lwi) wrote :

Regarding my last entry, fortunately my system now "recovered" from the behaviour of not detecting external monitors.
Concerning the hangup issue I want to add that HTML5/Flash videos also does not seem to trigger the hangup, however I am not aware of the subsystem used for this video playback (browser is Chromium).

Mark Pointing (mp035) wrote :

@Chris,
What makes you say the hardware is different? Both machines use the same intel video driver.

Chris Wilson (ickle) wrote :

@Mark, you have a gen3 device which has no known issues with Xv, this bug report is about a gen6 device which still has instability when using rc6 (new hardware feature on gen6).

Mark Pointing (mp035) wrote :

Understood, the Atom N455 is different. What about the macbook i5?

Chris Wilson (ickle) wrote :

Could be gen5, gen6 or gen7 depending upon the model.

Can you please try with this patch: https://patchwork.kernel.org/patch/2707341/ as it claims to fix some instability with rc6 on SandyBridge?

I applied (apt-get source, patch -p1, dpkg-buildpackage -b, dpkg -i) the patch against the current ubuntu stock kernel (3.8.0-25-generic) but to no avail, a mplayer with xv will instantly result in the lockup state. Due to that I have configured my mplayer to use the gl backend which is running flawlessly the past few weeks.

*** Bug 67856 has been marked as a duplicate of this bug. ***

Timeout.

Michael, are you stills seeing the issue with later kernels? There seems to have been some back and forth with the patch referenced by Chris in comment #6 - please try 3.13-rc1 or later.

Nothing has changed. SNB can still randomly hard lock with rc6 and vsync.

Leon Winter, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p xserver-xorg-video-intel REPLACE-WITH-BUG-NUMBER

Please note, given that the information from the prior release is already available, doing this on a release prior to the development one would not be helpful.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → Low
status: Confirmed → Incomplete

Could you please retest with latest drm-intel-nightly?

Hm, another option might be to switch to timeout mode on SNB. With a long enough timeout, we could apply the "emit a primitive" workaround everytime we come out of rc6 and hopefully make things more stable...

I just saw today that there is a recommendation to toggle PMSI_CTL around WAIT_FOR_EVENT in the SNB bspec. That is probably worth trying...

Implemented the bspec recommendation:

commit d247cb7d0cdb73736f31612157e47f166af68ba0
Author: Chris Wilson <email address hidden>
Date: Mon Dec 8 10:07:25 2014 +0000

    sna/gen6: Poke PSMI control around WAIT_FOR_EVENT to prevent idling

    The bspec recommends preventing the hardware from going to sleep around
    a WAIT_FOR_EVENT, and tells us to use disable sleep bit in PSMI control
    to accomplish this.

    References: https://bugs.freedesktop.org/show_bug.cgi?id=62373
    Signed-off-by: Chris Wilson <email address hidden>

It's worth another go...

*** Bug 87163 has been marked as a duplicate of this bug. ***

(In reply to Chris Wilson from comment #13)
> Implemented the bspec recommendation:
>
> commit d247cb7d0cdb73736f31612157e47f166af68ba0
> Author: Chris Wilson <email address hidden>
> Date: Mon Dec 8 10:07:25 2014 +0000
>
> sna/gen6: Poke PSMI control around WAIT_FOR_EVENT to prevent idling
>
> The bspec recommends preventing the hardware from going to sleep around
> a WAIT_FOR_EVENT, and tells us to use disable sleep bit in PSMI control
> to accomplish this.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=62373
> Signed-off-by: Chris Wilson <email address hidden>
>
> It's worth another go...

OK so not my bug but with a baytrail J1900N this does not prevent a vaapi hard lock (probably when de-interlacing h/w or s/w so fps = refresh) for me.

I only recently started seeing locks - turns out that dri3 was OK for me, but of course it then got disabled by default and I started getting locks.

This is with kodi - it seems they recommend 910 as the last stable driver - and I can lock with 311.

This was tested with head on this commit, kernel nightly and mesa about a week old.

(In reply to Andy Furniss from comment #15)

> I only recently started seeing locks - turns out that dri3 was OK for me,
> but of course it then got disabled by default and I started getting locks.

It seems I was a bit hasty in calling dri3 OK - I can lock if I try long enough, just that it takes about 20x longer than dri2.

(In reply to Andy Furniss from comment #16)
> (In reply to Andy Furniss from comment #15)
>
> > I only recently started seeing locks - turns out that dri3 was OK for me,
> > but of course it then got disabled by default and I started getting locks.
>
> It seems I was a bit hasty in calling dri3 OK - I can lock if I try long
> enough, just that it takes about 20x longer than dri2.

Looks like the locks were a mesa issue which is now fixed.

I can't point to a commit, but the reason I suspect mesa is that I changed my test case to s/w decode no de-int fps < refresh and could lock/not lock depending on the level of gl output chosen in kodi.

dri2 is still < dri3 - neither lock but 2 glitches occasionally.

Anyway, I guess I am in the wrong bug, sorry for the noise.

(In reply to Andy Furniss from comment #17)
> (In reply to Andy Furniss from comment #16)
> > (In reply to Andy Furniss from comment #15)
> >
> > > I only recently started seeing locks - turns out that dri3 was OK for me,
> > > but of course it then got disabled by default and I started getting locks.
> >
> > It seems I was a bit hasty in calling dri3 OK - I can lock if I try long
> > enough, just that it takes about 20x longer than dri2.
>
> Looks like the locks were a mesa issue which is now fixed.

Just for completeness - it wasn't mesa, it's just hard to call stable/not when even on unstable runs of 12 hrs are possible.

Current thinking is Kernel > 3.16.x is unstable on baytrail, kodi developers also saying this, so not just me.

*** Bug 87163 has been marked as a duplicate of this bug. ***

Waiting for feedback in order to change the status.

Changed in xserver-xorg-video-intel:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.