System will periodically lockup with [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle

Bug #1402331 reported by chris pollock
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Medium
Unassigned

Bug Description

I have noticed the lockups periodically since 09/13/2014. I can't remember what kernel version was in use then possibly kernel 3.13.0-35-generic or -36. It has continued periodically through versions -39; -40 and now to -43. The last time this has happened was today Dec 13 14:40:57 localhost kernel: [154858.820009] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle. At times when it locks up I will have a cursor on a black screen and will have to press the power button to restart other times it will lockup whenever I'm in an application such as Firefox or Evolution. At that time I can usually CTRL>ALT-F1 log in and run 'sudo reboot' to restart the system.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-43-generic 3.13.0-43.72
ProcVersionSignature: Ubuntu 3.13.0-43.72-generic 3.13.11.11
Uname: Linux 3.13.0-43-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.6
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: chris 2445 F.... pulseaudio
CurrentDesktop: GNOME
Date: Sat Dec 13 20:22:51 2014
HibernationDevice: RESUME=UUID=bb329dc0-0642-4b6a-876a-12c2f02fb7f6
InstallationDate: Installed on 2014-10-24 (50 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
MachineType: Dell Inc. OptiPlex 780
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-43-generic root=UUID=4254a7e9-429b-4f53-a08c-ae7ff839b98f ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-43-generic N/A
 linux-backports-modules-3.13.0-43-generic N/A
 linux-firmware 1.127.10
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/13/2010
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A03
dmi.board.name: 0C27VV
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.asset.tag: LE0006476
dmi.chassis.type: 6
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA03:bd02/13/2010:svnDellInc.:pnOptiPlex780:pvr:rvnDellInc.:rn0C27VV:rvrA01:cvnDellInc.:ct6:cvr:
dmi.product.name: OptiPlex 780
dmi.sys.vendor: Dell Inc.

Revision history for this message
chris pollock (cpollock) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
penalvch (penalvch)
tags: added: bios-outdated-a15
Changed in linux (Ubuntu):
importance: Undecided → Low
status: Confirmed → Incomplete
Revision history for this message
chris pollock (cpollock) wrote :

Since I've never updated a bios before I'm in the process of reading the links you sent. As soon as I've made the update I'll run the commands you requested and add to the report.

Revision history for this message
chris pollock (cpollock) wrote :

After doing more reading and asking questions in the #ubuntu IRC channel it was suggested I update to a newer kernel version instead. So yesterday after I updated to - Linux localhost 3.14.0-031400-generic #201403310035 SMP Mon Mar 31 04:36:23 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

If this solves the problem I will post a comment

Revision history for this message
chris pollock (cpollock) wrote :

It seems that this has been fixed by updating to a newer kernel. It's been over five days without any issues.

Revision history for this message
chris pollock (cpollock) wrote :

Apparently this is not a bug that is only related to Ubuntu but to other flavors as well, see https://bugs.freedesktop.org/show_bug.cgi?id=75394. This could possibly be marked as a duplicate of that bug. One further note I am still experiencing the lockups with the above error noted. If it makes any difference I've added drm.debug=0x06 to my /etc/default/grub file and rebooted. I will attache the output of dmesg to this bug report for reference.

Revision history for this message
Christian Egger (ce4) wrote :

I own a Dell Optiplex 760 from 2009 and I have the exact same problem on Debian Jessie (running kernel 3.16.0-4-amd64).

I have been running BIOS version A04 (from 2009) and just upgraded to the latest A16 BIOS version.

I can keep you informed if this fixes the ugly bug.

FYI, I did the following to upgrade the BIOS, took me around 5mins:

- copied an USB FreeDOS 1.1 image onto an empty USB drive. I used this one (don't forget to bunzip2):
  http://derek.chezmarcotte.ca/wp-content/uploads/2012/01/FreeDOS-1.1-USB-Boot.img.bz2
- copied the latest O760-A15.exe BIOS update file for my Dell 760 onto the thumbdrive
- booted into FreeDOS and ran the bios update

Revision history for this message
chris pollock (cpollock) wrote : Re: [Bug 1402331] Re: System will periodically lockup with [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle

On Thu, 2015-01-15 at 11:49 +0000, johnnys wrote:
> I own a Dell Optiplex 760 from 2009 and I have the exact same problem on
> Debian Jessie (running kernel 3.16.0-4-amd64).
>
> I have been running BIOS version A04 (from 2009) and just upgraded to
> the latest A16 BIOS version.
>
> I can keep you informed if this fixes the ugly bug.
>
> FYI, I did the following to upgrade the BIOS, took me around 5mins:
>
> - copied an USB FreeDOS 1.1 image onto an empty USB drive. I used this one (don't forget to bunzip2):
> http://derek.chezmarcotte.ca/wp-content/uploads/2012/01/FreeDOS-1.1-USB-Boot.img.bz2
> - copied the latest O760-A15.exe BIOS update file for my Dell 760 onto the thumbdrive
> - booted into FreeDOS and ran the bios update
>
Johnny, I need to follow up with you on the above. I've now got
'command.com', 'kernel.sys', and '0780-A15.exe' on an empty thumb drive.
I assume the next step is to boot into it? FYI as I was looking for an
empty thumb drive in my collection this morning the thing locked up
again. Any help would be appreciated as I've never updated a BIOS before
in all my years.

Thanks
Chris

--
Chris
KeyID 0xE372A7DA98E6705C
31.11°N 97.89°W (Elev. 1092 ft)
08:02:06 up 9 min, 1 user, load average: 0.33, 0.64, 0.46
Ubuntu 14.04.1 LTS, kernel 3.13.0-44-generic

Revision history for this message
Christian Egger (ce4) wrote :

Hi Chris,

I just came back to bring bad news: My machine just crashed again (running with the newly flashed BIOS A16).

I assume you got the copy process right (probably also using the 'dd' command). Otherwise you wouldn't have command.com kernel.sys and the bios update file "0780-A15.exe' on it). Next step would be to boot into this newly created FreeDOS environment, i.e.:

- shutdown the PC
- insert the thumb drive
- be sure that your system has "boot to USB" enabled
   otherwise hit F12 (or so) to "select device to boot from"
- set the current date (FreeDOS will prompt you)
- start the BIOS update by typing "0780-A15.exe"
- follow the steps on screen...
- reboot

cheers,

John

Revision history for this message
chris pollock (cpollock) wrote :

On Thu, 2015-01-15 at 14:23 +0000, johnnys wrote:
> Hi Chris,
>
> I just came back to bring bad news: My machine just crashed again
> (running with the newly flashed BIOS A16).
>
> I assume you got the copy process right (probably also using the 'dd'
> command). Otherwise you wouldn't have command.com kernel.sys and the
> bios update file "0780-A15.exe' on it). Next step would be to boot into
> this newly created FreeDOS environment, i.e.:
>
> - shutdown the PC
> - insert the thumb drive
> - be sure that your system has "boot to USB" enabled
> otherwise hit F12 (or so) to "select device to boot from"
> - set the current date (FreeDOS will prompt you)
> - start the BIOS update by typing "0780-A15.exe"
> - follow the steps on screen...
> - reboot
>
> cheers,
>
> John
>
Thanks John, been so long since I ran a DOS program I forgot how to
execute a file :( had to Google it. Would have been ok if I'd made the
first character an 'O' instead of putting a '0' in. Finally got it right
though. Since I already had the A15 update I installed that one. That's
screwed up that you already had a lockup, check comment #4 where it was
suggested that I update the BIOS. Since both of us have already done
that you may want to go to the bug report and leave a comment that 'well
that didn't work, what next' but I guess be nicer. I'm also working on
this bug report - https://bugs.freedesktop.org/show_bug.cgi?id=75394 you
may want to get involved there also. I'm going to make a note of when I
upgraded the BIOS and follow up with comments on both bugs when (note I
didn't say if) it locks up/crashes again. I'd probably run 'sudo lshw'
and attach the output to you comment to show that you have in fact
updated the BIOS that's what I'm going to do and since I'm on A15 and
you're on A16 and just had a lockup that should tell them something.

This is getting absurd, something needs to be done. If you notice the
first comment in the link above you'll see that person is running 'Arch
Linux' and a completely different type of hardware so this bug is
affecting a lot of systems. I've already upgraded to a 3.14* kernel from
a Ubuntu PPA which did not help so I went back to the 3.13* version.

Hopefully someone, someday will get this fixed.

Chris

--
Chris
KeyID 0xE372A7DA98E6705C
31.11°N 97.89°W (Elev. 1092 ft)
08:40:03 up 6 min, 2 users, load average: 1.14, 1.32, 0.71
Ubuntu 14.04.1 LTS, kernel 3.13.0-44-generic

Revision history for this message
chris pollock (cpollock) wrote :

I have now upgraded my BIOS to version A15, the output of sudo lshw is attached.

Revision history for this message
penalvch (penalvch) wrote :

chris pollock, now that you have updated your BIOS, have you experienced a lockup?

tags: added: latest-bios-a15
removed: bios-outdated-a15
Revision history for this message
chris pollock (cpollock) wrote :

Not yet, however it's only been a bit over two days since the update and this is a very intermittent issue. I will most definitely post an update if/when the lockup happens. If not and I get to where I'm sure the system is stable I'll post. I've also added the following to my /etc/default/grub configuration file:

GRUB_CMDLINE_LINUX_DEFAULT="drm.debug=0x66"

and will post the output of 'dmesg' if another lockup happens.

Revision history for this message
Christian Egger (ce4) wrote :

I have experienced another lockup since updating the BIOS to the latest version.

However, I'm using Debian Jessie (3.16 kernel) and a slightly different Dell model (Optiplex 760 instead of an Optiplex 780).

Revision history for this message
chris pollock (cpollock) wrote :

Well, it lasted a bit over two days this time, just experienced another lockup
Jan 17 20:14:43 localhost kernel: [214866.808010] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle
This is after updating to the A15 BIOS on my Dell 780

Revision history for this message
chris pollock (cpollock) wrote :

Also as an attachment is output of dmesg

Revision history for this message
chris pollock (cpollock) wrote :

I neglected to add this but whether it makes a difference the lockup occurred while I was scrolling down posts in Facebook using Firefox.

Revision history for this message
Christian Egger (ce4) wrote :

@Chris: You can try to untick "use hardware acceleration, if available" and "smooth scrolling" in Firefox' settings (navigate to Advanced tab=>General sub tab=>Browsing). At least for me, this reduced the frequency of lockups.

Revision history for this message
chris pollock (cpollock) wrote :

On Sun, 2015-01-18 at 11:31 +0000, Christian Egger wrote:
> @Chris: You can try to untick "use hardware acceleration, if available"
> and "smooth scrolling" in Firefox' settings (navigate to Advanced
> tab=>General sub tab=>Browsing). At least for me, this reduced the
> frequency of lockups.
>
Done, I'm curious as to why on Launchpad this bug is low importance and
status still shows incomplete and assigned to no one while here at
freedesktop.org https://bugs.freedesktop.org/show_bug.cgi?id=75394 it's
marked as high importance and critical. I believe that since we've done
what Chris Penalver requested by updating our BIOS, which BTW did not
fix the issue, I'll be posting my info to the bug report at
freedesktop.org above.

--
Chris
KeyID 0xE372A7DA98E6705C
31.11°N 97.89°W (Elev. 1092 ft)
07:14:04 up 10:55, 1 user, load average: 1.58, 1.35, 0.90
Ubuntu 14.04.1 LTS, kernel 3.13.0-44-generic

penalvch (penalvch)
tags: added: regression-release
tags: added: regression-update
removed: regression-release
Revision history for this message
penalvch (penalvch) wrote :

chris pollock, just to clarify, reporters who have an outdated BIOS can find the Importance Low as the BIOS should have already been updated before reporting it on Launchpad. Now that it is updated, a different Importance now applies. As well, freedesktop.org Severity/Priority is not restricted, so anyone can adjust it to whatever they want, whether or not it applies, so it is largely meaningless. Also, a bug report being assigned in Launchpad is up to a developer assigning themselves, versus being forced/auto assigned.

Despite this, could you please test the latest upstream kernel available from the very top line at the top of the page (the release names are irrelevant for testing, and please do not test the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue.

If the test did not allow you to test to the issue (ex. you couldn't boot into the OS) please make a comment in your report about this, and continue to test the next most recent kernel version until you can test to the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested exactly shown as:
kernel-fixed-upstream-3.19-rc4

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description.

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
importance: Low → Medium
description: updated
Revision history for this message
chris pollock (cpollock) wrote :

Will do Chris, so on the page you linked the latest non RC candidate is 3.18.3-vivid or do you want me to attempt to install 3.19-rc5-vivid?

Revision history for this message
penalvch (penalvch) wrote :

chris pollock, 3.19-rc5.

Revision history for this message
chris pollock (cpollock) wrote :

Thanks Chris, booted fine saw no issues on install or during boot. Output of uname -a Linux localhost 3.19.0-031900rc5-generic #201501180935 SMP Sun Jan 18 09:36:49 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux running this as of about 11:40 on 18 Jan. I'll post if I get any lockups.

Revision history for this message
chris pollock (cpollock) wrote :

Whether this means anything or not I have no idea however I will post it in case it does. I get via a small script an hourly snippet of my syslog. When things are going on I have a habit of scanning through it intermittently for anything that, to me, looks abnormal. I happened to take a close look at the output of one of today's and noticed:

kernel: [87987.468212] [drm:intel_crtc_cursor_set_obj] cursor off
kernel: [87987.468219] [drm:g4x_check_srwm] SR watermark: display plane 92, cursor 2
kernel: [87987.468221] [drm:g4x_check_srwm] display watermark is too large(92/63), disabling
kernel: [87987.468223] [drm:intel_set_memory_cxsr] memory self-refresh is disabled
kernel: [87987.468225] [drm:g4x_update_wm] Setting FIFO watermarks - A: plane=40, cursor=2, B: plane=2, cursor=2, SR: plane=0, cursor=0
kernel: [87989.117248] [drm:i915_gem_open]

This has shown up multiple times since the 14th of Jan and not sure of exactly how many times since I updated to the 3.19 kernel however it is in 14 hourly log snippets. As I said I don't know if this means anything or not, probably doesn't. Even though it's only been a bit over 28hrs I haven't experienced a lockup as of yet.

Revision history for this message
chris pollock (cpollock) wrote :

System locked up at exactly 5pm on 19 Jan. Could not find [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle anywhere in today's log. Output of dmesg is attached. Still running 3.19.0-031900rc5-generic kernel.

penalvch (penalvch)
tags: added: kernel-bug-exists-upstream kernel-bug-exists-upstream-3.19-rc5 needs-bisect
Revision history for this message
chris pollock (cpollock) wrote :

Chris, first question of the morning, when I experienced the lockup last nigh the 'Hangcheck .......' error was not anywhere in my syslog, does this mean anything? Did my dmesg attachment show anything? Secondly, as I'm reading through the 'bisect' instructions you sent it mentions building a kernel from scratch, I take it that that is a requirement to do this? If so this may take me a few days as 1)I've never built a kernel from scratch before and 2)medications I take prevent me from thinking very clearly until late in the day. However, if in fact building a kernel is required I'll throw myself into it and see what happens.

Revision history for this message
penalvch (penalvch) wrote :

chris pollock:
>" when I experienced the lockup last nigh the 'Hangcheck .......' error was not anywhere in my syslog, does this mean anything?"

If one does not attach a log from /var/log , or captured through alternate methods, that contains either a kernel call trace, xorg backtrace, or at least the hangcheck originally reported, then it's largely useless unfortunately.

>"Did my dmesg attachment show anything?"

Unfortunately, it did not show any of the three things above. It is not terribly uncommon for nothing of value to be printed in any of the logs during a complete system lock. You may have better success capturing helpful information utilizing the capture methods noted in https://help.ubuntu.com/community/DebuggingSystemCrash .

>"it mentions building a kernel from scratch, I take it that that is a requirement to do this?"

Yes.

Revision history for this message
chris pollock (cpollock) wrote :

Thank you Chris, I'll start working on this tomorrow.

Revision history for this message
chris pollock (cpollock) wrote :

Although I'm still working on digesting the 'kernel bisection' I have noticed several things. Firstly I've had four 'lockups' since 17 Jan, none of which contain the [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle issue. Secondly since 14 Jan when I went back from the 3.14* kernel build to 3.13.0-44-generic I've been seeing quite often in my syslog:

kernel: [ 1326.412487] [drm:intel_crtc_cursor_set_obj] cursor off
kernel: [ 1326.412493] [drm:g4x_check_srwm] SR watermark: display plane 92, cursor 2
kernel: [ 1326.412495] [drm:g4x_check_srwm] display watermark is too large(92/63), disabling
kernel: [ 1326.412497] [drm:intel_set_memory_cxsr] memory self-refresh is disabled
kernel: [ 1326.412499] [drm:g4x_update_wm] Setting FIFO watermarks - A: plane=40, cursor=2, B: plane=2, cursor=2, SR: plane=0, cursor=0

I might add that though the system 'appears' to be 'locked up' it is only the video that is locked, however the mouse cursor will continue to move, if that makes any sense. The system continues to fetch and process mail with fetchmail and procmail, SpamAssassin and ClamAv continue to run as well as scripts I have to report spam and so forth. Whether this change will make any difference in this bug report I'm not sure however I will continue to work on the kernel bisection as requested.

Revision history for this message
chris pollock (cpollock) wrote :

Since the 17th of Jan I've had seven of the 'lockups' none of which show [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle in fact AFAICT there is nothing in my syslog that shows a reason why. How should I proceed now?

Revision history for this message
chris pollock (cpollock) wrote :

Christopher, I worked on the bisection tonight however when I ran the command earlier this evening and built the kernel it built the latest *-45 instead of *-35. I just tried the command I saw in:

Build Environment

If you've not built a kernel on your system before, there are some packages needed before you can successfully build. You can get these installed with:

    sudo apt-get build-dep linux-image-$(uname -r)

I ran the command again and got the below, where am I screwing up? I'm really wanting to get this done as I have several other projects going also and the 'freezes' are getting more and more frequent with the *-45 kernel.

chris@localhost:~$ sudo apt-get build-dep linux-image-3.13.0-35-generic
Reading package lists... Done
Building dependency tree
Reading state information... Done
Picking 'linux' as source package instead of 'linux-image-3.13.0-35-generic'
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
15 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] n
Abort.

Revision history for this message
chris pollock (cpollock) wrote :

On Wed, 2015-02-04 at 11:34 +0000, Christopher M. Penalver wrote:
> chris pollock, it would be best to switch over to bisecting the mainline
> kernel following the article.
>
Working on it, are these the three files I need to work with? And the
same for 3.13.0-36-generic which is where I believe the issue started. I
noticed when I installed the 3.19 version I had a file called
(linux-headers-3.19.0-031900rc5_3.19.0-031900rc5.201501180935_all.deb) I
don't see a _all.deb file in the 3.13.0-35 branch.

linux-headers-3.13.0-35-generic_3.13.0-35.62_amd64.deb
linux-image-3.13.0-35-generic_3.13.0-35.62_amd64.deb
linux-image-extra-3.13.0-35-generic_3.13.0-35.62_amd64.deb

--
Chris
KeyID 0xE372A7DA98E6705C
31.11°N 97.89°W (Elev. 1092 ft)
20:15:11 up 9:32, 2 users, load average: 0.23, 0.36, 0.39
Ubuntu 14.04.1 LTS, kernel 3.13.0-45-generic

Revision history for this message
chris pollock (cpollock) wrote :

I just had another 'lockup' after about 11 days with the same error message - Feb 13 19:05:22 localhost kernel: [807775.808019] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle as has been the norm lately I see a black screen with a movable mouse cursor. CTRL-ALT-F1 will not take to to a log-in screen. I have to manually shut down the system with the power button. Doesn't anyone have an idea as to what to do to troubleshoot this problem? This is on my Ubuntu 14.04.1 LTS system running kernel 3.13.0-45-generic #74-Ubuntu SMP Tue Jan 13 19:36:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux. I'm getting no where trying to do this kernel bisection. Am I the only Ubuntu user in the whole world that is experiencing this problem? I definitely want to help get this fixed however I'm just not getting anywhere.

Revision history for this message
chris pollock (cpollock) wrote :

I just experienced another 'lockup' at 19:06:06 however this time as sometimes in the past the 'Hangcheck....' error was not present in my syslog. All that was noted was this:

Feb 14 19:04:53 localhost kernel: [10683.755797] systemd-hostnamed[9328]: Warning: nss-myhostname is not installed. Changing the local hostname might make it unresolveable. Please install nss-myhostname!
Feb 14 19:04:53 localhost dbus[382]: [system] Successfully activated service 'org.freedesktop.hostname1'
Feb 14 19:09:48 localhost kernel: [10978.591143] [drm:intel_crtc_cursor_set], cursor off

Revision history for this message
chris pollock (cpollock) wrote :

Another lockup this afternoon, black screen, mouse cursor was present and could be moved. When moving around (black) screen hand would appear as if I was hovering over something of course I have no idea what it was. Could not CTRL>ALT>F* to terminal log-in.

Feb 17 17:18:01 localhost kernel: [252433.820010] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle

Revision history for this message
chris pollock (cpollock) wrote :

I forgot to add the kernel version I'm running - Linux localhost 3.13.0-45-generic #74-Ubuntu SMP Tue Jan 13 19:36:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
chris pollock (cpollock) wrote :
Download full text (5.5 KiB)

On 18 Feb, 2015 I installed kernel 3.19.0-031900-generic #201502091451 SMP Mon Feb 9 14:52:52 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux to see if the above bug had been fixed. This morning at 07:18:55, 19 Feb 2015 the system froze while clicking on a link to open in Firefox. The above 'Hangcheck' error did not appear in my syslog, below is what I see during that time period. Note: I was able to CTRL>ALT>F1 into a terminal and logged in. Attempted to run 'startx' however it did not run (possibly I did it wrong) but I will attach the Xorg.1.log from this as well as dmesg from the boot. The below are the lines from my syslog when this lockup happened:

Feb 19 07:18:54 localhost kernel: [49604.616334] [drm:i915_gem_open]
Feb 19 07:19:59 localhost kernel: [49669.989779] [drm:intel_crtc_cursor_set_obj] cursor off
Feb 19 07:19:59 localhost kernel: [49669.989786] [drm:g4x_check_srwm] SR watermark: display plane 92, cursor 2
Feb 19 07:19:59 localhost kernel: [49669.989789] [drm:g4x_check_srwm] display watermark is too large(92/63), disabling
Feb 19 07:19:59 localhost kernel: [49669.989791] [drm:intel_set_memory_cxsr] memory self-refresh is disabled
Feb 19 07:19:59 localhost kernel: [49669.989794] [drm:g4x_update_wm] Setting FIFO watermarks - A: plane=40, cursor=2, B: plane=2, cursor=2, SR: plane=0, cursor=0
Feb 19 07:20:00 localhost kernel: [49670.747858] [drm:intel_crtc_set_config] [CRTC:9] [FB:47] #connectors=1 (x y) (0 0)
Feb 19 07:20:00 localhost kernel: [49670.747862] [drm:intel_set_config_compute_mode_changes] computed changes for [CRTC:9], mode_changed=0, fb_changed=1
Feb 19 07:20:00 localhost kernel: [49670.747865] [drm:intel_modeset_stage_output_state] [CONNECTOR:13:VGA-1] to [CRTC:9]
Feb 19 07:20:00 localhost kernel: [49670.747867] [drm:intel_modeset_affected_pipes] set mode pipe masks: modeset: 1, prepare: 1, disable: 0
Feb 19 07:20:00 localhost kernel: [49670.747870] [drm:connected_sink_compute_bpp] [CONNECTOR:13:VGA-1] checking for sink bpp constrains
Feb 19 07:20:00 localhost kernel: [49670.747872] [drm:intel_modeset_pipe_config] plane bpp: 24, pipe bpp: 24, dithering: 0
Feb 19 07:20:00 localhost kernel: [49670.747874] [drm:intel_dump_pipe_config] [CRTC:9][modeset] config for pipe A
Feb 19 07:20:00 localhost kernel: [49670.747875] [drm:intel_dump_pipe_config] cpu_transcoder: A
Feb 19 07:20:00 localhost kernel: [49670.747876] [drm:intel_dump_pipe_config] pipe bpp: 24, dithering: 0
Feb 19 07:20:00 localhost kernel: [49670.747878] [drm:intel_dump_pipe_config] fdi/pch: 0, lanes: 0, gmch_m: 0, gmch_n: 0, link_m: 0, link_n: 0, tu: 0
Feb 19 07:20:00 localhost kernel: [49670.747879] [drm:intel_dump_pipe_config] dp: 0, gmch_m: 0, gmch_n: 0, link_m: 0, link_n: 0, tu: 0
Feb 19 07:20:00 localhost kernel: [49670.747881] [drm:intel_dump_pipe_config] dp: 0, gmch_m2: 0, gmch_n2: 0, link_m2: 0, link_n2: 0, tu2: 0
Feb 19 07:20:00 localhost kernel: [49670.747882] [drm:intel_dump_pipe_config] audio: 0, infoframes: 0
Feb 19 07:20:00 localhost kernel: [49670.747882] [drm:intel_dump_pipe_config] requested mode:
Feb 19 07:20:00 localhost kernel: [49670.747885] [drm:drm_mode_debug_printmodeline] Modeline 0:"1680x1050" 60 119000 1680 1728 1760 1840 1050 1053 1059 ...

Read more...

Revision history for this message
chris pollock (cpollock) wrote :
Revision history for this message
chris pollock (cpollock) wrote :
Revision history for this message
chris pollock (cpollock) wrote :
Download full text (4.0 KiB)

Again another lockup this evening at 19:16:40, 20 Feb 2015. And again the 'Hangcheck' error did not make itself known, what my syslog shows for this time period before the CTRL>ALT>F1 is:

Feb 20 19:17:01 localhost CRON[29189]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Feb 20 19:17:28 localhost kernel: [129187.759473] [drm:intel_crtc_cursor_set_obj] cursor off
Feb 20 19:17:28 localhost kernel: [129187.759479] [drm:g4x_check_srwm] SR watermark: display plane 92, cursor 2
Feb 20 19:17:28 localhost kernel: [129187.759481] [drm:g4x_check_srwm] display watermark is too large(92/63), disabling
Feb 20 19:17:28 localhost kernel: [129187.759483] [drm:intel_set_memory_cxsr] memory self-refresh is disabled
Feb 20 19:17:28 localhost kernel: [129187.759485] [drm:g4x_update_wm] Setting FIFO watermarks - A: plane=40, cursor=2, B: plane=2, cursor=2, SR: plane=0, cursor=0
Feb 20 19:17:29 localhost kernel: [129189.173796] [drm:intel_crtc_set_config] [CRTC:9] [FB:47] #connectors=1 (x y) (0 0)
Feb 20 19:17:29 localhost kernel: [129189.173800] [drm:intel_set_config_compute_mode_changes] computed changes for [CRTC:9], mode_changed=0, fb_changed=1
Feb 20 19:17:29 localhost kernel: [129189.173803] [drm:intel_modeset_stage_output_state] [CONNECTOR:13:VGA-1] to [CRTC:9]
Feb 20 19:17:29 localhost kernel: [129189.173805] [drm:intel_modeset_affected_pipes] set mode pipe masks: modeset: 1, prepare: 1, disable: 0
Feb 20 19:17:29 localhost kernel: [129189.173808] [drm:connected_sink_compute_bpp] [CONNECTOR:13:VGA-1] checking for sink bpp constrains
Feb 20 19:17:29 localhost kernel: [129189.173810] [drm:intel_modeset_pipe_config] plane bpp: 24, pipe bpp: 24, dithering: 0
Feb 20 19:17:29 localhost kernel: [129189.173812] [drm:intel_dump_pipe_config] [CRTC:9][modeset] config for pipe A
Feb 20 19:17:29 localhost kernel: [129189.173813] [drm:intel_dump_pipe_config] cpu_transcoder: A
Feb 20 19:17:29 localhost kernel: [129189.173814] [drm:intel_dump_pipe_config] pipe bpp: 24, dithering: 0
Feb 20 19:17:29 localhost kernel: [129189.173816] [drm:intel_dump_pipe_config] fdi/pch: 0, lanes: 0, gmch_m: 0, gmch_n: 0, link_m: 0, link_n: 0, tu: 0
Feb 20 19:17:29 localhost kernel: [129189.173817] [drm:intel_dump_pipe_config] dp: 0, gmch_m: 0, gmch_n: 0, link_m: 0, link_n: 0, tu: 0
Feb 20 19:17:29 localhost kernel: [129189.173819] [drm:intel_dump_pipe_config] dp: 0, gmch_m2: 0, gmch_n2: 0, link_m2: 0, link_n2: 0, tu2: 0
Feb 20 19:17:29 localhost kernel: [129189.173820] [drm:intel_dump_pipe_config] audio: 0, infoframes: 0
Feb 20 19:17:29 localhost kernel: [129189.173820] [drm:intel_dump_pipe_config] requested mode:
Feb 20 19:17:29 localhost kernel: [129189.173823] [drm:drm_mode_debug_printmodeline] Modeline 0:"1680x1050" 60 119000 1680 1728 1760 1840 1050 1053 1059 1080 0x48 0xa
Feb 20 19:17:29 localhost kernel: [129189.173824] [drm:intel_dump_pipe_config] adjusted mode:
Feb 20 19:17:29 localhost kernel: [129189.173827] [drm:drm_mode_debug_printmodeline] Modeline 0:"1680x1050" 60 119000 1680 1728 1760 1840 1050 1053 1059 1080 0x48 0xa
Feb 20 19:17:29 localhost kernel: [129189.173829] [drm:intel_dump_crtc_timings] crtc timings: 119000 1680 1728 1760 1840 1050 1053 1059 1080, t...

Read more...

Revision history for this message
chris pollock (cpollock) wrote :
Revision history for this message
chris pollock (cpollock) wrote :

This is now getting into the realm of the absurd. No matter what kernel version I run in the 3.13.* series the system will lockup after a day or so with the above error. Today it locked up at 9:09am CST - Mar 1 09:09:00 localhost kernel: [182499.820012] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle. Again with the black screen and mouse cursor only. Cannot get to a log-in screen with CTRL-ALT-F1, CTRL-ALT-BKSPACE or with CTRL-ALT-ESC. And again, all background processes continue running as if nothing is happening. The current kernel version being run is - 3.13.0-46-generic #76-Ubuntu SMP Thu Feb 26 18:52:13 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
chris pollock (cpollock) wrote :

And again this happens less than 24hrs later still with kernel 3.13.0-46-generic #76-Ubuntu SMP Thu Feb 26 18:52:13 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux. I was able to drop to a terminal with CTRL-ALT-F1 this time as it locked up in Firefox.

Mar 2 07:35:38 localhost kernel: [71138.820009] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle

Portion of kern.log when lockup happened:

Mar 2 07:35:38 localhost kernel: [71138.820009] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle
Mar 2 07:36:23 localhost kernel: [71184.699897] [drm:intel_crtc_cursor_set], cursor off
Mar 2 07:36:24 localhost kernel: [71185.780637] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0)
Mar 2 07:36:24 localhost kernel: [71185.780641] [drm:intel_set_config_compute_mode_changes], computed changes for [CRTC:3], mode_changed=0, fb_changed=1
Mar 2 07:36:24 localhost kernel: [71185.780644] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:VGA-1] to [CRTC:3]
Mar 2 07:36:24 localhost kernel: [71185.780656] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 6720
Mar 2 07:36:24 localhost kernel: [71185.788019] [drm:intel_crtc_set_config], [CRTC:4] [NOFB]
Mar 2 07:36:24 localhost kernel: [71185.788021] [drm:intel_set_config_compute_mode_changes], computed changes for [CRTC:4], mode_changed=0, fb_changed=0
Mar 2 07:36:24 localhost kernel: [71185.788023] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:VGA-1] to [CRTC:3]
Mar 2 07:36:24 localhost kernel: [71185.788026] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0)
Mar 2 07:36:24 localhost kernel: [71185.788029] [drm:intel_set_config_compute_mode_changes], computed changes for [CRTC:3], mode_changed=0, fb_changed=0
Mar 2 07:36:24 localhost kernel: [71185.788031] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:VGA-1] to [CRTC:3]
Mar 2 07:36:24 localhost kernel: [71185.788042] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0)
Mar 2 07:36:24 localhost kernel: [71185.788044] [drm:intel_set_config_compute_mode_changes], computed changes for [CRTC:3], mode_changed=0, fb_changed=0
Mar 2 07:36:24 localhost kernel: [71185.788046] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:VGA-1] to [CRTC:3]

Revision history for this message
chris pollock (cpollock) wrote :

Attached is the kern.log showing four of the lockups and what happened before and after them

Revision history for this message
chris pollock (cpollock) wrote :

Lockup happened again today with the hangcheck error. It has happened since my last post and now however without the hangcheck error. All I would see is a black screen with a movable mouse cursor. Today I disabled X-Screensaver and it locked up while using FireFox running kernel 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux.

Revision history for this message
chris pollock (cpollock) wrote :

As a test on 9 March I booted to 3.13.0-031300-generic #201401192235 SMP Mon Jan 20 03:36:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux to see if this kernel had escaped whatever seems to be causing the 'hangcheck' problem. Every day or so either due to a kernel update or due to the fact that I was testing issues with another bug I was having to reboot until yesterday. At the time X-Screensaver was running, I moved the mouse to get out of it and I went to a black screen with just the mouse cursor. Could not CTRL-ALT-F* to a term login had to use the power button. This happened at 11:50 and I booted back to 3.13.0-031300-generic #201401192235 SMP Mon Jan 20 03:36:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux. Today at 3pm I went back to the computer and again all I saw was the black screen with the mouse cursor that could be moved and again CTRL>ALT>F* did nothing and I had to reboot using the power button. This time however my syslog showed

Mar 17 13:57:59 localhost kernel: [93991.808012] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle

I'm still running the 3.13.0 kernel however have not enabled X-Screensaver just as a test. Is there anything else I can check or do?

Revision history for this message
penalvch (penalvch) wrote :

chris pollock, testing the latest mainline kernel (4.0-rc4) would be helpful.

Revision history for this message
chris pollock (cpollock) wrote :

I've booted into 4.0.0-040000rc4-generic #201503152135 SMP Mon Mar 16 01:36:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
at 7:49 this evening. Also reactivated X-Screensaver to see how things go. Will advise if/when anything out of the ordinary happens.

Revision history for this message
chris pollock (cpollock) wrote :
Download full text (6.3 KiB)

Christopher, since booting into the kernel in comment #52 I have seen this probably over a thousand times in my hourly syslog snippet:

Mar 17 20:02:34 localhost kernel: [ 824.137147] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d66db3c0
Mar 17 20:02:34 localhost kernel: [ 824.145104] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d66db900
Mar 17 20:02:34 localhost kernel: [ 824.153100] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.161098] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad3c0
Mar 17 20:02:34 localhost kernel: [ 824.169102] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.177100] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad3c0
Mar 17 20:02:34 localhost kernel: [ 824.185100] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.193139] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad480
Mar 17 20:02:34 localhost kernel: [ 824.201099] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.209097] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad480
Mar 17 20:02:34 localhost kernel: [ 824.217098] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.225095] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad480
Mar 17 20:02:34 localhost kernel: [ 824.233092] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.241092] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad480
Mar 17 20:02:34 localhost kernel: [ 824.249099] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.257114] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad480
Mar 17 20:02:34 localhost kernel: [ 824.265091] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.273095] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad480
Mar 17 20:02:34 localhost kernel: [ 824.281094] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.289092] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad480
Mar 17 20:02:34 localhost kernel: [ 824.289565] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.297106] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad480
Mar 17 20:02:34 localhost kernel: [ 824.305090] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d90ad0c0
Mar 17 20:02:34 localhost kernel: [ 824.313092] [drm:drm_atomic_set_fb_for_plane] Set [FB:59] for plane state ffff8800d...

Read more...

Revision history for this message
penalvch (penalvch) wrote :

chris pollock, could you please provide a yes or no answer to the following question:
When using the 4.0-rc4 kernel, is the problem this bug is scoped to "System will periodically lockup" reproducible?

Revision history for this message
chris pollock (cpollock) wrote :

Yes, between 10pm on 18 Mar 2015 and 7am on 19 Mar 2015 the system locked up though I can't find any evidence of the 'hangcheck.....' error

Revision history for this message
chris pollock (cpollock) wrote :

Christopher, I made a grave error in my statement about not seeing the 'hangcheck' error this morning. I just did a body search in my syslog snippets and there it was

Mar 19 07:34:14 localhost kernel: [128723.820044] [drm:i915_hangcheck_elapsed [i915]] *ERROR* Hangcheck timer elapsed... render ring idle

Now this is very odd because during this time I was trying to get the system to drop to a terminal with CTRL>ALT>F1, after some period of time when it didn't drop to the terminal login I was trying to blindly login when suddenly the terminal did appear. I logged in and it was at 7:36am when I rebooted so the above 'hangcheck' appeared while I was either trying to login or I had also been looking at the size of my syslog which BTW is over 20Mb. So in regards to your question from last night, a definite yes, but I have no idea what the reason is as usual.

Revision history for this message
penalvch (penalvch) wrote :

chris pollock, just to clarify, did this issue occur, or not occur with kernel 3.13.0-35 (just yes it did, or no it did not)?

tags: added: kernel-bug-exists-upstream-4.0-rc4
removed: kernel-bug-exists-upstream-3.19-rc5
Revision history for this message
chris pollock (cpollock) wrote :

Yes

Revision history for this message
chris pollock (cpollock) wrote :

At approximately 7:20pm, 19 March 2015 I noticed that again all I saw on my monitor was a black screen with the mouse cursor that was movable. I did a CTRL>ALT>F1 and left the room and came back about 5 minutes later when I was able to log-in and run 'sudo reboot'. This time I ensured that I did a message body check on the hourly syslog snippets for 'hangcheck' and there was none associated with this lockup. Now I do have another question associated with my comment #53. Since what I've shown in that comment is in being written to my syslog it is now over 32mb in size and that is only for today. Is this another bug I need to report in association with the 4.0 kernel since it didn't start happening until I started running it or what? I checked earlier today for an rc5 but didn't see one.

Thanks

Revision history for this message
penalvch (penalvch) wrote :

chris pollock, did this problem not occur in a release prior to Trusty?

tags: added: regression-potential
removed: needs-bisect regression-update
Revision history for this message
chris pollock (cpollock) wrote :

I can't provide that information since I didn't begin to run Trusty until around late July of last year when my other Linux box finally died and I decided to run Ubuntu. Prior to that I ran Mandriva.

Revision history for this message
chris pollock (cpollock) wrote :

Christopher, I asked this question, After boot kernel: [ 1228.531419] usblp0: removed kernel: [ 1228.532703] usblp 1-3.5:1.0: usblp0: USB Bidirectional printer dev 6 if 0 alt 0 proto 2 vid 0x03F0 pid 0x2B17 is added to syslog every 6 seconds here - https://answers.launchpad.net/ubuntu/+question/263924 and the reply I received is - You are using an unofficial kernel. I suggest you report the issue to the PPA maintainer. Of course I'm using an unofficial kernel, so what do you suggest I do to get this 'bug' or whatever it is taken care of? Here's what it looks like in my syslog:

Ubuntu 14.04.2 LTS, kernel 4.0.0-040000rc4-generic printer is an HP1020. Each time I reboot every six seconds this is printed to my syslog:

Mar 20 11:23:13 localhost kernel: [ 1757.100062] usblp0: removed
Mar 20 11:23:13 localhost kernel: [ 1757.101389] usblp 1-3.5:1.0: usblp0: USB Bidirectional printer dev 6 if 0 alt 0 proto 2 vid 0x03F0 pid 0x2B17
Mar 20 11:23:19 localhost kernel: [ 1763.104667] usblp0: removed
Mar 20 11:23:19 localhost kernel: [ 1763.105807] usblp 1-3.5:1.0: usblp0: USB Bidirectional printer dev 6 if 0 alt 0 proto 2 vid 0x03F0 pid 0x2B17
Mar 20 11:23:25 localhost kernel: [ 1769.110688] usblp0: removed
Mar 20 11:23:25 localhost kernel: [ 1769.111851] usblp 1-3.5:1.0: usblp0: USB Bidirectional printer dev 6 if 0 alt 0 proto 2 vid 0x03F0 pid 0x2B17

However as soon as I print something, in this case a test page from the CUPS web interface, it stops:

Mar 20 11:24:19 localhost kernel: [ 1823.166289] usblp0: removed
Mar 20 11:24:19 localhost kernel: [ 1823.167583] usblp 1-3.5:1.0: usblp0: USB Bidirectional printer dev 6 if 0 alt 0 proto 2 vid 0x03F0 pid 0x2B17
Mar 20 11:24:20 localhost hp[8064]: io/hpmud/model.c 108: unable to open /etc/hp/hplip.conf: No such file or directory
Mar 20 11:24:20 localhost hp[8064]: io/hpmud/model.c 532: no HP_LaserJet_1020 attributes found in /data/models/models.dat
Mar 20 11:24:20 localhost hp[8064]: io/hpmud/model.c 543: no HP_LaserJet_1020 attributes found in /data/models/unreleased/unreleased.dat
Mar 20 11:24:21 localhost foo2zjs-wrapper: foo2zjs-wrapper -z1 -P -L0 -r1200x600 -p1 -T3 -m1 -s7 -n1
Mar 20 11:24:22 localhost kernel: [ 1826.056212] usblp0: removed
Mar 20 11:24:22 localhost foo2zjs-wrapper: gs -sPAPERSIZE=letter -g10200x6600 -r1200x600 -sDEVICE=pbmraw -dCOLORSCREEN -dMaxBitmap=500000000
Mar 20 11:24:22 localhost foo2zjs-wrapper: foo2zjs -r1200x600 -g10200x6600 -p1 -m1 -n1 -d1 -s7 -z1 -u 192x96 -l 192x96 -L 0 -T3 -P

Notice it's every 6 seconds and goes away after I print something.

Revision history for this message
penalvch (penalvch) wrote :

chris pollock, just to clarify Launchpad is not the correct venue for reporting issues with the upstream kernel.

However, Launchpad is the correct venue for your problem, as it is reproducible with the Ubuntu kernel, and as a part of the debugging process, the upstream kernel is tested.

Despite this, for regression testing purposes, could you please test for this via http://releases.ubuntu.com/saucy/ and advise to the results?

Revision history for this message
chris pollock (cpollock) wrote :

I take it you just want me to run via the live DVD and not do a full install correct?

Revision history for this message
chris pollock (cpollock) wrote :

I'd like to ask you another question Christopher, am I the only person in the whole world running Ubuntu that has gone through all the kernel versions we've been testing that has this problem?

Revision history for this message
chris pollock (cpollock) wrote :

So, I come into the computer room this morning, turn on the monitor, move the mouse and of course I see just the black screen and mouse cursor. This was 6:49am. I do a CTRL>ALT>F1 and walk away to take care of a few things and at 7:02 I came back, did a log-in via the terminal and entered the 'sudo reboot' command. After the boot was completed I did a 'body' search of my syslog hourly snippets for 'hangcheck' and there were none found from last night when I left the system until this morning when I went and found the black screen. What I did see in my 7am (which is really from 6am to 6:59am) snippet is in the attachment. Have no idea what it means but as I said above 6:49 was when I move the cursor as I usually do to get out of the screensaver. Could X-Screensaver have anything to do with this issue? I notice that there's a spamd run in this file, please ignore it. I just captured the whole area that looked like the trace.

Revision history for this message
penalvch (penalvch) wrote :

chris pollock:
>"I take it you just want me to run via the live DVD and not do a full install correct?"

If the issue has been reproducible in a live environment, then testing that would be fine.

Revision history for this message
chris pollock (cpollock) wrote :

How long do you think I should run the live DVD? I can probably only run it during the night from say 10pm my time until about 8am my time as I have lots of other work going on my system during the day and I don't have another system to do the testing on. Also did you have a chance to see my comment #65?

Revision history for this message
chris pollock (cpollock) wrote :

Christopher, some things of note:

1. I went a bit over two days on 4.0.0-040000rc4-generic without a lockup. I noticed that there was an ....rc5 out today. I installed and booted into it, none of my extension settings from Gnome Tweak Tool would take. At the same time 3.13.0-48-generic #80-Ubuntu SMP came down the pipe as a Ubuntu update. I installed it and booted into it at 3:21 this afternoon. Will keep notes as I have been and advise you of any lockups with and without the 'hangcheck' error.

Revision history for this message
chris pollock (cpollock) wrote :

A new kernel came down from Ubuntu - 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux the other day so I've been running it. It ran for 3d 6h 07m before locking up at 21:26:56 on 26 March 2015. This time there was no sign of the 'Hangcheck.....' error anywhere in my syslog. I have discovered these kernels - Index of /~kernel-ppa/mainline/drm-intel-next here - http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-next/ should I try installing and booting into the newest one of these? Are you still looking at this bug or has it become a dead issue?

Revision history for this message
chris pollock (cpollock) wrote :

I'd like to post an update to this bug report. For about two weeks now I've been running kernel 4.0.0-997-generic #201503310205 SMP Tue Mar 31 02:07:04 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux which I got here http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-next/ I've also installed xf86-video-intel driver version 2.99.917. All of this was from the assistance of the Intel-Gfx mailing list. Other than having to reboot last week when a new standard Ubuntu kernel came down which was after 6 days of no lockups the system has been stable. Currently I'm again at 6 days, 5hrs with no lockups.

Revision history for this message
jippie (jph4dotcom) wrote :
Download full text (3.7 KiB)

Similar, probalby same problem here.

- I've been running Kubuntu 14.04 for about a year with no problems. Problem started about 3 months ago or so, not entirely sure about the period.
- The screen usually locks up when using Firefox, today it happen when I switched windows from Firefox to Konsole (the GUI app). The screen locked half way switching the window borders from blue to grey. (Could it be the screen candy thingy that also does the wobbly windows, what's the name again?). The konsole window border on my right monitor is was grey, is now locked faint blue and should have turned to full blue. The Firefox window on the left monitor was full blue, is now slightly going to grey and locked right there and then.
- I don't use a Dell computer, but some Intel mother board based system (DG43GT with a Q9550).
- When the system is locked, which happens every two or three weeks, I can sometimes change to text console but not always. Currently I cannot access console, but I can log in to the box through SSH.
- Screen blanking (power save) doesn't work when the screen locked.

Linux diablo 3.13.0-49-generic #83-Ubuntu SMP Fri Apr 10 20:11:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

When my box locks up, I usually run full patching before power cycle, some messages in log files may be caused by that. I attached log files and I believe in the past I have some debug headers installed on my system which seems to list somewhat useful information.

[vr mei 15 23:12:46 2015] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle
[vr mei 15 23:15:36 2015] INFO: task Xorg:2243 blocked for more than 120 seconds.
[vr mei 15 23:15:36 2015] Tainted: G OX 3.13.0-49-generic #83-Ubuntu
[vr mei 15 23:15:36 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[vr mei 15 23:15:36 2015] Xorg D ffff88023bd134c0 0 2243 2230 0x00400000
[vr mei 15 23:15:36 2015] ffff8800b29d7a18 0000000000000086 ffff88022a6d1800 ffff8800b29d7fd8
[vr mei 15 23:15:36 2015] 00000000000134c0 00000000000134c0 ffff88022a6d1800 ffff880035523000
[vr mei 15 23:15:36 2015] ffff880035472ad8 ffff880035eda800 ffff880035eda800 ffff8800032da780
[vr mei 15 23:15:36 2015] Call Trace:
[vr mei 15 23:15:36 2015] [<ffffffff81725f49>] schedule+0x29/0x70
[vr mei 15 23:15:36 2015] [<ffffffffa015ea65>] intel_crtc_wait_for_pending_flips+0x75/0x110 [i915]
[vr mei 15 23:15:36 2015] [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[vr mei 15 23:15:36 2015] [<ffffffffa017008f>] intel_crtc_set_config+0x7ef/0x9a0 [i915]
[vr mei 15 23:15:36 2015] [<ffffffffa0026eed>] drm_mode_set_config_internal+0x5d/0xe0 [drm]
[vr mei 15 23:15:36 2015] [<ffffffffa00a4561>] drm_fb_helper_set_par+0x71/0xf0 [drm_kms_helper]
[vr mei 15 23:15:36 2015] [<ffffffff813cd181>] fb_set_var+0x191/0x430
[vr mei 15 23:15:36 2015] [<ffffffff810a2f60>] ? update_curr+0x80/0x180
[vr mei 15 23:15:36 2015] [<ffffffff813da161>] fbcon_blank+0x1d1/0x2d0
[vr mei 15 23:15:36 2015] [<ffffffff81462d68>] do_unblank_screen+0xb8/0x1f0
[vr mei 15 23:15:36 2015] [<ffffffff81458aba>] complete_change_console+0x5a/0xe0
[vr mei 15 23:15:36 2015] [<ffffffff81459ae...

Read more...

Revision history for this message
jippie (jph4dotcom) wrote :

Quick question:

I can probably figure out from back ups which kernel I was running about 6 months ago, would it be possible to roll back to an older kernel version?

20140601/tree/boot/vmlinuz-3.11.0-19-generic
20140601/tree/boot/vmlinuz-3.13.0-24-generic
20140706/tree/boot/vmlinuz-3.13.0-29-generic
20140706/tree/boot/vmlinuz-3.13.0-30-generic
20140803/tree/boot/vmlinuz-3.13.0-32-generic
20140907/tree/boot/vmlinuz-3.13.0-35-generic
20141005/tree/boot/vmlinuz-3.13.0-36-generic
20141102/tree/boot/vmlinuz-3.13.0-37-generic
20141207/tree/boot/vmlinuz-3.13.0-40-generic
20150104/tree/boot/vmlinuz-3.13.0-43-generic
20150201/tree/boot/vmlinuz-3.13.0-44-generic
20150301/tree/boot/vmlinuz-3.13.0-45-generic
20150405/tree/boot/vmlinuz-3.13.0-46-generic
20150405/tree/boot/vmlinuz-3.13.0-48-generic
20150412/tree/boot/vmlinuz-3.13.0-48-generic
20150412/tree/boot/vmlinuz-3.13.0-49-generic

Revision history for this message
chris pollock (cpollock) wrote :
Download full text (13.1 KiB)

I've had several video lockups since running the kernel and driver that I mentioned in comment 72. I've been trying to figure out a way to get a back trace. All I've been able to come up with so far is this from the /var/log/kern.log tied to the latest lockup:

May 13 17:59:19 localhost kernel: [216520.292010] ------------[ cut here ]------------
May 13 17:59:19 localhost kernel: [216520.292037] WARNING: CPU: 1 PID: 1157 at /home/kernel/COD/linux/drivers/gpu/drm/drm_irq.c:1142 drm_wait_one_vblank+0x12f/0x180 [drm]()
May 13 17:59:19 localhost kernel: [216520.292039] vblank wait timed out on crtc 0
May 13 17:59:19 localhost kernel: [216520.292041] Modules linked in: btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c ses enclosure coretemp snd_hda_codec_analog bnep joydev snd_hda_codec_generic i915 gpio_ich kvm hid_generic dell_wmi sparse_keymap rfcomm usbhid snd_hda_intel bluetooth snd_hda_controller uas snd_hda_codec usb_storage hid snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi video snd_seq drm_kms_helper snd_seq_device dcdbas drm snd_timer serio_raw snd i2c_algo_bit soundcore shpchp lpc_ich wmi 8250_fintek mac_hid parport_pc ppdev lp parport binfmt_misc psmouse pata_acpi e1000e ptp pps_core
May 13 17:59:19 localhost kernel: [216520.292093] CPU: 1 PID: 1157 Comm: Xorg Not tainted 4.0.0-997-generic #201503310205
May 13 17:59:19 localhost kernel: [216520.292095] Hardware name: Dell Inc. OptiPlex 780 /0C27VV, BIOS A15 08/06/2013
May 13 17:59:19 localhost kernel: [216520.292097] 0000000000000476 ffff8800d6c1fa08 ffffffff817e3106 0000000000000007
May 13 17:59:19 localhost kernel: [216520.292100] ffff8800d6c1fa58 ffff8800d6c1fa48 ffffffff810791b7 0000000000000286
May 13 17:59:19 localhost kernel: [216520.292103] 0000000000000000 ffff8800d6f3b800 0000000000000000 0000000000000000
May 13 17:59:19 localhost kernel: [216520.292107] Call Trace:
May 13 17:59:19 localhost kernel: [216520.292114] [<ffffffff817e3106>] dump_stack+0x45/0x57
May 13 17:59:19 localhost kernel: [216520.292119] [<ffffffff810791b7>] warn_slowpath_common+0x97/0xe0
May 13 17:59:19 localhost kernel: [216520.292122] [<ffffffff810792b6>] warn_slowpath_fmt+0x46/0x50
May 13 17:59:19 localhost kernel: [216520.292127] [<ffffffff810bb267>] ? finish_wait+0x67/0x80
May 13 17:59:19 localhost kernel: [216520.292140] [<ffffffffc0371cff>] drm_wait_one_vblank+0x12f/0x180 [drm]
May 13 17:59:19 localhost kernel: [216520.292144] [<ffffffff810bb130>] ? prepare_to_wait_event+0x100/0x100
May 13 17:59:19 localhost kernel: [216520.292157] [<ffffffffc0371d70>] drm_crtc_wait_one_vblank+0x20/0x30 [drm]
May 13 17:59:19 localhost kernel: [216520.292167] [<ffffffffc03cd738>] drm_plane_helper_commit+0x248/0x2a0 [drm_kms_helper]
May 13 17:59:19 localhost kernel: [216520.292175] [<ffffffffc03cd7ee>] drm_plane_helper_disable+0x5e/0xb0 [drm_kms_helper]
May 13 17:59:19 localhost kernel: [216520.292193] [<ffffffffc037b7a8>] __setplane_internal+0x1e8/0x2e0 [drm]
May 13 17:59:19 localhost kernel: [216520.292203] [<ffffffffc037b9c7>] drm_mode_cursor_universal+0x127/0x210 [drm]
May 13 17:59:19 localhost kernel: [216520.292206] [<ffffffff817ed215>] ? __w...

Revision history for this message
jippie (jph4dotcom) wrote :

For what it is worth: X is waiting for I/O, hence the D state:

$ ps aux | grep X
root 2243 0.9 0.1 552832 14336 tty7 Ds+ mei03 176:23 /usr/bin/X -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch

Not sure how I can figure out which file X is waiting for, lsof spitrs out over 500 lines.

Revision history for this message
penalvch (penalvch) wrote :

jippie, it will help immensely if you filed a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

Revision history for this message
jippie (jph4dotcom) wrote :
Revision history for this message
Josh Rosenberg (7-launchpad-desh-info) wrote :

I have been experiencing the same problem periodically (the "[drm:i915_hangcheck_elapsed [i915]] *ERROR* Hangcheck timer elapsed... render ring idle" problem) on my computer. It ranges from 1 day to 3 weeks between crashes. I run Linux Mint 17 (or 17.1 or 17.2), and it has happened with various kernels, the latest being 3.19.0-26-generic. When it happens, my X session is a total loss. My mouse reflects movement, but the cursor never changes, and no windows refresh or can be interacted with. I can usually still switch to other consoles, which is how I was able to peruse logs after the last lockup.

My syslog snippet is attached.

Revision history for this message
Robert Hrovat (robi-hipnos) wrote :

It happens multiple times a day on almost every machine at work. All machines has intel built in graphics and the only common thing about this bug is that there was always web browser running some page with flash.

Revision history for this message
penalvch (penalvch) wrote :

Josh Rosenberg / Robert Hrovat, it will help immensely if you filed a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

Revision history for this message
Robert Hrovat (robi-hipnos) wrote :

Christopher, I think I might find workaround by uninstalling compiz and use gnome fallback. Which is default on all machines in company. It's the second day when none of machine crashed.

Revision history for this message
Josh Rosenberg (7-launchpad-desh-info) wrote :

Unfortunately I haven't been able to use ubuntu-bug to open a new report, though I still experience the problem regularly (including twice within the past hour).

Revision history for this message
chris pollock (cpollock) wrote :

Josh, here's another bug report I filed on this - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1497627 possibly you can add to that one?

Revision history for this message
penalvch (penalvch) wrote :

Josh Rosenberg, in order to be most helpful, you will want to file a new report (not add anything to an already existing report) via http://cdimage.ubuntu.com/daily-live/current/ .

For more on why this is helpful please see https://wiki.ubuntu.com/ReportingBugs .

Revision history for this message
Josh Rosenberg (7-launchpad-desh-info) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.