[all variants] - Radeon HD3650 - radeon.dpm=1 - ring 0 stalled for more than 10

Bug #1507150 reported by Daryl Lublink
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

When the screen saver is left running for a few minutes, the computer locks up.

1 out of 2 times, I have to do a hard reset ( hold power button for 30 seconds or disconnect power/battery. )

1 out of 4 times, I can reset the computer using the magic sysrq ( Raising elephants... )

1 out of 4 times, I can switch to a console ( CTRL+ALT+F1 ). The GUI can not be recovered, it remains frozen. The last time I managed to switch to a console, I managed to capture dmesg and xorg.

Messages like this are seen in dmesg after it has locked up:

[ 2077.860355] radeon 0000:01:00.0: ring 0 stalled for more than 12488msec
[ 2077.860371] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000184c0 last fence id 0x000000000001864c on ring 0)
[ 2077.860997] [drm:r600_ib_test [radeon]] *ERROR* radeon: fence wait failed (-35).
[ 2077.861067] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on GFX ring (-35).

Notes :
This problem is easily reproducible on a HP 8530p with a n [AMD/ATI] RV635/M86 [Mobility Radeon HD 3650].

This bug affects all versions of Ubuntu/Kubuntu/Ubuntu-mate/Xubuntu since Ubuntu 11.04.

On Ubuntu 10.04, the issue does not occur because I can use the non-open source fglrx-legacy driver.

When using the open source 'radeon' kernel module to run the Mobility HD 3650 device, the driver locks up frequently. The behaviour varies slightly from version to version, but the following description will be the behaviour in Ubuntu-Mate 15.04.

The issue only occurs when 'radeon.dpm=1' is passed as a kernel parameter. This option is needed otherwise the card runs really hot and drains my laptop battery and burns my legs if the laptop is on my legs.

It's not a hardware issue, it's a driver issue. Why? Because I tested using Ubuntu 10.04 + Legacy FGLRX, and the graphics card works fine without issue. I patched together Linux 3.4 + Xorg 6.9 ( xserver 1.12 ) + Arch Linux + Legacy FGLRX, and everything else modern, and the graphics card itself had no issues ( the installation was unstable because recent version of mate doesn't work well with old version of Xorg )

I strongly believe that it is a Kernel issue because the error messages all appear in dmesg. I can't find any errors in Xorg.lorg.

A second, minor, issue that I have is that when I start the computer, the mouse can't click most places on the screen. When I click, nothing happens or the wrong thing is clicked. If I switch to a console and back to X, the issue resolves itself. This issue *might* be related to the main issue, but I do not have the technical skills to determine whether or not it is the same issue.

My laptop runs with full disk encryption ( dm-crypt ).

Notes about reproducing :
I can reliably reproduce this. It can take up to 35 minutes to reproduce it sometimes, but I can reproduce it without fail.

If you provide me with a test kernel to reproduce, please insure that dm-crypt is supported so that I don't have to reinstall the entire O/S.

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: linux-image-3.19.0-30-generic 3.19.0-30.34
ProcVersionSignature: Ubuntu 3.19.0-30.34-generic 3.19.8-ckt6
Uname: Linux 3.19.0-30-generic x86_64
ApportVersion: 2.17.2-0ubuntu1.5
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: david 1342 F.... pulseaudio
 /dev/snd/controlC1: david 1342 F.... pulseaudio
Date: Sat Oct 17 09:31:56 2015
HibernationDevice: RESUME=UUID=86c5a3d9-cb5c-4bcb-9d9d-a61d6e9a356f
InstallationDate: Installed on 2015-10-13 (4 days ago)
InstallationMedia: Ubuntu-MATE 15.04 "Vivid Vervet" - Release amd64 (20150422.1)
MachineType: Hewlett-Packard HP EliteBook 8530p
PccardctlStatus:
 Socket 0:
   3.3V
  16-bit
  PC Card
   Subdevice 0 (function 0) bound to driver "pata_pcmcia"
ProcEnviron:
 LANGUAGE=en_CA:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.19.0-30-generic root=/dev/mapper/ubuntu--mate--vg-root ro radeon.dpm=1
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-30-generic N/A
 linux-backports-modules-3.19.0-30-generic N/A
 linux-firmware 1.143.3
SourcePackage: linux
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/10/2009
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: 68PDV Ver. F.09
dmi.board.name: 30E7
dmi.board.vendor: Hewlett-Packard
dmi.board.version: KBC Version 90.21
dmi.chassis.type: 10
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvr68PDVVer.F.09:bd03/10/2009:svnHewlett-Packard:pnHPEliteBook8530p:pvrF.09:rvnHewlett-Packard:rn30E7:rvrKBCVersion90.21:cvnHewlett-Packard:ct10:cvr:
dmi.product.name: HP EliteBook 8530p
dmi.product.version: F.09
dmi.sys.vendor: Hewlett-Packard

Revision history for this message
Daryl Lublink (dlublink) wrote :
Revision history for this message
Daryl Lublink (dlublink) wrote :

All the files that were included automatically by apport were collected *WITHOUT* reproducing the bug as apport won't run without the GUI.

I have included the xorg.log and dmesg taken when the bug occurred.

Revision history for this message
Daryl Lublink (dlublink) wrote :

This file is xorg.log taken after the bug occured and the GUI was completely locked up.

I can't find any relevant error messages, but I am including it just in case.

Revision history for this message
Daryl Lublink (dlublink) wrote :

I compiled a kernel manually and copied the config from the ubuntu kernel. I ran 'makelocalyesconfig' and used 'make menuconfig' to set radeon/drm to modules. I compiled the kernel that was provided by apt-get source linux, and not git.

The issue still occurs.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
penalvch (penalvch)
tags: added: bios-outdated-f.20
removed: dpm radeon
Changed in linux (Ubuntu):
importance: Undecided → Low
status: Confirmed → Incomplete
Revision history for this message
Daryl Lublink (dlublink) wrote :
Revision history for this message
Daryl Lublink (dlublink) wrote :

As requested by CMP, I have updated the BIOS on the computer and retested. It still shows the same behaviour.

# dmidecode -s bios-version; dmidecode -s bios-release-date
68PDV Ver. F.20
12/08/2011

As far as I can tell, absolutely nothing has changed. the video driver is still crashing. I included a new dmesg taken AFTER the BIOS update.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

David, could you please test the latest upstream kernel available from the very top line at the top of the page from http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D (the release names are irrelevant for testing, and please do not test the daily folder)? Install instructions are available at https://wiki.ubuntu.com/Kernel/MainlineBuilds . This will allow additional upstream developers to examine the issue.

If the latest kernel did not allow you to test to the issue (ex. you couldn't boot into the OS) please make a comment in your report about this, and continue to test the next most recent kernel version until you can test to the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this issue is fixed in the mainline kernel, please add the following tags by clicking on the yellow circle with a black pencil icon, next to the word Tags, located at the bottom of the report description:
kernel-fixed-upstream
kernel-fixed-upstream-X.Y-rcZ

Where X, Y, and Z are numbers corresponding to the kernel version.

If the mainline kernel does not fix the issue, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-X.Y-rcZ

Please note, an error to install the kernel does not fit the criteria of kernel-bug-exists-upstream.

Once testing of the latest upstream kernel is complete, please mark this report's Status as Confirmed. Please let us know your results.

Thank you for your understanding.

tags: added: latest-bios-f.20
removed: bios-outdated-f.20
Changed in linux (Ubuntu):
importance: Low → Medium
status: Confirmed → Incomplete
Revision history for this message
Daryl Lublink (dlublink) wrote :

After some further testing, I discovered that I can reproduce the bug reliably in under 60 seconds.

Steps to reproduce :

1. Boot computer
2. Open Firefox or Google Chrome
3. Lock screen
4. Wait for screen saver
5. after 15 seconds, it locks up.

I noticed that if I boot without opening Firefox or Google Chrome, the screen saver will run for hours with freezing up.

Revision history for this message
Daryl Lublink (dlublink) wrote :

I tested using the latest upstream kernel, which at the time of testing was 4.3.0-040300rc6-generic :

uname -a :

Linux laptop 4.3.0-040300rc6-generic #201510182030 SMP Mon Oct 19 00:31:41 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The issue occurs exactly the same as it did on the standard repository kernel.

tags: added: kernel-bug-exists-upstream kernel-bug-exists-upstream-4.3.0-040300rc6-generic
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream-4.3-rc6
removed: kernel-bug-exists-upstream-4.3.0-040300rc6-generic
Revision history for this message
Daryl Lublink (dlublink) wrote :

I was comparing the dmesg from the upstream kernel and the Ubuntu kernel and I noticed that the upstream kernel has an extra error message "failed to create device file for power profile" and sysfs "cannot create duplicate filename '/devices/pci0....'

I do not have the knowledge of the kernel to decide whether these error messages are unrelated or related to the bug I am reporting.

The issue where the card locks up and I see messages complaining about 'rings' in dmesg is identical on both the Ubuntu kernel and the upstream kernel.

It should be noted that this issue also affects Ubuntu 12.04, Ubuntu 14.04 in addition to the 15.04 I have reported.

Revision history for this message
Daryl Lublink (dlublink) wrote :

This may be related to the ticket #1409393 ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1409393 ) .

Both tickets have the following in common :

1. radeon.dpm=1
2. Radeon Mobile 3650
3. Lockup with dmesg that mentions 'stalling' and 'rings'

Revision history for this message
penalvch (penalvch) wrote :

Daryl Lublink, it will help immensely if you filed a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
penalvch (penalvch)
tags: removed: kernel-bug-exists-upstream kernel-bug-exists-upstream-4.3-rc6
Revision history for this message
Daryl Lublink (dlublink) wrote :

Do you want the bug to be filed using an upstream kernel or the standard kernel?

Revision history for this message
Daryl Lublink (dlublink) wrote :

As requested by Christopher M. Penalver (penalvch), I have refiled the bug using the 'ubuntu-bug linux' command.

Revision history for this message
Daryl Lublink (dlublink) wrote :

Link to new ticket for anyone who is looking for it :

https://bugs.launchpad.net/bugs/1508944

Revision history for this message
penalvch (penalvch) wrote :

David, to advise, this is not considered a duplicate of another report, or vice versa.

penalvch (penalvch)
tags: added: needs-upstream-testing
Revision history for this message
Daryl Lublink (dlublink) wrote :

I closed this ticket because it's the same as the ticket at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508944

I mistakenly used two different accounts to post comments and it caused confusion with the people doing triage.

Both this ticket and 1508944 were posted by the same person for the same computer, same bug.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.