Lucid: Overheating due to no PM for ATI KMS

Bug #570589 reported by Klaus Doblmann on 2010-04-27
68
This bug affects 14 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

As you know the code in Lucid's -32 kernel doesn't have support for power management on ATI cards with KMS (which is default). This is not a problem for desktop users, but laptop users with beefy cards may run into severe overheating issues. I expect the forums to flood with these problems once lucid is final. I have experienced this on two (laptop) machines at home: GPU and CPU share a heatsink thus the CPU gets very hot too when idling. In one case this lead to the CPU (both are C2Ds) being constantly stuck at 800 MHz a few minutes after booting (Once it went over 75° or something like that), in the other case the CPU temps went up to 80° in idle even with the fans at full speed. Switching to fglrx OR a newer kernel and PM OR UMS and PM fixes this, but I feel KMS on ATI as a default is a bad decision right now. I hope this gets fixed with 10.10 as the PM code hits MMs kernel.

Most likely we won't be able to do anything about this bug (unless there's a kernel upgrade in one of lucid's point releases) so I guess the purpose of this bugreport is more or less to track this issue during the development of MM.
---
Architecture: amd64
DistroRelease: Ubuntu 10.04
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Alpha amd64 (20100113)
Package: linux (not installed)
ProcEnviron:
 LANGUAGE=de_AT:de:de_DE:de_CH:de_LU:de_LI:de_BE:en
 PATH=(custom, user)
 LANG=de_AT.utf8
 SHELL=/bin/bash
Tags: lucid
Uname: Linux 2.6.34-rc5-klausi x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UserGroups: adm admin audio cdrom dialout dip fax floppy fuse lpadmin netdev plugdev sambashare tape vboxusers video

Jeremy Foshee (jeremyfoshee) wrote :

Hi Klaus,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/releases/ . If the issue remains, please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 570589

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Klaus Doblmann (moviemaniac) wrote :

@Jeremy:

I'm using the latest version of Ubuntu 10.04 and can test on three different machines, all of which exhibit the problem (two in extreme, one in not critical ways).
Upstream-testing is currently going on with the 2.6.34-series where some portions of the PM code have been added which already works. Further code is currently being prepared for merging into 2.6.35 once the merge window opens.
I tested this on all three machines with 2.6.34-rc5 and ATI KMS PM works when /etc/modprobe.d/radeon-kms.conf is altered to say "options radeon modeset=1 dynpm=1". On two machines dynamic clock setting works as it should, however on my main working machine (which doesn't get too hot on the 2600XT anyway) I need additional patches which allow me to manually set the clock speeds as dynpm doesn't work due to the settings in the atomBIOS table of my card.

So, in short: The issue has been fixed upstream (and is heavvily worked on right now) but is obviously missing from lucid's kernel.

To test the latest upstream PM code, apply these patches on top of the drm-next tree: http://people.freedesktop.org/~agd5f/pm3/

tags: removed: needs-upstream-testing
tags: added: apport-collected
description: updated
Klaus Doblmann (moviemaniac) wrote :

Ho-hum. It seems apport doesn't like my custom kernel and attaching the logs of Ubuntu's standard kernel won't help since powermanagement doesn't work there as it's not implemented yet...

Jeremy Foshee (jeremyfoshee) wrote :

Klaus,
    thanks for the detail. The issue with apport you have mentioned is one that i am hoping to discuss and work to resolve at UDS.

Thanks!

~JFo

Jeremy Foshee (jeremyfoshee) wrote :

one other thing. Would you mind setting an upstream bug watch for the upstream bug that this is being worked on? Either that or just put a link in here to it and I will add it.

Thanks!

~JFo

Klaus Doblmann (moviemaniac) wrote :

Jeremy,
There's no upstream bug as this is not exactly a bug per definition, it's more like a new feature that's being worked on. I don't think it'd be a good idea to file a bugreport upstream either as this is already under development.
If you want you may consider this "bug" fixed upstream and we can mark this fixed in Ubuntu as soon as 2.6.34 hits the archives and the radeon-kms.conf is altered so powermanagement is being used. Which reminds me that we should add the package responsible for radeon-kms.conf to this bug too, but I have no idea which package that might be (the package-search on packages.ubuntu.com comes up empty).

I will keep a watch on this during the development of 10.10 so we can address this properly once the right packages are in the archives.

Jeremy Foshee (jeremyfoshee) wrote :

Klaus,
     Sounds good to me. I've marked this bug triaged.

Thanks!

~JFo

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
Richard Theil (richard-theil) wrote :

I have just posted thermal measurements as a comment to bug 488152, of which this is a duplicate. My measurements and long term experience suggest that passively cooled cards can and will die under the prevailing conditions. A "medium" rating seems to underestimate the problem a bit. I actually discovered the specific issue due to such an incident (though the fried card was an Nvidia with closed driver under karmic).

I definitely agree with Richard - I'd also rate this bug highly important, especially with the fact in mind that for ATI cards the xserver-xorg-video-ati driver is used by default on a fresh installation (correct me if I am wrong here). Also consider this might perhaps lead to increasing reports about dying video cards and eventually to bad publicity for the entire distribution. I might be a bit too concerned here, but I wonder how long a card may actually survive when permanently running on temperatures around 80°C (when idle, that is).

Due to fglrx not supporting KMS and introducing a severe memory leak with compositing enabled, I switched back to the free driver just yesterday, but chose to revert that, as I am heavily concerned about the lifetime of my video card (HD4350, passive / heatsink) when using the free driver with the temperature being significantly (!) higher.

squiddy (squiddvault) wrote :

this bug report should be on high importance instead of medium. it shortened the h/w lifetime.

madbiologist (me-again) wrote :

In addition to the Radeon power management patches included upstream in kernels 2.6.34 and 2.6.35, users with desktop systems may be interested in some new thermal monitoring work which has just landed upstream. This work makes it possible to monitor the internal GPU temperature on newer Radeon cards. See http://www.phoronix.com/scan.php?page=news_item&px=ODMxMQ for more information.

madbiologist (me-again) wrote :

The thermal monitoring support mentioned in comment #11 is included in kernel 2.6.36.

madbiologist (me-again) wrote :

Proper power management for AMD/ATI Radeon R600 and newer hardware is finally available in the upstream 3.11 linux kernel. The first release candidate (3.11-rc1) of the 3.11 kernel is available at http://kernel.ubuntu.com/~kernel-ppa/mainline/ and instructions on how to install and uninstall it are available at https://wiki.ubuntu.com/Kernel/MainlineBuilds

To use this power management for the AMD/ATI Radeon you will need to select it at boot by adding radeon.dpm=1 to your GRUB kernel boot options as described at https://help.ubuntu.com/community/Grub2/Troubleshooting#Editing_the_GRUB_2_Menu_During_Boot

Changed in linux (Ubuntu):
status: Triaged → Fix Committed
madbiologist (me-again) wrote :

See the blog post at http://www.botchco.com/agd5f/?p=57 for further information.

Unlike the older dynpm method, the new DPM method works with multiple monitors and there shouldn't be any flickering as the performance level changes are handled by dedicated hardware rather than the driver.

Julien Olivier (julo) wrote :

I've just tested the new kernel, and it seems to work well in the sense that it keeps my laptop at around 60°C when idle. The problem is that, even at 60°C, my laptop fan is still very noisy. Is 60°C still too hot? Is it just a hardware problem?

madbiologist (me-again) wrote :

Julien - poor hardware design is a possibility. Also make sure you have removed all dust from inside the computer, particulary from the CPU and GPU fans and heatsinks (if it's a desktop, open the case, if it's a laptop at the very least run a vacuum cleaner nozzle along the external air vents). You can also put the edge of a book under the back edge of your laptop so that it is raised off the desk at a slight angle, or purchase a laptop cooling pad (with USB powered fans) to sit the laptop on.

I neglected to mention that to use the new power management feature on R700 and newer hardware (other than APUs) requires installation of the latest AMD graphics microcode (ucode) files to /lib/firmware/radeon
These are available at http://people.freedesktop.org/~agd5f/radeon_ucode/
Get the version ending in "smc".

Julien Olivier (julo) wrote :

@madbiologist

Thanks for the advice, I'll try them all. The fact that I live in the south of France probably doesn't help either (it's around 30°C here now).

As for the firmware, my card is pretty old: ATI RS880M [Mobility Radeon HD 4225/4250]. I guess I don't need them, right?

madbiologist (me-again) wrote :

According to Wikipedia and http://xorg.freedesktop.org/wiki/RadeonFeature/#index5h2 the Mobility Radeon HD 4225/4250 is a RV620 chip, so you shouldn't need the updated firmware files.

madbiologist (me-again) wrote :

Kernel 3.11.0-1.4 (based on the upstream 3.11-rc4 kernel) is now available in Ubuntu 13.10 "Saucy Salamander". Also, kernel 3.11.0-2.5 is in Saucy-proposed, which is based on the upstream 3.11-rc5 kernel, which has some bugfixes for the new DPM method.

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
madbiologist (me-again) wrote :

The currently under-development 3.13 upstream kernel enables DPM by default (without needing the radeon.dpm=1 boot parameter I mentioned above) for Radeon HD 4000 through Radeon HD 7000 series graphics processors but with some specific ASICs being excluded.

madbiologist (me-again) wrote :

The currently under-development Ubuntu 14.04 "Trusty Tahr" is based on the 3.13 kernel. You can download a pre-release version at http://cdimage.ubuntu.com/daily-live/current/ and the final release is scheduled for April 17th, 2014 as per https://wiki.ubuntu.com/TrustyTahr/ReleaseSchedule

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers