temperature overheating of cpu and radeon in 12.10 and above

Bug #1166916 reported by codeslinger
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
xserver-xorg-video-ati (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

13.04 Ubuntustudio 64 bit

under moderate load temperature is around 81c

under high load (playing video inside a virtualbox) temperature hit 94c.... at which point I turned it off

I have tested this with all 3 of the radeon drivers, with no significant difference.

The radeon temp is always about 2-3 degrees hotter than the rest, so suspect that is souce of problem.

10.10 does not have this problem at all
12.04 runs 5-10 degrees hotter on average than does 10.10

but starting with 12.10 the temperature gets unusably hot

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.8.0-14-lowlatency 3.8.0-14.9
ProcVersionSignature: Ubuntu 3.8.0-14.9-lowlatency 3.8.4
Uname: Linux 3.8.0-14-lowlatency x86_64
ApportVersion: 2.9.2-0ubuntu5
Architecture: amd64
CasperVersion: 1.330
Date: Tue Apr 9 17:11:07 2013
LiveMediaBuild: Ubuntu-Studio 13.04 "Raring Ringtail" - Alpha amd64 (20130403)
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-lowlatency
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
codeslinger (codeslinger) wrote :
Revision history for this message
codeslinger (codeslinger) wrote :
Revision history for this message
codeslinger (codeslinger) wrote :
Revision history for this message
codeslinger (codeslinger) wrote :

the above data is from running on the flash drive, but I also get similar results from Beta 1 installed with all updates.

also, on 12.10 I got similar results when running the generic kernel. have not yet tried the generic kernel on 13.04 but would expect it will behave the same based on experience with 12.10.

Revision history for this message
Kaj Ailomaa (zequence) wrote :

Could you please verify if this happens also on linux-generic. The kernels are so alike, that it might make more sense to change this bug as affecting linux-generic instead.

I've seen overheating issues myself for AMD cards on regular Ubuntu.

Revision history for this message
Daniel Letzeisen (dtl131) wrote :

The open-source radeon driver is notorious for overheating laptops. Unfortunately, you cannot use fglrx/Catalyst on Ubuntu 12.10 or later for that GPU. Your best best is to stick with 12.04 and use the proprietary fglrx/Catalyst driver for now.
Another option is using 12.10 and downgrading your Xserver using makson96 PPA, but that is a hack and may cause package management issues down the road (use at own risk): https://launchpad.net/~makson96/+archive/fglrx

Note that AMD is working on releasing better power management code/docs, but it's currently still under legal review: http://www.phoronix.com/scan.php?page=news_item&px=MTM0MjA

Revision history for this message
codeslinger (codeslinger) wrote :

So, I guess you are saying don't buy AMD??? ;-o

on 10.10 the temperature is usually in the mid 50's and seldom hits 70. also the fan hardly ever comes on.

12.04 is usually in the 60's(c) and under high load hits about 78c, the fan is on but it's usually quiet.

on 12.10 & 13.04 the temperature at idle is in the 80's and peaks into the mid 90's (I won't allow it to go higher) that is a huge difference.

The ~actual~ battery life on 10.10 is about 4 hours... on 13.04 beta the battery life is estimated at about 1 hour and the fan has two modes really loud and screaming loud.

This is all on the same laptop, the only difference is the code... The power managment on 10.10 works great, much better than the Windows 7 that came with the laptop. So why have we gone so far in the wrong direction from something that was working to something that is dangerously* broken? If it had always been this way, that would be different. But there is a huge jump between 12.04 and 12.10. There must be something specific that accounts for this change.

* Yes, dangerously broken... the danger is that the laptop could be permanently damaged by such high heat.

--------

After quite a bit of further testing, I'm no longer entirely sure that the video is the only culprit. At idle or light load, the video is indeed 2-3 degrees hotter, but under moderate to high load the cpu gets much hotter than the video, as much as 6 degrees hotter. That's quite a differential.

Thanks for your suggestions, I will take a look at the PPA. For me personally, 12.04 is not a good answer because neither the audio input or bluetooth work, not much point in having a DAW if it can't make a sound... :-) whereas on 13.04 so far everything seems to work except for the temperature.

Am I the only person in the world with an AMD cpu and an Intel HDMI audio chip???

I will test the generic kernel later today.

Revision history for this message
Daniel Letzeisen (dtl131) wrote :

AMD has a good open-source team that is hamstrung by lepal/IP problems. They're getting better on that front though. See Phoronix .com forums for greater support.

Revision history for this message
codeslinger (codeslinger) wrote :

okay, here is the data you have been waiting for...

but first thing to know is that I have put my laptop on a stand to increase airflow, thus it's over-all temperature is now about 3 degrees cooler than the above observations. none-the-less even with better airflow it still hit 96c and climbing, at 100% load, before I stopped it.

I wrote a program to record the temperature and cpu load data, which I will attach below. But let me just give you the bottom line here.

There is no difference between running the generic kernel and the low latency kernel.

There is minor difference between 10.10 and 12.04

There is a huge difference between 12.04 and 12.10.

On 12.10 I tried both low latency and generic, I also tried all 3 of the available video drivers. I did not see any signifigant difference with any of these combinations.

Since I no longer have 12.10 installed I can't show you the data. However the preformance of 12.10 is essentially identical to 13.04 and I do have the data for 13.04. Except that on 13.04 when I tried to install alternate video drivers it hung. So I could only test with the xorg driver.

Temperature preformance of 12.04 is a little bit higher than 10.10 but is satisfactory. On the other hand 12.10 and 13.04 are dramatically hotter. Under max load 12.04 never gets above 82c, under light load it is typically in the low 60's. By contrast, 12.10 and 13.04 seldom get cooler than 80c and easily reach the 90s. Which is dangerously close to hardware destruction.

Revision history for this message
codeslinger (codeslinger) wrote :

12.04 low latency 64 bit

idle to max load and back to idle

max temp was 79c

Revision history for this message
codeslinger (codeslinger) wrote :

13.04 beta 2 low latency 64 bit

prolonged idle trying to find the lowest possible temperature

this was from before installing updates, but after installing updates I saw no difference.

Revision history for this message
codeslinger (codeslinger) wrote :

13.04 beta 2 low latency 64 bit

idle to max load and back to idle

note: It did not reach it's max peak temperature, I shut down the load once it reached the 90's

Revision history for this message
codeslinger (codeslinger) wrote :

13.04 beta 2 Generic 64 bit - idle to max to idle - temperature and cpu log

general impression is that teh generic might be slightly cooler than low latency, but not a significant difference.

again. once it hit the 90's I shut down the load.

also I did some prolonged idleing at the start because it was still cooling down from the prvious test.

Revision history for this message
codeslinger (codeslinger) wrote :

2 further thoughts...

on 12.10 I also tried the 32 bit low latency kernel with similar high temperatures.

on 13.04 during prolonged idle I noticed an interesting behavior;

when starting from cool... and running at idle, it heats up to 79c at which point the fan kicks into high gear, it then cools back down to about 75c where the temperature stabilizes.

You can see this in the data, just know that there was no change in load, it was idle the whole time. But the fan got quite a bit louder.

Revision history for this message
codeslinger (codeslinger) wrote :

ubuntu 10.10 generic 32 bit

here is the data for 10.10, it's actually closer to 12.04 then I thought it was. The chief difference is that with 12.04 I am always aware of the fan noise even at idle, but with 10.10 at idle the fan is effectivly silent.

I had to wait until the computer was cold to run this test

temperature stays in mid to high 50s

Revision history for this message
codeslinger (codeslinger) wrote :

ubuntu 10.10 generic 32 bit

idle to max and then back to idle

note it took several minutes of fumbling around to get a consistant high load... my usual technique is to open severel virtualboxes and then have each one play a video, but on this os, I have not used it in so long that the software was not compatible with my existing vms so I had to create some new ones.

it peaked at 83c which is actually higer than 12.04, but maybe it was just because I was trying harder to get it under load. The peak temperature while under max load on 10.10, it still far less that peak temp on 12.10/13.04 in fact it is very close to the idle temp of the newer oses.

towards the end of the test, the temperature started going back up... I noticed there was some network activity and suspect some background auto-updater task had kicked in.

Revision history for this message
codeslinger (codeslinger) wrote :

I used lm-sensors to get the temperature data, unfortunately lm-sensors can't read the radeon's temp in the older oses.

summary: - temperature overheating probably radeon
+ temperature overheating of cpu and radeon in 12.10 and above
Kaj Ailomaa (zequence)
affects: linux-lowlatency (Ubuntu) → xserver-xorg-video-ati (Ubuntu)
Revision history for this message
codeslinger (codeslinger) wrote :

very curious... per irc with Kaj...

on 12.04 64 bit UbuntuStudio

installed 3.8.0-18-generic kernel which was backported for testing purposes.
see: https://launchpad.net/~ubuntu-x-swat/+archive/r-lts-backport

over-all result is that this kernel is running about 6 degress hotter than 12.04 with the stock 3.2.0-39-lowlatency kernel

contrast this with 13.04 running the 3.8.0-17-generic kernel, which is about 15 degrees hotter at 100% load and 20 degrees hotter at idle...

Revision history for this message
codeslinger (codeslinger) wrote :

P.S. it was not a totally apples to apples comparision because in the process of installing the kernel I ended up switching the video driver from the amd to the xorg. but the results should be pretty close given that in previous tests on 12.10 I so no significant difference in temperature between the different drivers.

Revision history for this message
Kaj Ailomaa (zequence) wrote :

I was looking at your HW specs. Seems like a fairly new computer, but something that did a huge difference for me, was blowing out dust from the fan outtake. I wasn't able to see the dust visually. You only notice it when you blow it. If you want to try this, please make sure you open the back first, so the dust has somewhere to go. May be worth a shot anyway.

Have you looked at if the CPU itself is prone to high differences in temperature during different loads? Your differences seem quite abnormal. Also, have you googled on if the linux kernel is not handling that CPU well.

This bug report was initially about graphic card temperatures, which is separated from CPU temperatures. The two are unlikely to be related.

Revision history for this message
codeslinger (codeslinger) wrote :
Download full text (4.5 KiB)

apparently I am still not doing a very good job of explaining the situation... let me try again.

on 10.10 the best case idle temp is 51C
on 12.04 the best case idle temp is about 57C
on 13.04 the best case idle temp is 75C (that is at idle! and more typically it is low 80's)

under full load:
the max temp of 10.10 is 83C
the max temp of 12.04 is 81C
the max temp of 13.04 is 96C and climbing... (it was shutdown rather than risk meltdown)

it is expected that 12.04 would be a little bit hotter than 10.10 because 10.10 is 32 bits and generic whereas the others are 64 bits and low latency. The difference between 10.10 at full load and 12.04 at full load was probably due to lack of consistency in the load itself, in any case, they are close enough not to matter.

Suppose that blowing the dust out of the fan achieved an unrealistic gain of 10C in cooling. That would still leave 13.04 at 86C (or higher) which really is still too hot. However, blowing the dust out of the fan would be expected to affect all of the oses equally. So that still leaves us with a differential of 75 - 51 = 24C at idle and 96 - 83 = 13C at 100% load, between the oses.

That is a huge difference and the only source of that difference is the software. If I were running these tests years apart, that would be a different thing, and it would be reasonable to blame dust for the difference, but I am not; instead I have a multi-boot setup and I am running the tests within ten minutes of each other, so hardware differences are ruled out because it is the same hardware and the same amount of dust.

Furthermore, on 10.10 when the temp gets down to around 55C, the Fan Shuts Off... therefore however much dust there might be, it is not even a factor for the temperature of 10.10 because under light load or idle it does not even use the fan. With the other os versions the fan never shuts off, but 12.04 can get pretty quiet, 13.04 the fan is always loud even at idle.

so there is indeed a very serious problem here.

I started out thinking this was a video driver issue, because I started out with the observation that at idle up to moderate loads the video was consistently 2 to 3 degrees hotter than the other temps. However, in further testing I observed that at high loads the other temperatures greatly exceeded the video temperature. So at that point I back-tracked on my assumptions about the video driver being the sole culprit.

It has been my experience as a programmer that when one is presented with a complex set of symptoms, it is usually the result of multiple bugs appearing to be a single problem. So, I'm inclined to suspect that there is a problem with the radeon but that there is also a problem with the cpu scheduler too.

Right now I am working on two things. The first thing is that I have done a fresh clean/new install of 12.04, because my main version of 12.04 has had a lot of changes made, so I want a pristine os for testing. The other chief advantage is that on the stock os, lm-sensors is able to read the radeon temperature, but on my main 12.04 lm-sensors is no longer able to read the radeon temp. So, with lm-sensors working properly we can have a direct com...

Read more...

Revision history for this message
madbiologist (me-again) wrote :

The better power management described in comment #6 is still coming, but in the meantime you might like to try some of the basic ATI/AMD Radeon power management setting described at http://wiki.x.org/wiki/RadeonFeature#KMS_Power_Management_Options
See also http://www.x.org/wiki/radeonBuildHowTo#radeon-KMS_power-management

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Confirmed
Revision history for this message
madbiologist (me-again) wrote :

The better power management for AMD/ATI Radeon R600 and newer hardware (as described in comment #6) is finally available in the upstream 3.11 linux kernel. The first release candidate (3.11-rc1) of the 3.11 kernel is available at http://kernel.ubuntu.com/~kernel-ppa/mainline/ and instructions on how to install and uninstall it are available at https://wiki.ubuntu.com/Kernel/MainlineBuilds

To use this power management for the AMD/ATI Radeon you will need to select it at boot by adding radeon.dpm=1 to your GRUB kernel boot options as described at https://help.ubuntu.com/community/Grub2/Troubleshooting#Editing_the_GRUB_2_Menu_During_Boot

Revision history for this message
madbiologist (me-again) wrote :

See the blog post at http://www.botchco.com/agd5f/?p=57 for further information.

Unlike the older dynpm method, the new DPM method works with multiple monitors and there shouldn't be any flickering as the performance level changes are handled by dedicated hardware rather than the driver.

Revision history for this message
madbiologist (me-again) wrote :

I neglected to mention that to use the new power management feature on R700 and newer hardware (other than APUs) requires installation of the latest AMD graphics microcode (ucode) files to /lib/firmware/radeon
These are available at http://people.freedesktop.org/~agd5f/radeon_ucode/
Get the version ending in "smc".

R700 basically means Radeon HD 4000 series and newer. However note that according to Wikipedia and http://xorg.freedesktop.org/wiki/RadeonFeature/#index5h2 the Mobility Radeon HD 4225/4250 is a RV620 chip, so anyone with one of those shouldn't need the updated firmware files.

Revision history for this message
madbiologist (me-again) wrote :

Kernel 3.11.0-1.4 (based on the upstream 3.11-rc4 kernel) is now available in Ubuntu 13.10 "Saucy Salamander". Also, kernel 3.11.0-2.5 is in Saucy-proposed, which is based on the upstream 3.11-rc5 kernel, which has some bugfixes for the new DPM method.

Changed in xserver-xorg-video-ati (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
madbiologist (me-again) wrote :

The currently under-development 3.13 upstream kernel enables DPM by default (without needing the radeon.dpm=1 boot parameter I mentioned above) for Radeon HD 4000 through Radeon HD 7000 series graphics processors but with some specific ASICs being excluded.

Revision history for this message
codeslinger (codeslinger) wrote :

Hi madbiologist, thank you very much for your persistance. (somehow launchpad no longer shows this bug in my list...)
I did try some of your suggestions above, but nothing really seemed to help.

I am happy to report that UbubtuStudio 14.04 BETA 1 (Alpha) AMD64 is greatly improved compared to the 13.x versions.

it is still nowhere near as cool as it was in 10.10, but the 14.04 temperature is similar to but not quite as good as what I used to see in 12.04. It runs several degrees hotter than 12.04, but a lot cooler than 13.x.

After the recent round of updates, 12.04 is now also running much hotter :-( so I was getting desperate for a fix.

I am happy to say that 14.04 is running cool enough to be usable, the range is mid 60's to low 80's, whereas I was never able to use 13.x due to overheating.

10.10 is proof that it could still run much cooler (low 50's), but 10.10 had other problems (audio, bluetooth).

so overall this is a huge win.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.