Critical temperature reached (100 C), shutting down. throttling too little, too late.

Bug #563815 reported by Wolfgang Kufner
46
This bug affects 8 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

[Workaround 1 (quick)]
As a workaround the maximum cpu frequency can be limited:
as root: echo 1333000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq; echo 1333000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
to check the current maximum frequency:
grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq

[Workaround 2 (preferred)]
Clean the notebook. There is, in time, no way around this if the notebook is supposed to run with more than 1GHz on 100% load. There is massive dust caking between the fan and the cooling fins.
The main service hatch needs to be opened with a screwdriver. Then two screws for the fan need to be removed. The fan can then be moved away. The dust cake can then easily be removed.
Caution, e.g. with regard to static electricity and loss of warranty, needs to be applied. The risk is entirely your own.
Result of the cleaning operation: the temperature settles at a mere 70°C on 100% load (at the full 2.17 GHz) on a flat table at about 25°C room temperature.
Conclusion: It would still be nice if temperature control were more agressive. This would also conceivably be advantageous when the notebook is not on a hard, flat surface and cooling is therefore hampered. And temperature control measures at over 90°C would never get in the way of normal operation anyway, where temperature does not get that high.

[Report]
Under heavy use the Acer Extensa 5630Z always overheats and shuts down. I noticed this in lucid, but karmic shows the same principal problem.
The main culprit seems to be that the clock frequency is reduced too little and too late.
It should reduce clock frequency more/sooner to avoid the forced shutdowns.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-20-generic 2.6.32-20.30
Regression: No
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-20.30-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-20-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: wolfgang 1230 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf4800000 irq 21'
   Mixer name : 'Intel G45 DEVCTG'
   Components : 'HDA:10ec0268,1025013c,00100101 HDA:14f12c06,10250093,00100000 HDA:80862802,80860101,00100000'
   Controls : 21
   Simple ctrls : 12
Date: Thu Apr 15 15:08:59 2010
HibernationDevice: RESUME=UUID=d4760b83-2892-4b31-bf01-cb6c7d267f1f
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Alpha amd64 (20100226)
MachineType: Acer Extensa 5630
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-20-generic root=/dev/sda7 i915.powersave=0 mmc_removable=0 thermal.psv=80 quiet splash
ProcEnviron:
 LANG=en_US.utf8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.33
RfKill:
 0: acer-wireless: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
StagingDrivers: rt2860sta
Title: [STAGING]
dmi.bios.date: 12/05/2008
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: V1.25
dmi.board.name: Homa
dmi.board.vendor: Acer
dmi.board.version: Rev
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvrV1.25:bd12/05/2008:svnAcer:pnExtensa5630:pvr0100:rvnAcer:rnHoma:rvrRev:cvnAcer:ct10:cvrN/A:
dmi.product.name: Extensa 5630
dmi.product.version: 0100
dmi.sys.vendor: Acer

Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :
Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

I have stressed the cpu with two instances of:
cat /dev/zero > /dev/null

and logged this till forced shutdown (the stressing begins later than the log):
watch -n 1 "date>>temperature;grep . /proc/acpi/thermal_zone/TZS0/temperature>>temperature;grep . /proc/acpi/thermal_zone/TZS1/temperature>>temperature;grep . /sys/devices/system/cpu/cpu*/cpufreq/*>>temperature;dmesg |grep ACPI>>temperature"

The last log entry shows frequency being reduced to minimum, but by that time it is too late:
scaling_cur_freq:1000000
scaling_max_freq:1299600

Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

Forgot to add that I booted with thermal.psv=80 to reduce the trip point. That did not help.

grep . /proc/acpi/thermal_zone/TZS*/*
/proc/acpi/thermal_zone/TZS0/cooling_mode:<setting not supported>
/proc/acpi/thermal_zone/TZS0/polling_frequency:<polling disabled>
/proc/acpi/thermal_zone/TZS0/state:state: ok
/proc/acpi/thermal_zone/TZS0/temperature:temperature: 53 C
/proc/acpi/thermal_zone/TZS0/trip_points:critical (S5): 98 C
/proc/acpi/thermal_zone/TZS0/trip_points:passive: 80 C: tc1=0 tc2=50 tsp=0 devices=CPU0 CPU1
/proc/acpi/thermal_zone/TZS1/cooling_mode:<setting not supported>
/proc/acpi/thermal_zone/TZS1/polling_frequency:<polling disabled>
/proc/acpi/thermal_zone/TZS1/state:state: ok
/proc/acpi/thermal_zone/TZS1/temperature:temperature: 53 C
/proc/acpi/thermal_zone/TZS1/trip_points:critical (S5): 98 C

Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

sudo cat /proc/acpi/dsdt > dstd.dat

Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

sudo acpidump > acpi.dat

Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

Kernel log for the overheating session (the bug was filed with ubuntu-bug in the session _after_ that).

Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

Correction to comment #3: The hand-logged overheat session was _not_ booted with thermal.psv=80. Therefore in that session the trip point was higher:
/proc/acpi/thermal_zone/TZS0/trip_points:passive: 95 C: tc1=0 tc2=50 tsp=0 devices=CPU0 CPU1
sorry for the confusion.

Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

Stuff that happened in other sessions:

Quite often the temperature reading seems to be confused. One of the two temperature readings dropped to 30° and stuck there till just before force shutdown: From that log (attached thermal.ondemand.log):

Wed Apr 14 21:14:16 CEST 2010
temperature: 96 C
temperature: 96 C
Wed Apr 14 21:14:17 CEST 2010
temperature: 30 C
temperature: 96 C
...
Wed Apr 14 21:15:58 CEST 2010
temperature: 30 C
temperature: 99 C
Wed Apr 14 21:15:59 CEST 2010
temperature: 100 C
temperature: 98 C

But it turned out that overheating will also occur without this confused sensor reading.

Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

With the powersave governor the temperature does never rise above 71° even under full load. This shows that with agressive enough frequency reduction a thermal emergency shutdown need never occur (at least as long as the fan is getting air).

Revision history for this message
ceg (ceg) wrote :

see also Bug #370173

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Wolfgang,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
description: updated
Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

Just did the testing with the mainline build 2.6.35rc1 in maverick and also with the stock maverick kernel.

Same picture as before: scaling_max_freq goes down to 1.7 GHz when things get hot, but it is not enough. The temperature rises further, but no further automatic reduction of scaling_max_freq occurs (e.g. to 1.3 or even 1.0) (at least not before the very last seconds). Forced overheat shutdown occurs.

tags: removed: kj-expired needs-upstream-testing
Changed in linux (Ubuntu):
status: Expired → New
tags: added: maverick
Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

Just tried boot parameter acpi_osi=Linux on 2.6.35rc1 mainline in maverick. No difference.

Changed in linux (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
tags: added: kernel-needs-review kernel-therm
Andy Whitcroft (apw)
tags: added: kernel-candidate kernel-reviewed
removed: kernel-needs-review
Andy Whitcroft (apw)
summary: - [STAGING] Critical temperature reached (100 C), shutting down.
- throttling too little, too late.
+ Critical temperature reached (100 C), shutting down. throttling too
+ little, too late.
tags: removed: kernel-candidate
description: updated
Revision history for this message
Wolfgang Kufner (wolfgangkufner) wrote :

I did a run with with those new maverick-fwts testing images. "Changing passive trip point seems uneffective for Zone TZS1." looks like it might be relevant. All the rest is attached.

Revision history for this message
cameleon (el-cameleon-1) wrote :

I got the same problem with a fresh install of Natty on a Dell 1525N notebook, for instance when I watch a flash video within Firefox.

I can read in the syslog:
"Aug 8 22:36:21 Inspiron-1525 kernel: [12219.014818] Critical temperature reached (99 C), shutting down."

I think that the user should at least be informed of the problem by a message or notification just before the shutt down, because he has no way to guess the cause of the crash.

Revision history for this message
Alexander Hunziker (alex-hunziker) wrote :

I have the same problem on a Thinkpad T400s. I had Lenovo replace the entire fan with a brand new one, this didn't change the situation at all.

Revision history for this message
Alexander Hunziker (alex-hunziker) wrote :

In my case it turned out that the mainboard of the machine was faulty. Lenovo installed a new one and the machine behaves now.

Revision history for this message
Ken Hanks (kleeh) wrote :

I think it's worth noting that in my case (Toshiba A305 Satellite) and in the case of others that I've read on this and other forums, this overheating thing is *not* an issue with Windows 7 or Windows Vista. So am I going to have to revert to trying to use Ruby on Windows?? Please don't make me do that...

Revision history for this message
madbiologist (me-again) wrote :

Hi Ken. As you have different hardware you might have a different bug. In fact, if your Toshiba A305 Satellite has ATI Radeon 3650 512MB graphics like some other A305's then a significant amount of the heat might be coming from the graphics hardware. The 2.6.35 kernel used in Ubuntu 10.10 introduced some basic ATI power management which can help reduce heat, but it's still not as good as the proprietary Catalyst (fglrx driver). See http://www.x.org/wiki/radeonBuildHowTo#radeon-KMS_power-management for instructions on how to enable and use these power management settings.

Revision history for this message
penalvch (penalvch) wrote :

Wolfgang Kufner, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily kernel folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11.1

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.