Ubuntu
acpi-support package

Bug #22336
Comment #74

Comment 74 for bug 22336

Revision history for this message

eBobster (ebobster) wrote on 2006-11-07: Re: laptop overheats when performing CPU intensive tasks.

#74

More data for the problem:
Compaq V4000 (laptop) - Centrino 1.73G. Max CPU Temp (100 C) (from Intel)
Using Ubuntu Dapper. Under high load would claim critical temp reached and halt. dmesg shows CPU reached (102 C).
Using powersave and kpowersave

For me this problem started out of the blue - not following any kernel change or particular apt-get updates.

I use "watch 1 acpitool -tfc" to keep watch over the system atm. Noteworthy mentions are:

  Fan : <not available>
  Throttling control : no
  Limit interface : no
  critical (S5): 100 C
  passive: 95 C: tc1=2 tc2=5 tsp=300 devices=0xdffea660

First 3 seems wrong for a Centrino -- but I guess that is a problem with the ACPI interface to BIOS here.

The fan (there is one) responds autonomously -- probably BIOS controlled? So does the above really matter.

Doing something like kernel compile I would see the CPU temp hovering between 80-100. Passive would kick in every now and again.

polling_interval was set to 2, I changed this to 30 and observed that sometimes the CPU temp spiked at 102, 105, 107 but for no more than 1 second then immediately dropped back to sub-100. No instability, so could be a glitch?

Sometimes Linux will hit 100+ on 30 seconds and halt.

My conclusions:

the polling is far too rigid. Perhaps it should take some averages over another interval or require a sustained critical temperature before ditching the system. (make this user configurable under /proc/acpi/ as is the rest). I like the idea of polling_interval being 2 but my system would be fine if it only acted on the critical temperature if the CPU was 100+ for more than 3 of these intervals.

The passive trip could be wrong, but that depends on the interpretation of the 100+ spikes.

I currently avoid the problem by changing things to:

  echo 5 > /proc/acpi/thermal_zone/THR0/polling_frequency
  echo 5 > /proc/acpi/thermal_zone/THR1/polling_frequency
  echo "110:102:90:60:50:40" > /proc/acpi/thermal_zone/THR0/trip_points

The 110 attempts to offset the spike (which is a rare spike); the 90 sets the passive kick-in which takes the CPU speed to 1.3G during the passive region.

Powersave and co (tried a few) seemed to be doing their job. (note Klaptop is the only thing that can successfully suspend to RAM for me)

I'm of the belief that my hardware (1+ year old always working) is showing some minor cracks with the 100+ temp spike. But I also think the kernel could be more forgiving of it.

http://www.columbia.edu/~ariel/acpi/acpi_howto.html was a very good read.

More data for the problem:
Compaq V4000 (laptop) - Centrino 1.73G.  Max CPU Temp (100 C) (from Intel)
Using Ubuntu Dapper.  Under high load would claim critical temp reached and halt.  dmesg shows CPU reached (102 C).
Using powersave and kpowersave

For me this problem started out of the blue - not following any kernel change or particular apt-get updates.

I use "watch 1 acpitool -tfc" to keep watch over the system atm.  Noteworthy mentions are:

Fan            : <not available>
  Throttling control     : no
  Limit interface        : no
  critical (S5):           100 C
  passive:                 95 C: tc1=2 tc2=5 tsp=300 devices=0xdffea660

First 3 seems wrong for a Centrino -- but I guess that is a problem with the ACPI interface to BIOS here.

The fan (there is one) responds autonomously -- probably BIOS controlled?  So does the above really matter.

Doing something like kernel compile I would see the CPU temp hovering between 80-100.  Passive would kick in every now and again.

Sometimes Linux will hit 100+ on 30 seconds and halt.

My conclusions:

the polling is far too rigid.  Perhaps it should take some averages over another interval or require a sustained critical temperature before ditching the system. (make this user configurable under /proc/acpi/ as is the rest).  I like the idea of polling_interval being 2 but my system would be fine if it only acted on the critical temperature if the CPU was 100+ for more than 3 of these intervals.