More data for the problem:
Compaq V4000 (laptop) - Centrino 1.73G. Max CPU Temp (100 C) (from Intel)
Using Ubuntu Dapper. Under high load would claim critical temp reached and halt. dmesg shows CPU reached (102 C).
Using powersave and kpowersave
For me this problem started out of the blue - not following any kernel change or particular apt-get updates.
I use "watch 1 acpitool -tfc" to keep watch over the system atm. Noteworthy mentions are:
Fan : <not available>
Throttling control : no
Limit interface : no
critical (S5): 100 C
passive: 95 C: tc1=2 tc2=5 tsp=300 devices=0xdffea660
First 3 seems wrong for a Centrino -- but I guess that is a problem with the ACPI interface to BIOS here.
The fan (there is one) responds autonomously -- probably BIOS controlled? So does the above really matter.
Doing something like kernel compile I would see the CPU temp hovering between 80-100. Passive would kick in every now and again.
polling_interval was set to 2, I changed this to 30 and observed that sometimes the CPU temp spiked at 102, 105, 107 but for no more than 1 second then immediately dropped back to sub-100. No instability, so could be a glitch?
Sometimes Linux will hit 100+ on 30 seconds and halt.
My conclusions:
the polling is far too rigid. Perhaps it should take some averages over another interval or require a sustained critical temperature before ditching the system. (make this user configurable under /proc/acpi/ as is the rest). I like the idea of polling_interval being 2 but my system would be fine if it only acted on the critical temperature if the CPU was 100+ for more than 3 of these intervals.
The passive trip could be wrong, but that depends on the interpretation of the 100+ spikes.
I currently avoid the problem by changing things to:
The 110 attempts to offset the spike (which is a rare spike); the 90 sets the passive kick-in which takes the CPU speed to 1.3G during the passive region.
Powersave and co (tried a few) seemed to be doing their job. (note Klaptop is the only thing that can successfully suspend to RAM for me)
I'm of the belief that my hardware (1+ year old always working) is showing some minor cracks with the 100+ temp spike. But I also think the kernel could be more forgiving of it.
More data for the problem:
Compaq V4000 (laptop) - Centrino 1.73G. Max CPU Temp (100 C) (from Intel)
Using Ubuntu Dapper. Under high load would claim critical temp reached and halt. dmesg shows CPU reached (102 C).
Using powersave and kpowersave
For me this problem started out of the blue - not following any kernel change or particular apt-get updates.
I use "watch 1 acpitool -tfc" to keep watch over the system atm. Noteworthy mentions are:
Fan : <not available>
Throttling control : no
Limit interface : no
critical (S5): 100 C
passive: 95 C: tc1=2 tc2=5 tsp=300 devices=0xdffea660
First 3 seems wrong for a Centrino -- but I guess that is a problem with the ACPI interface to BIOS here.
The fan (there is one) responds autonomously -- probably BIOS controlled? So does the above really matter.
Doing something like kernel compile I would see the CPU temp hovering between 80-100. Passive would kick in every now and again.
polling_interval was set to 2, I changed this to 30 and observed that sometimes the CPU temp spiked at 102, 105, 107 but for no more than 1 second then immediately dropped back to sub-100. No instability, so could be a glitch?
Sometimes Linux will hit 100+ on 30 seconds and halt.
My conclusions:
the polling is far too rigid. Perhaps it should take some averages over another interval or require a sustained critical temperature before ditching the system. (make this user configurable under /proc/acpi/ as is the rest). I like the idea of polling_interval being 2 but my system would be fine if it only acted on the critical temperature if the CPU was 100+ for more than 3 of these intervals.
The passive trip could be wrong, but that depends on the interpretation of the 100+ spikes.
I currently avoid the problem by changing things to:
echo 5 > /proc/acpi/ thermal_ zone/THR0/ polling_ frequency thermal_ zone/THR1/ polling_ frequency 90:60:50: 40" > /proc/acpi/ thermal_ zone/THR0/ trip_points
echo 5 > /proc/acpi/
echo "110:102:
The 110 attempts to offset the spike (which is a rare spike); the 90 sets the passive kick-in which takes the CPU speed to 1.3G during the passive region.
Powersave and co (tried a few) seemed to be doing their job. (note Klaptop is the only thing that can successfully suspend to RAM for me)
I'm of the belief that my hardware (1+ year old always working) is showing some minor cracks with the 100+ temp spike. But I also think the kernel could be more forgiving of it.
http:// www.columbia. edu/~ariel/ acpi/acpi_ howto.html was a very good read.