Comment 162 for bug 22336

Revision history for this message
Len Brown (len-brown) wrote : heat vs light

Thanks Thomas, for pointing out this bug report.

There are a lot of reports here, not all the same.

Note firstly, that 4 People have said their problems went away
when they cleaned out their fan.

A bunch more said that their problems went away when they
disabled powernowd and instead used the in-kernel cpufreq governors.
This is may explain the Ubuntu-specific aspect of a bunch
of these reports.

If you have cleaned out your fan, and you are able to reproduce the
issues using the in-kernel governors instead of userspace daemons,
then I urge you to file a bug (one but per system model please)
in the upstream bugzilla:

http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI
in the Power-thermal category.

But a couple of more things need clarification:

There are two types of systems here and the way to address
them are totally different.

1. ACPI controlled fan that doesn't come on

If /proc/acpi/thermal_zone/*/trip_points
has lines that being with "active", then
you have active trip-points and ACPI fan control.

If the temperature rises above these trip points
and the fan fails to come on, then you have an
ACPI fan issue. There were several famous ones,
particularly the HP nx6125 and HP nx6325 which
have a very "interesting" BIOS that only recently
has Linux started to learn to deal with.

Note that ACPI thermals should NOT require polling.
If you need to set the polling frequency in /proc/acpi/../thermal_zone/polling_frequency
to anything but 0, then there is a kernel bug
that you are not helping to fix by working around it.

(Indeed, one could argue that the way this override
 is available is a bug and should be removed)

2. ACPI passive/critical trip points only

If you don't have active lines in your trip_points file,
but have just passive and critical, then you've
got motherboard controlled fans, and Linux
and ACPI have no influence on them.

Yes, sometimes a BIOS upgrade changes the firmware
fan control policy. Sometimes there are also BIOS SETUP
options for cool vs quiet as well.

But in both ACPI and motherboard fan cases above,
where the fans is spinning fast
and the temperature continues to climb...
CLEAN YOUR FAN! Many laptops suck air in
from the bottom and blow it through a fine grill.
This grill can get blocked with dust, making your
fan ineffective. (no, this doesn't address the
"windows works, linux doesn't" cases)

If you've got the thing open already, consider
checking that the cooling solution is properly
bolted down -- and when you check, consider
cleaning it off and using a dab or arctic silver
as the TIM when you replace it...

> ACPI: Looking for DSDT ... not found!

Ignore this -- it is Ubuntu "value add" from the patch they use to allow
overriding the DSDT with the initrd method.

> ondemand vs userspace governors

This may make the average power under typical use less.
however, ondemand is the same as "performance" when
the machine is fully utilized. ie. it may make a cooling
problem less common, but it doesn't address the worse case.

> customizing the DSDT

Don't bother. DSDT customization is primarily a debugging method.
If Linux can't handle the DSDT that came with your machine as well
as Windows does, then Linux is broken and needs to be fixed.

That said, sometimes, particularly with new machines, it is
a good idea to upgrade to the latest available BIOS,
which brings with it a new ACPI DSDT)
It is rare that any significant BIOS bugs are fixed
after 6 months of release of a new machine --
since the important ones are generally flushed
out in that period, and after that period the lead
developers are all busy working on something else:-)

> TM1 and TM2 on Intel processors

These are hardware throttling, and throttling with voltage reduction, respectively.
There is also a hardware critical poweroff feature in case these
are unable to control temperature.

However, if TM1/TM2 are being invoked, then something is _very_ wrong,
they should almost never be invoked for a properly designed system,
and they are invoked only at trip points higher than what ACPI
and Linux see.

> over-riding /proc/acpi/thermal_zone/.../trip_points

This is fiction -- and I'm currently proposing that the ability
to override these trip-points be disabled upstream to put
an end to the fiction.

Firstly, the trip-points you override change how Linux
applies policy to a given temperature, but they do
not actually change the trip points that are acted on
by the hardware. ACPI trip points are READ ONLY.
Only the firmware has the power to change them.

What this means is that if you set the OS copy of
the trip-points to something different than the
ACPI read-only trip-points - the trip points are NOT
going to fire. You need to also enable periodic
polling for Linux to get temperature updates
around your OS-copy of the trip points.

And when the REAL trip-point fires and the BIOS
updates the trip-points to implement hysteresis,
your modifications will likely be over-written.

Further, if you have ACPI controlled fans, you should be able
to echo 0 (on) and 3 (off) into /proc/acpi/fan/.../state files
to turn them on and off. Note that with multiple fans
that they often refer to the same fan at multiple speeds
and you need to do this in a certain order.

Finally, a rare machine has the ability to set the cooling_mode.
active/passive. This corresponds to the _SCP method,
and here it tells the _firmware_ to change the _real_
trip points to make the fans kick in first (active)
or throttling to kick in first (passive).

I hope this helps. I hope it doesn't come off as pedantic or
heavy-handed, but this thread is heavy on grip session
and light on facts, (much heat but little light)
which doesn't get us anywhere.

I hope to see some bug reports in bugzilla.kernel.org
for those of you who can reproduce your issues using
the latest upstream stable kernel. (eg 2.6.21.y)

thanks,
-Len Brown
Linux Kernel ACPI Maintainer