ACPI: Unable to turn cooling device 'on' (Quadcore-AMD64, Ubuntu64)

Bug #314001 reported by schloegl
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
acpi (Ubuntu)
Invalid
Undecided
Unassigned
acpi (openSUSE)
New
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

Binary package hint: acpi

kern.log shows this information:

...
Jan 2 09:09:45 schloegl-desktop kernel: [ 136.114080] sd 7:0:0:0: Attached scsi generic sg6 type 0
Jan 2 18:01:35 schloegl-desktop kernel: [32046.051225] usb 6-2: USB disconnect, address 2
Jan 2 18:06:37 schloegl-desktop kernel: [32348.396058] ACPI Exception (thermal-0469): AE_ERROR, ACPI thermal trip point state changed
Jan 2 18:06:37 schloegl-desktop kernel: [32348.396083] Please send acpidump to <email address hidden>
Jan 2 18:06:37 schloegl-desktop kernel: [32348.396085] [20080609]
Jan 2 18:06:37 schloegl-desktop kernel: [32348.396540] ACPI: Critical trip point
Jan 2 18:06:37 schloegl-desktop kernel: [32348.396568] Critical temperature reached (71 C), shutting down.
Jan 2 18:06:37 schloegl-desktop kernel: [32348.396598] ACPI: Unable to turn cooling device [ffff88012fa34a60] 'on'
Jan 2 18:06:43 schloegl-desktop kernel: [32354.396226] Critical temperature reached (58 C), shutting down.
Jan 2 18:06:47 schloegl-desktop kernel: [32358.303777] ip6_tables: (C) 2000-2006 Netfilter Core Team
Jan 5 09:01:55 schloegl-desktop kernel: Inspecting /boot/System.map-2.6.27-9-generic

Unfortunately I was unable to find the file acpidump.

This happenend on my new quad-core amd64 with ubuntu 64, starting a heavy computing batch (using matlab, distributing the load on all 4 cpu's) before I left work on Jan 2nd. About 10 minutes later the machine shut down by itself.

Revision history for this message
schloegl (alois-schloegl) wrote :

Attached is the ascidump

Revision history for this message
Chris Coulson (chrisccoulson) wrote :

This is not an acpi bug. The acpi source package just contains some utilities for inspecting the ACPI subsystem. This bug should be assigned to the kernel only

Changed in acpi:
status: New → Invalid
Revision history for this message
schloegl (alois-schloegl) wrote :

This issue is now discussed here:

http://bugzilla.kernel.org/show_bug.cgi?id=13573

Revision history for this message
schloegl (alois-schloegl) wrote :

Thanks to the effort of Thomas Renninger and Zhang Rui, we found a solution to the problem described above:

1) Change Bios Setting "AMD K8 Cool&Quite control": to "AUTO" (the default setting was "disabled")
2) download and extract kernel 2.6.30.2 or 2.6.31-rc3 or later
3) Apply this patch http://bugzilla.kernel.org/show_bug.cgi?id=13573#c37
   Compile and install the kernel
4) edit /boot/grub/menu.lst and add "thermal.psv=65" at every line starting with "kernel /boot/vmlinux ..." like this one:
        kernel /boot/vmlinuz-2.6.31-rc3-some-string-here ro quiet splash thermal.psv=65
5) reboot with new kernel

6a) Run as root these:
    echo 1 > /proc/acpi/thermal_zone/THRM/cooling_mode
    echo 10 >/proc/acpi/thermal_zone/*/polling_frequency
This is needed after every startup

6b) In order to do this automatically at every boot, use the attached script /etc/init.d/cputhrottling
and run as root
    chmod 0755 etc/init.d/cputhrottling
    update-rc.d cputhrottling start 50 S .

These steps solved the described problem. An alternative solution might become viable in future with coreboot:
http://www.coreboot.org/pipermail/coreboot/2009-June/050040.html

What is the message ?

A main reason for the problem was the bios:
1) According to this http://bugzilla.kernel.org/show_bug.cgi?id=13573#c52
the _PSL in the Bios refers to a non-existing device.

2) The bios does not report a trip point change

3) The default bios setting for "AMD K8 Cool&Quiet control" was "disable" instead of "AUTO". This broke the ACPI conformance of the Bios.
The problem became visible by this log message:
[ 4.114549] [Firmware Bug]: powernow-k8: Your BIOS does not provide ACPI
_PSS objects in a way that Linux understands. Please report this to the Linux
ACPI maintainers and complain to your BIOS vendor."
Although I contacted the vendor (where I bought the machine), no solution was found. The problem is that neither the vendor, nor the mainboard manufacturer, nor the manual of mainboard (with a detailed description of the Bios)
pointed this out. I'm wondering how one could know about this ? I found out only after spending hours to investigate the various bios settings.

This case seemed to resemble the issues discussed here [1][2][3][4].
In the present case, it was not an AMI bios but a Phoenix AWARD bios. The vendor,
and the mainboard manufacturer were not able to provide any helpful support. It seems they and the users are
taken hostage by the bios provider. There is really a need to open up the bios.

[1] http://ubuntu-virginia.ubuntuforums.org/showthread.php?t=869249
[2] http://ubuntuforums.org/showthread.php?t=871311
[3] https://bugzilla.redhat.com/show_bug.cgi?id=456352
[4] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/251338

Revision history for this message
jajaX (jajaplanet) wrote :

Hi ! (sorry for my bad english)

same "problem" since little time on kernel 2.6.28-15 on kubuntu 9.04 with and INTEL Q6600

kern.log =>

Aug 24 23:52:39 quadcore kernel: [53353.941816] ACPI: Critical trip point
Aug 24 23:52:39 quadcore kernel: [53353.941832] Critical temperature reached (121 C), shutting down.
Aug 24 23:52:39 quadcore kernel: [53353.941906] ACPI: Unable to turn cooling device [f6c1bba0] 'on'
Aug 24 23:52:45 quadcore kernel: [53360.140210] ACPI: Critical trip point
Aug 24 23:52:45 quadcore kernel: [53360.140223] Critical temperature reached (121 C), shutting down.
Aug 24 23:52:45 quadcore kernel: [53360.140622] ACPI: Unable to turn cooling device [f6c1bba0] 'on'
Aug 24 23:52:55 quadcore kernel: [53366.340220] ACPI: Critical trip point
Aug 24 23:52:55 quadcore kernel: [53366.340230] Critical temperature reached (121 C), shutting down.
Aug 24 23:52:55 quadcore kernel: [53366.340658] ACPI: Unable to turn cooling device [f6c1bba0] 'on'
Aug 24 23:52:57 quadcore kernel: [53372.540198] ACPI: Critical trip point
Aug 24 23:52:57 quadcore kernel: [53372.540207] Critical temperature reached (121 C), shutting down.
Aug 24 23:52:57 quadcore kernel: [53372.540220] ACPI: Unable to turn cooling device [f6c1bba0] 'on'
Aug 24 23:53:03 quadcore kernel: [53378.740238] ACPI: Critical trip point
Aug 24 23:53:03 quadcore kernel: [53378.740248] Critical temperature reached (121 C), shutting down.
Aug 24 23:53:03 quadcore kernel: [53378.740259] ACPI: Unable to turn cooling device [f6c1bba0] 'on'
Aug 24 23:53:10 quadcore kernel: [53384.940189] ACPI: Critical trip point
Aug 24 23:53:10 quadcore kernel: [53384.940199] Critical temperature reached (121 C), shutting down.
Aug 24 23:53:10 quadcore kernel: [53384.940211] ACPI: Unable to turn cooling device [f6c1bba0] 'on'

computer =>

abit Fatal1ty FP-IN9 SLI
Intel Core 2 Quad Q6600
2048 Mo DDR2
2 Asus silent 8600 Gt 512Mo
kubuntu Jaunty Jackalope 9.04 (32 bits) kde 4.3

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi schloegl,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/lucid.

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 314001

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
schloegl (alois-schloegl) wrote : Re: [Bug 314001] Re: ACPI: Unable to turn cooling device 'on' (Quadcore-AMD64, Ubuntu64)

Hi Jeremy,

The problem is solved. This bug can be closed.

Alois

Jeremy Foshee wrote:
> Hi schloegl,
>
> This bug was reported a while ago and there hasn't been any activity in
> it recently. We were wondering if this is still an issue? Can you try
> with the latest development release of Ubuntu? ISO CD images are
> available from http://cdimage.ubuntu.com/releases/lucid.
>
> If it remains an issue, could you run the following command from a
> Terminal (Applications->Accessories->Terminal). It will automatically
> gather and attach updated debug information to this report.
>
> apport-collect -p linux 314001
>
> Also, if you could test the latest upstream kernel available that would
> be great. It will allow additional upstream developers to examine the
> issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once
> you've tested the upstream kernel, please remove the 'needs-upstream-
> testing' tag. This can be done by clicking on the yellow pencil icon
> next to the tag located at the bottom of the bug description and
> deleting the 'needs-upstream-testing' text. Please let us know your
> results.
>
> Thanks in advance.
>
> [This is an automated message. Apologies if it has reached you
> inappropriately; please just reply to this message indicating so.]
>
>
> ** Tags added: needs-kernel-logs
>
> ** Tags added: needs-upstream-testing
>
> ** Tags added: kj-triage
>
> ** Changed in: linux (Ubuntu)
> Status: New => Incomplete
>

Revision history for this message
Al Bogner (ubuntu-forum) wrote :

I think the problem isn't

I tried to install with a rw-cd from xubuntu-10.04-desktop-i386.iso and couldn't install, because of "Critical temperature reached (121C), shutting down"

Since this is an old test machine, I have Opensuse 11.2 installed too. I could finish the install with Opensuse, but after a while I got the same message and the machine halted. Actually the bios means that everything is ok with the temperature (below 50°C). With Suse 11.2 I use 2.6.31.12-0.2-default

So I assume it is a general kernel problem. Here are more details of the machine:
http://www.smolts.org/client/show/pub_de5ac227-d011-438c-8f2c-cbfac07a8e04

I also think there is a problem with the installer of xubuntu-10.04-desktop-i386.iso. I couldn't install yesterday with the dvd-rom. When I try to install from the SCSI-burner the system boots, but then /dev/sr0 cannot be found. It looks like it works best with /dev/sr2. I will try again when it is cooler. I think the room temperature is hard on the limit to work or not to work.

Revision history for this message
schloegl (alois-schloegl) wrote :

Did you consider the possibility that the CPU is really overheating ? In that case, you should be glad that the machine shut down and avoided further harm.

The hints from here http://www.coreboot.org/pipermail/coreboot/2009-June/050104.html were most useful:
- CPU Cooler properly mounted using a good thermal conductivity grease ?
- Is cpu-throtting (dynamic frequency scaling) enabled in the Bios ? Make sure it is enabled.
Enabling cpu-throttling and a larger cooler solved the problem for me.

Other possibilities are:
- Did you check whether the CPU fan is available/running ?
- Its unlikely but possible that the temperature sensor is not working correctly.

Revision history for this message
Al Bogner (ubuntu-forum) wrote :

Yes, I considered this, but I can't believe the the cpu is too hot. When I switch on the pc after it was unused for hours, I see this message immediately when I try to install xubuntu, also it doesn't stop here. I can go on for a while, but then the installation stops with an useless/undefined message and starts the live cd. I tried it very often and it is always the same with xubuntu.

I tried element OS, which is based on xubuntu. The chance is a lot better, that I can install it, but it could be that the installation halts. And the last time it stopped during the nvidia-install.

With Opensuse I get the message too, but the pc could run for hours. The last time it shutdown during an update.

It looks like I am very close to a limit of, don't know what. Maybe there is a relation the the room temperature. It is 23°C outside now. The room has about the same temperature.

The bios always mention temperatures, which are far away to be critical.

After the element os installation I have a system temp of 40°C and a cpu temp of 45°C according to the bios.

Do you think the bios temperatures are wrong? At least after I switch the pc on, I trust the temperatures, when it says below 40°C and then the warning with 121°C also appears. It is always 121°C, very strange.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.