Nvidia GPU overheating on Toshiba P100

Bug #484875 reported by JP
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
NVIDIA Drivers Ubuntu
New
Undecided
auto-linux-nforce-bugs
Nominated for Trunk by Yann
acpi (Ubuntu)
Triaged
High
Unassigned
nvidia-graphics-drivers (Ubuntu)
Confirmed
Undecided
Unassigned
nvidia-graphics-drivers-180 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

******* Problem *********
Toshiba Satellite P100 PSP-A3C laptop, Nvidia GeForce Go 7600, running Karmic 2.6.31-15, Nvidia proprietary driver 185.

GNOME-Sensors applet reports CPU stable at about 60 C, GPU climbs slowly up to about 90 C, then the system shuts down.

On 9.04, I had the same problem, but I was able to load a custom DSDT, after which the fan ran properly, and the system stayed cool, GPU around 65 C. Also stays cool under Windows XP.

I understand that it is not possible to load a custom DSDT under karmic, so I need another solution.

******** Possible Fix **********
After some more hacking in DSDT, Yann found that fan control is located in ACPI EC (embedded controller) address space : VTMP acpi byte in acpi EC bit field, at offest 0x5E.

Yann found some similar concerns on some acer laptops (having the same WMI related acpi methods, located in device AMW0), and with a perl script that Yann was able to modify in order to setup a constant fan speed. That perl script is attached to this report.

Philip Muškovac (yofel)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Rakhmad Azhari (rakhmad-azhari) wrote :

Confirm the same issue here. My laptop is HP Pavilion dv5162ea. NVIDIA GeForce Go 7400, running openSUSE 11.2 with proprietary driver 190.42.

CPU temperature climbs up fast from 24C to 50C in just 15 minutes. Programs running are Opera, ChoqoK, pidgin, and JDownloader.

Prior to openSUSE 11.2, I was running openSUSE 11.1 and the temperature was stable around 65C. Never had to load a custom DSDT.

Revision history for this message
WeatherGod (ben-v-root) wrote :

JP, thank you for taking time to report this issue. To help us better diagnose your issue, please follow the guidance in the following link: https://wiki.ubuntu.com/DebuggingACPI

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
WeatherGod (ben-v-root) wrote :

Rakhmad, because you don't have *exactly* the same kind of laptop as JP has, it would be best to file your issue as a separate bug, following the guidance in the link I gave above.

Revision history for this message
JP (jparker-farcomm) wrote :

Output of uname -a

Revision history for this message
JP (jparker-farcomm) wrote :
Revision history for this message
JP (jparker-farcomm) wrote :
Revision history for this message
JP (jparker-farcomm) wrote :

Hibernated, restarted, I believe only the CPU fan is running now at very low speed

No such file as /var/log/kern.log.0

Tried to run:

sudo cp -r /proc/acpi /tmp

got:

cp: cannot open `/proc/acpi/event' for reading: Device or resource busy

Revision history for this message
JP (jparker-farcomm) wrote :

/proc/acpi/fan is empty and /proc/acpi/thermal_zone/*/trip_points has no active trip points

Revision history for this message
WeatherGod (ben-v-root) wrote :

JP, thank you for this information. It will be very useful. In the meantime, could you try out the new NVidia driver, version 190.42 that has recently been made available? I don't know if it would fix your problem, but it is worth a shot. Supposedly, there is some new code in there to handle fan speed.

Revision history for this message
JP (jparker-farcomm) wrote :

Tried 190.42, same result.

Also tried nvclock utility, it reports that my card does not support fan speed control.

Revision history for this message
WeatherGod (ben-v-root) wrote :

JP, thanks for testing that. Have you tried using the default drivers instead of the proprietary drivers offered by NVidia?

Revision history for this message
WeatherGod (ben-v-root) wrote :

Moving this bug report to the nvidia drivers package.

affects: linux (Ubuntu) → nvidia-graphics-drivers-180 (Ubuntu)
Revision history for this message
WeatherGod (ben-v-root) wrote :

JP, before switching over to the free/open drivers, I would like to forward this bug report to the nvidia people. They would like to have a particular log file included in the bug report that can be generated by running: sudo nvidia-bug-report.sh
It will produce a file called nvidia-bug-report.log in your current working directory. If you can upload that log file to this bug report, I can then push this report up to nvidia.

After that, feel free to test out the open/free drivers that are available.

Revision history for this message
JP (jparker-farcomm) wrote :
Revision history for this message
JP (jparker-farcomm) wrote :

The open driver seems to be limited in that I can no longer switch to my external monitor. The fans are very quiet, which leads me to believe they are not running properly. Libsensors can no longer monitor the GPU temp, so I cannot tell when the machine is going to shut down.

Back to Windows XP for now I guess...

WeatherGod (ben-v-root)
Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
lintunen (lintunen) wrote :

Similar problem experienced since upgrading to Karmic on Asus G1sN with NVidia 9500M GS card except overheats to the point the machine shuts down. Attempted upgrading to the latest version of the NVidia drivers (190.53 at the time), no difference.

Revision history for this message
JP (jparker-farcomm) wrote :

Finally got around to installing Fedora 12 on this machine, it does NOT have the same issue running the NVidia proprietary drivers. I am running NVidia 190.45 at a steady comfortable 49 degrees C. This is without any BIOS DSDT hacks also, which were required to get NVidia drivers to work with Ubuntu 9.04.

Bryce Harrington (bryce)
tags: added: karmic
Revision history for this message
Yann (lostec) wrote :

This bug affects Lucid Lynx and previous LTS (2.6.24 kernel in Hardy) DSDT hack is not more possible as loading custom DSDT have been removed since kernel 2.6.26.

This will be a major upgrade stopper for many laptops running previous 8.04 LTS...

Revision history for this message
Bryce Harrington (bryce) wrote :

[This is an automatic notification.]

Hi JP,

This bug was reported against an earlier version of Ubuntu, can you
test if it still occurs on Lucid?

Please note we also provide technical support for older versions of
Ubuntu, but not in the bug tracker. Instead, to raise the issue through
normal support channels, please see:

    http://www.ubuntu.com/support

If you are the original reporter and can still reproduce the issue on
Lucid, please run the following command to refresh the report:

  apport-collect 484875

If you are not the original reporter, please file a new bug report, so
we can work with you as the original reporter instead (you can reference
bug 484875 in your report if you think it may be related):

  ubuntu-bug xorg

If by chance you can no longer reproduce the issue on Lucid or if you
feel it is no longer relevant, please mark the bug report 'Fix Released'
or 'Invalid' as appropriate, at the following URL:

  https://bugs.launchpad.net/ubuntu/+bug/484875

Changed in nvidia-graphics-drivers (Ubuntu):
status: New → Incomplete
tags: added: needs-retested-on-lucid-by-june
Revision history for this message
Yann (lostec) wrote :

This bug can be solved 2 ways:
1)Handling GPU Temp regulation in nvidia driver, especially true on laptops, as done by OEMs specific drivers for windows.
2)DSDT hacks, that are no more possible with this 10.04 LTS updated kernel compared to previous 8.04 that does not integrate the DSDT patch (for instance, openSuse 11.2 reintegrated this patch just after release after users expressed their concern... and it will be in 11.3) anymore

1 may be quite tricky, as driver fan control may vary a lot depending of the hardware.
2 can be done by users, usually by finding the regulation loop in ACPI pseudo code and the fan speed control register: The trick is to put a medium value in this register in an acpi function that is called at init.

You loose the regulation, but manage to find speeds not causing overheat for a wide use range (including gaming).

For instance, on my laptop the trick is:
in acpi method _REG,
adding:
Store (0x3C, \_SB.PCI0.LPCB.EC0.VTMP)

0x3C gives a medium range not to noisy, keeping everithing cool in lunix since about 4 years...

Cannot send you an apport log, as I cannot upgrade with current kernel: My GPU would overheat in half an hour of desktop use... and fry if I don't stop!

See here about openSuse dsdt patch integrated after release (11.2, kernel 2.6.31):
https://bugzilla.novell.com/show_bug.cgi?id=533555

Changed in nvidia-graphics-drivers (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Yann (lostec) wrote :
Download full text (5.4 KiB)

After some more hacking in DSDT, I found that fan control is located in ACPI EC (embedded controller) address space : VTMP acpi byte in acpi EC bit field, at offest 0x5E.

If found some similar concerns on some acer laptops (having the same WMI related acpi methods, located in device AMW0), with a perl script that I was able to modify in order to setup a constant fan speed...

Here is the script, for 8.04 to 10.04 LTS users that found this issue and that have not already burned their GPU:

file: tosh_p100_ec.pl
***

#!/usr/bin/perl -w

# Version 1.0 (2010-05-27)
#
# *************************************************
# ** Toshiba P100 series GPU fan setup for Linux **
# ** Must be called in init/resume scripts **
# ** This bypass the need for custom ACPI kernel **
# ** no more possible in latest distros... **
# *************************************************
#
# Machines used for test (successful users, please update):
# - p100-114, PSPA3E, bios 4.80
# - ...
#
# Copyright (C) 2010 YL ; adapted from acer_ec.pl (0.6.1) by these authors:
#
# Copyright (C) 2007 Michael Kurz michi.kurz (at) googlemail.com
# Copyright (C) 2007 Petr Tomasek tomasek (#) etf,cuni,cz
# Copyright (C) 2007 Carlos Corbacho cathectic (at) gmail.com
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 3
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

require 5.004;

use strict;
use Fcntl;
use POSIX;
use File::Basename;

sub initialize_ioports
{
  sysopen (IOPORTS, "/dev/port", O_RDWR)
    or die "/dev/port: $!\n";
  binmode IOPORTS;
}

sub close_ioports
{
  close (IOPORTS)
    or print "Warning: $!\n";
}

sub inb
{
  my ($res,$nrchars);
  sysseek IOPORTS, $_[0], 0 or return -1;
  $nrchars = sysread IOPORTS, $res, 1;
  return -1 if not defined $nrchars or $nrchars != 1;
  $res = unpack "C",$res ;
  return $res;
}

# $_[0]: value to write
# $_[1]: port to write
# Returns: -1 on failure, 0 on success.
sub outb
{
  if ($_[0] > 0xff)
  {
    my ($package, $filename, $line, $sub) = caller(1);
    print "\n*** Called outb with value=$_[1] from line $line\n",
          "*** (in $sub). PLEASE REPORT!\n",
          "*** Terminating.\n";
    exit(-1);
  }
  my $towrite = pack "C", $_[0];
  sysseek IOPORTS, $_[1], 0 or return -1;
  my $nrchars = syswrite IOPORTS, $towrite, 1;
  return -1 if not defined $nrchars or $nrchars != 1;
  return 0;
}

sub wait_write
{
 my $i = 0;
 while ((inb($_[0]) & 0x02) && ($i < 10000)) {
  sleep(0.01);
  $i++;
 }
 return -($i == 10000);
}

sub wait_read
{
 my $i = 0;
 while (!(inb($_[0]) & 0x01) && ($i < 10000)) {
  sleep(0.01);
 ...

Read more...

Revision history for this message
Yann (lostec) wrote :

I add the script file as an attachement, it will be easier to get...

WeatherGod (ben-v-root)
Changed in acpi (Ubuntu):
status: New → Confirmed
description: updated
Revision history for this message
MarcRandolph (mrand) wrote :

Marking as high: uncertain how many are affected by this, but unscheduled shutdowns are never a good user experience, to say nothing of potential dataloss.

Changed in acpi (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
Revision history for this message
Yann (lostec) wrote :

Hello,

If no-one care about his one (a 4 years old laptop series: If no-one cared before, I think that will remain!)... just setup the script hereupper for startup:
-copy it in /usr/local/bin/ which is a good location for such user stuff...
-setup rc.local to launch it at startup:
sudo gedit /etc/rc.local

and add this line before "exit 0":
nohup /usr/local/bin/tosh_p100_ec.pl set_gpu_fan_med &

You can verify setup is done, after startup:
sudo sudo tosh_p100_ec.pl regs

You should see value "60" at offset 0x5E (array reading of offsets in output). If from time to times (games, high local temperatures...) you need more refresh, you can also setup rc.local with set_gpu_fan_hig parameter instead of set_gpu_fan_med

Be careful because at the beginning of my lucid migration, I expected the problem was corrected at the driver or acpi level because laptop was running cool: In fact, this is just because I had run windows before and some fan speed was already setup (variable) and probably not overwritten at ubuntu startup.

=> Use the script!

Maybe someone could make a deb package of this...

Revision history for this message
Magister95 (spdeux) wrote :

It seems I have the same bug on lucid with an Asus eeepc 1201NL.
Using the nvidia drivers helps but the laptop is still overheating and shutting down.
I tried the script. For now, the shutdown didn't happen again. Let's hope !

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.