CPU/GPU Fans suddenly go 100% RPM on Asus ROG G752VT

Bug #1775717 reported by whl2
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned

Bug Description

After upgrading from Ubuntu 16.04 to 18.04, I've noticed that CPU and GPU fans suddenly go full throttle. It doesn't seem to depend on temperature, could happen when 32 °C reported, sometimes the laptop can work for day or two without this issue, but eventually it happens. The only way to stop fans is to power off the laptop, power on (fans go 100% at this point again), power off during BIOS splash screen and power on again. Seems to happen more often after waking up from a sleep.

Things tried:

Upgrading BIOS to latest available on ASUS site to the date of writing. No change.

asus-fan kernel module (https://github.com/daringer/asus-fan). Detects both fans and shows correct speed in /sys/class/hwmon/*/fan{1,2}*, but does not seem to control it and does not prevent going full throttle.

4.17.0 kernel from Ubuntu mainline builds. Doesn't seem to affect the issue.

Userspace fan control software (https://github.com/hirschmann/nbfc). Can control speed of both fans to the point where they go full throttle, after that has no effect on them.

---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: whale 3068 F.... pulseaudio
 /dev/snd/controlC0: whale 3068 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=UUID=ada9b595-bbf2-40a3-8575-5908492bb969
InstallationDate: Installed on 2014-10-20 (1333 days ago)
InstallationMedia: Ubuntu 14.04.1 LTS "Trusty Tahr" - Release amd64+mac (20140722.2)
MachineType: ASUSTeK COMPUTER INC. G752VT
NonfreeKernelModules: nvidia_modeset nvidia wl
Package: linux (not installed)
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-23-generic root=UUID=f6509d56-6183-464c-90bb-9c3be2c8abdf ro quiet splash acpi_backlight=vendor vt.handoff=1
ProcVersionSignature: Ubuntu 4.15.0-23.25-generic 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-23-generic N/A
 linux-backports-modules-4.15.0-23-generic N/A
 linux-firmware 1.173.1
Tags: bionic
Uname: Linux 4.15.0-23-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dip docker lpadmin plugdev sambashare sudo tty video
_MarkForUpload: True
dmi.bios.date: 06/29/2017
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: G752VT.304
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: G752VT
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: 1.0
dmi.chassis.asset.tag: ATN12345678901234567
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK COMPUTER INC.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrG752VT.304:bd06/29/2017:svnASUSTeKCOMPUTERINC.:pnG752VT:pvr1.0:rvnASUSTeKCOMPUTERINC.:rnG752VT:rvr1.0:cvnASUSTeKCOMPUTERINC.:ct10:cvr1.0:
dmi.product.family: G
dmi.product.name: G752VT
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK COMPUTER INC.

whl2 (whale2-box)
summary: - COU.Fan suddenly goes 100% RPM on Asus ROG G752VT
+ CPU/GPU Fans suddenly go 100% RPM on Asus ROG G752VT
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1775717

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Does the bug go away if you boot back into the 16.04 kernel?

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
whl2 (whale2-box) wrote :

Hi Joseph,

Thank you for looking into it. Can't answer yet - I'll have to boot 4.4 kernel and run it for two days at least to make sure - going to start tonight. Also will attach more logs - apport-collect didn't work with mainline kernel.

Revision history for this message
whl2 (whale2-box) wrote :

Well, the fans just went 100% on kernel 4.4.14-040414-generic after about a day of normal operation.
Any advice on what other information to collect?

Revision history for this message
penalvch (penalvch) wrote :

whl2, please execute the following command only once, as it will automatically gather debugging information, in a terminal:
apport-collect 1775717

Revision history for this message
whl2 (whale2-box) wrote : ProcCpuinfoMinimal.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
whl2 (whale2-box) wrote : ProcEnviron.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote :

Or should it be done with standard Ubuntu kernel, not mainline?

1 comments hidden view all 114 comments
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you run `sudo powertop -C out.csv` and attach out.csv here?

Revision history for this message
whl2 (whale2-box) wrote : AlsaInfo.txt

apport information

description: updated
Revision history for this message
whl2 (whale2-box) wrote : CRDA.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : CurrentDmesg.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : IwConfig.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : Lspci.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : Lsusb.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : ProcEnviron.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : ProcInterrupts.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : ProcModules.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : PulseList.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : RfKill.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : UdevDb.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote : WifiSyslog.txt

apport information

Revision history for this message
whl2 (whale2-box) wrote :

powertop results

penalvch (penalvch)
tags: added: latest-bios-304
tags: added: regression-release
description: updated
tags: added: kernel-bug-exists-upstream-4.17
Revision history for this message
penalvch (penalvch) wrote :

whl2:

1) Regarding the use of non-default kernel parameter:
acpi_backlight=vendor

Is this required for something? If you boot without this parameter does this change anything?

2) If you remove the nvidia proprietary drivers, does this provide a WORKAROUND?

3) If you remove the 3rd party software "asus-fan" and "nbfc" is this still reproducible?

Revision history for this message
whl2 (whale2-box) wrote :

Hi Christopher,

> 1) Regarding the use of non-default kernel parameter:
> acpi_backlight=vendor
>
> Is this required for something? If you boot without this parameter does this change anything?

I've added acpi_backlight=vendor to kernel parameters because once the laptop didn't restore screen backlight after plugging AC power back. It seems to be completely different story, the fan issue appeared way before this parameter was added.

> 2) If you remove the nvidia proprietary drivers, does this provide a WORKAROUND?

This one I have to test. Will post back when I get some results.

> 3) If you remove the 3rd party software "asus-fan" and "nbfc" is this still reproducible?

Yes, the issue started before. Presence of this software is a result of my (fruitless) attempts to find a workaround.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

The Package C-State stays at PC2, which causes lots of heat. It should be able to hit PC8.
C2 (pc2); 58.3%
C3 (pc3); 0.0%
C6 (pc6); 0.0%
C7 (pc7); 0.0%
C8 (pc8); 0.0%
C9 (pc9); 0.0%
C10 (pc10); 0.0%

Please follow these steps:
- Intall r8168-dkms and reboot
- Use kernel parameter "pcie_aspm=force"
- Run this as root "echo powersave > /sys/module/pcie_aspm/parameters/policy"
- Run "powertop --auto-tune"
- Check the C-State in powertop

Revision history for this message
whl2 (whale2-box) wrote :

Hi Kai-Feng,

Thank you for looking into this. I've done the steps you've described, however I'm not sure everything worked as it should:

# powertop --auto-tune
modprobe cpufreq_stats failedLoaded 0 prior measurements
Cannot load from file /var/cache/powertop/saved_parameters.powertop
File will be loaded after taking minimum number of measurement(s) with battery only
RAPL device for cpu 0
RAPL Using PowerCap Sysfs : Domain Mask 7
RAPL device for cpu 0
RAPL Using PowerCap Sysfs : Domain Mask 7
Devfreq not enabled
glob returned GLOB_ABORTED
Cannot load from file /var/cache/powertop/saved_parameters.powertop
File will be loaded after taking minimum number of measurement(s) with battery only
Leaving PowerTOP

The kernel module is loaded

# lsmod|grep r8168
r8168 524288 0

kernel parameter is present

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.17.0-041700-generic root=UUID=f6509d56-6183-464c-90bb-9c3be2c8abdf ro quiet splash acpi_backlight=vendor pcie_aspm=force vt.handoff=1

Output of powertop without options attached.

whl2 (whale2-box)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

whl2:

1) To clarify, when you were using 16.04, were you using the nvidia or nouveau driver? If nvidia, which version precisely?

2) If you uninstall the nvidia driver in your current OS, and use nouveau is there any change?

3) To keep this relevant to upstream, one will want to test the latest mainline kernel (now 4.18-rc3) as it is released. Could you please advise?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
whl2 (whale2-box) wrote :

Christopher,

1) On 16.04 I was using nvidia driver from Ubuntu packages, but can't really remember what version. I'm not sure if the history of package installation is recorded somewhere.
On the other hand, I can try running 16.04 from USB stick with several different versions of the driver.

2) Sorry, I still didn't test it, will try tomorrow. The single time I tried it before, it gave a very poor performance, laptop was almost unusable.

3) Testing 4.18 seems a way easier, I can do it.

Revision history for this message
whl2 (whale2-box) wrote :

Regarding 2) - I've managed to install nouveau driver with reasonable performance, will post in a while if the issue is reproducible.

Revision history for this message
whl2 (whale2-box) wrote :

With nouveau driver and 3d-party software dropped (asus_fan.ko and nbfc) the issue is the same.

$ lsmod|grep nouveau
nouveau 1716224 14
mxm_wmi 16384 1 nouveau
ttm 106496 1 nouveau
drm_kms_helper 172032 1 nouveau
drm 401408 17 nouveau,ttm,drm_kms_helper
i2c_algo_bit 16384 1 nouveau
wmi 24576 4 asus_wmi,intel_wmi_thunderbolt,mxm_wmi,nouveau
video 45056 2 asus_wmi,nouveau

Now going to test with kernel 4.18-rc3

Revision history for this message
whl2 (whale2-box) wrote :

For the record - dropped all 3d-party fan-controlling software, issue still present with mainline kernel 4.16.18 and nvidia driver 396.51.
Wasn't lucky enough to boot 4.18-rc<latest>

Revision history for this message
penalvch (penalvch) wrote :

whl2:

1) The latest mainline kernel is now 4.18 (not 4.18-rc<latest>). Could you please advise to this?

2) When using 16.04, did you use 3rd party fan software or was it a default install?

3) To further narrow down the regression, could you please test 4.4.5 and respond with the results?

Revision history for this message
whl2 (whale2-box) wrote :

Regarding 2) - no 3d-party fan software on 16.04.
3) - running on 4.4.5-040405-generic now, but not for very long, can't be sure yet if the issue is still present or not.

Revision history for this message
whl2 (whale2-box) wrote :

To my utter disappointment, it happens in both 4.18.5 and 4.4.5. Last one is really weird - I'm 100% sure that it didn't happen in Ubuntu 16.04. Now I'm going to downgrade the whole distro to 16.04 LTS and see.

penalvch (penalvch)
tags: added: kernel-bug-exists-upstream-4.18.5
removed: kernel-bug-exists-upstream-4.17
Revision history for this message
Eugene Kosogin (ekosogin) wrote :

i also has the same issue with the same laptop and kernel

penalvch (penalvch)
tags: added: cosmic
34 comments hidden view all 114 comments
Revision history for this message
whl2 (whale2-box) wrote :

> I think you already have tried to take the fans by fancontrol.

The problem is - after the fans go 100%, they aren't controllable any more.

penalvch (penalvch)
tags: added: disco
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in thermald (Ubuntu):
status: New → Confirmed
Revision history for this message
aarif (aarifkhamdi) wrote :

Does it need to provide more information?

Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
aarif (aarifkhamdi) wrote :

ubuntu 18.04.2 - can't reproduce.
fixed now?

Revision history for this message
aarif (aarifkhamdi) wrote :

oh, sorry.
not fixed.

reproduced after 1 hour

Revision history for this message
Eugene Kosogin (ekosogin) wrote :

ASUS relased new BIOS version - 307 - https://www.asus.com/uk/ROG-Republic-Of-Gamers/ROG-G752VT/HelpDesk_BIOS/
Can someone check it?

Revision history for this message
whl2 (whale2-box) wrote :

> ASUS relased new BIOS version - 307 - https://www.asus.com/uk/ROG-Republic-Of-Gamers/ROG-
> G752VT/HelpDesk_BIOS/
> Can someone check it?

Upgraded, will post the effect on the fan issue.

Revision history for this message
whl2 (whale2-box) wrote :

Well, no. It didn't help. Full throttle after 12 minutes.

penalvch (penalvch)
tags: added: latest-bios-307
removed: latest-bios-304
Revision history for this message
whl2 (whale2-box) wrote :

Ubuntu 19.10, kernel 5.3.0-19-generic - still the same.

penalvch (penalvch)
tags: added: eoan needs-upstream-testing
Revision history for this message
penalvch (penalvch) wrote :

whl2:

1) As noted in #32, one will want to keep testing the latest upstream kernel (now 5.4-rc5) as it is released, or one risks developers not paying attention to your problem.

2) Please provide the output of the following terminal command (not perform an apport-collect):
sudo dmidecode -s bios-version && sudo dmidecode -s bios-release-date

Revision history for this message
whl2 (whale2-box) wrote :

Regarding 2) :

$ sudo dmidecode -s bios-version && sudo dmidecode -s bios-release-date
G752VT.307
04/26/2019

Revision history for this message
whl2 (whale2-box) wrote :

Regarding 1) :

$ uname -a
Linux rog 5.4.0-050400rc5-generic #201910271430 SMP Sun Oct 27 18:33:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

and issue is still present.

Revision history for this message
Eugene Kosogin (ekosogin) wrote :

Does anyone know is it reproducible with ubuntu 20.04?

Revision history for this message
whl2 (whale2-box) wrote :

> Does anyone know is it reproducible with ubuntu 20.04?

I just upgraded to 20.04. Running kernel 5.6.0-050600rc6-generic (tried 5.6.0, but it has wifi problems, so switched back)
The fan issue is still there.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please add 'dyndbg="drivers/pci/* +p"' to the kernel parameter, run "sudo powertop --auto-tune", wait for 5 seconds, then attach dmesg here.

Revision history for this message
Sergey Ivanov (icegood1980) wrote :

This is quite reproducible on all versions of ubuntu up to 20.04.
Please, find the appropriate issue for kernel:
https://bugzilla.kernel.org/show_bug.cgi?id=153281

Revision history for this message
Sergey Ivanov (icegood1980) wrote :
Revision history for this message
Sergey Ivanov (icegood1980) wrote :
Revision history for this message
Sergey Ivanov (icegood1980) wrote :
Revision history for this message
Sergey Ivanov (icegood1980) wrote :
Revision history for this message
Sergey Ivanov (icegood1980) wrote :

Also tried to gather powertop-related issues for further analysis, however, wasn't able to do it completely:

1) Firstly I enabled next startup option:
dyndbg="file ec.c +p ; drivers/pci/* +p". I added ACPI's ec as it might be fault reason as well. At least for some models, EmbeddedControl includes a table for fan temperatures as, for example,
implemented in https://github.com/dominiksalvet/asus-fan-control
Kernel messages could be found in dmesg_startup during the startup of the system.

2) Before gathering data via powertop I tried to emulate maximal speed. For me it happens when I
try to editing something or opening video in the browser. dmesg_change_speed gathered for this case.

3) Finally, I tried to run powertop and... my keyboard was stuck. It happens in 100% of cases. So, what I did I run next cmd:

powertop --auto-tune 2> powertop_start_err > powertop_start_out && sleep 5 && dmesg > dmesg_powertop

rebooted and attached both powertop_start_err and powertop_start_out. No need to attach dmesg_powertop as it always empty. Seems accordingly to
https://askubuntu.com/questions/1131279/powertop-cpufreq-stats-failed-and-devfreq-not-enabled-running-auto-tone
powertop is not an appropriate utility for notebooks. Please, suggest something else.

==========================================================================
My conclusion so far seems it related somehow to ACPI float regions that somehow overlap with other devices.
P.S. find my system-related info for my ASUS G752VL in latest comments of
https://bugzilla.kernel.org/show_bug.cgi?id=153281

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Just dyndbg="file drivers/pci/* +p" please.

Also, please attach the content of `/sys/kernel/debug/pmc_core/package_cstate_show`.

Revision history for this message
Sergey Ivanov (icegood1980) wrote :
Revision history for this message
Sergey Ivanov (icegood1980) wrote :

what about the substitution of powertop?

Revision history for this message
Sergey Ivanov (icegood1980) wrote :

It seems the fan speed became uncontrolled once the temperature is too high. Even though it became lower in near future. see attach

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

PC3 will be really hot, it should normally reach PC8.

Revision history for this message
Max Wittal (madmax43v3r) wrote :

sudo prime-select intel fixed it for my Acer Nitro 5.

Revision history for this message
Sergey Ivanov (icegood1980) wrote :

any progresss? What else information needed to comtinue?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Comment #96? Please don't include EC debug message.

Revision history for this message
Eugene Kosogin (ekosogin) wrote :

Does anyone know is it reproducible with ubuntu 20.10?

Revision history for this message
Sergey Ivanov (icegood1980) wrote :

The issue is still reproducible under 20.10 i.e. under kernel 5.8.0-26-generic

Revision history for this message
Sergey Ivanov (icegood1980) wrote :

OK, issue is still reproducible. I should appologize that i havn't gathered it before.
Anyway, below you can find dmesg's only with dyndbg="file drivers/pci/* +p enabled for two cases:
1) fans are OK - all files with suffix 1
2) fans are at max error state - all files with suffix 2

Description and names of files are the same as in comment #95.

More observations:

1) to set fans into max state i just should input letters from keyboard in any cyrillic language (RU for example) for some time (not more than i min takes for me to reproduce issue). At least i couldn't reproduce bug while inputting in EN layout.

2) No additional dmesg messages are generated in a dyndbg="file drivers/pci/* +p" mode during reproducing of a bug.

Revision history for this message
Sergey Ivanov (icegood1980) wrote :
Revision history for this message
Sergey Ivanov (icegood1980) wrote :

Two more words: bug occurs irrespective of what video drivers present. Now (unlike in comment #95) i reproduced it against nouveau video driver (it was against nvidia before).

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

[ 5.528682] nouveau 0000:01:00.0: DRM: Disabling PCI power management to avoid bug

Please boot with kernel parameter "blacklist=nouveau modprobe.blacklist=nouveau", run `sudo powertop --auto-tune`, and see if it helps.

no longer affects: thermald (Ubuntu)
Revision history for this message
Sergey Ivanov (icegood1980) wrote :

"blacklist=nouveau modprobe.blacklist=nouveau"
Really? With what video driver i supposed to work then?

`sudo powertop --auto-tune`
As i said before and it still happens - my keyboard blocks in 100% after this command. I'm not able to run any other commands anymore.

Here is start-up dmesg log against nvidia drivers and... it still reproducible with the very same steps.

Revision history for this message
Sergey Ivanov (icegood1980) wrote :

Does anyone tested a bug against Language change. It is 100% reproducible in my case:

1) Turn on notebook with normal fan speed
2) Switch to RU language and start pressing buttons under it in some editor.
3) Fan goes maximal speed without any chance to stop it.

For a long time, I tried to avoid inputting in RU and my fans are OK.

Changed in linux (Ubuntu):
status: Incomplete → New
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Eugene Kosogin (ekosogin) wrote :

Does anyone know is it reproducible with ubuntu 22.04?

Revision history for this message
Sergey Ivanov (icegood1980) wrote :

It is still reproducible for me under 22.04.

Steps:
1) switch to RU layout.
2) start to input cyrilic text somewhere. For me it is also enough to simply move mouse ponter and click for 15-20 seconds.

Fan became work on maximals.

Displaying first 40 and last 40 comments. View all 114 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.