Kernel wrong temperature reporting

Bug #1781924 reported by benjamin button
38
This bug affects 8 people
Affects Status Importance Assigned to Milestone
OEM Priority Project
Won't Fix
High
Alex Tu
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I'm having some thermal trouble since I've started using kernel >4.0.x with my new laptop. According to psensor and sensord, CPU temperature jumps above the 90C, stays there for around 1second, and goes back to 50C. All transitions happen in around 100ms. I've cleaned and re-applied thermal paste and checked the connections twice. More interestingly, this doesn't happen when the CPU is under heavy-load, but when it is waiting idly. Sometimes sensord reports the CPU temp has gone over the critical threshold of the CPU and device shuts down itself immediately.

There is also Windows 10 installed as a dual boot and didn't see this problem on it when idle or under heavy load.

Dmesg output is full of following warnings;

[ 1282.296247] CPU0: Core temperature above threshold, cpu clock throttled (total events = 109)[ 1282.296267] CPU4: Core temperature above threshold, cpu clock throttled (total events = 109)
[ 1282.296269] CPU5: Package temperature above threshold, cpu clock throttled (total events = 220)
[ 1282.296269] CPU1: Package temperature above threshold, cpu clock throttled (total events = 220)
[ 1282.296271] CPU4: Package temperature above threshold, cpu clock throttled (total events = 220)
[ 1282.296272] CPU6: Package temperature above threshold, cpu clock throttled (total events = 220)
[ 1282.296273] CPU2: Package temperature above threshold, cpu clock throttled (total events = 220)
[ 1282.296274] CPU7: Package temperature above threshold, cpu clock throttled (total events = 220)
[ 1282.296275] CPU3: Package temperature above threshold, cpu clock throttled (total events = 220)
[ 1282.296281] CPU0: Package temperature above threshold, cpu clock throttled (total events = 220)
[ 1282.297226] CPU4: Core temperature/speed normal
[ 1282.297227] CPU5: Package temperature/speed normal
[ 1282.297228] CPU0: Core temperature/speed normal
[ 1282.297229] CPU1: Package temperature/speed normal
[ 1282.297229] CPU0: Package temperature/speed normal
[ 1282.297230] CPU4: Package temperature/speed normal
[ 1282.297233] CPU3: Package temperature/speed normal
[ 1282.297233] CPU7: Package temperature/speed normal
[ 1282.297269] CPU2: Package temperature/speed normal
[ 1282.297269] CPU6: Package temperature/speed normal

Because of wrong temperature reporting, kernel throttles the CPU and reduces overall performance, which results in frustrating user experience.

I've tried followings and find a temporary solution;
 - intel_pstate enabled, turbo-boost enabled > problem exists
 - intel_pstate enabled, turbo-boost disabled > problem frequency reduced
 - intel_pstate disabled, governor set to anythig other than powersave > problem exists
 - intel_pstate disabled, governor set to powersave > problem frequency reduced

I'm suspicious about a kernel bug such as;
- https://bugzilla.redhat.com/show_bug.cgi?id=924570
- https://bugzilla.redhat.com/show_bug.cgi?id=1317190
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/pcmC0D7p: burak 1547 F...m pulseaudio
 /dev/snd/controlC0: burak 1547 F.... pulseaudio
CurrentDesktop: X-Cinnamon
DistroRelease: Linux Mint 19
HibernationDevice: RESUME=UUID=60a9fd7e-ec69-448a-b19a-93efaa6035ed
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 004: ID 04f3:0903 Elan Microelectronics Corp.
 Bus 001 Device 003: ID 8087:0a2b Intel Corp.
 Bus 001 Device 002: ID 045e:07a5 Microsoft Corp. Wireless Receiver 1461C
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: ASUSTeK COMPUTER INC. UX430UNR
NonfreeKernelModules: nvidia_modeset nvidia
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-24-generic root=UUID=692de7fe-530f-4849-89ec-a5b4413af7dc ro quiet splash intel_pstate=disable vt.handoff=1
ProcVersionSignature: Ubuntu 4.15.0-24.26-generic 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-24-generic N/A
 linux-backports-modules-4.15.0-24-generic N/A
 linux-firmware 1.173.1
Tags: tara
Uname: Linux 4.15.0-24-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 11/28/2017
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: UX430UNR.302
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: UX430UNR
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK COMPUTER INC.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrUX430UNR.302:bd11/28/2017:svnASUSTeKCOMPUTERINC.:pnUX430UNR:pvr1.0:rvnASUSTeKCOMPUTERINC.:rnUX430UNR:rvr1.0:cvnASUSTeKCOMPUTERINC.:ct10:cvr1.0:
dmi.product.family: UX
dmi.product.name: UX430UNR
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK COMPUTER INC.

Revision history for this message
benjamin button (canavaroski90) wrote :

Added lspci_vnvn.log

Revision history for this message
benjamin button (canavaroski90) wrote :

Added lsb_release.log

Revision history for this message
benjamin button (canavaroski90) wrote :

Added linux.log

Revision history for this message
benjamin button (canavaroski90) wrote :

Added version.log

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1781924

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
benjamin button (canavaroski90) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected tara
description: updated
Revision history for this message
benjamin button (canavaroski90) wrote : CRDA.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : CurrentDmesg.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : IwConfig.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : Lspci.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : ProcEnviron.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : ProcInterrupts.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : ProcModules.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : PulseList.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : RfKill.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : UdevDb.txt

apport information

Revision history for this message
benjamin button (canavaroski90) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
benjamin button (canavaroski90) wrote :

Seems like shutdowns disappeared after disabling thermald service. Could it be a broken package problem?

Package: thermald
Architecture: amd64
Version: 1.7.0-5ubuntu1
Priority: optional
Section: admin
Origin: Ubuntu
Maintainer: Colin King <email address hidden>
Original-Maintainer: Colin King <email address hidden>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 583
Depends: libc6 (>= 2.16), libdbus-1-3 (>= 1.9.14), libdbus-glib-1-2 (>= 0.88), libgcc1 (>= 1:3.0), libglib2.0-0 (>= 2.37.3), libstdc++6 (>= 5.2), libxml2 (>= 2.7.4), lsb-base (>= 3.0-6)
Filename: pool/main/t/thermald/thermald_1.7.0-5ubuntu1_amd64.deb
Size: 185220
MD5sum: 842c83b5d474d8366bdc4ffdf201db21
SHA1: 5cdfc914bc11e37bdd47620874e015fb10387ab8
SHA256: 9734e7700262f4b9a03dfd3e0f59044871228339452edb7aa11afd6cfc9743c1
Homepage: https://github.com/01org/thermal_daemon
Description-en: Thermal monitoring and controlling daemon
 Thermal Daemon is a Linux daemon for monitoring and
 controlling platform temperatures. Once the system
 temperature reaches a certain threshold, the Linux daemon
 activates various cooling methods to try to cool the system.
Description-md5: b3957326598bfd50927c3294bfbabcc9
Task: ubuntu-budgie-desktop
Supported: 5y

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc5

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
benjamin button (canavaroski90) wrote :

I tried with 4.17 and 4.18-rcx kernels and problem still persist. Also suspend mode is not working correctly and consuming huge battery in suspend mode. If I put the device into suspend mode without setting CPU governor to "powersave", CPU burns like hell when suspended.

tags: added: kernel-bug-exists-upstream
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

So it's a regression? Do you remember the kernel version that doesn't have this issue?

Revision history for this message
benjamin button (canavaroski90) wrote :

As I remember, I didn't have that problem in Ubuntu 16.04 but don't remember which kernel it was. I was updating the OS whenever there are new updates, so most probably I've used numbers of different kernel versions. This device is my daily laptop and it's not possible to install and test Ubuntu 16.04. But I'm a frequent dmesg checker person and I'm sure I didn't notice that kind of thermal throttling issues on 16.04. If you can suggest a way to find and test the kernels in 16.04, I'll gladly do it. Currently the dmesg is full of throttling messages.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Anner Visser (anner-) wrote :

Also affects me, seems to be a particular problem on Lenovo laptops (specifically the T480)

Revision history for this message
StefanF (stefan) wrote :

Same problem here with an HP Omen Laptop.

Whenever I run the "sensors" command under high load, all CPU cores are below 90°C. I never saw the critical value of 100°C.

Revision history for this message
WinEunuchs2Unix (ricklee518) wrote :

This bug has been reported a few times in Ask Ubuntu. Just today (July 3rd, 2019) I was going back through `dmesg` and by chance I noticed on June 22nd, 2019 the errors appear about a dozen times.

I've been on the same kernel version 4.14.114 LTS for a couple/few months now but I did flirt with version 4.14.120 LTS briefly and it is possible I was using that kernel version that day. I would have to check journalctl to find out for sure if someone deems it relevant.

I don't view this bogus throttling error as a problem more of a curiosity if anything.

Revision history for this message
Alex Tu (alextu) wrote :

I got a similar problem on XPS-13-9380 on Bionic.
kernel: 4.15.0-1045-oem
BIOS: 1.3.2

When the kernel message show overheating the sensor shows not that hot more than 100°C

$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +45.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +44.0°C (high = +100.0°C, crit = +100.0°C)
Core 1: +45.0°C (high = +100.0°C, crit = +100.0°C)
Core 2: +43.0°C (high = +100.0°C, crit = +100.0°C)
Core 3: +42.0°C (high = +100.0°C, crit = +100.0°C)

acpitz-virtual-0
Adapter: Virtual device
temp1: +25.0°C (crit = +107.0°C)

pch_cannonlake-virtual-0
Adapter: Virtual device
temp1: +44.0°C

Revision history for this message
Alex Tu (alextu) wrote : apport information

ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.7
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/pcmC0D0p: alextu 2205 F...m pulseaudio
 /dev/snd/controlC0: alextu 2205 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 18.04
InstallationDate: Installed on 2019-07-20 (2 days ago)
InstallationMedia: Ubuntu 18.04.2 LTS "Bionic Beaver" - Release amd64 (20190210)
MachineType: Dell Inc. XPS 13 9380
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-1045-oem root=UUID=77efb06b-1ce6-4749-a131-67f57aa8b21b ro quiet splash vt.handoff=1
ProcVersionSignature: Ubuntu 4.15.0-1045.50-oem 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-1045-oem N/A
 linux-backports-modules-4.15.0-1045-oem N/A
 linux-firmware 1.173.9
Tags: bionic
Uname: Linux 4.15.0-1045-oem x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: sudo
WifiSyslog:

_MarkForUpload: True
dmi.bios.date: 03/29/2019
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.3.2
dmi.board.name: 0KTW76
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.3.2:bd03/29/2019:svnDellInc.:pnXPS139380:pvr:rvnDellInc.:rn0KTW76:rvrA00:cvnDellInc.:ct10:cvr:
dmi.product.family: XPS
dmi.product.name: XPS 13 9380
dmi.sys.vendor: Dell Inc.

tags: added: bionic
Revision history for this message
Alex Tu (alextu) wrote : AlsaInfo.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : CRDA.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : IwConfig.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : Lspci.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : Lsusb.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : ProcEnviron.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : ProcModules.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : PulseList.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : RfKill.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : UdevDb.txt

apport information

Changed in oem-priority:
importance: Undecided → High
assignee: nobody → Alex Tu (alextu)
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Alex,

1) This warning should be harmless. Does it affect actual usage?
2) What's the temperature reports under thermal_zones sysfs?

Revision history for this message
Alex Tu (alextu) wrote : Re: [Bug 1781924] Re: Kernel wrong temperature reporting
Download full text (6.2 KiB)

1) Yup, system not hangs up, but I feel the system response slower when it
happens. E.g. keyboard mouse not work for seconds. I can check if the
cpufreq be changed next time I get machine.

2) Not saw the over temperature by command sensors. Isn't it reported from
sysfs? If not, I can check it next time I get machine.

Kai-Heng Feng <email address hidden> 於 2019年7月24日 週三 下午4:11 寫道:

> Alex,
>
> 1) This warning should be harmless. Does it affect actual usage?
> 2) What's the temperature reports under thermal_zones sysfs?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1781924
>
> Title:
> Kernel wrong temperature reporting
>
> Status in OEM Priority Project:
> New
> Status in linux package in Ubuntu:
> Expired
>
> Bug description:
> I'm having some thermal trouble since I've started using kernel >4.0.x
> with my new laptop. According to psensor and sensord, CPU temperature
> jumps above the 90C, stays there for around 1second, and goes back to
> 50C. All transitions happen in around 100ms. I've cleaned and re-
> applied thermal paste and checked the connections twice. More
> interestingly, this doesn't happen when the CPU is under heavy-load,
> but when it is waiting idly. Sometimes sensord reports the CPU temp
> has gone over the critical threshold of the CPU and device shuts down
> itself immediately.
>
> There is also Windows 10 installed as a dual boot and didn't see this
> problem on it when idle or under heavy load.
>
> Dmesg output is full of following warnings;
>
> [ 1282.296247] CPU0: Core temperature above threshold, cpu clock
> throttled (total events = 109)[ 1282.296267] CPU4: Core temperature above
> threshold, cpu clock throttled (total events = 109)
> [ 1282.296269] CPU5: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296269] CPU1: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296271] CPU4: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296272] CPU6: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296273] CPU2: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296274] CPU7: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296275] CPU3: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296281] CPU0: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.297226] CPU4: Core temperature/speed normal
> [ 1282.297227] CPU5: Package temperature/speed normal
> [ 1282.297228] CPU0: Core temperature/speed normal
> [ 1282.297229] CPU1: Package temperature/speed normal
> [ 1282.297229] CPU0: Package temperature/speed normal
> [ 1282.297230] CPU4: Package temperature/speed normal
> [ 1282.297233] CPU3: Package temperature/speed normal
> [ 1282.297233] CPU7: Package temperature/speed normal
> [ 1282.297269] CPU2: Package temperature/speed normal
> [ 1282.297269] CPU6: Package...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

> 2) Not saw the over temperature by command sensors. Isn't it reported from
sysfs? If not, I can check it next time I get machine.

It reads from hwmon instead of thermal zones, which should be used on modern Intel platforms.

Revision history for this message
Hamish Marson (travellingkiwi) wrote :

Also affects my brand new Thinkpad X1 Extreme Gen 2.

The errors pop up when the CPU is stone cold (i.e. upon waking after an overnight sleep in temps <10C)

Revision history for this message
benjamin button (canavaroski90) wrote :
Download full text (6.0 KiB)

I've started to be suspicious about the hardware issue. Not sure if the
sensors are not functioning properly or readings are erroneous.

On Sun, Oct 20, 2019, 14:40 Hamish Marson <email address hidden>
wrote:

> Also affects my brand new Thinkpad X1 Extreme Gen 2.
>
> The errors pop up when the CPU is stone cold (i.e. upon waking after an
> overnight sleep in temps <10C)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1781924
>
> Title:
> Kernel wrong temperature reporting
>
> Status in OEM Priority Project:
> New
> Status in linux package in Ubuntu:
> Expired
>
> Bug description:
> I'm having some thermal trouble since I've started using kernel >4.0.x
> with my new laptop. According to psensor and sensord, CPU temperature
> jumps above the 90C, stays there for around 1second, and goes back to
> 50C. All transitions happen in around 100ms. I've cleaned and re-
> applied thermal paste and checked the connections twice. More
> interestingly, this doesn't happen when the CPU is under heavy-load,
> but when it is waiting idly. Sometimes sensord reports the CPU temp
> has gone over the critical threshold of the CPU and device shuts down
> itself immediately.
>
> There is also Windows 10 installed as a dual boot and didn't see this
> problem on it when idle or under heavy load.
>
> Dmesg output is full of following warnings;
>
> [ 1282.296247] CPU0: Core temperature above threshold, cpu clock
> throttled (total events = 109)[ 1282.296267] CPU4: Core temperature above
> threshold, cpu clock throttled (total events = 109)
> [ 1282.296269] CPU5: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296269] CPU1: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296271] CPU4: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296272] CPU6: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296273] CPU2: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296274] CPU7: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296275] CPU3: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.296281] CPU0: Package temperature above threshold, cpu clock
> throttled (total events = 220)
> [ 1282.297226] CPU4: Core temperature/speed normal
> [ 1282.297227] CPU5: Package temperature/speed normal
> [ 1282.297228] CPU0: Core temperature/speed normal
> [ 1282.297229] CPU1: Package temperature/speed normal
> [ 1282.297229] CPU0: Package temperature/speed normal
> [ 1282.297230] CPU4: Package temperature/speed normal
> [ 1282.297233] CPU3: Package temperature/speed normal
> [ 1282.297233] CPU7: Package temperature/speed normal
> [ 1282.297269] CPU2: Package temperature/speed normal
> [ 1282.297269] CPU6: Package temperature/speed normal
>
> Because of wrong temperature reporting, kernel throttles the CPU and
> reduces overall performance, which results in frust...

Read more...

Rex Tsai (chihchun)
tags: added: oem-priority
Rex Tsai (chihchun)
Changed in oem-priority:
status: New → Won't Fix
To post a comment you must log in.