Frequently getting thermal warnings and cpu throttling messages in syslog

Bug #1851749 reported by Brad Figg
74
This bug affects 15 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Medium
Ubuntu Kernel Team

Bug Description

Nov 6 11:34:26 fog kernel: [1129655.443564] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 50300)
Nov 6 11:34:26 fog kernel: [1129655.443565] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 50300)
Nov 6 11:34:26 fog kernel: [1129655.443567] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 58637)
Nov 6 11:34:26 fog kernel: [1129655.443568] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 58637)
Nov 6 11:34:26 fog kernel: [1129655.443569] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 58637)
Nov 6 11:34:26 fog kernel: [1129655.443570] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 58637)
Nov 6 11:34:26 fog kernel: [1129655.446528] mce: CPU2: Core temperature/speed normal
Nov 6 11:34:26 fog kernel: [1129655.446529] mce: CPU0: Core temperature/speed normal
Nov 6 11:34:26 fog kernel: [1129655.446530] mce: CPU1: Package temperature/speed normal
Nov 6 11:34:26 fog kernel: [1129655.446531] mce: CPU3: Package temperature/speed normal
Nov 6 11:34:26 fog kernel: [1129655.446531] mce: CPU0: Package temperature/speed normal
Nov 6 11:34:26 fog kernel: [1129655.446532] mce: CPU2: Package temperature/speed normal
Nov 6 11:40:35 fog kernel: [1130024.427390] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 50316)
Nov 6 11:40:35 fog kernel: [1130024.427391] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 50316)
Nov 6 11:40:35 fog kernel: [1130024.427392] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 58655)
Nov 6 11:40:35 fog kernel: [1130024.427394] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 58655)
Nov 6 11:40:35 fog kernel: [1130024.427424] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 58655)
Nov 6 11:40:35 fog kernel: [1130024.427424] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 58655)
Nov 6 11:40:35 fog kernel: [1130024.434426] mce: CPU0: Core temperature/speed normal
Nov 6 11:40:35 fog kernel: [1130024.434427] mce: CPU3: Package temperature/speed normal
Nov 6 11:40:35 fog kernel: [1130024.434428] mce: CPU2: Core temperature/speed normal
Nov 6 11:40:35 fog kernel: [1130024.434428] mce: CPU1: Package temperature/speed normal
Nov 6 11:40:35 fog kernel: [1130024.434429] mce: CPU2: Package temperature/speed normal
Nov 6 11:40:35 fog kernel: [1130024.434430] mce: CPU0: Package temperature/speed normal
Nov 6 11:45:48 fog kernel: [1130337.433923] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 50358)
Nov 6 11:45:48 fog kernel: [1130337.433925] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 58701)
Nov 6 11:45:48 fog kernel: [1130337.433926] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 50358)
Nov 6 11:45:48 fog kernel: [1130337.433927] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 58701)
Nov 6 11:45:48 fog kernel: [1130337.433928] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 58701)
Nov 6 11:45:48 fog kernel: [1130337.433932] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 58701)
Nov 6 11:45:48 fog kernel: [1130337.434926] mce: CPU0: Core temperature/speed normal
Nov 6 11:45:48 fog kernel: [1130337.434926] mce: CPU2: Core temperature/speed normal
Nov 6 11:45:48 fog kernel: [1130337.434927] mce: CPU3: Package temperature/speed normal
Nov 6 11:45:48 fog kernel: [1130337.434928] mce: CPU1: Package temperature/speed normal
Nov 6 11:45:48 fog kernel: [1130337.434928] mce: CPU2: Package temperature/speed normal
Nov 6 11:45:48 fog kernel: [1130337.434930] mce: CPU0: Package temperature/speed normal
Nov 6 11:47:51 fog kernel: [1130459.924675] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 8218)
Nov 6 11:47:51 fog kernel: [1130459.924676] mce: CPU3: Core temperature above threshold, cpu clock throttled (total events = 8218)
Nov 6 11:47:51 fog kernel: [1130459.934691] mce: CPU1: Core temperature/speed normal
Nov 6 11:47:51 fog kernel: [1130459.934691] mce: CPU3: Core temperature/speed normal
Nov 6 11:50:54 fog kernel: [1130643.478700] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 50404)
Nov 6 11:50:54 fog kernel: [1130643.478701] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 50404)
Nov 6 11:50:54 fog kernel: [1130643.478702] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 58764)
Nov 6 11:50:54 fog kernel: [1130643.478703] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 58764)
Nov 6 11:50:54 fog kernel: [1130643.478705] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 58764)
Nov 6 11:50:54 fog kernel: [1130643.478708] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 58764)
Nov 6 11:50:54 fog kernel: [1130643.480693] mce: CPU1: Package temperature/speed normal
Nov 6 11:50:54 fog kernel: [1130643.480694] mce: CPU0: Core temperature/speed normal
Nov 6 11:50:54 fog kernel: [1130643.480694] mce: CPU3: Package temperature/speed normal
Nov 6 11:50:54 fog kernel: [1130643.480695] mce: CPU2: Core temperature/speed normal
Nov 6 11:50:54 fog kernel: [1130643.480696] mce: CPU0: Package temperature/speed normal
Nov 6 11:50:54 fog kernel: [1130643.480697] mce: CPU2: Package temperature/speed normal
Nov 6 11:59:14 fog kernel: [1131143.379551] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 50440)
Nov 6 11:59:14 fog kernel: [1131143.379552] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 50440)
Nov 6 11:59:14 fog kernel: [1131143.379554] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 58811)
Nov 6 11:59:14 fog kernel: [1131143.379554] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 58811)
Nov 6 11:59:14 fog kernel: [1131143.379556] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 58811)
Nov 6 11:59:14 fog kernel: [1131143.379557] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 58811)
Nov 6 11:59:14 fog kernel: [1131143.383513] mce: CPU0: Core temperature/speed normal
Nov 6 11:59:14 fog kernel: [1131143.383514] mce: CPU3: Package temperature/speed normal
Nov 6 11:59:14 fog kernel: [1131143.383515] mce: CPU2: Core temperature/speed normal
Nov 6 11:59:14 fog kernel: [1131143.383515] mce: CPU1: Package temperature/speed normal
Nov 6 11:59:14 fog kernel: [1131143.383516] mce: CPU2: Package temperature/speed normal
Nov 6 11:59:14 fog kernel: [1131143.383517] mce: CPU0: Package temperature/speed normal
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu8.2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: bradf 1575 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 19.10
InstallationDate: Installed on 2019-10-02 (36 days ago)
InstallationMedia: Ubuntu 19.10 "Eoan Ermine" - Beta amd64 (20190926.1)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 004: ID 138a:0097 Validity Sensors, Inc.
 Bus 001 Device 003: ID 04f2:b5ce Chicony Electronics Co., Ltd Integrated Camera
 Bus 001 Device 002: ID 8087:0a2b Intel Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: LENOVO 20HRCTO1WW
Package: linux (not installed)
ProcEnviron:
 TERM=tmux-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-19-generic root=UUID=3963ec22-812d-4d52-8df4-7a8418751e15 ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 5.3.0-19.20-generic 5.3.1
RelatedPackageVersions:
 linux-restricted-modules-5.3.0-19-generic N/A
 linux-backports-modules-5.3.0-19-generic N/A
 linux-firmware 1.183.1
Tags: eoan
Uname: Linux 5.3.0-19-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 07/04/2017
dmi.bios.vendor: LENOVO
dmi.bios.version: N1MET37W (1.22 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20HRCTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrN1MET37W(1.22):bd07/04/2017:svnLENOVO:pn20HRCTO1WW:pvrThinkPadX1Carbon5th:rvnLENOVO:rn20HRCTO1WW:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.family: ThinkPad X1 Carbon 5th
dmi.product.name: 20HRCTO1WW
dmi.product.sku: LENOVO_MT_20HR_BU_Think_FM_ThinkPad X1 Carbon 5th
dmi.product.version: ThinkPad X1 Carbon 5th
dmi.sys.vendor: LENOVO

Revision history for this message
Brad Figg (brad-figg) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected eoan
description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : CRDA.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : IwConfig.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : Lspci.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : ProcModules.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : PulseList.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : RfKill.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : UdevDb.txt

apport information

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
status: Confirmed → In Progress
Revision history for this message
David Britton (dpb) wrote :

Glad to see this filed, I have the same issue. Note there is a good discussion of the issue here:

https://www.reddit.com/r/thinkpad/comments/870u0a/t480s_linux_throttling_bug/

And the User made a github repo to address the problem in testing:

https://github.com/erpalma/throttled

I can clearly see the effect of the change using s-tui and monitoring cpu freq. Before the change, the throttle will limit me to around 1.5-1.7 GHZ under load, after the change, it's at 2.7GHZ consistent without the constant throttle messages in syslog/journalctl. I don't have enough depth in this area to understand the technical tradeoffs, but I did want to point out the research that I have used thus far.

Revision history for this message
Alex Murray (alexmurray) wrote :

Lenovo upstream are aware of the issue and are looking to resolve it in the BIOS for various models -

https://forums.lenovo.com/t5/Other-Linux-Discussions/X1C6-T480s-low-cTDP-and-trip-temperature-in-Linux/m-p/4513821/highlight/true#M13563

https://forums.lenovo.com/t5/Other-Linux-Discussions/X1C6-T480s-low-cTDP-and-trip-temperature-in-Linux/m-p/4534109/highlight/true#M13637

throttled is not a great solution since this requires writing to MSRs which means users have to disable Secure Boot to use it - instead it is best to wait for the official solution from Lenovo IMO.

Revision history for this message
gene_wood (gene.wood) wrote :
Revision history for this message
Willy Nolan (optonox) wrote :

Sorry there is a *lot* of information here. I have this same issue, what are we supposed to do as users?

Revision history for this message
Dave Chiluk (chiluk) wrote :

This is seems to be wider than simply Lenovo machines as my Dell Precision 5540 is hitting this as well.

Jeff Lane  (bladernr)
tags: added: ubuntu-certified
Revision history for this message
Colin Ian King (colin-king) wrote :

Anyone affected by this bug, can you run the following commands:

sudo dmidecode -H 11
sudo modprobe msr
sudo rdmsr -f 29:24 -d 0x1a2

The rdmsr command reads the TCC activation offset in the MSR_TEMPERATURE_TARGET register from bits 24..29. This offset value should be quite small, but some firmware may have set this to a relatively large value causing the thermal throttling to be kicking in prematurely.

Revision history for this message
Colin Ian King (colin-king) wrote :

In case the MSR is being updated by firmware or software, can the rdmsr command also be run when the machines are being run hot and one sees the "mce: CPU*: Package temperature above threshold" messages.

Changed in linux (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
Ian P. Christian (pookey) wrote :

This is the output I'm getting. I'm using a ThinkPad P1

# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.1.1 present.
Table at 0x40DBB000.

Handle 0x000B, DMI type 221, 12 bytes
OEM-specific Type
 Header and Data:
  DD 0C 0B 00 01 01 00 04 00 00 00 00
 Strings:
  FSP Binary Version

20

Revision history for this message
Brian Jelinek (brian1864) wrote :

Output on Lenovo Y510P

sudo dmidecode -H 11
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
69 structures occupying 2938 bytes.
Table at 0x000E6DE0.

Handle 0x000B, DMI type 7, 19 bytes
Cache Information
 Socket Designation: L1 Cache
 Configuration: Enabled, Not Socketed, Level 1
 Operational Mode: Write Back
 Location: Internal
 Installed Size: 32 kB
 Maximum Size: 32 kB
 Supported SRAM Types:
  Synchronous
 Installed SRAM Type: Synchronous
 Speed: Unknown
 Error Correction Type: Single-bit ECC
 System Type: Instruction
 Associativity: 8-way Set-associative

sudo modprobe msr
sudo rdmsr -f 29:24 -d 0x1a2
1

Revision history for this message
Luigi Calligaris (luigicalligaris) wrote :
Download full text (4.9 KiB)

I am affected as well on a Dell P74G Inspiron 7460 with an i7-7500U CPU @ 2.70GHz and an integrated GPU + an Nvidia GPU using the NVidia drivers.

According from information I found here

https://icecat.biz/p/dell/w56752561pth-gld/inspiron-notebooks-7460-33552288.html

the TDPs of my CPU are:

Thermal Design Power (TDP) 15 W
Configurable TDP-up frequency 2.9 GHz
Configurable TDP-up 25 W
Configurable TDP-down 7.5 W

CPU version as read from /proc/cpuinfo:

vendor_id : GenuineIntel
cpu family : 6
model : 142
model name : Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
stepping : 9
microcode : 0xde

From lspci, detail of the GPUs:

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)
01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)

My Kernel:

uname -a
Linux [redacted] 5.4.0-66-generic #74-Ubuntu SMP Wed Jan 27 22:54:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Output of the commands recommended above:

dmidecode looks odd returning info about WiFi:

sudo dmidecode -H 11
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.
85 structures occupying 5497 bytes.
Table at 0x000E9CA0.

Handle 0x000B, DMI type 8, 9 bytes
Port Connector Information
        Internal Reference Designator: JNGFF1 - WLAN/BT/Wigig CONN
        Internal Connector Type: None
        External Reference Designator: Not Specified
        External Connector Type: None
        Port Type: Other

sudo modprobe msr
sudo rdmsr -f 29:24 -d 0x1a2
2

Reproducing the issue:

stress -c 8
stress: info: [48012] dispatching hogs: 8 cpu, 0 io, 0 vm, 0 hdd

Dmesg output:

[17674.962910] mce: CPU3: Core temperature above threshold, cpu clock throttled (total events = 3987)
[17674.962910] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 3986)
[17674.962912] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 7491)
[17674.962913] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 7491)
[17674.962914] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 7487)
[17674.962915] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 7491)
[17674.966955] mce: CPU1: Core temperature/speed normal
[17674.966956] mce: CPU3: Core temperature/speed normal
[17674.966957] mce: CPU2: Package temperature/speed normal
[17674.966958] mce: CPU0: Package temperature/speed normal
[17674.966958] mce: CPU3: Package temperature/speed normal
[17674.966960] mce: CPU1: Package temperature/speed normal

Turbostat line being used (NOTE: 100 milliseconds interval):
turbostat --quiet --Summary --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,CorWatt,GFXWatt,Time_Of_Day_Seconds --interval 0.1

Output of turbostat across the beginning of stress:

Time_Of_Day_Seconds Busy% Bzy_MHz PkgTmp PkgWatt CorWatt GFXWatt
1622157712.441769 9.23 795 51 1.75 0.20 0.11
1622157712.542624 10.77 711 52 1.84 0.21 0.12
1622157712.643427 8.98 708 51 1.80 0.17 0.07
1622157712.744178 12.97 746 51 ...

Read more...

Changed in linux (Ubuntu):
assignee: Colin Ian King (colin-king) → Ubuntu Kernel Team (ubuntu-kernel-team)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.