Significantly lower power and thermal limits on ThinkPad T480s (and probably others) than on Windows

Bug #1763144 reported by Julian Andres Klode
58
This bug affects 18 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned
thermald (Ubuntu)
Incomplete
High
koba

Bug Description

A ThinkPad T480s under windows has a power limit of 44W, both short and long term, with a thermal maximum of about 93C or so. Under Linux, the power limits are 44W and 15W (short) or so, and the thermal limit is 80C, causing a significant performance loss.

Looking at MSR and MCHBAR values, we can see that the values are correctly at 44W in the MSR, but the MCHBAR is set to a lower value:

$ sudo rdmsr -a 0x610
42816000dd8160
42816000dd8160
42816000dd8160
42816000dd8160
42816000dd8160
42816000dd8160
42816000dd8160
42816000dd8160
$ sudo /home/jak/Downloads/iotools-1.5/iotools mmio_read64 0xfed159a0
0x0042816000dd8078

Setting the MCHBAR to the same value as the MSR register solves the problem. At some point intel-rapl seems to reduce overall frequency to 600 MHz, though.

The thermal limit is configured in MSR register 0x1a2; rdmsr -f 29:24 -d 0x1a2 returns 20. Setting those bits to 7 increases it, resulting in performance comparative to Windows.

Most of the analysis is based on the analysis in

https://www.reddit.com/r/thinkpad/comments/870u0a/t480s_linux_throttling_bug/

This applies to all bionic kernels I have tested so far, including

Ubuntu 4.15.0-13.14-generic 4.15.10
---
ApportVersion: 2.20.9-0ubuntu4
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/pcmC2D0p: jak 5881 F...m pulseaudio
 /dev/snd/controlC2: jak 5881 F.... pulseaudio
 /dev/snd/controlC1: jak 5881 F.... pulseaudio
 /dev/snd/controlC0: jak 5881 F.... pulseaudio
CurrentDesktop: GNOME
DistroRelease: Ubuntu 18.04
InstallationDate: Installed on 2018-03-14 (28 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20180313)
MachineType: LENOVO 20L8S02D00
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: root=/dev/mapper/ubuntu--vg-root ro rootflags=subvol=@ quiet splash vt.handoff=1
ProcVersionSignature: Ubuntu 4.15.0-13.14-generic 4.15.10
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-13-generic N/A
 linux-backports-modules-4.15.0-13-generic N/A
 linux-firmware 1.173
Tags: bionic
Uname: Linux 4.15.0-13-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip kvm lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 01/22/2018
dmi.bios.vendor: LENOVO
dmi.bios.version: N22ET31W (1.08 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20L8S02D00
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrN22ET31W(1.08):bd01/22/2018:svnLENOVO:pn20L8S02D00:pvrThinkPadT480s:rvnLENOVO:rn20L8S02D00:rvrNotDefined:cvnLENOVO:ct10:cvrNone:
dmi.product.family: ThinkPad T480s
dmi.product.name: 20L8S02D00
dmi.product.version: ThinkPad T480s
dmi.sys.vendor: LENOVO

Revision history for this message
Julian Andres Klode (juliank) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected bionic
description: updated
Revision history for this message
Julian Andres Klode (juliank) wrote : CRDA.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : IwConfig.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : Lspci.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : Lsusb.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : ProcEnviron.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : ProcModules.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : PulseList.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : RfKill.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : UdevDb.txt

apport information

Revision history for this message
Julian Andres Klode (juliank) wrote : WifiSyslog.txt

apport information

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Julian Andres Klode (juliank) wrote : Re: Significantly lower power and thermal limits on T480s (and probably others) than on Windows

An actually useful dmesg log (from journalctl -k)

summary: - Significantly lower power and thermal limits on T480s (and probably
- others) than on Windows
+ Significantly lower power and thermal limits on ThinkPad T480s (and
+ probably others) than on Windows
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.16 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Julian Andres Klode (juliank) wrote :

Note that the thermal limit seems to be rewritten by the EC at random times.

The person originally discovering the issue wrote a userspace daemon to write the proper stuff in memory from time to time - https://github.com/erpalma/lenovo-throttling-fix - but this should best get fixed in the kernel (unless it's a bug in the BIOS/EC firmware, then it's out of our hands).

Revision history for this message
Julian Andres Klode (juliank) wrote :
Revision history for this message
Julian Andres Klode (juliank) wrote :

I have seen this problem on all bionic kernels I have tested. The machine has never seen anything older than bionic. I installed it from a daily end of March.

The mainline kernel does not fix the bug.

tags: added: bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Julian Andres Klode (juliank) wrote :

Sorry, messed up the tag a bit.

tags: added: kernel-bug-exists-upstream
removed: bug-exists-upstream
Revision history for this message
AaronMa (mapengyu) wrote :

Hi,
TDP Intel U CPU family is defined as 25w.
Please refer to:
https://www.intel.com/content/www/us/en/products/processors/core/i5-processors/i5-8250u.html

CPU is not meant to go to 44W that is far away as Intel designed.

To allow the optimal operation and long-term reliability of Intel processor-based
systems, the component temperature specification is the applicable Tjmax which defined in MSR 0x1a2h[23:16];
Tj MAX is factory calibrated and is not user configurable.
TCC Activation Offset is set in TEMPERATURE_TARGET (0x1A2) MSR,bits [29:24];
the offset value will be subtracted from the value TjMAX.
Please try it if you like overclock.

When temperature is retrieved by the processor MSR, it is the instantaneous
temperature of the given DTS.
thermal interrupt in Linux depends on reading MSR to warn the system like:
temperature above threshold
Then EC will make CPU be full speed to cool down system, when CPU temperature is stable like 80C, CPU can work at the defined freq.

This is designed to make system reliable to work at the standard performance no matter with OS.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Sure, it's not designed for 44W by Intel. I only measured 27W maximum when configured to 44W. It's just that the laptop is likely configured for an overall thermal limit of 44W because it can house a MX150 with the same cooling system (dual heat pipe). In any case, on Windows it is not bound by 15W long term, but runs higher, averaging at about 30W from what I read.

The thermal point I wrote the same I think. On Linux the offset is 20, on Windows it seems higher, as it reaches 90C and more.

Revision history for this message
AaronMa (mapengyu) wrote :

Quoted from Lenovo's feedback:

"
Due to "DPTF" function will act on Windows system by BIOS/Driver support, the system will turn to cool mode when keep high temperature several minutes.
But Linux system didn't have driver support DPTF function, it only have TDP limit on platform.
So user should have different between Windows and Linux system as normal behavior.
"

And my test result:
1, When stress testing on Win10, CPU temperature reaches 98C and 3.2G.
But this status can't keep more than several mins, it is too hot for CPU. So the CPU tried to maintain a 1.1G freq to cool itself.
Then the temperature and freq will go into a loop like 98C/3.2G -> 80C/1.1G -> 98C/3.2G.
2, On Ubuntu 16.04 CPU keeps 80C/2.7G when stressing.

Revision history for this message
Martin Dünkelmann (nc-duenkekl3-deactivatedaccount) wrote :

I own a T460P with i7-6700HQ cpu.
The thermal throttling makes my pc lag like hell.
Even on the next day, after a night in standby.

Only a restart can fix it.

Revision history for this message
MMS-Prodeia (mms-prodeia) wrote :

I think, we have a problem here, that needs to be solved on multiple issues.

* there's throttling hitting in to avoid a dGPU reach its fall-off limit which is 76° (T580, i7-8550u, MX150). The only possible way to stay below this thermal limit is to power down the CPU massively, which leads to =>

* the dGPU isn't throttling down itself enough. In tests (glxsphere) my findings are:
dGPU runs at about 1600-1700Mhz, not going down further. I guess this is a result of the gpu itself has a thermal limit of around 95° and gpu's own throttling will kick in much later than 76°, making the gpu not helping in cooling down with slowing down more.
I wasn't able to get a reading of the power consumed by gpu, nvidia-smi does not offer these (nvidia-396). Thus leading to =>

* There's an imbalance between gpu's and cpu's abilities to thermal adapt. As gpu isn't able to, cpu is and does which leads to =>

a) a system running on brakes to manage heat, for to stay below a poosible gpu fall-off (full gpu power and frequency stable along time with cpu going down to only 3-5W
or
b) a system running at the possible performance with using the cpu's full potential, but only being able to use ⅓ of dGPU's capacities

Conclusion from my perspective:
There's a need to balance :-)

* there's a need to have influence on gpu's abilities to throttle earlier (=> nvidia)
* Lenovo-fix being enhanced to provide some kind of profiles, which are dynamically providing usecase profiles (easiest way to test is: set temp in Lenovo-fix to max 75°. As both, cpu & dGPU producing heat and 76° is fall-off, staying below that keeps system from throttling.)

Revision history for this message
Julian Andres Klode (juliank) wrote :

This bug was about different power limits for the non-dGPU version compared to Windows. It turned out that the power limits in Windows are broken, and the ones in Linux are correct, so this bug is in fact Invalid.

For the dGPU, I'm not sure what the intention is. Maybe report a bug against the nvidia driver if you think it's wrong?

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
MMS-Prodeia (mms-prodeia) wrote :

"It turned out that the power limits in Windows are broken, and the ones in Linux are correct, so this bug is in fact Invalid."
Where do I find that?

Revision history for this message
luckyrings (d8f2) wrote :

The described behavior with CPU thermal throtteling was confirmed by Lenovo. The limits are higher as in Windows "DPTF" works by driver support and enables the "desk mode". In Linux the "desk mode" is never reached due to the lack of sensor information and the system operates in the "lap mode" which has much lower trip temp.

See forum threat there:

https://forums.lenovo.com/t5/Other-Linux-Discussions/X1C6-T480s-low-cTDP-and-trip-temperature-in-Linux/td-p/4028489

The patches are not published yet for all platforms. But as T480s, many other models e.g. X1C6 are affected too.

Background info by Lenovo:

https://forums.lenovo.com/lnv/attachments/lnv/Special_Interest_Linux/13642/1/Linux%20Thermal%20throttling.pdf

Technical description by German magazine in German language only: https://www.notebookcheck.com/Lenovo-ThinkPads-haben-mit-CPU-Throttling-unter-Linux-zu-kaempfen-Loesung-in-Arbeit.435573.0.html?fbclid=IwAR0koYya67WfT8SlZmDuNx6F51niWYHzvd4C1MMzZdiyCJLutmC_4p3BJVs

Revision history for this message
Francois Thirioux (fthx) wrote :

I'm affected too (Ubuntu focal, kernel 5.4).

I found a simple workaround here using thermald :
https://forums.lenovo.com/t5/Other-Linux-Discussions/X1C6-T480s-low-cTDP-and-trip-temperature-in-Linux/m-p/4637873#M14378
helped from here :
https://github.com/intel/thermal_daemon/issues/215
It's just a workaround... and I don't really understand how it does make the limits higher, but it's better than nothing.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in thermald (Ubuntu):
status: New → Confirmed
Revision history for this message
Colin Ian King (colin-king) wrote (last edit ):

A recent update of thermald 1.9.1-1ubuntu0.5 in focal contains many of the upstream thermald patches that have been backported to support more modern laptops.

The Intel fixes included are as follows:

   - Disable legacy rapl cdev when rapl-mmio is in use
     This will prevent PL1/PL2 power limit from MSR based rapl, which
     may not be the correct one.
   - Delete all trips from zones before psvt install
     Initially zones has all the trips from sysfs, which may have wrong
     settings. Instead of deleting only for matched psvt zones, delete
     or all zones. In this way only zones which are in PSVT will be
     present.
   - Check for alternate names for B0D4 device
     B0D4 can be named as TCPU or B0D4. So search for both names
     if failed to find one.
   - Fix error for condition names
     The current code caps the max name as the last condition name,
     which is "Power_Slider". So any condition more than 56 will be
     printing error, with "Power_Slider" as condition name. For example
     for condition = 57: Unsupported condition 57 (Power_slider)
   - Set a very high RAPL MSR PL1 with --adaptive
     After upgrading Dell Latitude 5420, again noticed performance
     degradation.
     The PPCC power limit for MSR RAPL PL1 is reduced to 15W. Even though
     we disable MSR RAPL with --adaptive option, it is not getting
     disabled. So MSR RAPL limits still playing role.
     To fix that set a very high MSR RAPL PL1 limit so that it never
     causes throttling. All throttling with --adaptive option is done
     using RAPL-MMIO.
   - Special case for default PSVT
     When there are no adaptive tables and only one default PSVT table
     is present with just one entry with MAX type. Add one additional
     entry as done for non default case.
   - Increase power limit for disabled RAPL-MMIO
     Increase 100W to 200W as some desktop platform already have limit
     more than 100W.
   - Use Adaptive PPCC limits for RAPL MMIO
     Set the correct device name as RAPL-MSR so that RAPL-MMIO can
     also set the correct default power limits.

Can folk check if this helps with the issue?

Changed in thermald (Ubuntu):
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
status: Confirmed → Incomplete
Revision history for this message
Colin Ian King (colin-king) wrote :

Hi, I updated this bug a couple of months ago to see if recent backport fixes addressed the issue. If this has helped, please add your notes to the bug report so we can close the issue.

Changed in thermald (Ubuntu):
assignee: Colin Ian King (colin-king) → Ubuntu Kernel Team (ubuntu-kernel-team)
Changed in thermald (Ubuntu):
assignee: Ubuntu Kernel Team (ubuntu-kernel-team) → koba (kobako)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.