Performance workaround for Dell 7390 2-in-1 Ice Lake

Bug #1874933 reported by Srinivas Pandruvada on 2020-04-24
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
thermald (Ubuntu)
High
Colin Ian King
Focal
Undecided
Unassigned

Bug Description

== SRU justification focal ==

As reported here:
https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/intel-linux/1174225-dell-xps-7390-intel-ice-lake-performance-hit-hard-by-a-linux-kernel-regression?view=stream

This primarily impacts "Ubuntu 20.04 LTS (Focal Fossa)." as it switched to 5.4 kernel.
The 5.4 kernel added support for "Processor thermal device", for Ice Lake, which will expose the power tables (via PPCC).

This system default "max RAPL long term power limit" is 15W. But this power table is specifying as 9W. So thermald will limit power to 9W.

If dptfxtract is executed, then power limit will be higher than power up value, but most of the users will use out of the box setup. So this need a workaround.

This workaround will ignore any power limit less than the power up power limit.

This is addressed in thermald 2.1 with two commits:
https://github.com/intel/thermal_daemon/commit/f7db434293387c965e8d9141608f855893740e3a
https://github.com/intel/thermal_daemon/commit/c3461690eafb7304bf59a39fb02955a5154b3861

I know 20.04 LTS uses 1.9.1. I can assist in backport if required.

== Fix ==

Two upstream commits to ease backporting:
   - eeadf7d2efe Restore to min state on deactivation without
     depending on hardware state
   - 9a6dc27879a Clean up the code and documentation

Two upstream commits for the fix:
   - f7db4342933 Avoid polling power in non PPCC case
   - c3461690eaf Ignore invalid PPCC max power limit

== Test case ==

Open two terminals:
-In the first terminal run the following command:
   "sudo turbostat --show PkgWatt"
-In the second terminal run some all CPU busy workload, like stress-ng or mprime

After few seconds turbostat will show that power is capped around 9W.

Install the updated thermald, and repeat.

Now with this fix the power should be capped around 15W.

== Regression Potential ==

This fix involves changing the power limits logic so there is a potential that this may affect change the throttling behaviour of other systems with
poorly defined PPCC power tables because it now ignores the power limits
less than the power up limits. Users will see their machines run faster
and hence active cooling may crank up (e.g. fans) but I think the speed
improvement outweighs the noise factor.

Note that these changes are already in thermald 2.1 that is now in Ubuntu Groovy 20.10.

---------------------------

description: updated

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1874933

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Francois Thirioux (fthx) wrote :

Does 2.1 addresses, in the mean time, the performance bug affecting Thinkpads ?

I am not sure what is the thinkpad issue. Is it something new or old which should have been fixed with dptfxtract and thermald?

Power limits from this platform:
abuser@labuser-XPS-13-7390-2-in-1:/$ grep -r . sys/bus/pci/devices/0000\:00\:04.0/power_limits/*
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_max_uw:9000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_min_uw:2500000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_step_uw:100000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_tmax_us:28000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_tmin_us:24000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_max_uw:15000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_min_uw:6000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_step_uw:100000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_tmax_us:28000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_tmin_us:24000000

You can see 9000000 as max power limit 0.

Please change this to "Confirmed".
As you can see the power limits, it will limit performance from what you can get at 15W.

Anything more is required this to be applied?

affects: linux (Ubuntu) → thermald (Ubuntu)
Mitchell Lomme (mlomme) wrote :

Same issue for me on Dell XPS 9300. CPU is i7-1065G7.

Using 20.04 LTS + Kernel 5.6.11 and thermald 1.9.1-1build1.

root@laptop:/home/root# grep -r . /sys/bus/pci/devices/0000\:00\:04.0/power_limits/*
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_max_uw:9000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_min_uw:2500000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_step_uw:100000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_tmax_us:28000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_tmin_us:24000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_max_uw:15000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_min_uw:6000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_step_uw:100000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_tmax_us:28000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_tmin_us:24000000

Colin Ian King (colin-king) wrote :

Ubuntu 20.10 Groovy will have the latest 2.1 thermald hopefully in the next few hours. I'll backport the fixes and SRU this for focal.

Changed in thermald (Ubuntu):
status: Incomplete → In Progress
importance: Undecided → Medium
importance: Medium → High
assignee: nobody → Colin Ian King (colin-king)
Colin Ian King (colin-king) wrote :

@Srinivas, commit f7db434293387c965e8d9141608f855893740e3a does not apply cleanly, I guess there are some RAPL related patches that are prerequisites. Do you mind assisting on a backport here as I don't want to miss out the important commits that are also required.

description: updated
description: updated
Changed in thermald (Ubuntu):
status: In Progress → Fix Committed
Steve Langasek (vorlon) on 2020-05-22
Changed in thermald (Ubuntu):
status: Fix Committed → Fix Released
Łukasz Zemczak (sil2100) wrote :

For such a big code change I would like to see a clear test case before approving this SRU. Currently the test case is "As reported here: <URL>", which is not very easy-to-follow. Actually, even on the phoronix post, without proper context, I can't really easily find any reproduction steps. How would one formally check if performance workaround really works? Can you outline those in the bug description?

I don't want to reject this upload from the queue as it is good in principle, but for such a big diff I'd like to have a decent test-case, if possible.

Changed in thermald (Ubuntu Focal):
status: New → Incomplete
Jin-Dong Kim (jindong-kim) wrote :

Is this fix going to be released? Or, abandoned? I got a XPS-13-7390-2-in-1, and was waiting for the release of this fix. If necessary, I may want to provide a test.

Robie Basak (racb) wrote :

This is blocked on someone writing a test case as requested by Łukasz in comment 10.

To reproduce this:

Boot Ubuntu 20.04 LTS (Focal Fossa)with 5.4 kernel.

Open two terminals:
-In the first terminal run the following command "turbostat --show PkgWatt"
-In the second terminal run some all CPU busy workload, like stress-ng or mprime

After few seconds turbostat will show that power is capped around 9W.
Now with this fix the power will be capped around 15W.

So you gain performance worth 6W.

What else is needed here?

Chris Halse Rogers (raof) wrote :

I've added the testcase to the bug description. That seems like a sensible enough reproducer.

description: updated
Changed in thermald (Ubuntu Focal):
status: Incomplete → Fix Committed
tags: added: verification-needed verification-needed-focal

Hello Srinivas, or anyone else affected,

Accepted thermald into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/thermald/1.9.1-1ubuntu0.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

The attached file contains two screen shots:
- power_limit_before.png (old version thermald/now 1.9.1-1ubuntu0.1 amd64)
- power_limit_after.png (new version thermald/now 1.9.1-1ubuntu0.2 amd64)

Under "stress" workload, the max power consumed is capped below 9W. With the new version it is maintaining up to 15W. So the proposed version ignored PPCC power limit of 9W.

Used version
#apt list | grep thermald

thermald/now 1.9.1-1ubuntu0.2 amd64 [installed,local]

tags: added: verification-done-focal
removed: verification-needed verification-needed-focal

# dpkg -l thermald | cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-================-============-=========================================
ii thermald 1.9.1-1ubuntu0.2 amd64 Thermal monitoring and controlling daemon

hardware
Handle 0x0100, DMI type 1, 27 bytes
System Information
 Manufacturer: Dell Inc.
 Product Name: XPS 13 7390 2-in-1
 Version: Not Specified

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package thermald - 1.9.1-1ubuntu0.2

---------------
thermald (1.9.1-1ubuntu0.2) focal; urgency=medium

  * Performance workaround for Dell 7390 2-in-1 Ice Lake (LP: #1874933)
   - 5.4 kernel added support for "Processor thermal device" for Ice Lake
     via the PPCC power tables. The power table specified for Dell 7390
     2-in-1 specifies this as 9W so thermald will limit it to this.
     This is a workaround that will ignore power limits less than the
     power up power limit to workaround this throttling. Requires a
     couple of prerequisite patches to apply and final 2 patches for
     the fix.
   - eeadf7d2efe Restore to min state on deactivation without
     depending on hardware state
   - 9a6dc27879a Clean up the code and documentation
   - f7db4342933 Avoid polling power in non PPCC case
   - c3461690eaf Ignore invalid PPCC max power limit

 -- Colin King <email address hidden> Mon, 18 May 2020 09:26:23 +0100

Changed in thermald (Ubuntu Focal):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for thermald has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers