pull in latest thermald bug fixes into thermald
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
thermald (Ubuntu) |
Fix Released
|
Medium
|
Colin Ian King | ||
Focal |
Fix Released
|
Undecided
|
Unassigned | ||
Hirsute |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
== SRU Justification [ HIRSUTE ] ==
The upstream thermald 2.4.6 has been released with some more bug fixes that are pertinent to H/W available in older releases such as Hirsute. These fix a variety of issues found on H/W in the field and such as over-throttling, handling alternate ACPI object names for B0D4, handling trip zones which may have wrong settings and disabling the legacy rapl cdev when rapl-mmio is available.
== The fixes ==
Pull in these fixes:
From 2.4.6:
commit 273a53a11da2a73
Author: Srinivas Pandruvada <email address hidden>
Date: Mon Jun 7 16:47:47 2021 -0700
Use Adaptive PPCC limits for RAPL MMIO
Set the correct device name as RAPL-MSR so that RAPL-MMIO can
also set the correct default power limits.
commit 2dd67300448fa4a
Author: Srinivas Pandruvada <email address hidden>
Date: Mon Jun 7 16:45:14 2021 -0700
Increase power limit for disabled RAPL-MMIO
Increase 100W to 200W as some desktop platform already have limit
more than 100W.
commit 3de1004a49d0d15
Author: Srinivas Pandruvada <email address hidden>
Date: Sun Jun 6 19:19:15 2021 -0700
Special case for default PSVT
When there are no adaptive tables and only one default PSVT table
is present with just one entry with MAX type. Add one additional
entry as done for non default case.
From 2.4.5:
commit 301a89284e9d74a
Author: Srinivas Pandruvada <email address hidden>
Date: Sat May 29 10:50:44 2021 -0700
Set a very high RAPL MSR PL1 with --adaptive
After upgrading Dell Latitude 5420, again noticed performance degradation.
The PPCC power limit for MSR RAPL PL1 is reduced to 15W. Even though we
disable MSR RAPL with --adaptive option, it is not getting disabled. So
MSR RAPL limits still playing role.
To fix that set a very high MSR RAPL PL1 limit so that it never causes
throttling. All throttling with --adaptive option is done using RAPL-MMIO.
From 2.4.4:
commit ea4491971059259
Author: Srinivas Pandruvada <email address hidden>
Date: Thu Mar 11 10:06:15 2021 -0800
Fix error for condition names
The current code caps the max name as the last condition name,
which is "Power_Slider". So any condition more than 56 will be
printing error, with "Power_Slider" as condition name. For example
for condition = 57:
Unsupported condition 57 (Power_slider)
This is confusing during debug, so print "UNKNOWN" for condition
name 56.
commit 9115f2fb0b296a2
Author: Srinivas Pandruvada <email address hidden>
Date: Tue Mar 9 16:36:13 2021 -0800
This is confusing during debug, so print "UNKNOWN" for condition
name 56.
commit 9115f2fb0b296a2
Author: Srinivas Pandruvada <email address hidden>
Date: Tue Mar 9 16:36:13 2021 -0800
Check for alternate names for B0D4 device
B0D4 can be named as TCPU or B0D4. So search for both names
if failed to find one.
commit 660ee6f1f6351e6
Author: Srinivas Pandruvada <email address hidden>
Date: Tue Mar 9 16:33:08 2021 -0800
Delete all trips from zones before psvt install
Initially zones has all the trips from sysfs, which may have wrong
settings. Instead of deleting only for matched psvt zones, delete
for all zones. In this way only zones which are in PSVT will be
present.
commit 1ad03424f7f3d33
Author: Srinivas Pandruvada <email address hidden>
Date: Tue Mar 9 11:36:27 2021 -0800
Disable legacy rapl cdev when rapl-mmio is in use
Explicitly disable legacy rapl based on MSR interface when rapl-mmio
is in use. This will prevent PL1/PL2 power limit from MSR based rapl,
which may not be the correct one.
commit 45832e16290fec7
Author: Colin Ian King <email address hidden>
Date: Thu Mar 18 11:22:38 2021 +0000
Fix spelling mistakes found using codespell
There are a handful of spelling mistakes in comments and literal
strings found by codespell. Fix these.
Signed-off-by: Colin Ian King <email address hidden>
== Test plan ==
Actually this is problematic as the changes affect different H/W in different ways and testing to touch all these changes requires full CPU exercising to try and trip thermal overrun.
test plan is as follows:
1. install -proposed thermald, run with debug logging enabled
2. run stress-ng --cpu 0 for a few hours on various H/W with air vents plugged to try and trigger thermal overrun and exercise thermald thermal throttling.
3. check the thermald logging to check for state change behaviour on thermal changes.
== Where problems could occur ==
These have been already tested on H/W in the field, for example:
https:/
however, all these fixes can alter the functionality of thermald for a range of platforms so the regressions potential is high. The set of changes in thermald are already in upstream thermald in Ubuntu Impish so these fixes will already have been exercised on the development release.
Issues can occur in:
1. H/W supporting RAPL MMIO
2. Devices with TCPU ACPI object names, this will now behave differently (correctly)
3. Systems with both RAPL cdev support and RAPL MMIO support will behave differently, cdev will now be (correctly) disabled.
Changed in thermald (Ubuntu): | |
importance: | Undecided → Medium |
assignee: | nobody → Colin Ian King (colin-king) |
status: | New → In Progress |
description: | updated |
summary: |
- pull in latest thermald bug fixes into thermald 2.4.3 + pull in latest thermald bug fixes into thermald |
Changed in thermald (Ubuntu Focal): | |
status: | Fix Committed → Fix Released |
tags: |
added: verification-done verification-done-focal removed: verification-needed verification-needed-focal |
Changed in thermald (Ubuntu): | |
status: | In Progress → Fix Released |
Hello Colin, or anyone else affected,
Accepted thermald into hirsute-proposed. The package will build now and be available at https:/ /launchpad. net/ubuntu/ +source/ thermald/ 2.4.3-1ubuntu1 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See https:/ /wiki.ubuntu. com/Testing/ EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification- needed- hirsute to verification- done-hirsute. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed- hirsute. In either case, without details of your testing we will not be able to proceed.
Further information regarding the verification process can be found at https:/ /wiki.ubuntu. com/QATeam/ PerformingSRUVe rification . Thank you in advance for helping!
N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.