thermald often limits CPU frequency while on AC

Bug #1940485 reported by Chris Halse Rogers
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
thermald (Ubuntu)
In Progress
Medium
koba

Bug Description

I'm not entirely sure if this is a thermald issue, but on my laptop (Dell XPS 15" 2-in-1) I find that the CPU frequency, under load on AC, is often throttled to 2.5GHz rather than the base 3.1GHz or 3.8GHz turbo frequency.

Restarting thermald (with `systemctl restart thermald`) will let the system scale up to the expected ~3.8GHz boost frequency.

ProblemType: Bug
DistroRelease: Ubuntu 21.10
Package: thermald 2.4.6-1
ProcVersionSignature: Ubuntu 5.11.0-20.21+21.10.1-generic 5.11.21
Uname: Linux 5.11.0-20-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu67
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Thu Aug 19 09:24:56 2021
InstallationDate: Installed on 2021-06-26 (53 days ago)
InstallationMedia: Ubuntu 21.10.0 2021.05.28 amd64 "bcachefs" (20210622)
SourcePackage: thermald
UpgradeStatus: No upgrade log present (probably fresh install)

cpu info:
model : 158
model name : Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz

Revision history for this message
Chris Halse Rogers (raof) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

Two recent thermald releases may have now addressed this issue:

thermald (2.4.3-1ubuntu2) hirsute; urgency=medium

  * Support Jasper Lake. (LP: #1940629)
    - 0014-Added-Jasper-Lake-CPU-model.patch

thermald (2.4.3-1ubuntu1) hirsute; urgency=medium

   - Disable legacy rapl cdev when rapl-mmio is in use
     This will prevent PL1/PL2 power limit from MSR based rapl, which
     may not be the correct one.
   - Delete all trips from zones before psvt install
     Initially zones has all the trips from sysfs, which may have wrong
     settings. Instead of deleting only for matched psvt zones, delete
     or all zones. In this way only zones which are in PSVT will be
     present.
   - Check for alternate names for B0D4 device
     B0D4 can be named as TCPU or B0D4. So search for both names
     if failed to find one.
   - Fix error for condition names
     The current code caps the max name as the last condition name,
     which is "Power_Slider". So any condition more than 56 will be
     printing error, with "Power_Slider" as condition name. For example
     for condition = 57: Unsupported condition 57 (Power_slider)
   - Set a very high RAPL MSR PL1 with --adaptive
     After upgrading Dell Latitude 5420, again noticed performance
     degradation.
     The PPCC power limit for MSR RAPL PL1 is reduced to 15W. Even though
     we disable MSR RAPL with --adaptive option, it is not getting
     disabled. So MSR RAPL limits still playing role.
     To fix that set a very high MSR RAPL PL1 limit so that it never
     causes throttling. All throttling with --adaptive option is done
     using RAPL-MMIO.
   - Special case for default PSVT
     When there are no adaptive tables and only one default PSVT table
     is present with just one entry with MAX type. Add one additional
     entry as done for non default case.
   - Increase power limit for disabled RAPL-MMIO
     Increase 100W to 200W as some desktop platform already have limit
     more than 100W.
   - Use Adaptive PPCC limits for RAPL MMIO
     Set the correct device name as RAPL-MSR so that RAPL-MMIO can
     also set the correct default power limits.

Do you mind re-testing and letting us know if this now fixes the issue?

Changed in thermald (Ubuntu):
status: New → In Progress
status: In Progress → Incomplete
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Chris Halse Rogers (raof) wrote :

Still occurs on current impish:

mir on  explicit-platform-interfaces [$!?]
at 13:09:42 ❯ thermald --version
2.4.6

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Try these steps.
First disable thermald
#systemctl disable thermald

reboot

Then run the script
https://github.com/intel/thermal_daemon/blob/master/test/thermal-debug-dump-ubuntu.sh

It will generate a tar file. Upload that.
Also continue to use the system after the test and see if you see the same issue.

Revision history for this message
Chris Halse Rogers (raof) wrote :

I've run that debugging script. Unfortunately, Launchpad currently OOPSes when I try to attach the output to the bug. I'll keep trying every now and then.

The throttling did not occur during that test, and doesn't seem to have occurred while running the build of thermald from git, but this does not necessarily mean the bug doesn't still occur.

I'll continue running the build of thermald from git. Should I catch it in the bug-state, I'll re-collect the output of that script.

Revision history for this message
Colin Ian King (colin-king) wrote :

Any update on this Chris?

Changed in thermald (Ubuntu):
assignee: Colin Ian King (colin-king) → Ubuntu Kernel Team (ubuntu-kernel-team)
Changed in thermald (Ubuntu):
assignee: Ubuntu Kernel Team (ubuntu-kernel-team) → koba (kobako)
Revision history for this message
koba (kobako) wrote :

@Chris, could you upload the log?

Revision history for this message
Chris Halse Rogers (raof) wrote :
Revision history for this message
Chris Halse Rogers (raof) wrote :

Aha! Launchpad was OOPSing when I tried to upload the gzipped tarball, but I've successfully uploaded the files as a zstd-compressed tarball.

Revision history for this message
koba (kobako) wrote :

@Chris, if you use the upstream thermald, the issue didn't occur? thanks

Changed in thermald (Ubuntu):
status: Incomplete → In Progress
Revision history for this message
Chris Halse Rogers (raof) wrote :

Sorry about the late reply. I do not exactly know how to reproduce this behaviour, so I've been consistently running upstream thermald and collecting the logs while using the laptop normally.

Unfortunately, I have not observed this behaviour with the upstream thermald, so I can't get the requested logs. It's possible that the upstream code contains a fix for whatever my problem is?

Revision history for this message
koba (kobako) wrote :

@Chris, which upstream version did you use?

Revision history for this message
Chris Halse Rogers (raof) wrote :

I've been most recently running on git commit 2fc02bbf3a654c4e56fae51518f9983733ccd776, which would be slightly after 2.4.8.

I see that we've got a new thermald version (2.4.7) in the archive. I shall re-enable the system thermald and see if I can still reproduce.

description: updated
Revision history for this message
koba (kobako) wrote :

@Chris, could you reproduce with thermald 2.4.7? thanks

Revision history for this message
koba (kobako) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.