system sluggish, thermal keep frequency at 400MHz

Bug #1901266 reported by Alexander Mitsos
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
thermald (Ubuntu)
Fix Released
Medium
koba

Bug Description

This morning I upgraded to 20.10 from 20.04

The system was quite slow although I have a fast machine. My virtual windows 10 on virtualbox became unusable. When I tried to have the virtual machine open, I could not participate properly in a zoom call (I could still hear the people but they said that my voice was very choppy)
On 20.04 I was super-happy with the speed and I could have as many apps running as I want.

Based on google I started looking at
% journalctl --follow
and this shows quite a few errors but not repeating often enough to explain it.

Then I googled some more and found that /boot/efi was writing and reading.

Then I googled some more and thought I had trouble with gnome. So I reset it to default
% dconf reset -f /org/gnome/
and disabled the extensions. This made things slightly better but by far not acceptable.

After lots of searching I checked the frequency of the CPUs and it was at the minimum 400Hz (as shown by i7z and also other tools). I tried setting the governor with cpufreqctl and similar methods but this did not change anything.
I then found an old bug https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769236 and tried
% sudo systemctl stop thermald
this seems to work. After a few seconds the frequency shown in i7z goes to ~4500 MHz and the virtual machine seems to work fine.

ProblemType: Bug
DistroRelease: Ubuntu 20.10
Package: thermald 2.3-4
ProcVersionSignature: Ubuntu 5.8.0-25.26-generic 5.8.14
Uname: Linux 5.8.0-25-generic x86_64
NonfreeKernelModules: wl
ApportVersion: 2.20.11-0ubuntu50
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Sat Oct 24 01:39:40 2020
DistributionChannelDescriptor:
 # This is the distribution channel descriptor for the OEM CDs
 # For more information see http://wiki.ubuntu.com/DistributionChannelDescriptor
 canonical-oem-somerville-bionic-amd64-20180608-47+merion+X66
InstallationDate: Installed on 2019-09-27 (392 days ago)
InstallationMedia: Ubuntu 18.04 "Bionic" - Build amd64 LIVE Binary 20180608-09:38
SourcePackage: thermald
UpgradeStatus: Upgraded to groovy on 2020-10-23 (0 days ago)
mtime.conffile..etc.thermald.thermal-conf.xml: 2020-10-24T01:35:59.781865

Revision history for this message
Alexander Mitsos (alexandermitsos) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

Thermald 2.3-4 has been updated recently to 2.4.3-1ubuntu2 with some relevant upstream fixes. Do you mind verifying if this now addresses the issue. The updates contain the following fixes:

thermald (2.4.3-1ubuntu2) hirsute

  * Support Jasper Lake. (LP: #1940629)
    - 0014-Added-Jasper-Lake-CPU-model.patch

thermald (2.4.3-1ubuntu1) hirsute

  * Pull in bug fixes between 2.4.3 and 2.4.6 (LP: #1931565)
   - Disable legacy rapl cdev when rapl-mmio is in use
     This will prevent PL1/PL2 power limit from MSR based rapl, which
     may not be the correct one.
   - Delete all trips from zones before psvt install
     Initially zones has all the trips from sysfs, which may have wrong
     settings. Instead of deleting only for matched psvt zones, delete
     or all zones. In this way only zones which are in PSVT will be
     present.
   - Check for alternate names for B0D4 device
     B0D4 can be named as TCPU or B0D4. So search for both names
     if failed to find one.
   - Fix error for condition names
     The current code caps the max name as the last condition name,
     which is "Power_Slider". So any condition more than 56 will be
     printing error, with "Power_Slider" as condition name. For example
     for condition = 57: Unsupported condition 57 (Power_slider)
   - Set a very high RAPL MSR PL1 with --adaptive
     After upgrading Dell Latitude 5420, again noticed performance
     degradation.
     The PPCC power limit for MSR RAPL PL1 is reduced to 15W. Even though
     we disable MSR RAPL with --adaptive option, it is not getting
     disabled. So MSR RAPL limits still playing role.
     To fix that set a very high MSR RAPL PL1 limit so that it never
     causes throttling. All throttling with --adaptive option is done
     using RAPL-MMIO.
   - Special case for default PSVT
     When there are no adaptive tables and only one default PSVT table
     is present with just one entry with MAX type. Add one additional
     entry as done for non default case.
   - Increase power limit for disabled RAPL-MMIO
     Increase 100W to 200W as some desktop platform already have limit
     more than 100W.
   - Use Adaptive PPCC limits for RAPL MMIO
     Set the correct device name as RAPL-MSR so that RAPL-MMIO can
     also set the correct default power limits.

If this fixes the issue please let us know.

summary: - system slagish, thermal keep frequency at 400MHz
+ system sluggish, thermal keep frequency at 400MHz
Changed in thermald (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Changed in thermald (Ubuntu):
assignee: nobody → Colin Ian King (colin-king)
Changed in thermald (Ubuntu):
assignee: Colin Ian King (colin-king) → Ubuntu Kernel Team (ubuntu-kernel-team)
Changed in thermald (Ubuntu):
assignee: Ubuntu Kernel Team (ubuntu-kernel-team) → koba (kobako)
Revision history for this message
Colin Ian King (colin-king) wrote :

This bug report has not seen any further follow-up for 2+ years. Closing it. If it is still not fixed please re-open this issue.

Changed in thermald (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.