Comment 10 for bug 2026658

Revision history for this message
Eli (biblicabeebli) wrote :

I have updates!
- I set /sys/devices/system/cpu/intel_pstate/max_perf_pct to 100 and confirmed that it restore the 4.7/4.8 peak turbo frequencies.
- I ran `stress -c 1`, cpu package temps went up to ~84 degrees C. No changes on the grep, no changes on the powerprofilesctl degraded state.
- I ran `stress -c 2`, cpu package temps went up to ~90 degrees C. This triggered the powerprofilesctl degraded state.

The grep:
/sys/devices/system/cpu/intel_pstate/hwp_dynamic_boost:0
/sys/devices/system/cpu/intel_pstate/max_perf_pct:70
/sys/devices/system/cpu/intel_pstate/min_perf_pct:10
/sys/devices/system/cpu/intel_pstate/no_turbo:1
/sys/devices/system/cpu/intel_pstate/status:active

$ powerprofilesctl
* performance:
    Driver: intel_pstate
    Degraded: yes (high-operating-temperature)
  balanced:
    Driver: intel_pstate
  power-saver:
    Driver: intel_pstate

$ cpupower frequency-info
analyzing CPU 6:
  driver: intel_pstate
  ...
  hardware limits: 400 MHz - 4.80 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 400 MHz and 2.40 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  boost state support:
    Supported: yes
    Active: yes

In case it is relevant, this has been with thermald running.
Temperatures are back down around 45 degrees C, which is typical, but as stated in the original error report it will never recover on its own

* * *

I have now set those values back to their originals, e.g.
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/no_turbo:0

I will also note that the 400MHz to 2.40GHz range indicated by cpupower reverts the full range when no_turbo is set to 0, and powerprofilesctl degraded state is also directly based off this value. (So I will stop reporting them!)

From this I can at least write a script that sets these variables back to normal and regain normal functionality without rebooting!

My next step will be to uninstall thermald entirely, reboot, and report back with whether I'm able trigger either bug. I'm confused about what I experienced before, the reboot is to clear thermal envelope lock from my script.