thermald spamming kernel log when updating powercap RAPL powerlimit
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
thermald (Ubuntu) |
Fix Released
|
High
|
Colin Ian King | ||
Trusty |
Fix Released
|
High
|
Colin Ian King | ||
Wily |
Fix Released
|
High
|
Colin Ian King |
Bug Description
[SRU Justification]
thermald is triggering the kernel to SPAM the kernel log with frequent "package locked by BIOS, monitoring only" messages.
[Fix]
This issue is fixed with the following upstream commits:
f1a77c5f3b936ba
833245725494eb2
These two fixes have been shown to work on Xenial and apply cleanly to Trusty and Wily versions of thermald. The risk of regression is low since these fixes add extra sanity checking to the code rather than completely new functionality plus they are upstream commits that have been available Xenial for some time now.
[Testcase]
Run on a system that reads /sys/devices/
With the fix, this message only appears once, and no more spamming occurs thereafter.
[Regression Potential]
Minimal. The fixes are upstream and have been tested in Xenial for quite a while. The fixes patch cleanly to Trusty and Wily and result in the same upstream code, so the code paths are identical to that of Xenial's thermald.
-------
When thermald updates /sys/devices/
[38458.753468] powercap intel-rapl:0: package locked by BIOS, monitoring only
[38637.993447] powercap intel-rapl:0: package locked by BIOS, monitoring only
[38674.154336] powercap intel-rapl:0: package locked by BIOS, monitoring only
[38691.500619] powercap intel-rapl:0: package locked by BIOS, monitoring only
This message comes from set_power_limit() in drivers/
open("/
write(3, "35000000", 8) = -1 ENODATA (No data available)
so in theory thermald should be seeing this failed write and handling it appropriately rather.
cthd_sysfs_
if (cdev_sysfs.
however, I believe they should check errno for the failed write and disable the rapl interface if we get -ENODATA on this interface to avoid repeated failures and hence repeated spamming of kernel messages
Changed in thermald (Ubuntu): | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Colin Ian King (colin-king) |
Changed in thermald (Ubuntu): | |
status: | Fix Committed → Fix Released |
Changed in thermald (Ubuntu Trusty): | |
assignee: | nobody → Colin Ian King (colin-king) |
Changed in thermald (Ubuntu Wily): | |
assignee: | nobody → Colin Ian King (colin-king) |
Changed in thermald (Ubuntu Trusty): | |
importance: | Undecided → High |
Changed in thermald (Ubuntu Wily): | |
importance: | Undecided → High |
Changed in thermald (Ubuntu Trusty): | |
status: | New → In Progress |
Changed in thermald (Ubuntu Wily): | |
status: | New → In Progress |
description: | updated |
description: | updated |
tags: |
added: verification-needed removed: verification-done |
This not not the only existing issue, and flooding journalctl.
Looks like thermald does not check for valid cpu model to avoid that spamming: mine is an old intel q9550, and journalctl has logged:
*********** model:stepping 0x6:17:7 (6:23:7) thermal_ rel thermal_ rel cpuid-check
thermald[791]: NO RAPL sysfs present
Feb 08 07:16:23 u64 thermald[791]: 10 CPUID levels; family:
Feb 08 07:16:23 u64 thermald[791]: Need Linux PowerCap sysfs
Feb 08 07:16:23 u64 thermald[791]: failed to open /dev/acpi_
Feb 08 07:16:23 u64 thermald[791]: failed to open /dev/acpi_
Feb 08 07:16:23 u64 thermald[791]: TRT/ART read failed
...
...
Unsupported cpu model, use thermal-conf.xml file or run with --ignore-
THD engine start failed
*************
So thermal-conf.xml might be completed to blacklist some incompatible hardware i suppose.