Comment 34 for bug 2026658

Revision history for this message
Eli (biblicabeebli) wrote :

I have been able to get thermald log info for bug 2.
(This was accomplished with locked power details, so the computer remained usable over the ~30 hours of uptime before I saw it had triggered.)

The log file itself is over 50MB, I've zipped it into a 3.8MB file.

Grep output to confirm bug 2:
/sys/devices/system/cpu/intel_pstate/hwp_dynamic_boost:0
/sys/devices/system/cpu/intel_pstate/max_perf_pct:70
/sys/devices/system/cpu/intel_pstate/min_perf_pct:10
/sys/devices/system/cpu/intel_pstate/no_turbo:1
/sys/devices/system/cpu/intel_pstate/status:active

To save a lot of bother, from my own system logging script I determined that bug 2 was triggered between (2023-08-14) 21:37:13 and 21:37:14 (US Eastern). Converting that first one to unix timestamps yields 1692063433.

So, of these 3 logging events, 2 shoouuld be the before/after log statements from thermald - unless I've screwed up my math:

[1692063430][DEBUG]poll exit 0 polls_fd event 0 0
[1692063430][DEBUG] energy 1:524286656:772647335 mj: 7965 mw
[1692063430][DEBUG]read_temperature sensor ID 4
[1692063430][DEBUG]Sensor TCPU :temp 48000
[1692063430][DEBUG]pref 0 type 4 temp 48000 trip 103050
[1692063430][DEBUG]pref 0 type 4 temp 48000 trip 104550
[1692063430][DEBUG]pref 0 type 4 temp 48000 trip 106050
[1692063430][DEBUG]pref 0 type 4 temp 48000 trip 107050
[1692063430][DEBUG]pref 0 type 4 temp 48000 trip 109050
[1692063430][DEBUG]pref 0 type 0 temp 48000 trip 110050
[1692063430][DEBUG]pref 0 type 2 temp 48000 trip 110050
[1692063430][DEBUG]Passive Trip point applicable
[1692063430][DEBUG]Trip point applicable < 1:110050
[1692063430][DEBUG]cdev size for this trippoint 0
[1692063430][DEBUG]pref 0 type 3 temp 48000 trip 90000
[1692063430][DEBUG]Passive Trip point applicable
[1692063430][DEBUG]Trip point applicable < 2:90000
[1692063430][DEBUG]cdev size for this trippoint 4
[1692063430][DEBUG]cdev at index 13:Processor
[1692063430][DEBUG]>>thd_cdev_set_state temperature 90000:48000 index:13 state:0 :zone:4 trip_id:2 target_state_valid:0 target_value :0 force:0 min_state:0 max_state:0
[1692063430][DEBUG]zone_trip_limits.size() 0
[1692063430][DEBUG]def_max_state:0 temp_max_state:0 curr_max_state:0
[1692063430][DEBUG]thd_cdev_set_13:curr state -1657 max state 0
[1692063430][DEBUG]def_min_state:0 curr_min_state:0
[1692063430][INFO]op->device:Processor -1658
[1692063430][DEBUG]set cdev state index 13 state -1658
[1692063430][INFO]sysfs write failed /sys/class/thermal/cooling_device13/cur_state
[1692063430][INFO]Set : threshold:90000, temperature:48000, cdev:13(Processor), curr_state:-1658, max_state:0
[1692063430][DEBUG]<<thd_cdev_set_state 0

[1692063434][DEBUG]poll exit 0 polls_fd event 0 0
[1692063434][DEBUG] energy 1:524286656:772685798 mj: 9615 mw
[1692063434][DEBUG]read_temperature sensor ID 4
[1692063434][DEBUG]Sensor TCPU :temp 90000
[1692063434][DEBUG]pref 0 type 4 temp 90000 trip 103050
[1692063434][DEBUG]pref 0 type 4 temp 90000 trip 104550
[1692063434][DEBUG]pref 0 type 4 temp 90000 trip 106050
[1692063434][DEBUG]pref 0 type 4 temp 90000 trip 107050
[1692063434][DEBUG]pref 0 type 4 temp 90000 trip 109050
[1692063434][DEBUG]pref 0 type 0 temp 90000 trip 110050
[1692063434][DEBUG]pref 0 type 2 temp 90000 trip 110050
[1692063434][DEBUG]Passive Trip point applicable
[1692063434][DEBUG]Trip point applicable < 1:110050
[1692063434][DEBUG]cdev size for this trippoint 0
[1692063434][DEBUG]pref 0 type 3 temp 90000 trip 90000
[1692063434][DEBUG]Passive Trip point applicable
[1692063434][DEBUG]Trip point applicable > 2:90000
[1692063434][DEBUG]cdev size for this trippoint 4
[1692063434][DEBUG]cdev at index 27:rapl_controller
[1692063434][DEBUG]Need to switch to next cdev target 0
[1692063434][DEBUG]cdev at index 28:intel_pstate
[1692063434][DEBUG]>>thd_cdev_set_state temperature 90000:90000 index:28 state:1 :zone:4 trip_id:2 target_state_valid:0 target_value :0 force:0 min_state:0 max_state:0
[1692063434][DEBUG]def_min_state:0 curr_min_state:0
[1692063434][DEBUG]thd_cdev_set_28:curr state 1 max state 10
[1692063434][INFO]cdev index:28 consecutive call, increment exponentially state 3 (min 0 max 10) (1:1)
[1692063434][DEBUG]def_max_state:10 temp_max_state:0 curr_max_state:10
[1692063434][DEBUG]op->device:intel_pstate 3
[1692063434][DEBUG]set cdev state index 28 state 3 percent 70
[1692063434][INFO]turbo disabled
[1692063434][INFO]Set : threshold:90000, temperature:90000, cdev:28(intel_pstate), curr_state:3, max_state:10
[1692063434][DEBUG]<<thd_cdev_set_state 1

[1692063438][DEBUG]poll exit 0 polls_fd event 0 0
[1692063438][DEBUG] energy 1:524286656:772715101 mj: 7325 mw
[1692063438][DEBUG]read_temperature sensor ID 4
[1692063438][DEBUG]Sensor TCPU :temp 49000
[1692063438][DEBUG]pref 0 type 4 temp 49000 trip 103050
[1692063438][DEBUG]pref 0 type 4 temp 49000 trip 104550
[1692063438][DEBUG]pref 0 type 4 temp 49000 trip 106050
[1692063438][DEBUG]pref 0 type 4 temp 49000 trip 107050
[1692063438][DEBUG]pref 0 type 4 temp 49000 trip 109050
[1692063438][DEBUG]pref 0 type 0 temp 49000 trip 110050
[1692063438][DEBUG]pref 0 type 2 temp 49000 trip 110050
[1692063438][DEBUG]Passive Trip point applicable
[1692063438][DEBUG]Trip point applicable < 1:110050
[1692063438][DEBUG]cdev size for this trippoint 0
[1692063438][DEBUG]pref 0 type 3 temp 49000 trip 90000
[1692063438][DEBUG]Passive Trip point applicable
[1692063438][DEBUG]Trip point applicable < 2:90000
[1692063438][DEBUG]cdev size for this trippoint 4
[1692063438][DEBUG]cdev at index 13:Processor
[1692063438][DEBUG]>>thd_cdev_set_state temperature 90000:49000 index:13 state:0 :zone:4 trip_id:2 target_state_valid:0 target_value :0 force:0 min_state:0 max_state:0
[1692063438][DEBUG]zone_trip_limits.size() 0
[1692063438][DEBUG]def_max_state:0 temp_max_state:0 curr_max_state:0
[1692063438][DEBUG]thd_cdev_set_13:curr state -1658 max state 0
[1692063438][DEBUG]def_min_state:0 curr_min_state:0
[1692063438][INFO]op->device:Processor -1659
[1692063438][DEBUG]set cdev state index 13 state -1659
[1692063438][INFO]sysfs write failed /sys/class/thermal/cooling_device13/cur_state
[1692063438][INFO]Set : threshold:90000, temperature:49000, cdev:13(Processor), curr_state:-1659, max_state:0
[1692063438][DEBUG]<<thd_cdev_set_state 0

and also that ls statement:
$ sudo ls /sys/bus/acpi/devices/INTC1041:00/
hid modalias path physical_node power status subsystem uevent uid wakeup

I'm now going to attempt to trigger the bug using that older kernel version, since I know how to reliably manually trigger it using `stress`.