I have been able to get thermald log info for bug 2.
(This was accomplished with locked power details, so the computer remained usable over the ~30 hours of uptime before I saw it had triggered.)
The log file itself is over 50MB, I've zipped it into a 3.8MB file.
To save a lot of bother, from my own system logging script I determined that bug 2 was triggered between (2023-08-14) 21:37:13 and 21:37:14 (US Eastern). Converting that first one to unix timestamps yields 1692063433.
So, of these 3 logging events, 2 shoouuld be the before/after log statements from thermald - unless I've screwed up my math:
[1692063430][DEBUG]poll exit 0 polls_fd event 0 0
[1692063430][DEBUG] energy 1:524286656:772647335 mj: 7965 mw
[1692063430][DEBUG]read_temperature sensor ID 4
[1692063430][DEBUG]Sensor TCPU :temp 48000
[1692063430][DEBUG]pref 0 type 4 temp 48000 trip 103050
[1692063430][DEBUG]pref 0 type 4 temp 48000 trip 104550
[1692063430][DEBUG]pref 0 type 4 temp 48000 trip 106050
[1692063430][DEBUG]pref 0 type 4 temp 48000 trip 107050
[1692063430][DEBUG]pref 0 type 4 temp 48000 trip 109050
[1692063430][DEBUG]pref 0 type 0 temp 48000 trip 110050
[1692063430][DEBUG]pref 0 type 2 temp 48000 trip 110050
[1692063430][DEBUG]Passive Trip point applicable
[1692063430][DEBUG]Trip point applicable < 1:110050
[1692063430][DEBUG]cdev size for this trippoint 0
[1692063430][DEBUG]pref 0 type 3 temp 48000 trip 90000
[1692063430][DEBUG]Passive Trip point applicable
[1692063430][DEBUG]Trip point applicable < 2:90000
[1692063430][DEBUG]cdev size for this trippoint 4
[1692063430][DEBUG]cdev at index 13:Processor
[1692063430][DEBUG]>>thd_cdev_set_state temperature 90000:48000 index:13 state:0 :zone:4 trip_id:2 target_state_valid:0 target_value :0 force:0 min_state:0 max_state:0
[1692063430][DEBUG]zone_trip_limits.size() 0
[1692063430][DEBUG]def_max_state:0 temp_max_state:0 curr_max_state:0
[1692063430][DEBUG]thd_cdev_set_13:curr state -1657 max state 0
[1692063430][DEBUG]def_min_state:0 curr_min_state:0
[1692063430][INFO]op->device:Processor -1658
[1692063430][DEBUG]set cdev state index 13 state -1658
[1692063430][INFO]sysfs write failed /sys/class/thermal/cooling_device13/cur_state
[1692063430][INFO]Set : threshold:90000, temperature:48000, cdev:13(Processor), curr_state:-1658, max_state:0
[1692063430][DEBUG]<<thd_cdev_set_state 0
[1692063434][DEBUG]poll exit 0 polls_fd event 0 0
[1692063434][DEBUG] energy 1:524286656:772685798 mj: 9615 mw
[1692063434][DEBUG]read_temperature sensor ID 4
[1692063434][DEBUG]Sensor TCPU :temp 90000
[1692063434][DEBUG]pref 0 type 4 temp 90000 trip 103050
[1692063434][DEBUG]pref 0 type 4 temp 90000 trip 104550
[1692063434][DEBUG]pref 0 type 4 temp 90000 trip 106050
[1692063434][DEBUG]pref 0 type 4 temp 90000 trip 107050
[1692063434][DEBUG]pref 0 type 4 temp 90000 trip 109050
[1692063434][DEBUG]pref 0 type 0 temp 90000 trip 110050
[1692063434][DEBUG]pref 0 type 2 temp 90000 trip 110050
[1692063434][DEBUG]Passive Trip point applicable
[1692063434][DEBUG]Trip point applicable < 1:110050
[1692063434][DEBUG]cdev size for this trippoint 0
[1692063434][DEBUG]pref 0 type 3 temp 90000 trip 90000
[1692063434][DEBUG]Passive Trip point applicable
[1692063434][DEBUG]Trip point applicable > 2:90000
[1692063434][DEBUG]cdev size for this trippoint 4
[1692063434][DEBUG]cdev at index 27:rapl_controller
[1692063434][DEBUG]Need to switch to next cdev target 0
[1692063434][DEBUG]cdev at index 28:intel_pstate
[1692063434][DEBUG]>>thd_cdev_set_state temperature 90000:90000 index:28 state:1 :zone:4 trip_id:2 target_state_valid:0 target_value :0 force:0 min_state:0 max_state:0
[1692063434][DEBUG]def_min_state:0 curr_min_state:0
[1692063434][DEBUG]thd_cdev_set_28:curr state 1 max state 10
[1692063434][INFO]cdev index:28 consecutive call, increment exponentially state 3 (min 0 max 10) (1:1)
[1692063434][DEBUG]def_max_state:10 temp_max_state:0 curr_max_state:10
[1692063434][DEBUG]op->device:intel_pstate 3
[1692063434][DEBUG]set cdev state index 28 state 3 percent 70
[1692063434][INFO]turbo disabled
[1692063434][INFO]Set : threshold:90000, temperature:90000, cdev:28(intel_pstate), curr_state:3, max_state:10
[1692063434][DEBUG]<<thd_cdev_set_state 1
[1692063438][DEBUG]poll exit 0 polls_fd event 0 0
[1692063438][DEBUG] energy 1:524286656:772715101 mj: 7325 mw
[1692063438][DEBUG]read_temperature sensor ID 4
[1692063438][DEBUG]Sensor TCPU :temp 49000
[1692063438][DEBUG]pref 0 type 4 temp 49000 trip 103050
[1692063438][DEBUG]pref 0 type 4 temp 49000 trip 104550
[1692063438][DEBUG]pref 0 type 4 temp 49000 trip 106050
[1692063438][DEBUG]pref 0 type 4 temp 49000 trip 107050
[1692063438][DEBUG]pref 0 type 4 temp 49000 trip 109050
[1692063438][DEBUG]pref 0 type 0 temp 49000 trip 110050
[1692063438][DEBUG]pref 0 type 2 temp 49000 trip 110050
[1692063438][DEBUG]Passive Trip point applicable
[1692063438][DEBUG]Trip point applicable < 1:110050
[1692063438][DEBUG]cdev size for this trippoint 0
[1692063438][DEBUG]pref 0 type 3 temp 49000 trip 90000
[1692063438][DEBUG]Passive Trip point applicable
[1692063438][DEBUG]Trip point applicable < 2:90000
[1692063438][DEBUG]cdev size for this trippoint 4
[1692063438][DEBUG]cdev at index 13:Processor
[1692063438][DEBUG]>>thd_cdev_set_state temperature 90000:49000 index:13 state:0 :zone:4 trip_id:2 target_state_valid:0 target_value :0 force:0 min_state:0 max_state:0
[1692063438][DEBUG]zone_trip_limits.size() 0
[1692063438][DEBUG]def_max_state:0 temp_max_state:0 curr_max_state:0
[1692063438][DEBUG]thd_cdev_set_13:curr state -1658 max state 0
[1692063438][DEBUG]def_min_state:0 curr_min_state:0
[1692063438][INFO]op->device:Processor -1659
[1692063438][DEBUG]set cdev state index 13 state -1659
[1692063438][INFO]sysfs write failed /sys/class/thermal/cooling_device13/cur_state
[1692063438][INFO]Set : threshold:90000, temperature:49000, cdev:13(Processor), curr_state:-1659, max_state:0
[1692063438][DEBUG]<<thd_cdev_set_state 0
and also that ls statement:
$ sudo ls /sys/bus/acpi/devices/INTC1041:00/
hid modalias path physical_node power status subsystem uevent uid wakeup
I'm now going to attempt to trigger the bug using that older kernel version, since I know how to reliably manually trigger it using `stress`.
I have been able to get thermald log info for bug 2.
(This was accomplished with locked power details, so the computer remained usable over the ~30 hours of uptime before I saw it had triggered.)
The log file itself is over 50MB, I've zipped it into a 3.8MB file.
Grep output to confirm bug 2: system/ cpu/intel_ pstate/ hwp_dynamic_ boost:0 system/ cpu/intel_ pstate/ max_perf_ pct:70 system/ cpu/intel_ pstate/ min_perf_ pct:10 system/ cpu/intel_ pstate/ no_turbo: 1 system/ cpu/intel_ pstate/ status: active
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
To save a lot of bother, from my own system logging script I determined that bug 2 was triggered between (2023-08-14) 21:37:13 and 21:37:14 (US Eastern). Converting that first one to unix timestamps yields 1692063433.
So, of these 3 logging events, 2 shoouuld be the before/after log statements from thermald - unless I've screwed up my math:
[1692063430] [DEBUG] poll exit 0 polls_fd event 0 0 772647335 mj: 7965 mw [DEBUG] read_temperatur e sensor ID 4 [DEBUG] Sensor TCPU :temp 48000 [DEBUG] pref 0 type 4 temp 48000 trip 103050 [DEBUG] pref 0 type 4 temp 48000 trip 104550 [DEBUG] pref 0 type 4 temp 48000 trip 106050 [DEBUG] pref 0 type 4 temp 48000 trip 107050 [DEBUG] pref 0 type 4 temp 48000 trip 109050 [DEBUG] pref 0 type 0 temp 48000 trip 110050 [DEBUG] pref 0 type 2 temp 48000 trip 110050 [DEBUG] Passive Trip point applicable [DEBUG] Trip point applicable < 1:110050 [DEBUG] cdev size for this trippoint 0 [DEBUG] pref 0 type 3 temp 48000 trip 90000 [DEBUG] Passive Trip point applicable [DEBUG] Trip point applicable < 2:90000 [DEBUG] cdev size for this trippoint 4 [DEBUG] cdev at index 13:Processor [DEBUG] >>thd_cdev_ set_state temperature 90000:48000 index:13 state:0 :zone:4 trip_id:2 target_ state_valid: 0 target_value :0 force:0 min_state:0 max_state:0 [DEBUG] zone_trip_ limits. size() 0 [DEBUG] def_max_ state:0 temp_max_state:0 curr_max_state:0 [DEBUG] thd_cdev_ set_13: curr state -1657 max state 0 [DEBUG] def_min_ state:0 curr_min_state:0 [INFO]op- >device: Processor -1658 [DEBUG] set cdev state index 13 state -1658 [INFO]sysfs write failed /sys/class/ thermal/ cooling_ device13/ cur_state [INFO]Set : threshold:90000, temperature:48000, cdev:13(Processor), curr_state:-1658, max_state:0 [DEBUG] <<thd_cdev_ set_state 0
[1692063430][DEBUG] energy 1:524286656:
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063430]
[1692063434] [DEBUG] poll exit 0 polls_fd event 0 0 772685798 mj: 9615 mw [DEBUG] read_temperatur e sensor ID 4 [DEBUG] Sensor TCPU :temp 90000 [DEBUG] pref 0 type 4 temp 90000 trip 103050 [DEBUG] pref 0 type 4 temp 90000 trip 104550 [DEBUG] pref 0 type 4 temp 90000 trip 106050 [DEBUG] pref 0 type 4 temp 90000 trip 107050 [DEBUG] pref 0 type 4 temp 90000 trip 109050 [DEBUG] pref 0 type 0 temp 90000 trip 110050 [DEBUG] pref 0 type 2 temp 90000 trip 110050 [DEBUG] Passive Trip point applicable [DEBUG] Trip point applicable < 1:110050 [DEBUG] cdev size for this trippoint 0 [DEBUG] pref 0 type 3 temp 90000 trip 90000 [DEBUG] Passive Trip point applicable [DEBUG] Trip point applicable > 2:90000 [DEBUG] cdev size for this trippoint 4 [DEBUG] cdev at index 27:rapl_controller [DEBUG] Need to switch to next cdev target 0 [DEBUG] cdev at index 28:intel_pstate [DEBUG] >>thd_cdev_ set_state temperature 90000:90000 index:28 state:1 :zone:4 trip_id:2 target_ state_valid: 0 target_value :0 force:0 min_state:0 max_state:0 [DEBUG] def_min_ state:0 curr_min_state:0 [DEBUG] thd_cdev_ set_28: curr state 1 max state 10 [INFO]cdev index:28 consecutive call, increment exponentially state 3 (min 0 max 10) (1:1) [DEBUG] def_max_ state:10 temp_max_state:0 curr_max_state:10 [DEBUG] op->device: intel_pstate 3 [DEBUG] set cdev state index 28 state 3 percent 70 [INFO]turbo disabled [INFO]Set : threshold:90000, temperature:90000, cdev:28( intel_pstate) , curr_state:3, max_state:10 [DEBUG] <<thd_cdev_ set_state 1
[1692063434][DEBUG] energy 1:524286656:
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063434]
[1692063438] [DEBUG] poll exit 0 polls_fd event 0 0 772715101 mj: 7325 mw [DEBUG] read_temperatur e sensor ID 4 [DEBUG] Sensor TCPU :temp 49000 [DEBUG] pref 0 type 4 temp 49000 trip 103050 [DEBUG] pref 0 type 4 temp 49000 trip 104550 [DEBUG] pref 0 type 4 temp 49000 trip 106050 [DEBUG] pref 0 type 4 temp 49000 trip 107050 [DEBUG] pref 0 type 4 temp 49000 trip 109050 [DEBUG] pref 0 type 0 temp 49000 trip 110050 [DEBUG] pref 0 type 2 temp 49000 trip 110050 [DEBUG] Passive Trip point applicable [DEBUG] Trip point applicable < 1:110050 [DEBUG] cdev size for this trippoint 0 [DEBUG] pref 0 type 3 temp 49000 trip 90000 [DEBUG] Passive Trip point applicable [DEBUG] Trip point applicable < 2:90000 [DEBUG] cdev size for this trippoint 4 [DEBUG] cdev at index 13:Processor [DEBUG] >>thd_cdev_ set_state temperature 90000:49000 index:13 state:0 :zone:4 trip_id:2 target_ state_valid: 0 target_value :0 force:0 min_state:0 max_state:0 [DEBUG] zone_trip_ limits. size() 0 [DEBUG] def_max_ state:0 temp_max_state:0 curr_max_state:0 [DEBUG] thd_cdev_ set_13: curr state -1658 max state 0 [DEBUG] def_min_ state:0 curr_min_state:0 [INFO]op- >device: Processor -1659 [DEBUG] set cdev state index 13 state -1659 [INFO]sysfs write failed /sys/class/ thermal/ cooling_ device13/ cur_state [INFO]Set : threshold:90000, temperature:49000, cdev:13(Processor), curr_state:-1659, max_state:0 [DEBUG] <<thd_cdev_ set_state 0
[1692063438][DEBUG] energy 1:524286656:
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
[1692063438]
and also that ls statement: acpi/devices/ INTC1041: 00/
$ sudo ls /sys/bus/
hid modalias path physical_node power status subsystem uevent uid wakeup
I'm now going to attempt to trigger the bug using that older kernel version, since I know how to reliably manually trigger it using `stress`.