Comment 8 for bug 1923114

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

While looking for some improvements on cpu-on-off-test.sh, I investigated the path that is taken on azure. One good tool to debug these failures is the cpuhp tracing.

echo 1 > /sys/kernel/debug/tracing/events/cpuhp/enable
echo 1 > /sys/kernel/debug/tracing/tracing_on
cat /sys/kernel/debug/tracing/trace_pipe &
echo 0 > /sys/devices/system/cpu/cpu1/online

            bash-2660 [001] .... 10306.404476: cpuhp_enter: cpu: 0001 target: 143 step: 213 (cpuhp_kick_ap_work)
         cpuhp/1-14 [001] .... 10306.404506: cpuhp_enter: cpu: 0001 target: 143 step: 212 (sched_cpu_deactivate)
         cpuhp/1-14 [001] .... 10306.420476: cpuhp_exit: cpu: 0001 state: 211 step: 212 ret: 0
         cpuhp/1-14 [001] .... 10306.420502: cpuhp_enter: cpu: 0001 target: 143 step: 189 (msr_device_destroy [msr])
         cpuhp/1-14 [001] .... 10306.420623: cpuhp_exit: cpu: 0001 state: 188 step: 189 ret: 0
         cpuhp/1-14 [001] .... 10306.420629: cpuhp_enter: cpu: 0001 target: 143 step: 187 (mce_cpu_pre_down)
         cpuhp/1-14 [001] .... 10306.420653: cpuhp_exit: cpu: 0001 state: 186 step: 187 ret: 0
         cpuhp/1-14 [001] .... 10306.420654: cpuhp_enter: cpu: 0001 target: 143 step: 184 (hv_synic_cleanup)
         cpuhp/1-14 [001] .... 10306.420655: cpuhp_exit: cpu: 0001 state: 183 step: 184 ret: -16
         cpuhp/1-14 [001] .... 10306.426523: cpuhp_enter: cpu: 0001 target: 213 step: 185 (compute_batch_value)
         cpuhp/1-14 [001] .... 10306.426541: cpuhp_exit: cpu: 0001 state: 185 step: 185 ret: 0
         cpuhp/1-14 [001] .... 10306.426546: cpuhp_enter: cpu: 0001 target: 213 step: 186 (acpi_soft_cpu_online)
         cpuhp/1-14 [001] .... 10306.426553: cpuhp_exit: cpu: 0001 state: 186 step: 186 ret: 0
         cpuhp/1-14 [001] .... 10306.426553: cpuhp_enter: cpu: 0001 target: 213 step: 187 (mce_cpu_online)
         cpuhp/1-14 [001] .... 10306.426597: cpuhp_exit: cpu: 0001 state: 187 step: 187 ret: 0
         cpuhp/1-14 [001] .... 10306.426606: cpuhp_enter: cpu: 0001 target: 213 step: 188 (console_cpu_notify)
         cpuhp/1-14 [001] .... 10306.426607: cpuhp_exit: cpu: 0001 state: 188 step: 188 ret: 0
         cpuhp/1-14 [001] .... 10306.426611: cpuhp_enter: cpu: 0001 target: 213 step: 189 (msr_device_create [msr])
         cpuhp/1-14 [001] .... 10306.426665: cpuhp_exit: cpu: 0001 state: 189 step: 189 ret: 0
         cpuhp/1-14 [001] .... 10306.426667: cpuhp_enter: cpu: 0001 target: 213 step: 212 (sched_cpu_activate)
         cpuhp/1-14 [001] .... 10306.426670: cpuhp_exit: cpu: 0001 state: 212 step: 212 ret: 0

Notice the -16 failure on hv_synic_cleanup. It can fail for two reasons: the cpu is VMBUS_CONNECT_CPU, which is 0, not the case here, or the cpu is the target_cpu for a vmbus_channel.

# grep 1 /sys/bus/vmbus/devices/*/channels/*/cpu
/sys/bus/vmbus/devices/00000000-0001-8899-0000-000000000000/channels/3/cpu:1
/sys/bus/vmbus/devices/000d3a6e-002e-000d-3a6e-002e000d3a6e/channels/15/cpu:1
/sys/bus/vmbus/devices/f8b3781b-1e82-4818-a1c3-63d806ec15bb/channels/13/cpu:1