ubuntu_kernel_selftests: ./cpu-on-off-test.sh: line 94: echo: write error: Device or resource busy

Bug #1923114 reported by Marcelo Cerri
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
In Progress
Undecided
Krzysztof Kozlowski
linux-azure (Ubuntu)
New
Undecided
Unassigned
Trusty
New
Undecided
Unassigned
Xenial
New
Undecided
Unassigned
Bionic
New
Undecided
Unassigned
Groovy
New
Undecided
Unassigned
linux-azure-4.15 (Ubuntu)
New
Undecided
Unassigned
Trusty
New
Undecided
Unassigned
Xenial
New
Undecided
Unassigned
Bionic
New
Undecided
Unassigned
Groovy
New
Undecided
Unassigned

Bug Description

Test cpu-hotplug from ubuntu_kernel_selftests failed with bionic:linux-azure-4.15 running on a Basic A2 with 2 cores (besides other instance types):

selftests: cpu-on-off-test.sh
========================================
pid 28041's current affinity mask: 3
pid 28041's new affinity mask: 1
CPU online/offline summary:
present_cpus = 0-1 present_max = 1
Cpus in online state: 0-1
Cpus in offline state: 0
Limited scope test: one hotplug cpu
(leaves cpu in the original state):
online to offline to online: cpu 1
not ok 1..1 selftests: cpu-on-off-test.sh [FAIL]
./cpu-on-off-test.sh: line 94: echo: write error: Device or resource busy
offline_cpu_expect_success 1: unexpected fail

http://10.246.72.46/4.15.0-1112.124~16.04.1-azure/xenial-linux-azure-azure-amd64-4.15.0-Basic_A2-ubuntu_kernel_selftests/ubuntu_kernel_selftests/results/ubuntu_kernel_selftests.cpu-hotplug/debug/ubuntu_kernel_selftests.cpu-hotplug.DEBUG.html

The problem happens at "autotest-client-tests/ubuntu_kernel_selftests/cpu-on-off-test.sh" when executing:

 echo 0 > $SYSFS/devices/system/cpu/cpu$1/online

Marcelo Cerri (mhcerri)
tags: added: 4.15 amd64 azure bionic kqa-blocker sru-20210315 trusty ubuntu-kernel-selftests xenial
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Also seen in linux-azure 5.8.0-1027.29 (and later) for groovy

tags: added: 5.8 groovy
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Also seen in bionic linux-azure-fips 4.15.0-2024.27 for sru-20210315

tags: added: azure-fips
tags: added: sru-20210412
Revision history for this message
Ian May (ian-may) wrote :

bionic/linux-azure-fips: 4.15.0-2026.29

Revision history for this message
Ian May (ian-may) wrote :

bionic/linux-azure-4.15: 4.15.0-1114.127

Revision history for this message
Ian May (ian-may) wrote :

Found on bionic/linux-azure-4.15: 4.15.0-1115.128

Revision history for this message
Ian May (ian-may) wrote :

Found on bionic/linux-azure-fips: 4.15.0-2027.30

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Found on bionic/linux-azure: 4.15.0-1119.132

tags: added: sru-20210621
Changed in ubuntu-kernel-tests:
assignee: nobody → Krzysztof Kozlowski (krzk)
status: New → In Progress
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

While looking for some improvements on cpu-on-off-test.sh, I investigated the path that is taken on azure. One good tool to debug these failures is the cpuhp tracing.

echo 1 > /sys/kernel/debug/tracing/events/cpuhp/enable
echo 1 > /sys/kernel/debug/tracing/tracing_on
cat /sys/kernel/debug/tracing/trace_pipe &
echo 0 > /sys/devices/system/cpu/cpu1/online

            bash-2660 [001] .... 10306.404476: cpuhp_enter: cpu: 0001 target: 143 step: 213 (cpuhp_kick_ap_work)
         cpuhp/1-14 [001] .... 10306.404506: cpuhp_enter: cpu: 0001 target: 143 step: 212 (sched_cpu_deactivate)
         cpuhp/1-14 [001] .... 10306.420476: cpuhp_exit: cpu: 0001 state: 211 step: 212 ret: 0
         cpuhp/1-14 [001] .... 10306.420502: cpuhp_enter: cpu: 0001 target: 143 step: 189 (msr_device_destroy [msr])
         cpuhp/1-14 [001] .... 10306.420623: cpuhp_exit: cpu: 0001 state: 188 step: 189 ret: 0
         cpuhp/1-14 [001] .... 10306.420629: cpuhp_enter: cpu: 0001 target: 143 step: 187 (mce_cpu_pre_down)
         cpuhp/1-14 [001] .... 10306.420653: cpuhp_exit: cpu: 0001 state: 186 step: 187 ret: 0
         cpuhp/1-14 [001] .... 10306.420654: cpuhp_enter: cpu: 0001 target: 143 step: 184 (hv_synic_cleanup)
         cpuhp/1-14 [001] .... 10306.420655: cpuhp_exit: cpu: 0001 state: 183 step: 184 ret: -16
         cpuhp/1-14 [001] .... 10306.426523: cpuhp_enter: cpu: 0001 target: 213 step: 185 (compute_batch_value)
         cpuhp/1-14 [001] .... 10306.426541: cpuhp_exit: cpu: 0001 state: 185 step: 185 ret: 0
         cpuhp/1-14 [001] .... 10306.426546: cpuhp_enter: cpu: 0001 target: 213 step: 186 (acpi_soft_cpu_online)
         cpuhp/1-14 [001] .... 10306.426553: cpuhp_exit: cpu: 0001 state: 186 step: 186 ret: 0
         cpuhp/1-14 [001] .... 10306.426553: cpuhp_enter: cpu: 0001 target: 213 step: 187 (mce_cpu_online)
         cpuhp/1-14 [001] .... 10306.426597: cpuhp_exit: cpu: 0001 state: 187 step: 187 ret: 0
         cpuhp/1-14 [001] .... 10306.426606: cpuhp_enter: cpu: 0001 target: 213 step: 188 (console_cpu_notify)
         cpuhp/1-14 [001] .... 10306.426607: cpuhp_exit: cpu: 0001 state: 188 step: 188 ret: 0
         cpuhp/1-14 [001] .... 10306.426611: cpuhp_enter: cpu: 0001 target: 213 step: 189 (msr_device_create [msr])
         cpuhp/1-14 [001] .... 10306.426665: cpuhp_exit: cpu: 0001 state: 189 step: 189 ret: 0
         cpuhp/1-14 [001] .... 10306.426667: cpuhp_enter: cpu: 0001 target: 213 step: 212 (sched_cpu_activate)
         cpuhp/1-14 [001] .... 10306.426670: cpuhp_exit: cpu: 0001 state: 212 step: 212 ret: 0

Notice the -16 failure on hv_synic_cleanup. It can fail for two reasons: the cpu is VMBUS_CONNECT_CPU, which is 0, not the case here, or the cpu is the target_cpu for a vmbus_channel.

# grep 1 /sys/bus/vmbus/devices/*/channels/*/cpu
/sys/bus/vmbus/devices/00000000-0001-8899-0000-000000000000/channels/3/cpu:1
/sys/bus/vmbus/devices/000d3a6e-002e-000d-3a6e-002e000d3a6e/channels/15/cpu:1
/sys/bus/vmbus/devices/f8b3781b-1e82-4818-a1c3-63d806ec15bb/channels/13/cpu:1

tags: added: 5.4
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.