I found this issue is limited to node "dazzle", other ARM64 nodes like kopter-kernel and scobee-kernel they do not have this issue.
Also, a manual test show the real cause seems to be the efivarfs.sh in efivarfs, which can be found failed with timeout in the RT test report.
Running the test manually on dazzle with "set -x" added to the script: $ sudo ./efivarfs.sh + check_prereqs + local 'msg=skip all tests:' + '[' 0 '!=' 0 ']' + grep -q '^\S\+ /sys/firmware/efi/efivars efivarfs' /proc/mounts + rc=0 + run_test test_create + local test=test_create + echo -------------------- -------------------- + echo 'running test_create' running test_create + echo -------------------- -------------------- ++ type -t test_create + '[' function = function ']' + test_create + local 'attrs=\x07\x00\x00\x00' + local file=/sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27 + printf '\x07\x00\x00\x00\x00'
dmesg: [ 420.122478] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 420.128599] rcu: 20-...0: (9 GPs behind) idle=1a04/1/0x4000000000000000 softirq=1701/1701 fqs=6790 [ 420.137665] (detected by 30, t=15005 jiffies, g=6805, q=768 ncpus=32) [ 420.137670] Task dump for CPU 20: [ 420.137672] task:kworker/u64:0 state:R running task stack:0 pid:9 ppid:2 flags:0x0000000a [ 420.137680] Workqueue: efi_rts_wq efi_call_rts [ 420.137691] Call trace: [ 420.137693] __switch_to+0xbc/0x100 [ 420.137699] 0xffff80002585bb2c [ 484.991153] INFO: task efivarfs.sh:1786 blocked for more than 120 seconds. [ 484.998061] Not tainted 6.2.0-20-generic #20-Ubuntu [ 485.003478] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 485.011332] task:efivarfs.sh state:D stack:0 pid:1786 ppid:1782 flags:0x00000004 [ 485.011339] Call trace: [ 485.011341] __switch_to+0xbc/0x100 [ 485.011349] __schedule+0x2fc/0x7b0 [ 485.011353] schedule+0x68/0x160 [ 485.011356] schedule_timeout+0x1a8/0x1dc [ 485.011360] wait_for_completion+0xe0/0x180 [ 485.011364] virt_efi_set_variable+0x16c/0x220 [ 485.011369] efivar_set_variable_locked+0x80/0x120 [ 485.011372] efivar_entry_set_get_size+0xc4/0x180 [ 485.011377] efivarfs_file_write+0xb0/0x1d0 [ 485.011380] vfs_write+0xd0/0x310 [ 485.011385] ksys_write+0x7c/0x130 [ 485.011388] __arm64_sys_write+0x28/0x50 [ 485.011392] invoke_syscall+0x7c/0x124 [ 485.011396] el0_svc_common.constprop.0+0x5c/0x1cc [ 485.011399] do_el0_svc+0x38/0x60 [ 485.011402] el0_svc+0x30/0xe0 [ 485.011407] el0t_64_sync_handler+0x11c/0x150 [ 485.011411] el0t_64_sync+0x1a8/0x1ac
I will modify the bug title and description accordingly.
I found this issue is limited to node "dazzle", other ARM64 nodes like kopter-kernel and scobee-kernel they do not have this issue.
Also, a manual test show the real cause seems to be the efivarfs.sh in efivarfs, which can be found failed with timeout in the RT test report.
Running the test manually on dazzle with "set -x" added to the script: efi/efivars efivarfs' /proc/mounts ------- ------ ------- ------ ------- ------ ------- ------ \x07\x00\ x00\x00' firmware/ efi/efivars/ test_create- 210be57c- 9849-4fc7- a635-e6382d1aec 27 x00\x00\ x00'
$ sudo ./efivarfs.sh
+ check_prereqs
+ local 'msg=skip all tests:'
+ '[' 0 '!=' 0 ']'
+ grep -q '^\S\+ /sys/firmware/
+ rc=0
+ run_test test_create
+ local test=test_create
+ echo -------
-------
+ echo 'running test_create'
running test_create
+ echo -------
-------
++ type -t test_create
+ '[' function = function ']'
+ test_create
+ local 'attrs=
+ local file=/sys/
+ printf '\x07\x00\
dmesg: 1/0x40000000000 00000 softirq=1701/1701 fqs=6790 to+0xbc/ 0x100 kernel/ hung_task_ timeout_ secs" disables this message. to+0xbc/ 0x100 0x2fc/0x7b0 timeout+ 0x1a8/0x1dc completion+ 0xe0/0x180 set_variable+ 0x16c/0x220 set_variable_ locked+ 0x80/0x120 entry_set_ get_size+ 0xc4/0x180 file_write+ 0xb0/0x1d0 0xd0/0x310 0x7c/0x130 sys_write+ 0x28/0x50 syscall+ 0x7c/0x124 common. constprop. 0+0x5c/ 0x1cc svc+0x38/ 0x60 sync_handler+ 0x11c/0x150 sync+0x1a8/ 0x1ac
[ 420.122478] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 420.128599] rcu: 20-...0: (9 GPs behind) idle=1a04/
[ 420.137665] (detected by 30, t=15005 jiffies, g=6805, q=768 ncpus=32)
[ 420.137670] Task dump for CPU 20:
[ 420.137672] task:kworker/u64:0 state:R running task stack:0 pid:9 ppid:2 flags:0x0000000a
[ 420.137680] Workqueue: efi_rts_wq efi_call_rts
[ 420.137691] Call trace:
[ 420.137693] __switch_
[ 420.137699] 0xffff80002585bb2c
[ 484.991153] INFO: task efivarfs.sh:1786 blocked for more than 120 seconds.
[ 484.998061] Not tainted 6.2.0-20-generic #20-Ubuntu
[ 485.003478] "echo 0 > /proc/sys/
[ 485.011332] task:efivarfs.sh state:D stack:0 pid:1786 ppid:1782 flags:0x00000004
[ 485.011339] Call trace:
[ 485.011341] __switch_
[ 485.011349] __schedule+
[ 485.011353] schedule+0x68/0x160
[ 485.011356] schedule_
[ 485.011360] wait_for_
[ 485.011364] virt_efi_
[ 485.011369] efivar_
[ 485.011372] efivar_
[ 485.011377] efivarfs_
[ 485.011380] vfs_write+
[ 485.011385] ksys_write+
[ 485.011388] __arm64_
[ 485.011392] invoke_
[ 485.011396] el0_svc_
[ 485.011399] do_el0_
[ 485.011402] el0_svc+0x30/0xe0
[ 485.011407] el0t_64_
[ 485.011411] el0t_64_
I will modify the bug title and description accordingly.