mem-on-off-test from memory-hotplug in ubuntu_kernel_selftests fails with blocked task on B/F/G/H

Bug #1923113 reported by Kelsey Steele
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Triaged
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Undecided
Unassigned
Bionic
Incomplete
Undecided
Unassigned

Bug Description

mem-on-off-test from memory-hotplug fails with timeout and blocked task, although visible is also invalid argument/unexpected success on Bionic 4.15.0-141.145

This is not a regression as it happened on 4.15.0-139.143 too, though passed all tests on 4.15.0-140.144.

This same test is reported as failing, though with different output on: lp:1897764 and lp:1891003

-------------------
03/26 11:09:12 DEBUG| utils:0116| Running 'sudo sh -c 'echo 1 > /proc/sys/net/ipv4/conf/all/accept_local''
03/26 11:09:12 DEBUG| utils:0116| Running 'sudo make -C linux/tools/testing/selftests TARGETS=memory-hotplug run_tests'
03/26 11:09:12 DEBUG| utils:0153| [stdout] make: Entering directory '/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests'
03/26 11:09:12 DEBUG| utils:0153| [stdout] make[1]: Entering directory '/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/memory-hotplug'
03/26 11:09:12 DEBUG| utils:0153| [stdout] make[1]: Nothing to be done for 'all'.
03/26 11:09:12 DEBUG| utils:0153| [stdout] make[1]: Leaving directory '/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/memory-hotplug'
03/26 11:09:12 DEBUG| utils:0153| [stdout] make[1]: Entering directory '/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/memory-hotplug'
03/26 11:09:12 DEBUG| utils:0153| [stdout] ./mem-on-off-test.sh -r 2 && echo "selftests: memory-hotplug [PASS]" || echo "selftests: memory-hotplug [FAIL]"
03/26 11:09:12 DEBUG| utils:0153| [stdout] Test scope: 2% hotplug memory
03/26 11:09:16 DEBUG| utils:0153| [stdout] online all hot-pluggable memory in offline state:
03/26 11:09:16 DEBUG| utils:0153| [stdout] SKIPPED - no hot-pluggable memory in offline state
03/26 11:09:20 DEBUG| utils:0153| [stdout] offline 2% hot-pluggable memory in online state
03/26 11:09:20 DEBUG| utils:0153| [stdout] trying to offline 15 out of 748 memory block(s):
03/26 11:09:23 DEBUG| utils:0153| [stdout] online->offline memory10
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory100
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory101
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory102
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory103
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory104
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory105
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory106
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory107
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory108
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory109
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory11
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory110
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory111
03/26 11:09:24 DEBUG| utils:0153| [stdout] online->offline memory112
03/26 11:09:28 DEBUG| utils:0153| [stdout] online all hot-pluggable memory in offline state:
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory10
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory100
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory101
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory102
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory103
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory104
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory105
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory106
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory107
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory108
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory109
03/26 11:09:32 DEBUG| utils:0153| [stdout] offline->online memory11
03/26 11:09:33 DEBUG| utils:0153| [stdout] offline->online memory110
03/26 11:09:33 DEBUG| utils:0153| [stdout] offline->online memory111
03/26 11:09:33 DEBUG| utils:0153| [stdout] offline->online memory112
03/26 11:09:33 DEBUG| utils:0153| [stdout] Test with memory notifier error injection
03/26 11:09:37 ERROR| utils:0153| [stderr] ./mem-on-off-test.sh: line 263: echo: write error: Invalid argument
03/26 11:09:41 ERROR| utils:0153| [stderr] online_memory_expect_fail 13: unexpected success
03/26 11:09:41 ERROR| utils:0153| [stderr] online_memory_expect_fail 164: unexpected success
03/26 11:09:41 ERROR| utils:0153| [stderr] online_memory_expect_fail 185: unexpected success
03/26 11:09:41 ERROR| utils:0153| [stderr] online_memory_expect_fail 202: unexpected success
03/26 11:09:41 ERROR| utils:0153| [stderr] online_memory_expect_fail 24: unexpected success
03/26 11:09:41 ERROR| utils:0153| [stderr] online_memory_expect_fail 267: unexpected success
03/26 11:09:41 ERROR| utils:0153| [stderr] online_memory_expect_fail 321: unexpected success
03/26 11:09:42 ERROR| utils:0153| [stderr] online_memory_expect_fail 395: unexpected success
03/26 11:09:42 ERROR| utils:0153| [stderr] online_memory_expect_fail 403: unexpected success
03/26 11:09:42 ERROR| utils:0153| [stderr] online_memory_expect_fail 410: unexpected success
03/26 11:09:42 ERROR| utils:0153| [stderr] online_memory_expect_fail 457: unexpected success
03/26 11:09:42 ERROR| utils:0153| [stderr] online_memory_expect_fail 464: unexpected success
03/26 11:09:42 ERROR| utils:0153| [stderr] online_memory_expect_fail 50: unexpected success
03/26 11:09:42 ERROR| utils:0153| [stderr] online_memory_expect_fail 577: unexpected success
03/26 11:09:42 ERROR| utils:0153| [stderr] online_memory_expect_fail 594: unexpected success
03/26 11:09:42 ERROR| utils:0153| [stderr] online_memory_expect_fail 596: unexpected success
03/26 11:09:46 ERROR| utils:0153| [stderr] ./mem-on-off-test.sh: line 279: echo: write error: Invalid argument
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 10: unexpected success
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 100: unexpected success
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 101: unexpected success
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 102: unexpected success
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 103: unexpected success
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 104: unexpected success
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 105: unexpected success
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 106: unexpected success
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 107: unexpected success
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 108: unexpected success
03/26 11:09:50 ERROR| utils:0153| [stderr] offline_memory_expect_fail 109: unexpected success
-------------------

On Bionic ibm-gt v4.15 kernel only on amd64 metal nodes rizzo and rumford has a timeout of memory-hotplug from ubuntu_kernel_selftests:
-------------------
    14121 14:49:30 ERROR| [stderr] offline_memory_expect_fail 5: unexpected success
    14122 14:56:12 ERROR| [stderr] offline_memory_expect_fail 62: unexpected success
    14123 14:56:12 ERROR| [stderr] offline_memory_expect_fail 63: unexpected success
    14124 15:19:22 INFO | Timer expired (1800 sec.), nuking pid 17231
    14125 15:19:23 INFO | ERROR ubuntu_kernel_selftests.memory-hotplug ubuntu_kernel_selftests.memory-hotplug timestamp=1625066363 localtime=Jun 30 15:19:23
-------------------

In dmesg from a next test:
-------------------
[ 583.473311] Offlined Pages 32768
[ 583.476247] Offlined Pages 32768
[ 583.479028] Offlined Pages 32768
[ 583.481876] Offlined Pages 32768
[ 1088.798252] INFO: task systemd:1 blocked for more than 120 seconds.
[ 1088.798310] Not tainted 4.15.0-1098-ibm-gt #108-Ubuntu
[ 1088.798360] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.798410] systemd D 0 1 0 0x00000000
[ 1088.798413] Call Trace:
[ 1088.798422] __schedule+0x24e/0x890
[ 1088.798426] schedule+0x2c/0x80
[ 1088.798428] schedule_preempt_disabled+0xe/0x10
[ 1088.798431] __mutex_lock.isra.5+0x276/0x4e0
[ 1088.798435] __mutex_lock_slowpath+0x13/0x20
[ 1088.798437] ? __mutex_lock_slowpath+0x13/0x20
[ 1088.798439] mutex_lock+0x2f/0x40
[ 1088.798444] proc_cgroup_show+0x4c/0x2a0
[ 1088.798449] proc_single_show+0x56/0x80
[ 1088.798453] seq_read+0xe5/0x450
[ 1088.798459] __vfs_read+0x1b/0x40
[ 1088.798462] vfs_read+0x8e/0x130
[ 1088.798466] SyS_read+0x5c/0xe0
[ 1088.798470] do_syscall_64+0x73/0x130
[ 1088.798474] entry_SYSCALL_64_after_hwframe+0x41/0xa6
[ 1088.798477] RIP: 0033:0x7fb5b16cf184
[ 1088.798478] RSP: 002b:00007fff21894aa0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1088.798481] RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fb5b16cf184
[ 1088.798482] RDX: 0000000000000400 RSI: 000056544b965d80 RDI: 0000000000000012
[ 1088.798484] RBP: 000056544b965d80 R08: 0000000000000000 R09: 00007fb5b12cc9ba
[ 1088.798485] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000400
[ 1088.798486] R13: 00007fb5b19a72a0 R14: 000056544b8ad570 R15: 00000000000007ff
[ 1088.798497] INFO: task kworker/11:1:179 blocked for more than 120 seconds.
-------------------

description: updated
description: updated
tags: added: 5.4 hwe
Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

Found on Groovy 5.8.0-51.57

tags: added: groo
tags: added: 5.8 groovy
tags: added: sru-20210412
tags: added: focal hwe-5.8
tags: added: hwe-5.4
tags: added: fips
summary: mem-on-off-test from memory-hotplug fails with invalid
- argument/unexpected success on Bionic
+ argument/unexpected success on B/F/G
summary: mem-on-off-test from memory-hotplug fails with invalid
- argument/unexpected success on B/F/G
+ argument/unexpected success on B/F/G/H
tags: added: 5.11 hirsute sru-20210510
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote : Re: mem-on-off-test from memory-hotplug fails with invalid argument/unexpected success on B/F/G/H

Observed in some Bionic kernels - cycle sru-20210510 .

tags: added: sru-20210531
Po-Hsu Lin (cypressyew)
tags: removed: groo
tags: added: sru-20210621
Sean Feole (sfeole)
tags: added: hinted
Changed in ubuntu-kernel-tests:
status: New → Triaged
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed in B/aws (kernel 4.15), cycle sru-20210621.

tags: added: aws
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed in B/aws-fips (kernel 4.15), cycle sru-20210621.

tags: added: sru-20210719
Revision history for this message
Krzysztof Kozlowski (krzk) wrote : Re: mem-on-off-test from memory-hotplug in ubuntu_kernel_selftests fails with invalid argument/unexpected success on B/F/G/H

Bionic ibm-gt v4.15 kernel only on amd64 metal nodes rizzo and rumford has a timeout of memory-hotplug from ubuntu_kernel_selftests:
-------------------
    14121 14:49:30 ERROR| [stderr] offline_memory_expect_fail 5: unexpected success
    14122 14:56:12 ERROR| [stderr] offline_memory_expect_fail 62: unexpected success
    14123 14:56:12 ERROR| [stderr] offline_memory_expect_fail 63: unexpected success
    14124 15:19:22 INFO | Timer expired (1800 sec.), nuking pid 17231
    14125 15:19:23 INFO | ERROR ubuntu_kernel_selftests.memory-hotplug ubuntu_kernel_selftests.memory-hotplug timestamp=1625066363 localtime=Jun 30 15:19:23
-------------------

In dmesg from next test:
-------------------
[ 583.473311] Offlined Pages 32768
[ 583.476247] Offlined Pages 32768
[ 583.479028] Offlined Pages 32768
[ 583.481876] Offlined Pages 32768
[ 1088.798252] INFO: task systemd:1 blocked for more than 120 seconds.
[ 1088.798310] Not tainted 4.15.0-1098-ibm-gt #108-Ubuntu
[ 1088.798360] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.798410] systemd D 0 1 0 0x00000000
[ 1088.798413] Call Trace:
[ 1088.798422] __schedule+0x24e/0x890
[ 1088.798426] schedule+0x2c/0x80
[ 1088.798428] schedule_preempt_disabled+0xe/0x10
[ 1088.798431] __mutex_lock.isra.5+0x276/0x4e0
[ 1088.798435] __mutex_lock_slowpath+0x13/0x20
[ 1088.798437] ? __mutex_lock_slowpath+0x13/0x20
[ 1088.798439] mutex_lock+0x2f/0x40
[ 1088.798444] proc_cgroup_show+0x4c/0x2a0
[ 1088.798449] proc_single_show+0x56/0x80
[ 1088.798453] seq_read+0xe5/0x450
[ 1088.798459] __vfs_read+0x1b/0x40
[ 1088.798462] vfs_read+0x8e/0x130
[ 1088.798466] SyS_read+0x5c/0xe0
[ 1088.798470] do_syscall_64+0x73/0x130
[ 1088.798474] entry_SYSCALL_64_after_hwframe+0x41/0xa6
[ 1088.798477] RIP: 0033:0x7fb5b16cf184
[ 1088.798478] RSP: 002b:00007fff21894aa0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1088.798481] RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fb5b16cf184
[ 1088.798482] RDX: 0000000000000400 RSI: 000056544b965d80 RDI: 0000000000000012
[ 1088.798484] RBP: 000056544b965d80 R08: 0000000000000000 R09: 00007fb5b12cc9ba
[ 1088.798485] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000400
[ 1088.798486] R13: 00007fb5b19a72a0 R14: 000056544b8ad570 R15: 00000000000007ff
[ 1088.798497] INFO: task kworker/11:1:179 blocked for more than 120 seconds.
-------------------

This was found on:
2021.06.21: bionic linux-ibm-gt 4.15.0-1095.104 (rizzo)
2021.07.19: bionic linux-ibm-gt 4.15.0-1098.108 (rumford)

Not found on:
2021.06.21: bionic linux-ibm-gt 4.15.0-1096.105 (ricolli)
2021.06.21: bionic linux 4.15.0-150.155 (rumford)

summary: - mem-on-off-test from memory-hotplug fails with invalid
- argument/unexpected success on B/F/G/H
+ mem-on-off-test from memory-hotplug in ubuntu_kernel_selftests fails
+ with invalid argument/unexpected success on B/F/G/H
description: updated
summary: mem-on-off-test from memory-hotplug in ubuntu_kernel_selftests fails
- with invalid argument/unexpected success on B/F/G/H
+ with blocked task on B/F/G/H
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Initial report mentioned "invalid argument/unexpected" because that was the only visible error from the test, but most likely it was not the case. As reported by others (e.g. Observed in B/aws-fips (kernel 4.15), cycle sru-20210621) failure was actually a timeout caused by hung task. See comment above for Bionic ibm-gt v4.15.

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

IMPORTANT:
When reporting more failures related to this case - check for the timeout and look at the dmesg log to be sure this is the real visible effect.

description: updated
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1923113

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: New → Incomplete
Revision history for this message
Cory Todd (corytodd) wrote :

Observed in 2022.08.29/focal/linux-fips/5.4.0-1061.69

tags: added: sru-20220829
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.