cfs_bandwidth01 in sched from ubuntu_ltp_stable failed on B-4.15
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
Confirmed
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
[Impact]
Test case cfs_bandwidth01 in LTP sched test suite is a reproducer
of a CFS unthrottle_cfs_rq() issue (fe61468b2cbc2b sched/fair: Fix
enqueue_task_fair warning).
This test triggers a warning on our 4.15 kernel:
LTP: starting cfs_bandwidth01 (cfs_bandwidth01 -i 5)
------------[ cut here ]------------
rq->tmp_
WARNING: CPU: 0 PID: 0 at /build/
Modules linked in: input_leds joydev serio_raw mac_hid qemu_fw_cfg kvm_intel kvm irqbypass sch_fq_codel binfmt_misc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-144-generic #148-Ubuntu
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:unthrottle
RSP: 0018:ffff989ebf
RAX: 0000000000000000 RBX: ffff989eb4c6ac00 RCX: 0000000000000000
RDX: 0000000000000005 RSI: ffffffffacb63c4d RDI: 0000000000000046
RBP: ffff989ebfc03ea8 R08: 000000af39e61b33 R09: ffffffffacb63c20
R10: 0000000000000000 R11: 0000000000000001 R12: ffff989eb57fe400
R13: ffff989ebfc21900 R14: 0000000000000001 R15: 0000000000000001
FS: 000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055593258d618 CR3: 000000007a044000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
distribute_
sched_
? sched_cfs_
__hrtimer_
hrtimer_
smp_apic_
apic_timer_
</IRQ>
RIP: 0010:native_
RSP: 0018:ffffffffac
RAX: ffffffffabbc9280 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffffffac603e28 R08: 000000af39850067 R09: ffff989e73749d00
R10: 0000000000000000 R11: 7fffffffffffffff R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
? __sched_
default_
arch_cpu_
default_
do_idle+
cpu_startup_
rest_init+
start_
x86_64_
x86_64_
secondary_
Code: 50 09 00 00 49 39 85 60 09 00 00 74 68 80 3d 3a 6e 54 01 00 75 5f 31 db 48 c7 c7 c0 3d 2d ac c6 05 28 6e 54 01 01 e8 11 36 fc ff <0f> 0b 48 85 db 74 43 49 8b 85 78 09 00 00 49 39 85 70 09 00 00
---[ end trace b6b9a70bc2945c0c ]---
[Fix]
Base on the test case description, we will need these fixes:
* fe61468b2cbc2b sched/fair: Fix enqueue_task_fair warning
* b34cb07dde7c23 sched/fair: Fix enqueue_task_fair() warning some more
* 39f23ce07b9355 sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list
* 6d4d22468dae3d sched/fair: Reorder enqueue/
* 5ab297bab98431 sched/fair: Fix reordering of enqueue/
Backport needed for Bionic since we're missing some new variables /
coding style changes introduced in the following commits (and their
corresponding fixes):
* 97fb7a0a8944bd sched: Clean up and harmonize the coding style of the scheduler code base
* 9f68395333ad7f sched/pelt: Add a new runnable average signal
* 6212437f0f6043 sched/fair: Fix runnable_avg for throttled cfs
* 43e9f7f231e40e sched/fair: Start tracking SCHED_IDLE tasks count in cfs_rq
I have also searched in the upstream tree to see if there is any other
commit claim to be a fix of these but didn't see any.
[Test]
Test kernel can be found here:
https:/
With these patches applied, the test can pass without triggering this
warning.
<<<test_start>>>
tag=cfs_bandwidth01 stime=1624260713
cmdline=
contacts=""
analysis=exit
<<<test_output>>>
incrementing stop
tst_test.c:1313: TINFO: Timeout per run is 0h 05m 00s
tst_buffers.c:55: TINFO: Test is using guarded buffers
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
cfs_bandwidth01
Summary:
passed 10
failed 0
broken 0
skipped 0
warnings 0
I have also run the whole sched test suite in LTP to make sure there
is no other issues caused by this patchset.
[Where problems could occur]
* CFS (Completely Fair Scheduler) is the process scheduling system in
the kernel, if the patch is incorrect it might affect the sched
functionality. Especially system with CONFIG_
CONFIG_
[Other Info]
Test case description:
* Creates a multi-level CGroup hierarchy with the cpu controller
* enabled. The leaf groups are populated with "busy" processes which
* simulate intermittent cpu load. They spin for some time then sleep
* then repeat.
*
* Both the trunk and leaf groups are set cpu bandwidth limits. The
* busy processes will intermittently exceed these limits. Causing
* them to be throttled. When they begin sleeping this will then cause
* them to be unthrottle.
[Original Bug Report]
Issue found on Azure 4.15.0-1116.129 Bionic
This is a new test case added 8 days ago. So this is not a regression:
https:/
Test log:
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
cfs_bandwidth0
tst_test.c:1349: TFAIL: Kernel is now tainted.
HINT: You _MAY_ be missing kernel fixes, see:
https:/
https:/
https:/
https:/
https:/
Summary:
passed 10
failed 1
broken 0
skipped 0
warnings 0
tag=cfs_
Note that this test has failed on the following instances:
* Basic_A2
* Standard_B1ms
* Standard_D48_v3
* Standard_F32s_v2
But passed with:
* Standard_DS15_v2
* Standard_DS5_v2
When this issue happens, kernel will be tainted with warning:
[ 694.761774] LTP: starting cfs_bandwidth01 (cfs_bandwidth01 -i 5)
[ 695.801637] ------------[ cut here ]------------
[ 695.801640] rq->tmp_
[ 695.801694] WARNING: CPU: 0 PID: 0 at /build/
[ 695.801695] Modules linked in: input_leds joydev serio_raw mac_hid qemu_fw_cfg kvm_intel kvm irqbypass sch_fq_codel binfmt_misc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_
[ 695.801726] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-144-generic #148-Ubuntu
[ 695.801727] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 695.801729] RIP: 0010:unthrottle
[ 695.801730] RSP: 0018:ffff989ebf
[ 695.801732] RAX: 0000000000000000 RBX: ffff989eb4c6ac00 RCX: 0000000000000000
[ 695.801732] RDX: 0000000000000005 RSI: ffffffffacb63c4d RDI: 0000000000000046
[ 695.801734] RBP: ffff989ebfc03ea8 R08: 000000af39e61b33 R09: ffffffffacb63c20
[ 695.801734] R10: 0000000000000000 R11: 0000000000000001 R12: ffff989eb57fe400
[ 695.801735] R13: ffff989ebfc21900 R14: 0000000000000001 R15: 0000000000000001
[ 695.801737] FS: 000000000000000
[ 695.801738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 695.801739] CR2: 000055593258d618 CR3: 000000007a044000 CR4: 00000000000006f0
[ 695.801742] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 695.801743] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 695.801744] Call Trace:
[ 695.801755] <IRQ>
[ 695.801765] distribute_
[ 695.801767] sched_cfs_
[ 695.801768] ? sched_cfs_
[ 695.801775] __hrtimer_
[ 695.801777] hrtimer_
[ 695.801786] smp_apic_
[ 695.801789] apic_timer_
[ 695.801789] </IRQ>
[ 695.801791] RIP: 0010:native_
[ 695.801792] RSP: 0018:ffffffffac
[ 695.801793] RAX: ffffffffabbc9280 RBX: 0000000000000000 RCX: 0000000000000000
[ 695.801794] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 695.801795] RBP: ffffffffac603e28 R08: 000000af39850067 R09: ffff989e73749d00
[ 695.801796] R10: 0000000000000000 R11: 7fffffffffffffff R12: 0000000000000000
[ 695.801796] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 695.801801] ? __sched_
[ 695.801804] default_
[ 695.801813] arch_cpu_
[ 695.801814] default_
[ 695.801821] do_idle+0x172/0x1f0
[ 695.801823] cpu_startup_
[ 695.801825] rest_init+0xae/0xb0
[ 695.801843] start_kernel+
[ 695.801845] x86_64_
[ 695.801847] x86_64_
[ 695.801851] secondary_
[ 695.801852] Code: 50 09 00 00 49 39 85 60 09 00 00 74 68 80 3d 3a 6e 54 01 00 75 5f 31 db 48 c7 c7 c0 3d 2d ac c6 05 28 6e 54 01 01 e8 11 36 fc ff <0f> 0b 48 85 db 74 43 49 8b 85 78 09 00 00 49 39 85 70 09 00 00
[ 695.801875] ---[ end trace b6b9a70bc2945c0c ]---
tags: | added: 4.15 azure bionic sru-20210531 ubuntu-ltp-stable |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in linux (Ubuntu): | |
status: | New → Fix Released |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: | added: gt |
Changed in ubuntu-kernel-tests: | |
assignee: | nobody → Po-Hsu Lin (cypressyew) |
Changed in linux (Ubuntu Bionic): | |
assignee: | nobody → Po-Hsu Lin (cypressyew) |
status: | New → In Progress |
Changed in ubuntu-kernel-tests: | |
status: | New → In Progress |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: | added: fips |
description: | updated |
tags: | added: sru-20210621 |
Changed in ubuntu-kernel-tests: | |
status: | In Progress → Confirmed |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Confirmed |
Changed in linux (Ubuntu Bionic): | |
assignee: | Po-Hsu Lin (cypressyew) → nobody |
Changed in ubuntu-kernel-tests: | |
assignee: | Po-Hsu Lin (cypressyew) → nobody |
tags: | added: 4.4 sru-20230612 xenial |
This can be found on generic 4.15.0-145.149 as well