linux-image-5.15.0-1032-realtime locks up under scheduler test load

Bug #2024599 reported by Colin Ian King
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
apparmor (Ubuntu)
New
Undecided
Unassigned
Jammy
New
Undecided
Unassigned
Kinetic
New
Undecided
Unassigned
Lunar
Won't Fix
Undecided
Unassigned
Mantic
New
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Low
Unassigned
Jammy
Incomplete
Undecided
Unassigned
Kinetic
New
Undecided
Unassigned
Lunar
Won't Fix
Undecided
Unassigned
Mantic
Incomplete
Low
Unassigned

Bug Description

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy

uname -a
Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 24 11:45:03 UTC 2023 x86_64
x86_64 x86_64 GNU/Linux

free
               total used free shared buff/cache available
Mem: 4013888 200984 3439012 1204 373892 3744628
Swap: 4014076 0 4014076

Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS):

how to reproduce issue:

git clone https://github.com/ColinIanKing/stress-ng
sudo apt-get update
sudo apt-get build-dep stress-ng
sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev libglvnd-dev libgbm-dev
cd stress-ng
make clean
make -j 8
sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m

..wait for all the stressors to get invoked, system becomes unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears to be hard locked.

Changed in linux (Ubuntu):
importance: Undecided → Low
summary: - linux-image-5.15.0-1032-realtime locksup under scheduler test load
+ linux-image-5.15.0-1032-realtime locks up under scheduler test load
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2024599

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Jammy):
status: New → Incomplete
Revision history for this message
Colin Ian King (colin-king) wrote (last edit ):

I'm working through the stressors to see which ones are possibly causing issues. I did notice that the apparmor stressor eats up memory until the system runs out of memory. This stressor loads illegal apparmor profiles and then removes them. Perhaps there is a memory leak in the loading of profiles that don't pass the verification phase:

To show this issue, run the following, one can see that memory gets low over time before the user gets kicked off due to low memory:

sudo ./stress-ng --apparmor 1 --vmstat 5
stress-ng: info: [1339] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor
stress-ng: info: [1339] dispatching hogs: 1 apparmor
stress-ng: info: [1340] vmstat: r b swpd free buff cache si so bi bo in cs us sy id wa st
stress-ng: info: [1340] vmstat: 2 1 0 313824 32776 364352 0 0 16 18 4858 9752 4 25 70 0 0
stress-ng: info: [1340] vmstat: 5 0 0 257848 32776 366528 0 0 0 1091 4573 8435 4 23 72 0 0
stress-ng: info: [1340] vmstat: 5 0 0 198916 32784 368288 0 0 0 20 4642 8681 4 23 71 1 0
stress-ng: info: [1340] vmstat: 2 0 0 139496 32792 370600 0 0 0 16 4612 8500 4 23 71 1 0
stress-ng: info: [1340] vmstat: 2 0 0 85032 32740 363916 0 0 0 1751 4774 8710 4 23 71 1 0
stress-ng: info: [1340] vmstat: 5 0 0 92224 32748 310548 0 0 0 2020 5919 10123 4 24 70 1 0
stress-ng: info: [1340] vmstat: 2 0 0 93380 30068 268484 0 0 0 14 5590 10275 4 26 69 1 0
stress-ng: info: [1340] vmstat: 2 0 0 102152 23648 207872 0 0 0 3346 5277 9303 4 24 70 1 0
stress-ng: info: [1340] vmstat: 5 0 0 99184 18488 169084 0 0 48 2180 5614 9901 4 25 71 0 0
stress-ng: info: [1340] vmstat: 2 0 0 88068 7080 140392 0 0 359 2090 6146 11013 4 27 68 0 0
stress-ng: info: [1340] vmstat: 2 0 0 92368 564 82108 0 0 3568 2534 5899 10308 4 26 67 1 0
stress-ng: info: [1340] vmstat: 7 0 0 83784 100 47356 0 0 99834 4212 8540 14574 4 28 65 2 0
stress-ng: info: [1340] vmstat: 2 0 0 76784 188 44916 0 0 363427 7621 16647 28448 4 37 45 12 0

Revision history for this message
Colin Ian King (colin-king) wrote :

I've managed to capture where it hangs, looks like a RCU issue, see attached screen shot.

Revision history for this message
Birgit Edel (biredel) wrote :
Download full text (4.6 KiB)

0.15.06-2 against 6.2.0-1006-kvm also eats memory, but sometimes adds:

13:06:48.136501 AppArmor DFA next/check upper bounds error
13:06:48.204486 AppArmor DFA next/check upper bounds error
13:06:48.228502 AppArmor DFA next/check upper bounds error
13:06:48.476002 AppArmor DFA state with invalid match flags
13:06:48.476068 BUG: kernel NULL pointer dereference, address: 0000000000000030
13:06:48.476526 #PF: supervisor read access in kernel mode
13:06:48.485451 #PF: error_code(0x0000) - not-present page
13:06:48.485583 PGD 0 P4D 0
13:06:48.485653 Oops: 0000 [#1] SMP NOPTI
13:06:48.485696 CPU: 3 PID: 16589 Comm: stress-ng-appar Not tainted 6.2.0-1006-kvm #6-Ubuntu
13:06:48.485729 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
13:06:48.485757 RIP: 0010:0xffffffffb55599e8
13:06:48.485788 Code: b6 48 89 4d d0 0f 42 d8 e8 15 4b e6 ff 85 c0 74 19 4c 63 e0 48 83 c4 18 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d e9 1c a8 59 00 <4d> 8b 55 30 49 8d 82 a0 00 00 00 4c 89 55 c0 48 89 c7 48 89 45 c8
13:06:48.485814 RSP: 0018:ffffb23b01037c70 EFLAGS: 00010246
13:06:48.485848 RAX: 0000000000000000 RBX: 00000000000041ed RCX: 0000000000000000
13:06:48.485867 RDX: 0000000000033090 RSI: ffffffffb6e6bca8 RDI: 0000000000000000
13:06:48.485898 RBP: ffffb23b01037cb0 R08: 0000000000000000 R09: 0000000000000000
13:06:48.485947 R10: ffffffffb53a8f4b R11: 0000000000000246 R12: ffffffffb5f00026
13:06:48.485981 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
13:06:48.486006 FS: 00007fb551056740(0000) GS:ffff9e3177cc0000(0000) knlGS:0000000000000000
13:06:48.486033 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
13:06:48.486051 CR2: 0000000000000030 CR3: 0000000103c86000 CR4: 0000000000350ee0
13:06:48.486076 Call Trace:
13:06:48.486102 <TASK>
13:06:48.486126 0xffffffffb555e326
13:06:48.486156 0xffffffffb556b7cb
13:06:48.486169 0xffffffffb555bf9f
13:06:48.486193 0xffffffffb555c0e9
13:06:48.486226 0xffffffffb53832fa
13:06:48.486251 ? 0xffffffffb51f8bd0
13:06:48.486271 0xffffffffb53837d2
13:06:48.486295 0xffffffffb5383878
13:06:48.486320 0xffffffffb5addf37
13:06:48.486351 ? 0xffffffffb5ae24f5
13:06:48.486370 ? 0xffffffffb5addf43
13:06:48.486401 ? 0xffffffffb5ae25b1
13:06:48.486427 ? 0xffffffffb5ae1c3f
13:06:48.486446 0xffffffffb5c000ae
13:06:48.486465 RIP: 0033:0x00007fb550d069e4
13:06:48.486490 Code: 15 39 a4 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 80 3d fd 2b 0f 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
13:06:48.486508 RSP: 002b:00007ffe81ae16a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
13:06:48.486532 RAX: ffffffffffffffda RBX: 0000556e11d97fc0 RCX: 00007fb550d069e4
13:06:48.486564 RDX: 0000000000010d92 RSI: 0000556e11d9a890 RDI: 0000000000000005
13:06:48.486589 RBP: 0000000000010d92 R08: 0000556e11dc7510 R09: 00000000ffffffff
13:06:48.486608 R10: 0000000000000000 R11: 0000000000000202 R12: 0000556e11d9a890
13:06:48.486633 R13: 0000000000000005 R14: 0000000000000000 R15: 00007fb54f37e000
13:06:48.486657 </TASK>
13:06:48.486676 Modules linked in: ip6t_REJECT nf_reject_ipv6 nft_limit nft_chain_nat nf_nat xt_owner xt_hashlimit xt_...

Read more...

Revision history for this message
Colin Ian King (colin-king) wrote :
Download full text (8.3 KiB)

On 6.2.0-21-generic I also get:

sudo ./stress-ng --apparmor 1 --klog-check

stress-ng: error: [1083] klog-check: alert: [66.442338] 'BUG: kernel NULL pointer dereference, address: 0000000000000030'
stress-ng: error: [1083] klog-check: alert: [66.442538] '#PF: supervisor read access in kernel mode'
stress-ng: error: [1083] klog-check: alert: [66.442718] '#PF: error_code(0x0000) - not-present page'
stress-ng: info: [1083] klog-check: warning: [66.443080] 'Oops: 0000 [#1] PREEMPT SMP PTI'
stress-ng: info: [1083] klog-check: warning: [66.443256] 'CPU: 3 PID: 1088 Comm: stress-ng-appar Not tainted 6.2.0-21-generic #21-Ubuntu'
stress-ng: info: [1083] klog-check: warning: [66.443438] 'Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014'
stress-ng: info: [1083] klog-check: warning: [66.443628] 'RIP: 0010:aafs_create.constprop.0+0x7f/0x130'
stress-ng: info: [1083] klog-check: warning: [66.443819] 'Code: 4c 63 e0 48 83 c4 18 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 c3 cc cc cc cc <4d> 8b 55 30 4d 8d ba a0 00 00 00 4c 89 55 c0 4c 89 ff e8 8a 59 a1'
stress-ng: info: [1083] klog-check: warning: [66.444227] 'RSP: 0018:ffffbeb940907bd8 EFLAGS: 00010246'
stress-ng: info: [1083] klog-check: warning: [66.444433] 'RAX: 0000000000000000 RBX: 00000000000041ed RCX: 0000000000000000'
stress-ng: info: [1083] klog-check: warning: [66.444646] 'RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000'
stress-ng: info: [1083] klog-check: warning: [66.444862] 'RBP: ffffbeb940907c18 R08: 0000000000000000 R09: 0000000000000000'
stress-ng: info: [1083] klog-check: warning: [66.445074] 'R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff93db8b18'
stress-ng: info: [1083] klog-check: warning: [66.445291] 'R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000'
stress-ng: info: [1083] klog-check: warning: [66.445503] 'FS: 00007f60f5c07740(0000) GS:ffff9578bbcc0000(0000) knlGS:0000000000000000'
stress-ng: info: [1083] klog-check: warning: [66.445721] 'CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033'
stress-ng: info: [1083] klog-check: warning: [66.445939] 'CR2: 0000000000000030 CR3: 0000000124ffa004 CR4: 0000000000370ee0'
stress-ng: info: [1083] klog-check: warning: [66.446163] 'DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000'
stress-ng: info: [1083] klog-check: warning: [66.446387] 'DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400'
stress-ng: info: [1083] klog-check: warning: [66.446608] 'Call Trace:'
stress-ng: info: [1083] klog-check: warning: [66.446829] ' <TASK>'
stress-ng: info: [1083] klog-check: warning: [66.447059] ' __aafs_profile_mkdir+0x3d6/0x480'
stress-ng: info: [1083] klog-check: warning: [66.447290] ' aa_replace_profiles+0x862/0x1270'
stress-ng: info: [1083] klog-check: warning: [66.447518] ' policy_update+0xe0/0x180'
stress-ng: info: [1083] klog-check: warning: [66.447750] ' profile_replace+0xb9/0x150'
stress-ng: info: [1083] klog-check: warning: [66.447981] ' vfs_write+0xc8/0x410'
stress-ng: info: [1083] klog-check: warning: [66.448213] ' ? kmem_cache_free+0x1e/0x3b0'
stress-ng: info: [...

Read more...

Revision history for this message
Colin Ian King (colin-king) wrote :
Download full text (8.7 KiB)

And with 5.19.0-45-generic:

sudo ./stress-ng --apparmor 1 --klog-check
[sudo] password for cking:
stress-ng: info: [1179] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor
stress-ng: info: [1179] dispatching hogs: 1 apparmor
stress-ng: info: [1180] klog-check: kernel cmdline: 'BOOT_IMAGE=/vmlinuz-5.19.0-45-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro'
stress-ng: error: [1180] klog-check: error: [93.527396] 'AppArmor DFA next/check upper bounds error'
stress-ng: error: [1180] klog-check: error: [93.827976] 'AppArmor DFA state with invalid match flags'
stress-ng: error: [1180] klog-check: error: [93.991395] 'AppArmor DFA next/check upper bounds error'
stress-ng: error: [1180] klog-check: error: [93.992189] 'AppArmor DFA next/check upper bounds error'
stress-ng: error: [1180] klog-check: error: [94.007400] 'AppArmor DFA state with invalid match flags'
stress-ng: error: [1180] klog-check: error: [94.059345] 'AppArmor DFA state with invalid match flags'
stress-ng: error: [1180] klog-check: error: [94.104414] 'AppArmor DFA next/check upper bounds error'
stress-ng: error: [1180] klog-check: alert: [94.128617] 'BUG: kernel NULL pointer dereference, address: 0000000000000130'
stress-ng: error: [1180] klog-check: alert: [94.128644] '#PF: supervisor read access in kernel mode'
stress-ng: error: [1180] klog-check: alert: [94.128659] '#PF: error_code(0x0000) - not-present page'
stress-ng: info: [1180] klog-check: warning: [94.128685] 'Oops: 0000 [#1] PREEMPT SMP PTI'
stress-ng: info: [1180] klog-check: warning: [94.128698] 'CPU: 7 PID: 1185 Comm: stress-ng-appar Not tainted 5.19.0-45-generic #46-Ubuntu'
stress-ng: info: [1180] klog-check: warning: [94.128722] 'Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014'
stress-ng: info: [1180] klog-check: warning: [94.128745] 'RIP: 0010:aa_unpack+0x11f/0x530'
stress-ng: info: [1180] klog-check: warning: [94.128762] 'Code: 00 48 85 c0 0f 84 15 04 00 00 48 8d 75 a8 48 8d 7d b0 4c 8b 7d c0 e8 80 ec ff ff 48 89 c3 48 3d 00 f0 ff ff 0f 87 00 02 00 00 <4c> 8b b0 30 01 00 00 4d 85 f6 0f 84 38 01 00 00 49 8b 86 c8 00 00'
stress-ng: info: [1180] klog-check: warning: [94.128807] 'RSP: 0018:ffffb1fdc0f57ce0 EFLAGS: 00010207'
stress-ng: info: [1180] klog-check: warning: [94.129378] 'RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000'
stress-ng: info: [1180] klog-check: warning: [94.129928] 'RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000'
stress-ng: info: [1180] klog-check: warning: [94.130443] 'RBP: ffffb1fdc0f57d40 R08: 0000000000000000 R09: 0000000000000000'
stress-ng: info: [1180] klog-check: warning: [94.131056] 'R10: 0000000000000000 R11: 0000000000000000 R12: ffffb1fdc0f57da8'
stress-ng: info: [1180] klog-check: warning: [94.131572] 'R13: ffffb1fdc0f57da0 R14: ffff9da384835962 R15: ffff9da384820010'
stress-ng: info: [1180] klog-check: warning: [94.132090] 'FS: 00007fa65a059740(0000) GS:ffff9da3fbdc0000(0000) knlGS:0000000000000000'
stress-ng: info: [1180] klog-check: warning: [94.132652] 'CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033'
stress-ng: info: [1180] klog-check: warning: [94.133206] 'CR2: 0000000...

Read more...

Revision history for this message
Colin Ian King (colin-king) wrote :

5.15.0.75 works fine, no problem, 5.19.0-45 kernel crashes, so issue introduced between 5.15 and 5.19

Revision history for this message
Colin Ian King (colin-king) wrote :

And also occurs in Ubuntu Mantic with 6.3.0-7-generic

Revision history for this message
John Johansen (jjohansen) wrote :

This should be fixed by upstream commit

ec6851ae0ab4 apparmor: fix: kzalloc perms tables for shared dfas

Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks JJ, much appreciated :-)

Revision history for this message
Brian Murray (brian-murray) wrote :

Ubuntu 23.04 (Lunar Lobster) has reached end of life, so this bug will not be fixed for that specific release.

Changed in apparmor (Ubuntu Lunar):
status: New → Won't Fix
Changed in linux (Ubuntu Lunar):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.