kernel oops in pick_next_task_fair in 6.8.1-1002-realtime kernel

Bug #2068615 reported by Colin Ian King
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-realtime
New
Medium
Unassigned

Bug Description

Ubuntu Noble, Real Time kernel:

cking@noble-amd64-efi:~$ uname -a
Linux noble-amd64-efi 6.8.1-1002-realtime #2-Ubuntu SMP PREEMPT_RT Tue May 21 21:13:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

How to reproduce issue:

git clone https://github.com/ColinIanKing/stress-ng
cd stress-ng
make clean; make -j 8

sudo ./stress-ng --seq 16 -t 20 --metrics --vmstat 5 -x apparmor,prio-inv --progress --pathological

Note that this oops'd while executing the schedpolicy stressor, but just running this stressor by itself doesn't seem to trigger the oops per se, so it may be something to do with stressors being run prior to the schedpolicy stressor.

...
stress-ng: info: [8315] starting schedpolicy, 224 of 330 (67.88%), 16 instances, finish at 13:11:48 2024-06-06
stress-ng: info: [8318] vmstat: 16 0 70076 3422692 158820 167256 4 0 17 0 65858 266843 34 57 9 0 0
[ 4943.415061] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[ 4943.415069] #PF: supervisor read access in kernel mode
[ 4943.415071] #PF: error_code(0x0000) - not-present page
[ 4943.415072] PGD 0 P4D 0
[ 4943.415075] Oops: 0000 [#1] PREEMPT_RT SMP PTI
[ 4943.415079] CPU: 4 PID: 690936 Comm: kworker/u33:4 Not tainted 6.8.1-1002-realtime #2-Ubuntu
[ 4943.415083] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.02-2 03/11/2024
[ 4943.415085] Workqueue: 0x0 (ttm)
[ 4943.415158] RIP: 0010:pick_next_task_fair+0x91/0x630
[ 4943.415165] Code: 93 00 00 00 49 81 bd b0 02 00 00 00 d3 8e bd 75 62 4d 89 fe eb 27 4c 89 f7 e8 3b bf ff ff 84 c0 75 41 4c 89 f7 e8 8f 3e ff ff <4c> 8b b0 a0 00 00 00 48 89 c3 4d 85 f6 0f 84 fe 00 00 00 49 8b 46
[ 4943.415168] RSP: 0000:ffffa2ccc8fcfd38 EFLAGS: 00010046
[ 4943.415172] RAX: 0000000000000000 RBX: ffffa2ccc8fcfe10 RCX: 0000000000000000
[ 4943.415173] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 4943.415175] RBP: ffffa2ccc8fcfd78 R08: 0000000000000000 R09: 0000000000000000
[ 4943.415177] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9395fbc35300
[ 4943.415178] R13: ffff93958e0c5200 R14: ffff9395fbc35400 R15: ffff9395fbc35400
[ 4943.415181] FS: 0000000000000000(0000) GS:ffff9395fbc00000(0000) knlGS:0000000000000000
[ 4943.415183] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4943.415185] CR2: 00000000000000a0 CR3: 000000014b696006 CR4: 0000000000370ef0
[ 4943.415192] Call Trace:
[ 4943.415194] <TASK>
[ 4943.415196] ? show_regs+0x6d/0x80
[ 4943.415201] ? __die+0x24/0x80
[ 4943.415204] ? page_fault_oops+0x99/0x1c0
[ 4943.415210] ? do_user_addr_fault+0x2ed/0x6b0
[ 4943.415214] ? exc_page_fault+0x83/0x1b0
[ 4943.415218] ? asm_exc_page_fault+0x27/0x30
[ 4943.415225] ? pick_next_task_fair+0x91/0x630
[ 4943.415230] ? pick_next_task_fair+0x91/0x630
[ 4943.415235] pick_next_task+0x5f/0xce0
[ 4943.415242] __schedule+0x124/0x6f0
[ 4943.415246] ? trace_preempt_off+0x1a/0x70
[ 4943.415250] ? process_one_work+0x1a1/0x350
[ 4943.415256] schedule+0x38/0x120
[ 4943.415260] worker_thread+0x1d6/0x440
[ 4943.415265] ? __pfx_worker_thread+0x10/0x10
[ 4943.415269] kthread+0xfe/0x130
[ 4943.415273] ? __pfx_kthread+0x10/0x10
[ 4943.415276] ret_from_fork+0x44/0x70
[ 4943.415280] ? __pfx_kthread+0x10/0x10
[ 4943.415283] ret_from_fork_asm+0x1b/0x30
[ 4943.415290] </TASK>
[ 4943.415291] Modules linked in: snd_seq_dummy tls cuse uhid snd_seq snd_seq_device userio hci_vhci bluetooth ecdh_generic ecc vfio_iommu_type1 vfio iommufd nvram vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb tap dccp_ipv4 dccp atm pcbc lrw chacha_generic chacha_x86_64 libchacha xxhash_generic xcbc wp512 vmac sm3_generic sm3_avx_x86_64 sm3 poly1305_generic poly1305_x86_64 nhpoly1305_avx2 nhpoly1305_sse2 nhpoly1305 libpoly1305 michael_mic md4 streebog_generic rmd160 cmac algif_rng twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic fcrypt cast6_avx_x86_64 cast6_generic cast5_avx_x86_64 cast5_generic cast_common camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 blowfish_generic blowfish_x86_64 blowfish_common algif_skcipher algif_hash aria_aesni_avx2_x86_64 aria_aesni_avx_x86_64 aria_generic sm4_generic sm4_aesni_avx2_x86_64 sm4_aesni_avx_x86_64 sm4 ccm
[ 4943.415418] des3_ede_x86_64 des_generic libdes authenc aegis128 aegis128_aesni algif_aead af_alg qrtr cfg80211 binfmt_misc intel_rapl_msr intel_rapl_common intel_pmc_core intel_vsec pmt_telemetry pmt_class nls_iso8859_1 kvm_intel kvm irqbypass rapl snd_hda_codec_generic i2c_i801 i2c_smbus snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec lpc_ich snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore joydev qxl drm_ttm_helper ttm input_leds mac_hid serio_raw dm_multipath msr efi_pstore nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic usbhid hid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 xhci_pci psmouse xhci_pci_renesas ahci virtio_rng libahci aesni_intel crypto_simd cryptd
[ 4943.415498] CR2: 00000000000000a0
[ 4943.415501] ---[ end trace 0000000000000000 ]---
[ 4943.450926] hrtimer: interrupt took 5190862 ns

Changed in linux (Ubuntu):
importance: Undecided → Medium
affects: linux (Ubuntu) → ubuntu-realtime
Revision history for this message
Kevin Becker (kevinbecker) wrote :

Thanks for this one too, Colin. I'll see if I can reproduce this as well. Can you provide more information on the system where you got this kernel oops? Was it a VM? Can you give the VM's configuration?

Revision history for this message
Kevin Becker (kevinbecker) wrote (last edit ):

I have not been able to reproduce this one yet, neither in bare metal, LXD VMs (multipass) or QEMU VMs, in either amd64 nor arm64. I'll continue working on this issue to try to reproduce it.

Revision history for this message
Colin Ian King (colin-king) wrote :

any progress?

Revision history for this message
Kevin Becker (kevinbecker) wrote :

I haven't reproduced this one yet, but I will retry on the latest kernel soon.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.