dev test in ubuntu_stress_smoke_test hang on some nodes with 4.15 (unable to handle kernel NULL pointer dereference)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Stress-ng |
Invalid
|
Undecided
|
Unassigned | ||
ubuntu-kernel-tests |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Issue found on 4.15.0-144-generic with node "akis"
09:24:25 DEBUG| [stdout] dccp RETURNED 0
09:24:25 DEBUG| [stdout] dccp PASSED
09:24:25 DEBUG| [stdout] dentry STARTING
09:24:27 DEBUG| [stdout] dentry RETURNED 0
09:24:27 DEBUG| [stdout] dentry PASSED
09:24:27 DEBUG| [stdout] dev STARTING
^Test hang here.
The dev test will hang with Oops in dmesg:
May 21 09:24:29 akis kernel: [ 366.747371] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
May 21 09:24:29 akis kernel: [ 366.752640] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:29 akis kernel: [ 366.755208] IP: knem_miscdev_
May 21 09:24:29 akis kernel: [ 366.764818] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:29 akis kernel: [ 366.767987] PGD 0 P4D 0
May 21 09:24:29 akis kernel: [ 366.767989] Oops: 0000 [#1] SMP PTI
May 21 09:24:29 akis kernel: [ 366.767990] Modules linked in: cuse snd_seq snd_seq_device snd_timer snd soundcore dccp_ipv4 dccp ipx p8023 psnap atm p8022 llc algif_rng algif_aead anubis fcrypt khazad seed tea cmac md4 michael_mic poly1305_x86_64 poly1305_generic rmd128 rmd160 rmd256 rmd320 sha3_generic sm3_generic tgr192 wp512 algif_hash chacha20_x86_64 chacha20_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic des3_ede_x86_64 des_generic salsa20_generic camellia_generic camellia_aesni_avx2 camellia_
May 21 09:24:29 akis kernel: [ 366.779064] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:29 akis kernel: [ 366.782008] skx_edac x86_pkg_
May 21 09:24:29 akis kernel: [ 366.858437] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:29 akis kernel: [ 366.861005] crypto_simd i2c_algo_bit fb_sys_fops mlx_compat(OE) glue_helper cryptd ptp nvme drm pps_core nvme_core uas usb_storage
May 21 09:24:30 akis kernel: [ 366.940077] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:30 akis kernel: [ 366.951200] CPU: 12 PID: 52783 Comm: stress-ng Tainted: G OE 4.15.0-144-generic #148-Ubuntu
May 21 09:24:30 akis kernel: [ 366.951201] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.17 10/11/2018
May 21 09:24:30 akis kernel: [ 366.951203] RIP: 0010:knem_
May 21 09:24:30 akis kernel: [ 366.951204] RSP: 0018:ffffbb0e1c
May 21 09:24:30 akis kernel: [ 366.961357] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:30 akis kernel: [ 366.968664] RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000002
May 21 09:24:30 akis kernel: [ 366.968665] RDX: 0000000000000019 RSI: ffffbb0e1c22fc30 RDI: ffff8ed103c1cd00
May 21 09:24:30 akis kernel: [ 366.968665] RBP: ffffbb0e1c22fad0 R08: ffff8ed103c1cd01 R09: 0000000000000016
May 21 09:24:30 akis kernel: [ 366.968666] R10: ffff8ed103c1cd38 R11: 0000000000000000 R12: 0000000000000000
May 21 09:24:30 akis kernel: [ 366.968666] R13: 0000000000000000 R14: ffffbb0e1c22fb3c R15: ffff8ed103c1cd00
May 21 09:24:30 akis kernel: [ 366.968667] FS: 00007f8b4fea60c
May 21 09:24:30 akis kernel: [ 366.968669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 21 09:24:30 akis kernel: [ 366.976781] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:30 akis kernel: [ 366.981447] CR2: 0000000000000010 CR3: 000000bc44536002 CR4: 00000000007606e0
May 21 09:24:30 akis kernel: [ 366.981448] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 21 09:24:30 akis kernel: [ 366.981449] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 21 09:24:30 akis kernel: [ 366.981449] PKRU: 55555554
May 21 09:24:30 akis kernel: [ 366.981450] Call Trace:
May 21 09:24:30 akis kernel: [ 366.981456] do_sys_
May 21 09:24:30 akis kernel: [ 366.988574] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:30 akis kernel: [ 366.994669] ? misc_open+
May 21 09:24:30 akis kernel: [ 366.994673] ? chrdev_
May 21 09:24:30 akis kernel: [ 367.017883] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:30 akis kernel: [ 367.023171] ? mntput+0x24/0x40
May 21 09:24:30 akis kernel: [ 367.023173] ? terminate_
May 21 09:24:30 akis kernel: [ 367.023174] ? path_openat+
May 21 09:24:30 akis kernel: [ 367.023179] ? timerqueue_
May 21 09:24:30 akis kernel: [ 367.031643] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:30 akis kernel: [ 367.038388] ? do_filp_
May 21 09:24:30 akis kernel: [ 367.038390] ? _copy_to_
May 21 09:24:30 akis kernel: [ 367.038393] ? cp_new_
May 21 09:24:30 akis kernel: [ 367.045982] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:30 akis kernel: [ 367.052125] ? do_vfs_
May 21 09:24:30 akis kernel: [ 367.052127] ? SYSC_newfstat+
May 21 09:24:30 akis kernel: [ 367.052128] SyS_poll+0x9b/0x140
May 21 09:24:30 akis kernel: [ 367.052128] ? SyS_poll+0x9b/0x140
May 21 09:24:30 akis kernel: [ 367.052131] do_syscall_
May 21 09:24:30 akis kernel: [ 367.052137] entry_SYSCALL_
May 21 09:24:30 akis kernel: [ 367.060366] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
May 21 09:24:30 akis kernel: [ 367.066389] RIP: 0033:0x7f8b4e98acb9
May 21 09:24:30 akis kernel: [ 367.066389] RSP: 002b:00007ffd66
May 21 09:24:30 akis kernel: [ 367.066390] RAX: ffffffffffffffda RBX: 00007ffd6617b938 RCX: 00007f8b4e98acb9
May 21 09:24:30 akis kernel: [ 367.066391] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00007ffd6617b938
May 21 09:24:30 akis kernel: [ 367.066391] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000036
May 21 09:24:30 akis kernel: [ 367.066391] R10: 00007ffd6617b8a0 R11: 0000000000000293 R12: 0000000000000000
May 21 09:24:30 akis kernel: [ 367.066394] R13: 0000000000000000 R14: 0000000000000016 R15: 00007ffd6617fe60
Related branches
- Ian May (community): Approve
- CE Hyperscale: Pending requested
-
Diff: 14 lines (+3/-0)1 file modifiedlate.sh (+3/-0)
description: | updated |
tags: | added: 4.15 bionic kqa-blocker sru-20210510 ubuntu-stress-smoke-test |
The crash here is in a module provided by the Mellanox OFED stack (knem). We've seen crashes in this module before and have not been able to reproduce them w/o MOFED, so unlikely a bug in the Ubuntu kernel.