read_all_sys test in ubuntu_ltp triggers "BUG: kernel NULL pointer dereference" with 5.15.0-1058-nvidia on node hidon
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
New
|
Undecided
|
Unassigned |
Bug Description
Issue found with Jammy 5.15.0-1058-nvidia on node hidon (DGXH100), other NVIDIA nodes are good.
Steps:
sudo apt install -y automake bison build-essential byacc flex git keyutils libacl1-dev libaio-dev libcap-dev libmm-dev libnuma-dev libsctp-dev libselinux1-dev libssl-dev libtirpc-dev pkg-config quota xfslibs-dev xfsprogs
git clone https:/
cd ltp
git reset HEAD 998df1a5aa5026c
make autotools
./configure
make
sudo make install
# Start watching demsg output here
sudo /opt/ltp/
dmesg output:
[ 206.893706] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 206.901552] #PF: supervisor read access in kernel mode
[ 206.907341] #PF: error_code(0x0000) - not-present page
[ 206.913128] PGD 1660ef067 P4D 0
[ 206.916775] Oops: 0000 [#1] SMP NOPTI
[ 206.920909] CPU: 31 PID: 4238 Comm: read_all Tainted: G OE 5.15.0-1058-nvidia #59-Ubuntu
[ 206.931379] Hardware name: NVIDIA DGXH100/DGXH100, BIOS 1.1.3 10/30/2023
[ 206.938925] RIP: 0010:op_
[ 206.945114] Code: 41 57 49 89 f7 41 56 4c 8d 72 10 41 55 49 89 d5 41 54 53 31 db 48 83 ec 18 48 89 7d c0 65 48 8b 04 25 28 00 00 00 48 89 45 d0 <48> 8b 42 18 48 89 45 c8 89 de 4c 8d 45 c8 b9 40 00 00 00 4c 89 ff
[ 206.966209] RSP: 0018:ff3645a473
[ 206.972099] RAX: bf9049f9dfedf700 RBX: 0000000000000000 RCX: 0000000000000000
[ 206.980130] RDX: 0000000000000000 RSI: ff3332a81111e000 RDI: ff3333a809ae6040
[ 206.988158] RBP: ff3645a473cbfc90 R08: ff3333a809ae6040 R09: ff3332a81111e000
[ 206.996183] R10: 000000000000000b R11: 0000000000000000 R12: ffffffffa3cd1fe0
[ 207.004215] R13: 0000000000000000 R14: 0000000000000010 R15: ff3332a81111e000
[ 207.012248] FS: 00007fb56afe374
[ 207.021356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 207.027826] CR2: 0000000000000018 CR3: 000000011086a006 CR4: 0000000000771ee0
[ 207.035858] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 207.043890] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 207.051921] PKRU: 55555554
[ 207.054978] Call Trace:
[ 207.057748] <TASK>
[ 207.060125] ? show_trace_
[ 207.065043] ? show_trace_
[ 207.069952] ? wq_op_config_
[ 207.075253] ? show_regs.
[ 207.079774] ? __die_body.
[ 207.084004] ? __die+0x2b/0x37
[ 207.087454] ? page_fault_
[ 207.092085] ? memcg_slab_
[ 207.097783] ? __wake_up+0x13/0x20
[ 207.101632] ? do_user_
[ 207.106543] ? exc_page_
[ 207.110977] ? asm_exc_
[ 207.115700] ? op_cap_
[ 207.121197] wq_op_config_
[ 207.126303] dev_attr_
[ 207.130333] sysfs_kf_
[ 207.134857] kernfs_
[ 207.139087] seq_read_
[ 207.143321] kernfs_
[ 207.148035] new_sync_
[ 207.152269] vfs_read+
[ 207.156008] ksys_read+0x67/0xf0
[ 207.159653] __x64_sys_
[ 207.163781] x64_sys_
[ 207.168110] do_syscall_
[ 207.172142] ? exit_to_
[ 207.177545] ? syscall_
[ 207.182941] ? x64_sys_
[ 207.187449] ? do_syscall_
[ 207.191676] ? exit_to_
[ 207.197077] ? syscall_
[ 207.202477] ? x64_sys_
[ 207.206998] ? do_syscall_
[ 207.211226] ? do_syscall_
[ 207.215455] entry_SYSCALL_
[ 207.221142] RIP: 0033:0x7fb56b0fa7e2
[ 207.225176] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 8a b4 0c 00 e8 a5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 207.246269] RSP: 002b:00007ffdb8
[ 207.254787] RAX: ffffffffffffffda RBX: 00007fb56afab028 RCX: 00007fb56b0fa7e2
[ 207.262819] RDX: 00000000000003ff RSI: 00007ffdb852f570 RDI: 0000000000000003
[ 207.270850] RBP: 0000563101392168 R08: 0000000000000000 R09: 00007ffdb852ec30
[ 207.278881] R10: 00007ffdb85ca170 R11: 0000000000000246 R12: 000056310137f012
[ 207.286912] R13: 000056310137f06f R14: 000056310222bab0 R15: 00007fb56afa7000
[ 207.294944] </TASK>
[ 207.297412] Modules linked in: nvidia_uvm(O) nvidia_drm(O) nvidia_modeset(O) intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_
[ 207.297467] sha256_ssse3 mlxfw(OE) sha1_ssse3 sysimgblt psample fb_sys_fops aesni_intel cec tls ixgbe crypto_simd nvme xhci_pci cryptd i2c_i801 rc_core mlx_compat(OE) xfrm_algo dca intel_pmt i2c_ismt i2c_smbus pci_hyperv_intf drm xhci_pci_renesas mdio nvme_core wmi pinctrl_emmitsburg
[ 207.423040] CR2: 0000000000000018
[ 207.426783] ---[ end trace 7e35f51fec2ac5d9 ]---
5.15.0-1054-nvidia Looks ok on this system.
description: | updated |