idxd: NULL pointer dereference reading wq op_config attribute
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Undecided
|
Jacob Martin | ||
Jammy |
Fix Released
|
Medium
|
Jacob Martin | ||
linux-nvidia (Ubuntu) |
Invalid
|
Undecided
|
Jacob Martin | ||
Jammy |
Fix Released
|
Undecided
|
Jacob Martin |
Bug Description
SRU Justification
[Impact]
Systems that use the Intel Data Accelerator Driver (IDXD) may see a kernel NULL pointer dereference when reading the op_config attribute of an idxd WQ, if WQs do not offer the op_config capability.
On a DGXH100 system, this can be reproduced by running:
$ cat /sys/devices/
This affects 5.15.0-112-generic, and derivative kernels based on that generic version.
[Fix]
Author: Jacob Martin <email address hidden>
Date: Tue Jun 11 11:48:32 2024 -0500
UBUNTU: SAUCE: dmaengine: idxd: set is_visible member of idxd_wq_
BugLink: ...
The backport of commit b0325aefd398 ("dmaengine: idxd: add WQ operation
cap restriction support") for K5.15 omitted a line setting the
is_visible callback of idxd_wq_
idxd_
This results in the op_config attribute being accessible from userspace
when the underlying wq->opcap_bmap pointer used to service reads from it
is uninitialized, leading to a NULL pointer dereference when the
op_config attribute is read. Resolve this by setting the is_visible
callback as the upstream commit does.
Signed-off-by: Jacob Martin <email address hidden>
This patch adds a line setting the is_visible callback of idxd_wq_
[Test Case]
Verified that the patch "UBUNTU: SAUCE: dmaengine: idxd: set is_visible member of idxd_wq_
[Regression Potential]
There is a low risk of regression:
* this is specific to systems using IDXD.
* this patch brings us closer in-line with the upstream change.
[Other]
The Mantic 6.5 and Noble 6.8 kernels already have the upstream version of patch b0325aefd398 ("dmaengine: idxd: add WQ operation cap restriction support") as it was introduced in v6.1. These kernels set the is_visible attribute, so they are unaffected by this issue. Only Jammy K5.15 needs this fix.
-------
On a DGXH100 system, this can be reproduced by running:
$ cat /sys/devices/
$ dmesg
...
[ 236.620986] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 236.628829] #PF: supervisor read access in kernel mode
[ 236.634615] #PF: error_code(0x0000) - not-present page
[ 236.640404] PGD 1eff19067 P4D 0
[ 236.644049] Oops: 0000 [#1] SMP NOPTI
[ 236.648180] CPU: 117 PID: 8857 Comm: cat Tainted: G OE 5.15.0-112-generic #122-Ubuntu
[ 236.658361] Hardware name: NVIDIA DGXH100/DGXH100, BIOS 1.1.3 10/30/2023
[ 236.665901] RIP: 0010:op_
[ 236.672095] Code: 41 57 49 89 f7 41 56 4c 8d 72 10 41 55 49 89 d5 41 54 53 31 db 48 83 ec 18 48 89 7d c0 65 48 8b 04 25 28 00 00 00 48 89 45 d0 <48> 8b 42 18 48 89 45 c8 89 de 4c 8d 45 c8 b9 40 00 00 00 4c 89 ff
[ 236.693194] RSP: 0018:ff85e6e6b4
[ 236.699084] RAX: a91c7a5c2727bd00 RBX: 0000000000000000 RCX: 0000000000000000
[ 236.707118] RDX: 0000000000000000 RSI: ff401fb55265d000 RDI: ff4020b52b7f8040
[ 236.715146] RBP: ff85e6e6b43f3c38 R08: ff4020b52b7f8040 R09: ff401fb55265d000
[ 236.723170] R10: 000000000000000b R11: 0000000000000000 R12: ffffffffb9ad1f60
[ 236.731198] R13: 0000000000000000 R14: 0000000000000010 R15: ff401fb55265d000
[ 236.739221] FS: 00007fdae7c9974
[ 236.748328] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 236.754800] CR2: 0000000000000018 CR3: 000000017be08004 CR4: 0000000000771ee0
[ 236.762836] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 236.770863] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 236.778893] PKRU: 55555554
[ 236.781954] Call Trace:
[ 236.784720] <TASK>
[ 236.787098] ? show_trace_
[ 236.792017] ? show_trace_
[ 236.796929] ? wq_op_config_
[ 236.802231] ? show_regs.
[ 236.806743] ? __die_body.
[ 236.810969] ? __die+0x2b/0x37
[ 236.814405] ? page_fault_
[ 236.819029] ? do_user_
[ 236.823937] ? page_counter_
[ 236.829143] ? exc_page_
[ 236.833569] ? asm_exc_
[ 236.838287] ? op_cap_
[ 236.843785] wq_op_config_
[ 236.848891] dev_attr_
[ 236.852925] sysfs_kf_
[ 236.857448] kernfs_
[ 236.861678] seq_read_
[ 236.865910] ? __alloc_
[ 236.870339] kernfs_
[ 236.875058] new_sync_
[ 236.879279] vfs_read+
[ 236.883022] ksys_read+0x67/0xf0
[ 236.886665] __x64_sys_
[ 236.890796] x64_sys_
[ 236.895130] do_syscall_
[ 236.899167] ? handle_
[ 236.903686] ? do_user_
[ 236.908599] ? do_syscall_
[ 236.912828] ? exit_to_
[ 236.918232] ? irqentry_
[ 236.923726] ? irqentry_
[ 236.927957] ? exc_page_
[ 236.932378] entry_SYSCALL_
[ 236.938071] RIP: 0033:0x7fdae7db07e2
[ 236.942107] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 8a b4 0c 00 e8 a5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 236.963206] RSP: 002b:00007ffd26
[ 236.971720] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fdae7db07e2
[ 236.979747] RDX: 0000000000020000 RSI: 00007fdae792e000 RDI: 0000000000000003
[ 236.987780] RBP: 00007fdae792e000 R08: 00007fdae792d010 R09: 00007fdae792d010
[ 236.995812] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000
[ 237.003843] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[ 237.011876] </TASK>
[ 237.014349] Modules linked in: intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_
[ 237.014410] aesni_intel crypto_simd psample cryptd ixgbe cec tls xfrm_algo rc_core mlx_compat(OE) xhci_pci dca i2c_i801 nvme intel_pmt drm pci_hyperv_intf mdio xhci_pci_renesas i2c_smbus i2c_ismt nvme_core wmi pinctrl_emmitsburg
[ 237.134526] CR2: 0000000000000018
[ 237.138263] ---[ end trace 58ef1dd45abd6934 ]---
[ 237.461968] RIP: 0010:op_
[ 237.468149] Code: 41 57 49 89 f7 41 56 4c 8d 72 10 41 55 49 89 d5 41 54 53 31 db 48 83 ec 18 48 89 7d c0 65 48 8b 04 25 28 00 00 00 48 89 45 d0 <48> 8b 42 18 48 89 45 c8 89 de 4c 8d 45 c8 b9 40 00 00 00 4c 89 ff
[ 237.489246] RSP: 0018:ff85e6e6b4
[ 237.495128] RAX: a91c7a5c2727bd00 RBX: 0000000000000000 RCX: 0000000000000000
[ 237.503158] RDX: 0000000000000000 RSI: ff401fb55265d000 RDI: ff4020b52b7f8040
[ 237.511187] RBP: ff85e6e6b43f3c38 R08: ff4020b52b7f8040 R09: ff401fb55265d000
[ 237.519217] R10: 000000000000000b R11: 0000000000000000 R12: ffffffffb9ad1f60
[ 237.527246] R13: 0000000000000000 R14: 0000000000000010 R15: ff401fb55265d000
[ 237.535275] FS: 00007fdae7c9974
[ 237.544379] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 237.550845] CR2: 0000000000000018 CR3: 000000017be08004 CR4: 0000000000771ee0
[ 237.558875] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 237.566904] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 237.574933] PKRU: 55555554
CVE References
Changed in linux (Ubuntu Jammy): | |
assignee: | nobody → Jacob Martin (jacobmartin) |
Changed in linux-nvidia (Ubuntu): | |
assignee: | nobody → Jacob Martin (jacobmartin) |
Changed in linux-nvidia (Ubuntu Jammy): | |
assignee: | nobody → Jacob Martin (jacobmartin) |
Changed in linux (Ubuntu): | |
status: | New → In Progress |
Changed in linux (Ubuntu Jammy): | |
status: | New → In Progress |
Changed in linux-nvidia (Ubuntu): | |
status: | New → In Progress |
Changed in linux-nvidia (Ubuntu Jammy): | |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
status: | In Progress → Invalid |
Changed in linux (Ubuntu Jammy): | |
importance: | Undecided → Medium |
Changed in linux-nvidia (Ubuntu): | |
status: | In Progress → Invalid |
Changed in linux (Ubuntu Jammy): | |
status: | In Progress → Fix Committed |
Changed in linux-nvidia (Ubuntu Jammy): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-jammy-linux-nvidia removed: verification-needed-jammy-linux-nvidia |
tags: |
added: verification-done-jammy-linux-nvidia-tegra verification-done-jammy-linux-nvidia-tegra-igx removed: verification-needed-jammy-linux-nvidia-tegra verification-needed-jammy-linux-nvidia-tegra-igx |
This bug is awaiting verification that the linux-nvidia/ 5.15.0- 1060.61 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification- needed- jammy-linux- nvidia' to 'verification- done-jammy- linux-nvidia' . If the problem still exists, change the tag 'verification- needed- jammy-linux- nvidia' to 'verification- failed- jammy-linux- nvidia' .
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/ /wiki.ubuntu. com/Testing/ EnableProposed for documentation how to enable and use -proposed. Thank you!