Comment 6 for bug 2030978

Revision history for this message
Marietto (marietto2008) wrote :

I've upgraded ubuntu 23.04 to 23.10 and I've got the bug that you said that should have been fixed.

On my Ubuntu 23.10 I'm using kernel 6.5.0-10-generic and I've installed the nvidia driver version 535.129.03. (my nvidia gpu is the RTX 2080 ti ; my cpu is the intel I9)

Not exactly the same bug because the error in the Ubuntu bug report is for a different kernel module. But the underlying cause is probably the same.

Ubuntu is probably at fault here. Possibly the code for the nvidia-uvm module is designed for kernel versions < 6.5, so when Ubuntu upgraded to Linux kernel 6.5, it broke some modules because of changes to UBSAN in Linux 6.5 which causes modules such as nvidia-uvm to need patches to be compatible with Linux 6.5, but either nvidia has not yet provided a version of nvidia-uvm that is compatible with Linux 6.5 or Ubuntu neglected to apply an updated version from nvidia that is compatible with Linux 6.5.

Whois the guilty ? the nvidia or the ubuntu developers ? I didn't see this error on ubuntu 23.04,maybe because it does not use the kernel 6.5 by default,but 23.10 does it.

I see a lot of those errors when I issue the command "dmesg" and any audio-video streamings don't flow.

Log :

[ 15.029102] UBSAN: array-index-out-of-bounds in /var/lib/dkms/nvidia/535.129.03/build/nvidia-uvm/uvm_pmm_gpu.c:829:45

[ 15.031655] index 0 is out of range for type 'uvm_gpu_chunk_t *[*]'
[ 15.034248] CPU: 9 PID: 2571 Comm: ffdetect Tainted: P OE 6.5.0-10-generic #10-Ubuntu
[ 15.034249] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS PRO/Z390 AORUS PRO-CF, BIOS F12g GA9 06/08/2020
[ 15.034250] Call Trace:
[ 15.034251] <TASK>
[ 15.034251] dump_stack_lvl+0x48/0x70
[ 15.034255] dump_stack+0x10/0x20
[ 15.034257] __ubsan_handle_out_of_bounds+0xc6/0x110
[ 15.034259] merge_gpu_chunk+0x57/0x1d0 [nvidia_uvm]
[ 15.034293] free_chunk_with_merges+0x13d/0x180 [nvidia_uvm]
[ 15.034325] free_chunk+0xa4/0xd0 [nvidia_uvm]
[ 15.034355] uvm_pmm_gpu_free+0xbf/0xf0 [nvidia_uvm]
[ 15.034386] phys_mem_deallocate+0x33/0xd0 [nvidia_uvm]
[ 15.034422] uvm_page_tree_put_ptes_async+0x4d5/0x580 [nvidia_uvm]
[ 15.034459] uvm_page_table_range_vec_deinit+0x3e/0xd0 [nvidia_uvm]
[ 15.034494] uvm_va_range_destroy+0x14d/0x590 [nvidia_uvm]
[ 15.034527] ? os_release_spinlock+0x1a/0x30 [nvidia]
[ 15.034792] ? uvm_kvfree+0x30/0x70 [nvidia_uvm]
[ 15.034826] destroy_va_ranges.part.0+0x61/0x90 [nvidia_uvm]
[ 15.034857] uvm_user_channel_detach+0x9e/0xe0 [nvidia_uvm]
[ 15.034886] uvm_api_unregister_channel+0xee/0x1a0 [nvidia_uvm]
[ 15.034915] uvm_ioctl+0x1a04/0x1cd0 [nvidia_uvm]
[ 15.034939] ? uvm_api_unregister_channel+0x134/0x1a0 [nvidia_uvm]
[ 15.034968] ? _copy_to_user+0x25/0x70
[ 15.034970] ? uvm_ioctl+0x5cc/0x1cd0 [nvidia_uvm]
[ 15.034994] ? _raw_spin_lock_irqsave+0xe/0x20
[ 15.034996] ? thread_context_non_interrupt_add+0x13a/0x2c0 [nvidia_uvm]
[ 15.035031] uvm_unlocked_ioctl_entry.part.0+0x7b/0xf0 [nvidia_uvm]
[ 15.035055] ? uvm_thread_context_remove+0x39/0x50 [nvidia_uvm]
[ 15.035091] uvm_unlocked_ioctl_entry+0x6b/0x90 [nvidia_uvm]
[ 15.035115] __x64_sys_ioctl+0xa0/0xf0
[ 15.035116] do_syscall_64+0x59/0x90
[ 15.035118] ? __rseq_handle_notify_resume+0x37/0x70
[ 15.035119] ? exit_to_user_mode_loop+0xe0/0x130
[ 15.035122] ? exit_to_user_mode_prepare+0x9b/0xb0
[ 15.035123] ? syscall_exit_to_user_mode+0x37/0x60
[ 15.035125] ? do_syscall_64+0x68/0x90
[ 15.035126] ? syscall_exit_to_user_mode+0x37/0x60
[ 15.035128] ? do_syscall_64+0x68/0x90
[ 15.035129] ? syscall_exit_to_user_mode+0x37/0x60
[ 15.035130] ? do_syscall_64+0x68/0x90
[ 15.035131] ? do_syscall_64+0x68/0x90
[ 15.035133] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 15.035134] RIP: 0033:0x7f9b9e7238ef
[ 15.035144] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00
00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 15.035145] RSP: 002b:00007fff0ed9b9a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 15.035147] RAX: ffffffffffffffda RBX: 000000000239f8b8 RCX: 00007f9b9e7238ef
[ 15.035147] RDX: 00007fff0ed9ba10 RSI: 000000000000001c RDI: 0000000000000004
[ 15.035148] RBP: 00007fff0ed9ba50 R08: 000000000000242a R09: 0000000000000007
[ 15.035149] R10: 000000000242a3c0 R11: 0000000000000246 R12: 00007fff0ed9ba10
[ 15.035150] R13: 0000000000000004 R14: 0000000002506600 R15: 000000000237c138
[ 15.035151] </TASK>
[ 15.035152] ================================================================================
[ 15.037818] ================================================================================
[ 15.040413] UBSAN: array-index-out-of-bounds in /var/lib/dkms/nvidia/535.129.03/build/nvidia-uvm/uvm_pmm_gpu.c:857:39

[ 15.043033] index 0 is out of range for type 'uvm_gpu_chunk_t *[*]'
[ 15.045636] CPU: 9 PID: 2571 Comm: ffdetect Tainted: P OE 6.5.0-10-generic #10-Ubuntu
[ 15.045639] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS PRO/Z390 AORUS PRO-CF, BIOS F12g GA9 06/08/2020
[ 15.045640] Call Trace:
[ 15.045641] <TASK>
[ 15.045642] dump_stack_lvl+0x48/0x70
[ 15.045647] dump_stack+0x10/0x20
[ 15.045649] __ubsan_handle_out_of_bounds+0xc6/0x110
[ 15.045652] merge_gpu_chunk+0xc6/0x1d0 [nvidia_uvm]
[ 15.045702] free_chunk_with_merges+0x13d/0x180 [nvidia_uvm]
[ 15.045734] free_chunk+0xa4/0xd0 [nvidia_uvm]
[ 15.045765] uvm_pmm_gpu_free+0xbf/0xf0 [nvidia_uvm]
[ 15.045795] phys_mem_deallocate+0x33/0xd0 [nvidia_uvm]
[ 15.045831] uvm_page_tree_put_ptes_async+0x4d5/0x580 [nvidia_uvm]
[ 15.045868] uvm_page_table_range_vec_deinit+0x3e/0xd0 [nvidia_uvm]
[ 15.045904] uvm_va_range_destroy+0x14d/0x590 [nvidia_uvm]
[ 15.045936] ? os_release_spinlock+0x1a/0x30 [nvidia]
[ 15.046201] ? uvm_kvfree+0x30/0x70 [nvidia_uvm]
[ 15.046236] destroy_va_ranges.part.0+0x61/0x90 [nvidia_uvm]
[ 15.046277] uvm_user_channel_detach+0x9e/0xe0 [nvidia_uvm]
[ 15.046315] uvm_api_unregister_channel+0xee/0x1a0 [nvidia_uvm]
[ 15.046354] uvm_ioctl+0x1a04/0x1cd0 [nvidia_uvm]
[ 15.046388] ? uvm_api_unregister_channel+0x134/0x1a0 [nvidia_uvm]
[ 15.046427] ? _copy_to_user+0x25/0x70
[ 15.046429] ? uvm_ioctl+0x5cc/0x1cd0 [nvidia_uvm]
[ 15.046463] ? _raw_spin_lock_irqsave+0xe/0x20
[ 15.046466] ? thread_context_non_interrupt_add+0x13a/0x2c0 [nvidia_uvm]
[ 15.046511] uvm_unlocked_ioctl_entry.part.0+0x7b/0xf0 [nvidia_uvm]
[ 15.046544] ? uvm_thread_context_remove+0x39/0x50 [nvidia_uvm]
[ 15.046589] uvm_unlocked_ioctl_entry+0x6b/0x90 [nvidia_uvm]
[ 15.046622] __x64_sys_ioctl+0xa0/0xf0
[ 15.046625] do_syscall_64+0x59/0x90
[ 15.046627] ? __rseq_handle_notify_resume+0x37/0x70
[ 15.046629] ? exit_to_user_mode_loop+0xe0/0x130
[ 15.046632] ? exit_to_user_mode_prepare+0x9b/0xb0
[ 15.046634] ? syscall_exit_to_user_mode+0x37/0x60
[ 15.046636] ? do_syscall_64+0x68/0x90
[ 15.046638] ? syscall_exit_to_user_mode+0x37/0x60
[ 15.046639] ? do_syscall_64+0x68/0x90
[ 15.046641] ? syscall_exit_to_user_mode+0x37/0x60
[ 15.046642] ? do_syscall_64+0x68/0x90
[ 15.046644] ? do_syscall_64+0x68/0x90
[ 15.046645] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 15.046648] RIP: 0033:0x7f9b9e7238ef
[ 15.046668] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00
00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 15.046669] RSP: 002b:00007fff0ed9b9a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 15.046671] RAX: ffffffffffffffda RBX: 000000000239f8b8 RCX: 00007f9b9e7238ef
[ 15.046672] RDX: 00007fff0ed9ba10 RSI: 000000000000001c RDI: 0000000000000004
[ 15.046673] RBP: 00007fff0ed9ba50 R08: 000000000000242a R09: 0000000000000007
[ 15.046674] R10: 000000000242a3c0 R11: 0000000000000246 R12: 00007fff0ed9ba10
[ 15.046675] R13: 0000000000000004 R14: 0000000002506600 R15: 000000000237c138
[ 15.046677] </TASK>
[ 15.046678] ================================================================================

whois the guilty ? the nvidia or the ubuntu developers ? Everything went well on ubuntu 23.04. I've made a mistake upgrading to Ubuntu 23.10. I have a lot of those errors on the log file and I can't open any audio-video streaming. To fix this ugly bug will take a lot of time for me.