nvidia-driver-545 (and version 550) randomly hangs with compositing manager on VT change

Bug #2058824 reported by Mikko Rantalainen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-545 (Ubuntu)
New
Undecided
Unassigned

Bug Description

This issue doesn't happen with Nnvidia version 525 or 535 but it happens with 545 and 550.

Steps to reproduce:

- Run window manager or desktop environment without compositor manager. For example XFCE with Settings – Window Manager Tweaks – Compositor and disable Enable display compositing. (The problem is visible with this compositor, too, but getting out of hang with this one is harder.)

- The problem seems to be some kind of race condition because in "only" happens about 70% of the time for me. I'm using package "linux-tools-lowlatency-hwe-22.04" to get kernel with PREEMPT_DYNAMIC which may trigger the race condition easier.

- Run picom as follows to make sure it has no config of any kind that might avoid the problem:

    picom --config /dev/null --show-all-xerrors --log-level=TRACE & sleep 20; killall picom

  Note that this will emit a lot of log messages and picom will be killed after 20 seconds (resulting in no compositor X session which should be usable if not pretty).

- Switch to different virtual terminals with CTRL+ALT+F1, CTRL+ALT+F2, ..., CTRL+ALT+F7 and wait for display to refresh on each terminal.

- Assuming your initial virtual terminal was VT 7 as usual for GUI desktop, you should now see fully black screen with your usual mouse cursor only. Wait for the sleep timer above (20 seconds) to go off and kill picom to restore your screen. If you had compositing enabled in XFCE Window manager, you would be seeing the same issue but getting out of this issue is much harder because XFCE window manager implements the compositing internally and having compositing hang you would have to kill your window manager to get rid of it!

I'm assuming this also happens with other software, too, that simply happens to trigger the racy codepath during VT switch but the above is the best way to reproduce the issue at will. The problem happens pretty often rapidly changing to VT 1 and back to VT 7, too. It may be faster way to reproduce the problem.

When picom hangs, the last line it outputs to stderr is as follows:

[ 2024-03-23 22:24:17.069 draw_callback_impl TRACE ] Render start, frame 416

Normally this should be followed by

[ 2024-03-23 22:24:17.069 draw_callback_impl TRACE ] Render end

on the same millisecond or one later but when NVidia driver hangs on VT switch, this never happens.

As I wrote above, this bug doesn't occur with Nvidia driver version 535 so the bug has been introduced between version 535 and 545. I cannot debug this further because I don't have the source code.

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: nvidia-driver-545 (not installed)
ProcVersionSignature: Ubuntu 6.5.0-26.26.1~22.04.1-lowlatency 6.5.13
Uname: Linux 6.5.0-26-lowlatency x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: unknown
CurrentDesktop: XFCE
Date: Sat Mar 23 22:08:21 2024
EcryptfsInUse: Yes
InstallationDate: Installed on 2019-01-05 (1904 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
SourcePackage: nvidia-graphics-drivers-545
UpgradeStatus: Upgraded to jammy on 2023-08-11 (225 days ago)
modified.conffile..etc.init.d.apport: [modified]
mtime.conffile..etc.init.d.apport: 2022-05-19T12:50:20.029158

Revision history for this message
Mikko Rantalainen (mira) wrote :
Revision history for this message
Mikko Rantalainen (mira) wrote :
Download full text (4.3 KiB)

Here's driver parameters (should be all defaults):

# grep . /sys/module/nvidia*/parameters/*
/sys/module/nvidia_drm/parameters/fbdev:N
/sys/module/nvidia_drm/parameters/modeset:Y
/sys/module/nvidia_modeset/parameters/config_file:(null)
/sys/module/nvidia_modeset/parameters/disable_hdmi_frl:N
/sys/module/nvidia_modeset/parameters/disable_vrr_memclk_switch:N
/sys/module/nvidia_modeset/parameters/fail_malloc:-1
/sys/module/nvidia_modeset/parameters/hdmi_deepcolor:N
/sys/module/nvidia_modeset/parameters/malloc_verbose:N
/sys/module/nvidia_modeset/parameters/opportunistic_display_sync:Y
/sys/module/nvidia_modeset/parameters/output_rounding_fix:Y
/sys/module/nvidia_modeset/parameters/vblank_sem_control:N
/sys/module/nvidia_uvm/parameters/uvm_ats_mode:1
/sys/module/nvidia_uvm/parameters/uvm_block_cpu_to_cpu_copy_with_ce:0
/sys/module/nvidia_uvm/parameters/uvm_channel_gpfifo_loc:auto
/sys/module/nvidia_uvm/parameters/uvm_channel_gpput_loc:auto
/sys/module/nvidia_uvm/parameters/uvm_channel_num_gpfifo_entries:1024
/sys/module/nvidia_uvm/parameters/uvm_channel_pushbuffer_loc:auto
/sys/module/nvidia_uvm/parameters/uvm_conf_computing_channel_iv_rotation_limit:2147483648
/sys/module/nvidia_uvm/parameters/uvm_cpu_chunk_allocation_sizes:2166784
/sys/module/nvidia_uvm/parameters/uvm_debug_enable_push_acquire_info:0
/sys/module/nvidia_uvm/parameters/uvm_debug_enable_push_desc:0
/sys/module/nvidia_uvm/parameters/uvm_debug_prints:0
/sys/module/nvidia_uvm/parameters/uvm_disable_hmm:N
/sys/module/nvidia_uvm/parameters/uvm_downgrade_force_membar_sys:1
/sys/module/nvidia_uvm/parameters/uvm_enable_builtin_tests:0
/sys/module/nvidia_uvm/parameters/uvm_enable_debug_procfs:0
/sys/module/nvidia_uvm/parameters/uvm_enable_va_space_mm:1
/sys/module/nvidia_uvm/parameters/uvm_exp_gpu_cache_peermem:0
/sys/module/nvidia_uvm/parameters/uvm_exp_gpu_cache_sysmem:0
/sys/module/nvidia_uvm/parameters/uvm_fault_force_sysmem:0
/sys/module/nvidia_uvm/parameters/uvm_force_prefetch_fault_support:0
/sys/module/nvidia_uvm/parameters/uvm_global_oversubscription:1
/sys/module/nvidia_uvm/parameters/uvm_leak_checker:0
/sys/module/nvidia_uvm/parameters/uvm_page_table_location:(null)
/sys/module/nvidia_uvm/parameters/uvm_peer_copy:phys
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_batch_count:256
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_mimc_migration_enable:-1
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_momc_migration_enable:-1
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_threshold:256
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_batch_count:256
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_coalesce:1
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_max_batches_per_service:20
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_max_throttle_per_service:5
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_replay_policy:2
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_replay_update_put_ratio:50
/sys/module/nvidia_uvm/parameters/uvm_perf_map_remote_on_eviction:1
/sys/module/nvidia_uvm/parameters/uvm_perf_map_remote_on_native_atomics_fault:0
/sys/module/nvidia_uvm/parameters/uvm_perf_migrate_cpu_preunmap_block_order:2
/sys/module/nvidia_uvm...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.