Xorg Black Screen when upgrading packages (after CUDA installation)

Bug #2027614 reported by Baris Basturk
52
This bug affects 9 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-535 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi,

I am trying to setup CUDA on my system. Followed the instructions and everything seemed to work. Tried nvcc command and it works. After a reboot, tried to update packages with `apt update && apt full-upgrade`. The following is the output before I proceeded:

The following NEW packages will be installed:
  cpp-12 dctrl-tools dkms gcc-12 libasan8 libegl-mesa0:i386 libegl1:i386 libgbm1:i386 libgcc-12-dev libgles2:i386 libopengl0:i386 libtsan2 libwayland-client0:i386 libwayland-server0:i386 nvidia-dkms-535
The following packages have been kept back:
  alsa-ucm-conf gir1.2-adw-1 gjs libadwaita-1-0 libgjs0g libspeechd2 python3-speechd speech-dispatcher speech-dispatcher-audio-plugins speech-dispatcher-espeak-ng ubuntu-advantage-tools
The following packages will be upgraded:
  libnvidia-cfg1-535 libnvidia-common-535 libnvidia-compute-535 libnvidia-compute-535:i386 libnvidia-decode-535 libnvidia-decode-535:i386 libnvidia-encode-535 libnvidia-encode-535:i386 libnvidia-extra-535
  libnvidia-fbc1-535 libnvidia-fbc1-535:i386 libnvidia-gl-535 libnvidia-gl-535:i386 libxnvctrl0 nvidia-compute-utils-535 nvidia-driver-535 nvidia-kernel-common-535 nvidia-kernel-source-535 nvidia-utils-535
  xserver-xorg-video-nvidia-535

After a while, screen went black and had to do a hard reboot. I tried this two times on a fresh Ubuntu 22.04 installation and same thing happened. I am ready to provide any further information if needed.

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: xorg 1:7.7+23ubuntu2
ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
Uname: Linux 5.19.0-46-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
.proc.driver.nvidia.capabilities.gpu0: Error: path was not a regular file.
.proc.driver.nvidia.capabilities.mig: Error: path was not a regular file.
.proc.driver.nvidia.gpus.0000.01.00.0: Error: path was not a regular file.
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.suspend: suspend hibernate resume
.proc.driver.nvidia.suspend_depth: default modeset uvm
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.54.03 Tue Jun 6 22:20:39 UTC 2023
 GCC version:
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CasperMD5CheckResult: pass
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Wed Jul 12 21:06:45 2023
DistUpgraded: Fresh install
DistroCodename: jammy
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes
GpuHangFrequency: This is the first time
GraphicsCard:
 Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:7d86]
 NVIDIA Corporation Device [10de:2782] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: ASUSTeK Computer Inc. Device [1043:88e5]
InstallationDate: Installed on 2023-07-12 (0 days ago)
InstallationMedia: Ubuntu 22.04.2 LTS "Jammy Jellyfish" - Release amd64 (20230223)
MachineType: Micro-Star International Co., Ltd. MS-7D86
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.19.0-46-generic root=UUID=a9c91108-ed9d-4352-a779-fc46a70f4215 ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/23/2023
dmi.bios.release: 5.27
dmi.bios.vendor: American Megatrends International, LLC.
dmi.bios.version: 1.30
dmi.board.asset.tag: Default string
dmi.board.name: MEG Z790 ACE (MS-7D86)
dmi.board.vendor: Micro-Star International Co., Ltd.
dmi.board.version: 1.0
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Micro-Star International Co., Ltd.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInternational,LLC.:bvr1.30:bd03/23/2023:br5.27:svnMicro-StarInternationalCo.,Ltd.:pnMS-7D86:pvr1.0:rvnMicro-StarInternationalCo.,Ltd.:rnMEGZ790ACE(MS-7D86):rvr1.0:cvnMicro-StarInternationalCo.,Ltd.:ct3:cvr1.0:skuDefaultstring:
dmi.product.family: Default string
dmi.product.name: MS-7D86
dmi.product.sku: Default string
dmi.product.version: 1.0
dmi.sys.vendor: Micro-Star International Co., Ltd.
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.113-2~ubuntu0.22.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 22.2.5-0ubuntu0.1~22.04.3
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:21.1.4-2ubuntu1.7~22.04.1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20210115-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-2build1

Revision history for this message
Baris Basturk (realbarisbasturk) wrote :
Revision history for this message
Baris Basturk (realbarisbasturk) wrote (last edit ):

Now I get the following error if I run `apt full-upgrade`:

The following packages have unmet dependencies:
 nvidia-dkms-535 : Depends: nvidia-kernel-common-535 (= 535.54.03-0ubuntu1) but 535.54.03-0ubuntu0.22.04.1 is installed
 nvidia-driver-535 : Depends: libnvidia-extra-535 (= 535.54.03-0ubuntu1) but 535.54.03-0ubuntu0.22.04.1 is installed
                     Depends: nvidia-compute-utils-535 (= 535.54.03-0ubuntu1) but 535.54.03-0ubuntu0.22.04.1 is installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

I feel like I never should've been prompted to install any updates in the first place after CUDA install.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks for the bug report. Next time the screen goes black and after rebooting, please run:

  journalctl -b-1 > prevboot.txt

and attach the resulting text file here.

affects: xorg (Ubuntu) → nvidia-graphics-drivers-535 (Ubuntu)
Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: New → Incomplete
Revision history for this message
Baris Basturk (realbarisbasturk) wrote :

Hi Daniel,

This bug report is created after the issue occured in the first reboot with ubuntu-bug command. Do you still need the prevboot.txt? And if so, can I somehow retrieve it without having to go through a full clean installation?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If the black screen still occurs then please follow the instructions in comment #3. You do not need to reinstall.

Revision history for this message
Baris Basturk (realbarisbasturk) wrote :

Hi Daniel,

Attached the file you requested after repeating the same steps when I first created the bug:
clean ubuntu install, cuda install, reboot, apt full upgrade. It took me about 1h 30m so I hope this helps.

Revision history for this message
Baris Basturk (realbarisbasturk) wrote :

the same thing happens (black screen) if I run apt --fix-broken install, attached the related logs.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If any log messages are related to the black screen then it might be these:

Tem 13 22:00:47 ms-7d86 kernel: [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
Tem 13 22:00:47 ms-7d86 kernel: [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
Tem 13 22:00:47 ms-7d86 kernel: [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
Tem 13 22:00:47 ms-7d86 kernel: [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership

But those messages are common enough in other peoples' logs that I'm skeptical they're related. More likely I think you just have a broken package combination.

Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: Incomplete → New
Revision history for this message
Baris Basturk (realbarisbasturk) wrote :

Hi Daniel,

The apt update command already tells me that I have a broken package combination which I've already said I was suspecting in my previous comments. The problem is this not supposed to happen since I didn't do anything out of ordinary that would cause it. This is a clean installation + CUDA installation so I think anyone with similar enough hardware will face/reproduce this issue.

I hope you could look into why this could happen and provide a better explanation. Your comment above doesn't help me at all or help me resolve the issue. I'll be using docker for CUDA until this issue is resolved. Please let me know if you need any further assistance, I'd be happy to help.

Best regards,
Baris.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: New → Confirmed
Kevin Yeh (kevinyeh)
Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: Confirmed → Won't Fix
status: Won't Fix → Confirmed
Revision history for this message
Tod Hagan (tod222) wrote (last edit ):

Confirmed. The last two (or three) routine system upgrades that had new nvidia packages have black screened during the upgrade.

This is a desktop. I can still log in via ssh from a laptop. Running 'top' shows:

Tasks: 627 total, 4 running, 623 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.0 us, 11.8 sy, 0.0 ni, 87.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 64209.0 total, 17847.4 free, 19228.6 used, 27133.0 buff/cache
MiB Swap: 976.0 total, 856.4 free, 119.6 used. 42589.7 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    513 root 20 0 0 0 0 R 100.0 0.0 9:42.51 nvidia-modeset/
   4193 gdm 20 0 4108568 204204 66536 R 100.0 0.3 12:40.42 gnome-shell
   4795 root 20 0 24.7g 415668 262976 R 100.0 0.6 650:08.69 Xorg

Running 'ps' and grepping 'update':

  11859 ? Sl 0:10 update-notifier
1803241 ? Ss 0:00 /bin/sh /usr/lib/update-notifier/update-notifier-crash
1884567 pts/13 S+ 0:00 grep --color=auto -E update
3857507 ? SNl 0:05 /usr/bin/python3 /usr/bin/update-manager --no-update --no-focus-on-map
m

Doing 'shutdown -r now' logs out ssh but doesn't reboot the machine. After waiting, pressing the hardware reset button on the case has no effect. It doesn't restart until a power down and cold boot.

I created a prevboot.txt from the instructions in an earlier comment; it is attached. It looks interesting, but I'll leave the analysis to the experts.

Revision history for this message
Luis Alvarado (luisalvarado) wrote :

Yes, when installing Nvidia (prior to 525 just in case) turn off the monitor. you would need to time the time it takes to install the driver, update the kernel nvidia modules and more then force a reboot.

This only started happening after Nvidia 525 (Around February 2023), every new version after that creates this problem.

Revision history for this message
Ernst Persson (ernstp) wrote :

This also happened to me with a 2070 and Nvidia driver 535.xy to 535.104.05 .
That was a Wayland session, dual monitors...
I can probably dig up some logs.

Revision history for this message
Ernst Persson (ernstp) wrote :

This is what
  * debian/rules:
    - Do not start the systemd services on installation or upgrade, as this can
      lead to a black screen.
is supposed to fix right?

I updated to nvidia-graphics-drivers-535 (535.104.05-0ubuntu4) on Mantic (yes I know it's in -proposed, I wanted to test the fix) and it still happened during the update.

Revision history for this message
Simon Davis (davis-decent) wrote :

I can confirm exactly the same problem and behavior as Tod Hagan (tod222) wrote on 2023-08-06 #11.

Revision history for this message
Bas van den Heuvel (2-bas) wrote :

I too have exactly the same problem as Tod Hagan (#11) described above

Revision history for this message
dorpm (dorpmueller) wrote :

The same happened on my PC yesterday with upgrade to nvidia-firmware-535-535.113.01

Revision history for this message
Baris Basturk (realbarisbasturk) wrote :

I encountered a black screen during NVIDIA driver/firmware update this morning. This time, it had nothing to do with CUDA. Fortunately, the issue was resolved after a forced shutdown.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.