[Lenovo Legion Pro 7 16IRX8H] Boots slowly or not at all when NVIDIA driver is installed

Bug #2055153 reported by Alvin Jinsung Choi
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux-hwe-6.5 (Ubuntu)
Confirmed
Undecided
Unassigned
nvidia-graphics-drivers-525 (Ubuntu)
Confirmed
Undecided
Unassigned
nvidia-graphics-drivers-535 (Ubuntu)
Confirmed
Undecided
Unassigned
nvidia-graphics-drivers-545 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have booting related issue everytime I install NVIDIA-driver on Ubuntu 22.04 So the main problem is everytime I boot, Ubuntu would either

(1) boot fine (25% of the time)
(2) boot takes long time(about 2 min) (50% of the time)
(3) boot fails (25% of the time)

This symptom disappears when I uninstall NVIDIA driver so I'm pretty sure it is related to NVIDIA driver but I don't know how to fix it.

For case (2) (boot takes long time), black screen with

'/dev/nvme0n1p6: clean, *** files, *** blocks'

hang for a long time.

For case (3) (boot fails), blackscreen with

'iwlwifi : invalid buffer destination'
'ACPI BIOS Error (bug): could not resolve symbol [\_TZ.ETMD], AE NOT_FOUND'
'ACPI Error: Aborting method \_SB.IETM._OSC due to previos error (AE_NOT_FOUND)'
'Bluetooth: hci0: Malformed MSFT vendor event: 0x02'
'INFO: task plymouthd: *** blocked for more than *** seconds'
'"echo 0 > /prc/sys/kernel/hung_task_timeout_secs" disables this message'
'INFO: task gpu-manager: *** blocked for more than *** seconds'
'"echo 0 > /prc/sys/kernel/hung_task_timeout_secs" disables this message'

This screen appears and doesn't boot or blackscreen with cursor appears and doesn't boot. It happens quite randomly and it really is frustrating and want to resolve this issue. I've tried various things but it didn't work out. Here are some things I tried

- deleting and reinstalling NVIDIA driver with 'purge, autoremove, apt install'
- Installing different driver version(525, 535, 545)
- trying boot-repair program
- disabling nouveau by modifying /etc/modprobe.d/blacklist.conf
- 'nomodeset' by modifying /etc/default/grub (made it worse.. all boot fails....)
- tried various kernel version with various Ubuntu version (my laptop is very new device so 20.04 had issues with touchpad, wifi, etc, therefore I mainly tried 22.04)

However none of them seemed to resolve the problem I am facing. It showed different symptoms time-to-time but mostly all of them were boot-related. I am using a dualboot with windows

I am kind of lost and kindly ask for help. Here are some of the specification that would be helpful

Laptop: lenovo legion pro 7 16irx8h

$ uname -a
Linux alvin-Legion-Pro-7-16IRX8H 6.5.0-21-generic
#21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 9 13:32:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

$ lspci
00:00.0 Host bridge: Intel Corporation Device a702 (rev 01)
00:01.0 PCI bridge: Intel Corporation Device a70d (rev 01)
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S UHD Graphics (rev 04)
00:04.0 Signal processing controller: Intel Corporation Raptor Lake Dynamic Platform and Thermal Framework Processor Participant (rev 01)
00:06.0 PCI bridge: Intel Corporation Raptor Lake PCIe 4.0 Graphics Port (rev 01)
00:0a.0 Signal processing controller: Intel Corporation Raptor Lake Crashlog and Telemetry (rev 01)
00:14.0 USB controller: Intel Corporation Raptor Lake USB 3.2 Gen 2x2 (20 Gb/s) XHCI Host Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Raptor Lake-S PCH Shared SRAM (rev 11)
00:14.3 Network controller: Intel Corporation Raptor Lake-S PCH CNVi WiFi (rev 11)
00:15.0 Serial bus controller: Intel Corporation Raptor Lake Serial IO I2C Host Controller #0 (rev 11)
00:15.1 Serial bus controller: Intel Corporation Raptor Lake Serial IO I2C Host Controller #1 (rev 11)
00:15.2 Serial bus controller: Intel Corporation Raptor Lake Serial IO I2C Host Controller #2 (rev 11)
00:16.0 Communication controller: Intel Corporation Raptor Lake CSME HECI #1 (rev 11)
00:19.0 Serial bus controller: Intel Corporation Device 7a7c (rev 11)
00:19.1 Serial bus controller: Intel Corporation Device 7a7d (rev 11)
00:1a.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #25 (rev 11)
00:1b.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #17 (rev 11)
00:1b.5 PCI bridge: Intel Corporation Device 7a45 (rev 11)
00:1d.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #9 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Device 7a0c (rev 11)
00:1f.3 Audio device: Intel Corporation Raptor Lake High Definition Audio Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Raptor Lake-S PCH SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Raptor Lake SPI (flash) Controller (rev 11)
01:00.0 VGA compatible controller: NVIDIA Corporation AD104M [GeForce RTX 4080 Max-Q / Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 22bc (rev a1)
06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
07:00.0 PCI bridge: Intel Corporation Device 1133 (rev 02)
08:00.0 PCI bridge: Intel Corporation Device 1133 (rev 02)
08:01.0 PCI bridge: Intel Corporation Device 1133 (rev 02)
08:02.0 PCI bridge: Intel Corporation Device 1133 (rev 02)
08:03.0 PCI bridge: Intel Corporation Device 1133 (rev 02)
09:00.0 USB controller: Intel Corporation Device 1134
3d:00.0 USB controller: Intel Corporation Device 1135
76:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15)

$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd000027E0sv000017AAsd00003B64bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-535-open - distro non-free
driver : nvidia-driver-525 - distro non-free recommended
driver : nvidia-driver-535-server-open - distro non-free
driver : nvidia-driver-545 - distro non-free
driver : nvidia-driver-525-open - distro non-free
driver : nvidia-driver-545-open - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-525-server - distro non-free
driver : nvidia-driver-535 - third-party non-free
driver : xserver-xorg-video-nouveau - distro free builtin

$ nvidia-smi
Mon Feb 26 12:51:30 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 39C P3 N/A / 80W | 6MiB / 12282MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2048 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: xorg 1:7.7+23ubuntu2
ProcVersionSignature: Ubuntu 6.5.0-21.21~22.04.1-generic 6.5.8
Uname: Linux 6.5.0-21-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
.proc.driver.nvidia.capabilities.gpu0: Error: path was not a regular file.
.proc.driver.nvidia.capabilities.mig: Error: path was not a regular file.
.proc.driver.nvidia.gpus.0000.01.00.0: Error: path was not a regular file.
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.suspend: suspend hibernate resume
.proc.driver.nvidia.suspend_depth: default modeset uvm
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 525.147.05 Wed Oct 25 20:27:35 UTC 2023
 GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CasperMD5CheckResult: pass
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Tue Feb 27 21:07:08 2024
DistUpgraded: Fresh install
DistroCodename: jammy
DistroVariant: ubuntu
DkmsStatus: nvidia/525.147.05, 6.5.0-21-generic, x86_64: installed
ExtraDebuggingInterest: Yes
GpuHangFrequency: Continuously
GpuHangReproducibility: Seems to happen randomly
GpuHangStarted: Immediately after installing this version of Ubuntu
GraphicsCard:
 Intel Corporation Raptor Lake-S UHD Graphics [8086:a788] (rev 04) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:3b64]
 NVIDIA Corporation AD104M [GeForce RTX 4080 Max-Q / Mobile] [10de:27e0] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:3b64]
InstallationDate: Installed on 2024-02-25 (1 days ago)
InstallationMedia: Ubuntu 22.04.4 LTS "Jammy Jellyfish" - Release amd64 (20240220)
MachineType: LENOVO 82WQ
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.5.0-21-generic root=UUID=97f8e2ee-d221-478b-8921-2b2450d8bb6d ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/15/2023
dmi.bios.release: 1.42
dmi.bios.vendor: LENOVO
dmi.bios.version: KWCN42WW
dmi.board.asset.tag: NO Asset Tag
dmi.board.name: LNVNB161216
dmi.board.vendor: LENOVO
dmi.board.version: SDK0T76461 WIN
dmi.chassis.asset.tag: NO Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Legion Pro 7 16IRX8H
dmi.ec.firmware.release: 1.42
dmi.modalias: dmi:bvnLENOVO:bvrKWCN42WW:bd09/15/2023:br1.42:efr1.42:svnLENOVO:pn82WQ:pvrLegionPro716IRX8H:rvnLENOVO:rnLNVNB161216:rvrSDK0T76461WIN:cvnLENOVO:ct10:cvrLegionPro716IRX8H:skuLENOVO_MT_82WQ_BU_idea_FM_LegionPro716IRX8H:
dmi.product.family: Legion Pro 7 16IRX8H
dmi.product.name: 82WQ
dmi.product.sku: LENOVO_MT_82WQ_BU_idea_FM_Legion Pro 7 16IRX8H
dmi.product.version: Legion Pro 7 16IRX8H
dmi.sys.vendor: LENOVO
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.113-2~ubuntu0.22.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 23.2.1-1ubuntu3.1~22.04.2
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:21.1.4-2ubuntu1.7~22.04.8
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20210115-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-2build1

Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks for the bug report.

Each time you experience a failed boot followed by a successful boot please run:

  journalctl -b-1 > failedboot.txt

And each time you experience a slow boot please run:

  journalctl -b0 > slowboot.txt

and attach the resulting text files here.

summary: - Xorg freeze
+ Boots slowly or not at all when NVIDIA driver is installed
affects: xorg (Ubuntu) → nvidia-graphics-drivers-535 (Ubuntu)
tags: added: nvidia
Changed in nvidia-graphics-drivers-525 (Ubuntu):
status: New → Incomplete
Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: New → Incomplete
Changed in nvidia-graphics-drivers-545 (Ubuntu):
status: New → Incomplete
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote : Re: Boots slowly or not at all when NVIDIA driver is installed
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The slow boots seem to be due to the Nvidia kernel driver interfering in backlight control, which might also be a kernel bug:

Feb 28 14:13:24 alvin-Legion-Pro-7-16IRX8H systemd-udevd[515]: 0000:00:02.0: Worker [560] processing SEQNUM=4649 is taking a long time
Feb 28 14:13:56 alvin-Legion-Pro-7-16IRX8H systemd[1]: systemd-backlight@backlight:nvidia_0.service: start operation timed out. Terminating.
Feb 28 14:13:56 alvin-Legion-Pro-7-16IRX8H systemd[1]: systemd-backlight@backlight:nvidia_0.service: Main process exited, code=killed, status=15/TERM
Feb 28 14:13:56 alvin-Legion-Pro-7-16IRX8H systemd[1]: systemd-backlight@backlight:nvidia_0.service: Failed with result 'timeout'.
Feb 28 14:13:56 alvin-Legion-Pro-7-16IRX8H systemd[1]: Failed to start Load/Save Screen Backlight Brightness of backlight:nvidia_0.
Feb 28 14:13:57 alvin-Legion-Pro-7-16IRX8H systemd[1]: Starting Load/Save Screen Backlight Brightness of backlight:nvidia_0...
Feb 28 14:13:57 alvin-Legion-Pro-7-16IRX8H systemd[1]: Finished Load/Save Screen Backlight Brightness of backlight:nvidia_0.

The failed boots are some kind of failure to start gdm, although in one case there was also an Nvidia kernel driver crash. Please check for crashes in /var/crash and if found then report them using the ubuntu-bug command. If not found then please enable the debug section in /etc/gdm3/custom.conf, reboot and collect more logs from failed boots.

summary: - Boots slowly or not at all when NVIDIA driver is installed
+ [Lenovo Legion Pro 7 16IRX8H] Boots slowly or not at all when NVIDIA
+ driver is installed
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :

There seems _usr_bin_nautilus.1000.crash file exist in /var/crash. Are you saying to open up new bug report(not this page) with the file attached? Also I checked /etc/gdm3/custom.conf not enabled. Do I still need to enable the debug section?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The nautilus crash is not relevant here, thanks.

Yes please enable debug in /etc/gdm3/custom.conf

Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :

Enabled debug in /etc/gdm3/custom.conf
I will collect more failed logs and upload here. Thanks!

Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :

I tried to run the command

ubuntu-bug /var/crash/YOURFILE.crash

as you proposed but it is not working.
It shows 'The application Files has closed unexpectedly'

Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The only explanations I can see for the failed boots is:

Feb 28 21:20:49 alvin-Legion-Pro-7-16IRX8H /usr/libexec/gdm-x-session[1356]: (WW) NVIDIA(G0): Failed to set the display configuration
Feb 28 21:20:49 alvin-Legion-Pro-7-16IRX8H /usr/libexec/gdm-x-session[1356]: (WW) NVIDIA(G0): - Setting a mode on head 0 failed: Insufficient permissions
Feb 28 21:20:49 alvin-Legion-Pro-7-16IRX8H /usr/libexec/gdm-x-session[1356]: (WW) NVIDIA(G0): - Setting a mode on head 1 failed: Insufficient permissions
Feb 28 21:20:49 alvin-Legion-Pro-7-16IRX8H /usr/libexec/gdm-x-session[1356]: (WW) NVIDIA(G0): - Setting a mode on head 2 failed: Insufficient permissions
Feb 28 21:20:49 alvin-Legion-Pro-7-16IRX8H /usr/libexec/gdm-x-session[1356]: (WW) NVIDIA(G0): - Setting a mode on head 3 failed: Insufficient permissions
Feb 28 21:20:49 alvin-Legion-Pro-7-16IRX8H /usr/libexec/gdm-x-session[1356]: (II) NVIDIA(GPU-0): Deleting GPU-0
Feb 28 21:20:49 alvin-Legion-Pro-7-16IRX8H /usr/libexec/gdm-x-session[1356]: (II) Server terminated successfully (0). Closing log file.

The slow boots still seem to be the backlight issue in comment #10.

Changed in nvidia-graphics-drivers-525 (Ubuntu):
status: Incomplete → New
Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: Incomplete → New
Changed in nvidia-graphics-drivers-545 (Ubuntu):
status: Incomplete → New
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :

So how can I fix it?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

For the slow boots, try experimenting with kernel parameters like acpi_backlight=vendor and for more detailed info see https://wiki.ubuntu.com/Kernel/Debugging/Backlight

For the failed boots I would also recommend trying the kernel parameter: nvidia-drm.modeset=0

For both issues try the new experimental Nvidia driver version 550: https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa

Revision history for this message
Fabian Gasser (gasfab) wrote :

@alvinjinsung were you able to resolve this issue somehow by playing around with the above mentioned suggestions? I'm experiencing the same issue with my new Lenovo Legion Pro 5 16IRX8 on Ubuntu 22.04 when using the nvidia-driver-535 (proprietary, tested). When I switch back to the open source driver the issue also again disappeared, but then my external monitor isn't detected.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-hwe-6.5 (Ubuntu):
status: New → Confirmed
Changed in nvidia-graphics-drivers-525 (Ubuntu):
status: New → Confirmed
Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: New → Confirmed
Changed in nvidia-graphics-drivers-545 (Ubuntu):
status: New → Confirmed
Revision history for this message
Alvin Jinsung Choi (alvinjinsung) wrote :

@Fabian Gasser
After setting kernel parameter 'acpi_backlight=vendor' in /etc/default/grub file, the problem did not appear again until now. I will let you know if it happens again

@Daniel Van Vugt
May I ask what 'acpi_backlight=vendor' do?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It changes the backlight behaviour in the kernel.

To post a comment you must log in.