System freeze when plugging USB-C monitor into a hybrid Intel/Nvidia laptop: [drm:__nv_drm_gem_nvkms_memory_prime_get_sg_table [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to get memory pages for NvKmsKapiMemory

Bug #1960865 reported by Aleksandr Panzin
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mutter
New
Unknown
nvidia-graphics-drivers-470 (Ubuntu)
Confirmed
High
Unassigned

Bug Description

I recently upgraded my NVidia drivers on my ThinkPad P1 Gen 3.

If I start Ubuntu on Wayland session and plug in my USB-C monitor the system reliably completely freeze. The freeze is also reproducible if the screen is plugged in during startup and trying to login into a Wayland session.

Only Wayland session experiences a crash. The crash happens ONLY if the USB-C monitor is plugged in.

USB-C monitor is also a USB3 hub.

Hardware seems to work fine, as Windows shows no issues with the setup.

ProblemType: Bug
DistroRelease: Ubuntu 21.10
Package: xorg 1:7.7+22ubuntu2
ProcVersionSignature: Ubuntu 5.13.0-30.33-generic 5.13.19
Uname: Linux 5.13.0-30-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
.proc.driver.nvidia.capabilities.gpu0: Error: path was not a regular file.
.proc.driver.nvidia.capabilities.mig: Error: path was not a regular file.
.proc.driver.nvidia.gpus.0000.01.00.0: Error: path was not a regular file.
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.suspend: suspend hibernate resume
.proc.driver.nvidia.suspend_depth: default modeset uvm
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 470.103.01 Thu Jan 6 12:10:04 UTC 2022
 GCC version: gcc version 11.2.0 (Ubuntu 11.2.0-7ubuntu2)
ApportVersion: 2.20.11-0ubuntu71
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CasperMD5CheckResult: unknown
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Mon Feb 14 16:39:34 2022
DistUpgraded: 2021-10-14 18:36:11,564 DEBUG Running PostInstallScript: '/usr/lib/ubuntu-advantage/upgrade_lts_contract.py'
DistroCodename: impish
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, including running git bisection searches
GpuHangFrequency: Continuously
GpuHangReproducibility: Yes, I can easily reproduce it
GpuHangStarted: Within the last few days
GraphicsCard:
 Intel Corporation CometLake-H GT2 [UHD Graphics] [8086:9bc4] (rev 05) (prog-if 00 [VGA controller])
   Subsystem: Lenovo CometLake-H GT2 [UHD Graphics] [17aa:22c1]
 NVIDIA Corporation TU117GLM [Quadro T1000 Mobile] [10de:1fb9] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: Lenovo TU117GLM [Quadro T1000 Mobile] [17aa:22c1]
InstallationDate: Installed on 2020-10-05 (497 days ago)
InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
MachineType: LENOVO 20THCT01WW
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.13.0-30-generic root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: Upgraded to impish on 2021-10-14 (122 days ago)
dmi.bios.date: 12/27/2021
dmi.bios.release: 1.21
dmi.bios.vendor: LENOVO
dmi.bios.version: N2VET36W (1.21 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20THCT01WW
dmi.board.vendor: LENOVO
dmi.board.version: 0B98417 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.10
dmi.modalias: dmi:bvnLENOVO:bvrN2VET36W(1.21):bd12/27/2021:br1.21:efr1.10:svnLENOVO:pn20THCT01WW:pvrThinkPadP1Gen3:rvnLENOVO:rn20THCT01WW:rvr0B98417WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_20TH_BU_Think_FM_ThinkPadP1Gen3:
dmi.product.family: ThinkPad P1 Gen 3
dmi.product.name: 20THCT01WW
dmi.product.sku: LENOVO_MT_20TH_BU_Think_FM_ThinkPad P1 Gen 3
dmi.product.version: ThinkPad P1 Gen 3
dmi.sys.vendor: LENOVO
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.107-8ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 21.2.6-0ubuntu0.1
version.libgl1-mesa-glx: libgl1-mesa-glx 21.2.6-0ubuntu0.1
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.13-1ubuntu1.1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2build1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20200714-1ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-1build1

Revision history for this message
Aleksandr Panzin (jalexoid) wrote :
affects: ubuntu → xorg (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks for the bug report.

1. If you experience a crash then please follow these instructions: https://wiki.ubuntu.com/Bugs/Responses#Missing_a_crash_report_or_having_a_.crash_attachment

2. If you experience a freeze without any crash reports then it might be related to bug 1959888, even if not exactly the same.

3. You seem to have a lot of packages from "origin: unknown" shown in Dependencies.txt, so that might be a factor. But if we can confirm the cause of this problem is #1 or #2 instead then we won't have to worry about the package versions right now.

affects: xorg (Ubuntu) → ubuntu
Changed in ubuntu:
status: New → Incomplete
Revision history for this message
Aleksandr Panzin (jalexoid) wrote :
  • syslog Edit (23.6 KiB, application/octet-stream)

@vanvught

1. I can literally reproduce the crash. It's 100% reproducible.

2. The whole system freezes and the logs get a bunch of zeros. (attached is the file with extract from syslog from the moment I attach the screen to crash)

Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

It looks like the main issue just before the zeros is a kernel crash:

Feb 14 16:32:46 alex-power kernel: [ 67.119401] [drm:__nv_drm_gem_nvkms_memory_prime_get_sg_table [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to get memory pages for NvKmsKapiMemory 0x0000000076ba531b
Feb 14 16:32:46 alex-power kernel: [ 67.119416] BUG: kernel NULL pointer dereference, address: 000000000000000c
Feb 14 16:32:46 alex-power kernel: [ 67.119418] #PF: supervisor read access in kernel mode
Feb 14 16:32:46 alex-power kernel: [ 67.119419] #PF: error_code(0x0000) - not-present page
Feb 14 16:32:46 alex-power kernel: [ 67.119420] PGD 0 P4D 0
Feb 14 16:32:46 alex-power kernel: [ 67.119422] Oops: 0000 [#1] SMP NOPTI
Feb 14 16:32:46 alex-power kernel: [ 67.119424] CPU: 3 PID: 4291 Comm: gnome-shell Tainted: P OE 5.13.0-30-generic #33-Ubuntu
Feb 14 16:32:46 alex-power kernel: [ 67.119427] Hardware name: LENOVO 20THCT01WW/20THCT01WW, BIOS N2VET36W (1.21 ) 12/27/2021
Feb 14 16:32:46 alex-power kernel: [ 67.119428] RIP: 0010:drm_gem_map_dma_buf+0x43/0xb0 [drm]
Feb 14 16:32:46 alex-power kernel: [ 67.119455] Code: 00 00 83 fe 03 74 6f 48 8b 87 40 01 00 00 48 8b 40 38 48 85 c0 74 72 41 89 f5 ff d0 0f 1f 00 49 89 c4 48 3d 00 f0 ff ff 77 21 <8b> 50 0c 48 8b 7b 08 41 b8 20 00 00 00 44 89 e9 48 8b 30 e8 55 ca
Feb 14 16:32:46 alex-power kernel: [ 67.119457] RSP: 0018:ffffb921c72838d8 EFLAGS: 00010207
Feb 14 16:32:46 alex-power kernel: [ 67.119458] RAX: 0000000000000000 RBX: ffff9799510f37e0 RCX: 0000000000000000
Feb 14 16:32:46 alex-power kernel: [ 67.119460] RDX: 0000000000000000 RSI: ffff979c0bfa09c0 RDI: ffff979c0bfa09c0
Feb 14 16:32:46 alex-power kernel: [ 67.119461] RBP: ffffb921c72838f0 R08: 0000000000000000 R09: ffffb921c7283610
Feb 14 16:32:46 alex-power kernel: [ 67.119462] R10: ffffb921c7283608 R11: ffffffff92355428 R12: 0000000000000000
Feb 14 16:32:46 alex-power kernel: [ 67.119463] R13: 0000000000000000 R14: 00000000dfd20881 R15: 0000000000000800
Feb 14 16:32:46 alex-power kernel: [ 67.119464] FS: 00007fe6b9d9bd80(0000) GS:ffff979c0bf80000(0000) knlGS:0000000000000000
Feb 14 16:32:46 alex-power kernel: [ 67.119466] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 16:32:46 alex-power kernel: [ 67.119467] CR2: 000000000000000c CR3: 000000011cbae002 CR4: 00000000007706e0
Feb 14 16:32:46 alex-power kernel: [ 67.119468] PKRU: 55555554
Feb 14 16:32:46 alex-power kernel: [ 67.119469] Call Trace:
Feb 14 16:32:46 alex-power kernel: [ 67.119470] <TASK>
Feb 14 16:32:46 alex-power kernel: [ 67.119473] dma_buf_map_attachment+0x8c/0x100
Feb 14 16:32:46 alex-power kernel: [ 67.119477] i915_gem_object_get_pages_dmabuf+0x1c/0x70 [i915]
Feb 14 16:32:46 alex-power kernel: [ 67.119535] __i915_gem_object_get_pages+0x4d/0x80 [i915]

As a workaround please try logging into a Xorg session. It's the one called 'Ubuntu on Xorg' or if that's not listed then just 'Ubuntu' on the login screen.

tags: added: hybrid multigpu
tags: added: nvidia
tags: added: wayland wayland-session
summary: - System freeze
+ System freeze when plugging USB-C monitor into a hybrid Intel/Nvidia
+ laptop
summary: System freeze when plugging USB-C monitor into a hybrid Intel/Nvidia
- laptop
+ laptop: [drm:__nv_drm_gem_nvkms_memory_prime_get_sg_table [nvidia_drm]]
+ *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to get memory pages for
+ NvKmsKapiMemory
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → High
status: Incomplete → New
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in mutter (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Changed in nvidia-graphics-drivers-470 (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The strange thing to me is that Nvidia shouldn't be involved here. A USB-C port is likely to be wired to the Intel chips only. As a DisplayPort the USB-C is wired to the Intel GPU. And gnome-shell by default will use the integrated Intel GPU only for the desktop.

Please reproduce the freeze again, then reboot and run:

  journalctl -b-1 > prevboot.txt

and attach the resulting text file here.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This sounds like a simple bug in nvidia-drm-470 or kernel 5.13. Both have been upgraded in Ubuntu 22.04 already so there's a chance this issue is already fixed.

Still, please follow the instruction in comment #7 because I would like to understand more about the system.

Changed in mutter (Ubuntu):
status: Confirmed → Opinion
Revision history for this message
TheBestTheBestTheBest (thebestthebestthebest) wrote :

> The strange thing to me is that Nvidia shouldn't be involved here. A USB-C port is likely to be
> wired to the Intel chips only. As a DisplayPort the USB-C is wired to the Intel GPU. And gnome-
> shell by default will use the integrated Intel GPU only for the desktop.

FWIW, but I'm also experiencing the issue and I recall reading (I can't remember where) that the Thunderbolt port on the left end of my laptop is directly wired to the NVIDIA GPU, even when Optimus is enabled. This is not the case for other USB-C ports on my laptop.

This seems to be verified by the fact that, on X11, attaching a display to the Thunderbolt port also makes it show up in the NVIDIA control panel, which shows a reduced view and feature set - because it is acting as secondary GPU. The main laptop display is also not shown there, as it is being handled by the Intel GPU.

Revision history for this message
Aleksandr Panzin (jalexoid) wrote :

@vanvugt - USB-C port seems to be wired to NVIDIA GPU on this laptop. I cannot use the monitor, if I switch to Intel GPU (`prime-select intel`) - it doesn't even identify that a monitor was connected.

Revision history for this message
Aleksandr Panzin (jalexoid) wrote :

I'll reproduce later today after hours.

Will also try to install 22.04 to see if it helps.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Great, thanks for the explanation. So blaming just the nvidia-drm kernel module seems most reasonable right now. It is what's crashing after all.

Most likely the fix will come in the form of a new Nvidia driver. Please try apt install nvidia-driver-510 or via the 'Additional Drivers' app. You don't need to upgrade all of Ubuntu just yet.

no longer affects: mutter (Ubuntu)
no longer affects: linux (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Would I be correct in thinking that the Nvidia-510 driver turned this into bug 1964037 and bug 1959888 instead? If so then we could just declare "Won't Fix" for nvidia-470.

Changed in mutter:
status: Unknown → New
Revision history for this message
Aleksandr Panzin (jalexoid) wrote :

An update:

With recent driver updates it seems that there's no problem anymore.

To get to this stage, I had to clear out all and any configurations. Run update initiramfs

Clean install works as well...

With one caveat - Alt-DP is only working on one of the two USB-C(thunderbolt) ports.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.