[SRU] xserver crashes when hyperv_drm kernel module is loaded on azure NV series instances w/ nvidia grid driver
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
xorg-server (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Mustafa Kemal Gilor |
Bug Description
[ Impact ]
* Microsoft Azure NV-series instances with NVidia GRID drivers started to experience xserver crashes while following Microsoft's official guide to installing Nvidia drivers [1].
* Root cause analysis showed that it was due to having a device with BusID "PCI:0@
* Removing either the BusID specification or unloading the hyperv_drm kernel module seems to fix the crash.
* The crash is happening while X.server is trying to enumerate PCI devices. X.server dereferences a NULL pointer while trying to access to the PCI device info.
* The reason why it only happens while the hyperv_drm kernel module is loaded is that the hyperv_drm module does not expose PCI hardware information since it's a virtual device.
* The upstream patch [2] addresses the issue and it's confirmed that the xserver with the patch does not experience the crash.
* Ubuntu Focal `xorg-server` package does not include the patch [2] at the moment (xserver-
[1]: https:/
[2]: https:/
[ Test Plan ]
Part (a) is quoted from Microsoft's official guide [1].
Part (a):
* Spawn a Microsoft Azure NV-series instance with an NVidia GRID-supported GPU
- e.g. `NV36adms A10`
* Install updates, required tooling, and the desktop environment:
- sudo apt-get update
- sudo apt-get upgrade -y
- sudo apt-get dist-upgrade -y
- sudo apt-get install build-essential ubuntu-desktop -y
- sudo apt-get install linux-azure -y
* Disable nouveau kernel driver:
# Create a blacklist file /etc/modprobe.
blacklist nouveau
blacklist lbm-nouveau
* Reboot the VM, re-connect, and then stop X server:
- sudo reboot
# wait for the reboot, reconnect, and continue:
- sudo systemctl stop lightdm.service
* Download and install the NVidia GRID driver:
- wget -O NVIDIA-
- chmod +x NVIDIA-
- sudo ./NVIDIA-
- # When the setup asks whether you want to run the nvidia-xconfig utility to update your X configuration file, select Yes.
* Copy /etc/nvidia/
- sudo cp /etc/nvidia/
* Edit /etc/nvidia/
- sudo nano /etc/nvidia/
# Append the following lines:
IgnoreSP=FALSE
EnableUI=FALSE
# Remove this line if present:
FeatureType=0
# And save.
* Reboot the VM
Part (b):
* Ensure that the hyperv_drm kernel module is loaded:
- sudo modprobe hyperv_drm
* Use the attached xorg.conf file to override /etc/X11/xorg.conf file
* try to start the `xserver`:
- sudo startx
* `xserver` should crash with a similar output to the following:
X.Org X Server 1.20.13
X Protocol Version 11, Revision 0
Build Operating System: linux Ubuntu
Current Operating System: Linux a10test 5.15.0-1031-azure #38~20.04.1-Ubuntu SMP Mon Jan 9 18:23:48 UTC 2023 x86_64
Kernel command line: BOOT_IMAGE=
Build Date: 07 February 2023 12:48:13PM
xorg-server 2:1.20.
Current version of pixman: 0.38.4
Before reporting problems, check http://
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/
(==) Using config file: "/etc/X11/
(==) Using system config directory "/usr/share/
(EE)
(EE) Backtrace:
(EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+
(EE) 1: /lib/x86_
(EE) 2: /usr/lib/xorg/Xorg (xf86PlatformDe
(EE) 3: /usr/lib/xorg/Xorg (xf86PlatformMa
(EE) 4: /usr/lib/xorg/Xorg (xf86CallDriver
(EE) 5: /usr/lib/xorg/Xorg (xf86BusConfig+
(EE) 6: /usr/lib/xorg/Xorg (InitOutput+0x90b) [0x55e7786a59eb]
(EE) 7: /usr/lib/xorg/Xorg (InitFonts+0x1d4) [0x55e778667ea4]
(EE) 8: /lib/x86_
(EE) 9: /usr/lib/xorg/Xorg (_start+0x2e) [0x55e778651ace]
(EE)
(EE) Segmentation fault at address 0x124
(EE)
Fatal server error:
(EE) Caught signal 11 (Segmentation fault). Server aborting
(EE)
(EE)
Please consult the The X.Org Foundation support
at http://
for help.
(EE) Please also check the log file at "/var/log/
(EE)
(EE) Server terminated with error (1). Closing log file.
^Cxinit: giving up
xinit: unable to connect to X server: Connection refused
xinit: unexpected signal 2
# To verify patch fixes the issue:
* Enable the following PPA that includes the fix:
- sudo add-apt-repository ppa:mustafakema
- sudo apt update
* Install the package
- sudo apt install xserver-
* Try to start xserver:
- sudo startx
* xserver should not crash.
[ Where problems could occur ]
* The regression risk is low, given that the patch is well-isolated and basically adds a null check that is already assumed to be there in the first place.
[ Other Info ]
* workaround #1: unload hyperv_drm kernel module:
- sudo modprobe -r hyperv_drm
* workaround #2: Comment out BusID line in /etc/X11/xorg.conf [Device] section:
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
# BusID "PCI:0@32828:0:0"
Option "HardDPMS" "false"
Option "CustomEDID" "DFP-0:
EndSection
Related branches
- Ubuntu-X: Pending requested
-
Diff: 85 lines (+63/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/lp2007746-fix-pdev-null-deref.patch (+55/-0)
debian/patches/series (+1/-0)
CVE References
no longer affects: | xorg-server (Ubuntu Bionic) |
no longer affects: | xorg-server (Ubuntu Jammy) |
no longer affects: | xorg-server (Ubuntu Kinetic) |
no longer affects: | xorg-server (Ubuntu Lunar) |
Changed in xorg-server (Ubuntu Focal): | |
status: | New → In Progress |
assignee: | nobody → Mustafa Kemal Gilor (mustafakemalgilor) |
description: | updated |
tags: | added: focal |
Changed in xorg-server (Ubuntu): | |
status: | New → Fix Released |
tags: | added: se-sponsor-dgadomski |
tags: | added: se sts |
The relevant function is absent in Bionic and Jammy is based on an upstream version that contains the fix, so I presume the only affected series is Focal right now.