[SRU] xserver crashes when hyperv_drm kernel module is loaded on azure NV series instances w/ nvidia grid driver

Bug #2007746 reported by Mustafa Kemal Gilor
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xorg-server (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Mustafa Kemal Gilor

Bug Description

[ Impact ]

 * Microsoft Azure NV-series instances with NVidia GRID drivers started to experience xserver crashes while following Microsoft's official guide to installing Nvidia drivers [1].

 * Root cause analysis showed that it was due to having a device with BusID "PCI:0@<domain_id>:0:0", where domain id is >= 32767 while the hyperv_drm kernel module is loaded.

 * Removing either the BusID specification or unloading the hyperv_drm kernel module seems to fix the crash.

 * The crash is happening while X.server is trying to enumerate PCI devices. X.server dereferences a NULL pointer while trying to access to the PCI device info.

 * The reason why it only happens while the hyperv_drm kernel module is loaded is that the hyperv_drm module does not expose PCI hardware information since it's a virtual device.

 * The upstream patch [2] addresses the issue and it's confirmed that the xserver with the patch does not experience the crash.

 * Ubuntu Focal `xorg-server` package does not include the patch [2] at the moment (xserver-xorg-core=2:1.20.13-1ubuntu1~20.04.6).

 [1]: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup#install-grid-drivers-on-nv-or-nvv3-series-vms
 [2]: https://github.com/freedesktop/xorg-xserver/commit/0d93bbfa2cfacbb73741f8bed0e32fa1a656b928

[ Test Plan ]

Part (a) is quoted from Microsoft's official guide [1].

Part (a):

 * Spawn a Microsoft Azure NV-series instance with an NVidia GRID-supported GPU
   - e.g. `NV36adms A10`
 * Install updates, required tooling, and the desktop environment:
   - sudo apt-get update
   - sudo apt-get upgrade -y
   - sudo apt-get dist-upgrade -y
   - sudo apt-get install build-essential ubuntu-desktop -y
   - sudo apt-get install linux-azure -y
 * Disable nouveau kernel driver:
   # Create a blacklist file /etc/modprobe.d/nouveau.conf with following contents:
   blacklist nouveau
   blacklist lbm-nouveau
 * Reboot the VM, re-connect, and then stop X server:
   - sudo reboot
   # wait for the reboot, reconnect, and continue:
   - sudo systemctl stop lightdm.service
 * Download and install the NVidia GRID driver:
   - wget -O NVIDIA-Linux-x86_64-grid.run https://go.microsoft.com/fwlink/?linkid=874272
   - chmod +x NVIDIA-Linux-x86_64-grid.run
   - sudo ./NVIDIA-Linux-x86_64-grid.run
   - # When the setup asks whether you want to run the nvidia-xconfig utility to update your X configuration file, select Yes.
 * Copy /etc/nvidia/gridd.conf.template to /etc/nvidia/gridd.conf
   - sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf
 * Edit /etc/nvidia/grid.conf
   - sudo nano /etc/nvidia/grid.conf
   # Append the following lines:
   IgnoreSP=FALSE
   EnableUI=FALSE
   # Remove this line if present:
   FeatureType=0
   # And save.
 * Reboot the VM

 Part (b):

  * Ensure that the hyperv_drm kernel module is loaded:
    - sudo modprobe hyperv_drm
  * Use the attached xorg.conf file to override /etc/X11/xorg.conf file
  * try to start the `xserver`:
    - sudo startx
  * `xserver` should crash with a similar output to the following:
  X.Org X Server 1.20.13
  X Protocol Version 11, Revision 0
  Build Operating System: linux Ubuntu
  Current Operating System: Linux a10test 5.15.0-1031-azure #38~20.04.1-Ubuntu SMP Mon Jan 9 18:23:48 UTC 2023 x86_64
  Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1031-azure root=PARTUUID=4cac852b-afba-447b-b3e7-c002155c1305 ro console=tty1 console=ttyS0 earlyprintk=ttyS0 panic=-1
  Build Date: 07 February 2023 12:48:13PM
  xorg-server 2:1.20.13-1ubuntu1~20.04.6 (For technical support please see http://www.ubuntu.com/support)
  Current version of pixman: 0.38.4
    Before reporting problems, check http://wiki.x.org
    to make sure that you have the latest version.
  Markers: (--) probed, (**) from config file, (==) default setting,
    (++) from command line, (!!) notice, (II) informational,
    (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
  (==) Log file: "/var/log/Xorg.1.log", Time: Sat Feb 18 10:54:26 2023
  (==) Using config file: "/etc/X11/xorg.conf"
  (==) Using system config directory "/usr/share/X11/xorg.conf.d"
  (EE)
  (EE) Backtrace:
  (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x13c) [0x55e7787c5ecc]
  (EE) 1: /lib/x86_64-linux-gnu/libpthread.so.0 (funlockfile+0x60) [0x7f9576cac420]
  (EE) 2: /usr/lib/xorg/Xorg (xf86PlatformDeviceCheckBusID+0xa7) [0x55e7786c4db7]
  (EE) 3: /usr/lib/xorg/Xorg (xf86PlatformMatchDriver+0x700) [0x55e7786bf1b0]
  (EE) 4: /usr/lib/xorg/Xorg (xf86CallDriverProbe+0x5c) [0x55e7786971dc]
  (EE) 5: /usr/lib/xorg/Xorg (xf86BusConfig+0x43) [0x55e778697b23]
  (EE) 6: /usr/lib/xorg/Xorg (InitOutput+0x90b) [0x55e7786a59eb]
  (EE) 7: /usr/lib/xorg/Xorg (InitFonts+0x1d4) [0x55e778667ea4]
  (EE) 8: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf3) [0x7f9576ac8083]
  (EE) 9: /usr/lib/xorg/Xorg (_start+0x2e) [0x55e778651ace]
  (EE)
  (EE) Segmentation fault at address 0x124
  (EE)
  Fatal server error:
  (EE) Caught signal 11 (Segmentation fault). Server aborting
  (EE)
  (EE)
  Please consult the The X.Org Foundation support
     at http://wiki.x.org
   for help.
  (EE) Please also check the log file at "/var/log/Xorg.1.log" for additional information.
  (EE)
  (EE) Server terminated with error (1). Closing log file.
  ^Cxinit: giving up
  xinit: unable to connect to X server: Connection refused
  xinit: unexpected signal 2

# To verify patch fixes the issue:
* Enable the following PPA that includes the fix:
  - sudo add-apt-repository ppa:mustafakemalgilor/lp2007746
  - sudo apt update
* Install the package
  - sudo apt install xserver-xorg-core=2:1.20.13-1ubuntu1~20.04.6ubuntu1
* Try to start xserver:
  - sudo startx
* xserver should not crash.

[ Where problems could occur ]

 * The regression risk is low, given that the patch is well-isolated and basically adds a null check that is already assumed to be there in the first place.

[ Other Info ]

 * workaround #1: unload hyperv_drm kernel module:
   - sudo modprobe -r hyperv_drm
 * workaround #2: Comment out BusID line in /etc/X11/xorg.conf [Device] section:
   Section "Device"
      Identifier "Device0"
      Driver "nvidia"
      VendorName "NVIDIA Corporation"
      # BusID "PCI:0@32828:0:0"
      Option "HardDPMS" "false"
      Option "CustomEDID" "DFP-0:/etc/X11/vdisplay.edid"
   EndSection

Related branches

CVE References

Revision history for this message
Mustafa Kemal Gilor (mustafakemalgilor) wrote :
no longer affects: xorg-server (Ubuntu Bionic)
no longer affects: xorg-server (Ubuntu Jammy)
no longer affects: xorg-server (Ubuntu Kinetic)
no longer affects: xorg-server (Ubuntu Lunar)
Changed in xorg-server (Ubuntu Focal):
status: New → In Progress
assignee: nobody → Mustafa Kemal Gilor (mustafakemalgilor)
Revision history for this message
Mustafa Kemal Gilor (mustafakemalgilor) wrote (last edit ):

The relevant function is absent in Bionic and Jammy is based on an upstream version that contains the fix, so I presume the only affected series is Focal right now.

description: updated
tags: added: focal
Changed in xorg-server (Ubuntu):
status: New → Fix Released
tags: added: se-sponsor-dgadomski
tags: added: se sts
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Mustafa, or anyone else affected,

Accepted xorg-server into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/xorg-server/2:1.20.13-1ubuntu1~20.04.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in xorg-server (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Mustafa Kemal Gilor (mustafakemalgilor) wrote (last edit ):

Tested the proposed package (2:1.20.13-1ubuntu1~20.04.7) with the test plan; it no longer crashes and behaves as expected.

tags: added: verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xorg-server - 2:1.20.13-1ubuntu1~20.04.8

---------------
xorg-server (2:1.20.13-1ubuntu1~20.04.8) focal-security; urgency=medium

  * SECURITY UPDATE: Overlay Window Use-After-Free
    - debian/patches/CVE-2023-1393.patch: fix use-after-free of the COW in
      composite/compwindow.c.
    - CVE-2023-1393

 -- Marc Deslauriers <email address hidden> Wed, 29 Mar 2023 08:53:02 -0400

Changed in xorg-server (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.