Activity log for bug #2007746

Date Who What changed Old value New value Message
2023-02-18 11:47:38 Mustafa Kemal Gilor bug added bug
2023-02-18 11:47:38 Mustafa Kemal Gilor attachment added xorg.conf https://bugs.launchpad.net/bugs/2007746/+attachment/5648222/+files/xorg.conf
2023-02-18 11:51:27 Mustafa Kemal Gilor nominated for series Ubuntu Kinetic
2023-02-18 11:51:27 Mustafa Kemal Gilor bug task added xorg-server (Ubuntu Kinetic)
2023-02-18 11:51:27 Mustafa Kemal Gilor nominated for series Ubuntu Focal
2023-02-18 11:51:27 Mustafa Kemal Gilor bug task added xorg-server (Ubuntu Focal)
2023-02-18 11:51:27 Mustafa Kemal Gilor nominated for series Ubuntu Jammy
2023-02-18 11:51:27 Mustafa Kemal Gilor bug task added xorg-server (Ubuntu Jammy)
2023-02-18 11:51:27 Mustafa Kemal Gilor nominated for series Ubuntu Lunar
2023-02-18 11:51:27 Mustafa Kemal Gilor bug task added xorg-server (Ubuntu Lunar)
2023-02-18 11:51:27 Mustafa Kemal Gilor nominated for series Ubuntu Bionic
2023-02-18 11:51:27 Mustafa Kemal Gilor bug task added xorg-server (Ubuntu Bionic)
2023-02-18 11:53:16 Mustafa Kemal Gilor bug task deleted xorg-server (Ubuntu Bionic)
2023-02-18 11:53:21 Mustafa Kemal Gilor bug task deleted xorg-server (Ubuntu Jammy)
2023-02-18 11:53:25 Mustafa Kemal Gilor bug task deleted xorg-server (Ubuntu Kinetic)
2023-02-18 11:53:29 Mustafa Kemal Gilor bug task deleted xorg-server (Ubuntu Lunar)
2023-02-18 11:53:36 Mustafa Kemal Gilor xorg-server (Ubuntu Focal): status New In Progress
2023-02-18 11:53:38 Mustafa Kemal Gilor xorg-server (Ubuntu Focal): assignee Mustafa Kemal Gilor (mustafakemalgilor)
2023-02-18 12:48:37 Mustafa Kemal Gilor bug added subscriber SE SRU ("STS") Sponsors
2023-02-18 13:07:36 Launchpad Janitor merge proposal linked https://code.launchpad.net/~mustafakemalgilor/ubuntu/+source/xorg-server/+git/xorg-server/+merge/437541
2023-02-18 13:18:20 Mustafa Kemal Gilor description [ Impact ] * Microsoft Azure NV-series instances with NVidia GRID drivers started to experience xserver crashes while following Microsoft's official guide to installing Nvidia drivers [1]. * Root cause analysis showed that it was due to having a device with BusID "PCI:0@<domain_id>:0:0", where domain id is >= 32767 while the hyperv_drm kernel module is loaded. * Removing either the BusID specification or unloading the hyperv_drm kernel module seems to fix the crash. * The crash is happening while X.server is trying to enumerate PCI devices. X.server dereferences a NULL pointer while trying to access to the PCI device info. * The reason why it only happens while the hyperv_drm kernel module is loaded is that the hyperv_drm module does not expose PCI hardware information since it's a virtual device. * The upstream patch [2] addresses the issue and it's confirmed that the xserver with the patch does not experience the crash. * Ubuntu Focal `xorg-server` package does not include the patch [2] at the moment (xserver-xorg-core=2:1.20.13-1ubuntu1~20.04.6). [1]: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup#install-grid-drivers-on-nv-or-nvv3-series-vms [2]: https://github.com/freedesktop/xorg-xserver/commit/0d93bbfa2cfacbb73741f8bed0e32fa1a656b928 [ Test Plan ] Part (a) is quoted from Microsoft's official guide [1]. Part (a): * Spawn a Microsoft Azure NV-series instance with an NVidia GRID-supported GPU - e.g. `NV36adms A10` * Install updates, required tooling, and the desktop environment: - sudo apt-get update - sudo apt-get upgrade -y - sudo apt-get dist-upgrade -y - sudo apt-get install build-essential ubuntu-desktop -y - sudo apt-get install linux-azure -y * Disable nouveau kernel driver: # Create a blacklist file /etc/modprobe.d/nouveau.conf with following contents: blacklist nouveau blacklist lbm-nouveau * Reboot the VM, re-connect, and then stop X server: - sudo reboot # wait for the reboot, reconnect, and continue: - sudo systemctl stop lightdm.service * Download and install the NVidia GRID driver: - wget -O NVIDIA-Linux-x86_64-grid.run https://go.microsoft.com/fwlink/?linkid=874272 - chmod +x NVIDIA-Linux-x86_64-grid.run - sudo ./NVIDIA-Linux-x86_64-grid.run - # When the setup asks whether you want to run the nvidia-xconfig utility to update your X configuration file, select Yes. * Copy /etc/nvidia/gridd.conf.template to /etc/nvidia/gridd.conf - sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf * Edit /etc/nvidia/grid.conf - sudo nano /etc/nvidia/grid.conf # Append the following lines: IgnoreSP=FALSE EnableUI=FALSE # Remove this line if present: FeatureType=0 # And save. * Reboot the VM Part (b): * Ensure that the hyperv_drm kernel module is loaded: - sudo modprobe hyperv_drm * Use the attached xorg.conf file to override /etc/X11/xorg.conf file * try to start the `xserver`: - sudo startx * `xserver` should crash with a similar output to the following: X.Org X Server 1.20.13 X Protocol Version 11, Revision 0 Build Operating System: linux Ubuntu Current Operating System: Linux a10test 5.15.0-1031-azure #38~20.04.1-Ubuntu SMP Mon Jan 9 18:23:48 UTC 2023 x86_64 Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1031-azure root=PARTUUID=4cac852b-afba-447b-b3e7-c002155c1305 ro console=tty1 console=ttyS0 earlyprintk=ttyS0 panic=-1 Build Date: 07 February 2023 12:48:13PM xorg-server 2:1.20.13-1ubuntu1~20.04.6 (For technical support please see http://www.ubuntu.com/support) Current version of pixman: 0.38.4 Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: "/var/log/Xorg.1.log", Time: Sat Feb 18 10:54:26 2023 (==) Using config file: "/etc/X11/xorg.conf" (==) Using system config directory "/usr/share/X11/xorg.conf.d" (EE) (EE) Backtrace: (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x13c) [0x55e7787c5ecc] (EE) 1: /lib/x86_64-linux-gnu/libpthread.so.0 (funlockfile+0x60) [0x7f9576cac420] (EE) 2: /usr/lib/xorg/Xorg (xf86PlatformDeviceCheckBusID+0xa7) [0x55e7786c4db7] (EE) 3: /usr/lib/xorg/Xorg (xf86PlatformMatchDriver+0x700) [0x55e7786bf1b0] (EE) 4: /usr/lib/xorg/Xorg (xf86CallDriverProbe+0x5c) [0x55e7786971dc] (EE) 5: /usr/lib/xorg/Xorg (xf86BusConfig+0x43) [0x55e778697b23] (EE) 6: /usr/lib/xorg/Xorg (InitOutput+0x90b) [0x55e7786a59eb] (EE) 7: /usr/lib/xorg/Xorg (InitFonts+0x1d4) [0x55e778667ea4] (EE) 8: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf3) [0x7f9576ac8083] (EE) 9: /usr/lib/xorg/Xorg (_start+0x2e) [0x55e778651ace] (EE) (EE) Segmentation fault at address 0x124 (EE) Fatal server error: (EE) Caught signal 11 (Segmentation fault). Server aborting (EE) (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. (EE) Please also check the log file at "/var/log/Xorg.1.log" for additional information. (EE) (EE) Server terminated with error (1). Closing log file. ^Cxinit: giving up xinit: unable to connect to X server: Connection refused xinit: unexpected signal 2 [ Where problems could occur ] * The regression risk is low, given that the patch is well-isolated and basically adds a null check that is already assumed to be there in the first place. [ Other Info ] * workaround #1: unload hyperv_drm kernel module: - sudo modprobe -r hyperv_drm * workaround #2: Comment out BusID line in /etc/X11/xorg.conf [Device] section: Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" # BusID "PCI:0@32828:0:0" Option "HardDPMS" "false" Option "CustomEDID" "DFP-0:/etc/X11/vdisplay.edid" EndSection [ Impact ]  * Microsoft Azure NV-series instances with NVidia GRID drivers started to experience xserver crashes while following Microsoft's official guide to installing Nvidia drivers [1].  * Root cause analysis showed that it was due to having a device with BusID "PCI:0@<domain_id>:0:0", where domain id is >= 32767 while the hyperv_drm kernel module is loaded.  * Removing either the BusID specification or unloading the hyperv_drm kernel module seems to fix the crash.  * The crash is happening while X.server is trying to enumerate PCI devices. X.server dereferences a NULL pointer while trying to access to the PCI device info.  * The reason why it only happens while the hyperv_drm kernel module is loaded is that the hyperv_drm module does not expose PCI hardware information since it's a virtual device.  * The upstream patch [2] addresses the issue and it's confirmed that the xserver with the patch does not experience the crash.  * Ubuntu Focal `xorg-server` package does not include the patch [2] at the moment (xserver-xorg-core=2:1.20.13-1ubuntu1~20.04.6).  [1]: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup#install-grid-drivers-on-nv-or-nvv3-series-vms  [2]: https://github.com/freedesktop/xorg-xserver/commit/0d93bbfa2cfacbb73741f8bed0e32fa1a656b928 [ Test Plan ] Part (a) is quoted from Microsoft's official guide [1]. Part (a):  * Spawn a Microsoft Azure NV-series instance with an NVidia GRID-supported GPU    - e.g. `NV36adms A10`  * Install updates, required tooling, and the desktop environment:    - sudo apt-get update    - sudo apt-get upgrade -y    - sudo apt-get dist-upgrade -y    - sudo apt-get install build-essential ubuntu-desktop -y    - sudo apt-get install linux-azure -y  * Disable nouveau kernel driver:    # Create a blacklist file /etc/modprobe.d/nouveau.conf with following contents:    blacklist nouveau    blacklist lbm-nouveau  * Reboot the VM, re-connect, and then stop X server:    - sudo reboot    # wait for the reboot, reconnect, and continue:    - sudo systemctl stop lightdm.service  * Download and install the NVidia GRID driver:    - wget -O NVIDIA-Linux-x86_64-grid.run https://go.microsoft.com/fwlink/?linkid=874272    - chmod +x NVIDIA-Linux-x86_64-grid.run    - sudo ./NVIDIA-Linux-x86_64-grid.run    - # When the setup asks whether you want to run the nvidia-xconfig utility to update your X configuration file, select Yes.  * Copy /etc/nvidia/gridd.conf.template to /etc/nvidia/gridd.conf    - sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf  * Edit /etc/nvidia/grid.conf    - sudo nano /etc/nvidia/grid.conf    # Append the following lines:    IgnoreSP=FALSE    EnableUI=FALSE    # Remove this line if present:    FeatureType=0    # And save.  * Reboot the VM  Part (b):   * Ensure that the hyperv_drm kernel module is loaded:     - sudo modprobe hyperv_drm   * Use the attached xorg.conf file to override /etc/X11/xorg.conf file   * try to start the `xserver`:     - sudo startx   * `xserver` should crash with a similar output to the following:   X.Org X Server 1.20.13   X Protocol Version 11, Revision 0   Build Operating System: linux Ubuntu   Current Operating System: Linux a10test 5.15.0-1031-azure #38~20.04.1-Ubuntu SMP Mon Jan 9 18:23:48 UTC 2023 x86_64   Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1031-azure root=PARTUUID=4cac852b-afba-447b-b3e7-c002155c1305 ro console=tty1 console=ttyS0 earlyprintk=ttyS0 panic=-1   Build Date: 07 February 2023 12:48:13PM   xorg-server 2:1.20.13-1ubuntu1~20.04.6 (For technical support please see http://www.ubuntu.com/support)   Current version of pixman: 0.38.4     Before reporting problems, check http://wiki.x.org     to make sure that you have the latest version.   Markers: (--) probed, (**) from config file, (==) default setting,     (++) from command line, (!!) notice, (II) informational,     (WW) warning, (EE) error, (NI) not implemented, (??) unknown.   (==) Log file: "/var/log/Xorg.1.log", Time: Sat Feb 18 10:54:26 2023   (==) Using config file: "/etc/X11/xorg.conf"   (==) Using system config directory "/usr/share/X11/xorg.conf.d"   (EE)   (EE) Backtrace:   (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x13c) [0x55e7787c5ecc]   (EE) 1: /lib/x86_64-linux-gnu/libpthread.so.0 (funlockfile+0x60) [0x7f9576cac420]   (EE) 2: /usr/lib/xorg/Xorg (xf86PlatformDeviceCheckBusID+0xa7) [0x55e7786c4db7]   (EE) 3: /usr/lib/xorg/Xorg (xf86PlatformMatchDriver+0x700) [0x55e7786bf1b0]   (EE) 4: /usr/lib/xorg/Xorg (xf86CallDriverProbe+0x5c) [0x55e7786971dc]   (EE) 5: /usr/lib/xorg/Xorg (xf86BusConfig+0x43) [0x55e778697b23]   (EE) 6: /usr/lib/xorg/Xorg (InitOutput+0x90b) [0x55e7786a59eb]   (EE) 7: /usr/lib/xorg/Xorg (InitFonts+0x1d4) [0x55e778667ea4]   (EE) 8: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf3) [0x7f9576ac8083]   (EE) 9: /usr/lib/xorg/Xorg (_start+0x2e) [0x55e778651ace]   (EE)   (EE) Segmentation fault at address 0x124   (EE)   Fatal server error:   (EE) Caught signal 11 (Segmentation fault). Server aborting   (EE)   (EE)   Please consult the The X.Org Foundation support      at http://wiki.x.org    for help.   (EE) Please also check the log file at "/var/log/Xorg.1.log" for additional information.   (EE)   (EE) Server terminated with error (1). Closing log file.   ^Cxinit: giving up   xinit: unable to connect to X server: Connection refused   xinit: unexpected signal 2 # To verify patch fixes the issue: * Enable the following PPA that includes the fix: - sudo add-apt-repository ppa:mustafakemalgilor/lp2007746 - sudo apt update * Install the package - sudo apt install xserver-xorg-core=2:1.20.13-1ubuntu1~20.04.6ubuntu1 * Try to start xserver: - sudo startx * xserver should not crash. [ Where problems could occur ]  * The regression risk is low, given that the patch is well-isolated and basically adds a null check that is already assumed to be there in the first place. [ Other Info ]  * workaround #1: unload hyperv_drm kernel module:    - sudo modprobe -r hyperv_drm  * workaround #2: Comment out BusID line in /etc/X11/xorg.conf [Device] section:    Section "Device"       Identifier "Device0"       Driver "nvidia"       VendorName "NVIDIA Corporation"       # BusID "PCI:0@32828:0:0"       Option "HardDPMS" "false"       Option "CustomEDID" "DFP-0:/etc/X11/vdisplay.edid"    EndSection
2023-02-20 05:26:18 Daniel van Vugt tags focal
2023-02-20 05:26:23 Daniel van Vugt xorg-server (Ubuntu): status New Fix Released
2023-02-20 05:26:29 Daniel van Vugt bug added subscriber Daniel van Vugt
2023-03-01 15:33:46 Dariusz Gadomski tags focal focal se-sponsor-dgadomski
2023-03-02 16:27:08 Dariusz Gadomski bug added subscriber Dariusz Gadomski
2023-03-15 11:21:04 Dariusz Gadomski bug added subscriber Ubuntu Sponsors Team
2023-03-15 11:21:08 Dariusz Gadomski removed subscriber Ubuntu Sponsors Team
2023-03-15 15:22:39 Dariusz Gadomski tags focal se-sponsor-dgadomski focal se se-sponsor-dgadomski sts
2023-03-29 07:16:14 Chris Halse Rogers xorg-server (Ubuntu Focal): status In Progress Fix Committed
2023-03-29 07:16:16 Chris Halse Rogers bug added subscriber Ubuntu Stable Release Updates Team
2023-03-29 07:16:18 Chris Halse Rogers bug added subscriber SRU Verification
2023-03-29 07:16:23 Chris Halse Rogers tags focal se se-sponsor-dgadomski sts focal se se-sponsor-dgadomski sts verification-needed verification-needed-focal
2023-03-29 12:50:01 Mustafa Kemal Gilor tags focal se se-sponsor-dgadomski sts verification-needed verification-needed-focal focal se se-sponsor-dgadomski sts verification-done-focal
2023-03-29 18:08:18 Launchpad Janitor xorg-server (Ubuntu Focal): status Fix Committed Fix Released
2023-03-29 18:08:18 Launchpad Janitor cve linked 2023-1393