Xorg crashes in stdio functions under pci_device_vgaarb_set_target() from VGAGet() from VGAarbiterSpriteMoveCursor()

Bug #1839174 reported by Wes
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
X.Org X server
Fix Released
Medium
xorg-server (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Won't Fix
Low
Unassigned
Disco
Fix Released
Undecided
Unassigned
Eoan
Fix Released
Undecided
Unassigned
xorg-server-hwe-18.04 (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned

Bug Description

Issue
xorg crashes, there doesn't seem to be any repeatable cause, it'll happen just watching hulu or youtube, it'll happen when using citrix, or when playing steam games.

Steps to reproduce
use xorg for a while

Expected behaviour
not crashing

Other information
issue only happens when using pci-e passthrough to qemu virtual machines

Has been an issue since setting up pci-e passthrough, have reinstalled Linux Mint (from XFCE version) but error persists, have changed numerous bios settings without having an impact
Upgraded bios to most recent version, also reset bios to factory defaults, didn't make a difference
Tried latest Linux 5.x kernels from uuku, same issue
The virtual machine doesn't have any problems, it never crashes and stays running when the host os xorg crashes.

System: Host: drac Kernel: 4.15.0-55-generic x86_64 bits: 64 compiler: gcc v: 7.4.0 Desktop: Cinnamon 4.2.3
           Distro: Linux Mint 19.2 Tina base: Ubuntu 18.04 bionic
Machine: Type: Server System: Supermicro product: Super Server v: 0123456789 serial: <filter>
           Mobo: Supermicro model: X10SRL-F v: 1.01B serial: <filter> UEFI: American Megatrends v: 3.1c date: 05/02/2019
Graphics: Device-1: NVIDIA GP104 [GeForce GTX 1070] vendor: Micro-Star MSI driver: nvidia v: 430.34 bus ID: 03:00.0
           Device-2: NVIDIA GK104 [GeForce GTX 770] vendor: eVga.com. driver: vfio-pci v: 0.2 bus ID: 04:00.0
           Device-3: ASPEED Graphics Family vendor: Super Micro driver: ast v: kernel bus ID: 0b:00.0
           Display: x11 server: X.Org 1.19.6 driver: nvidia resolution: 1920x1080~60Hz, 1920x1080~60Hz
           OpenGL: renderer: GeForce GTX 1070/PCIe/SSE2 v: 4.6.0 NVIDIA 430.34 direct render: Yes
CPU: Topology: 6-Core model: Intel Xeon E5-1650 v4 bits: 64 type: MT MCP arch: Broadwell rev: 1 L2 cache: 15.0 MiB
           flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 86397
           Speed: 1203 MHz min/max: N/A Core speeds (MHz): 1: 1203 2: 1206 3: 1330 4: 1200 5: 1201 6: 1201 7: 1200 8: 1200
           9: 2046 10: 1202 11: 1202 12: 1201

Not using PPA builds
Linux Mint 19.2 (Upgraded from 19.1) 64bit

root@drac:/vms# apt-cache policy xorg
xorg:
  Installed: 1:7.7+19ubuntu7.1
  Candidate: 1:7.7+19ubuntu7.1
  Version table:
 *** 1:7.7+19ubuntu7.1 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:7.7+19ubuntu7 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages

1070GTX - NVIDIA-Linux-x86_64-430.34.run
Left screen is a 60hz monitor, right screen is a 120hz monitor
Also have a 770GTX bound to vfio for virtual machine passthrough
Previously used the PPA from https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa, but changed to Nvidia's .run to see if it made a different - it did not.

[xsession-errors.old.upload.txt](https://github.com/linuxmint/cinnamon/files/3471789/xsession-errors.old.upload.txt)
[xsession-errors.upload.txt](https://github.com/linuxmint/cinnamon/files/3471791/xsession-errors.upload.txt)
[lspci.txt](https://github.com/linuxmint/cinnamon/files/3471859/lspci.txt)

Syslog:
[kernel.log.txt](https://github.com/linuxmint/cinnamon/files/3471804/kernel.log.txt)

Grub: having mitigations on or off makes no difference
GRUB_CMDLINE_LINUX_DEFAULT="intel_idle.max_cstate=1 drm.debug=14 log_buf_len=16M mitigations=on intel_iommu=on vfio-pci.ids=8086:8d26,10de:1184,10de:0e0a modprobe.blacklist=snd_hda_intel,snd_hda_core,snd_hda_codec,snd_hda_codec_hdmi,nouveau"

Xorg crashdump traces: /var/crash/usr_lib_xorg_Xorg.0.crash

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -noliste'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f34b8f87432 in __GI__IO_default_xsputn (f=0x7f34b0427610, data=0x7f34b90b5030 <zeroes>, n=3) at genops.c:389
389 genops.c: No such file or directory.
[Current thread is 1 (Thread 0x7f34b0c26700 (LWP 1320))]
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f34bbc10600 (LWP 1196))]
#0 0x00007f34b92fb2b7 in __libc_write (fd=14, buf=0x7fffd039b2c0, nbytes=23) at ../sysdeps/unix/sysv/linux/write.c:27
27 ../sysdeps/unix/sysv/linux/write.c: No such file or directory.

Thread 1 backtrace:
#0 0x00007f34b8f87432 in __GI__IO_default_xsputn (f=0x7f34b0427610, data=0x7f34b90b5030 <zeroes>, n=3) at genops.c:389
#1 0x00007f34b8f7937e in __GI__IO_padn (fp=fp@entry=0x7f34b0427610, pad=pad@entry=48, count=count@entry=3) at iopadn.c:64
#2 0x00007f34b8f55f20 in _IO_vfprintf_internal (s=s@entry=0x7f34b0427610, format=format@entry=0x7f34bac5f55c "target PCI:%04x:%02x:%02x.%x", ap=ap@entry=0x7f34b0427790)
    at vfprintf.c:1642
#3 0x00007f34b902b169 in ___vsnprintf_chk (s=0x7f34b0427890 "target PCI:\020\200y\360.0\323\b3\330U", maxlen=<optimized out>, flags=1, slen=<optimized out>,
    format=0x7f34bac5f55c "target PCI:%04x:%02x:%02x.%x", args=args@entry=0x7f34b0427790) at vsnprintf_chk.c:63
#4 0x00007f34b902b095 in ___snprintf_chk (s=<optimized out>, maxlen=<optimized out>, flags=<optimized out>, slen=<optimized out>, format=<optimized out>) at snprintf_chk.c:34
#5 0x00007f34bac5d68a in pci_device_vgaarb_set_target () from /usr/lib/x86_64-linux-gnu/libpciaccess.so.0
#6 0x000055d830d0d038 in VGAGet (pScreen=0x55d8330adcd0) at ../../../../../../hw/xfree86/common/xf86VGAarbiterPriv.h:102
#7 VGAarbiterSpriteMoveCursor (pDev=0x55d8334d1360, pScreen=0x55d8330adcd0, x=1982, y=654) at ../../../../../../hw/xfree86/common/xf86VGAarbiter.c:948
#8 0x000055d830d0d04f in VGAarbiterSpriteMoveCursor (pDev=0x55d8334d1360, pScreen=0x55d8330adcd0, x=1982, y=654) at ../../../../../../hw/xfree86/common/xf86VGAarbiter.c:949
***Last event #8 repeats***

Thread 2 backtrace:
#0 0x00007f34b92fb2b7 in __libc_write (fd=14, buf=0x7fffd039b2c0, nbytes=23) at ../sysdeps/unix/sysv/linux/write.c:27
#1 0x00007f34bac5d4c0 in ?? () from /usr/lib/x86_64-linux-gnu/libpciaccess.so.0
#2 0x00007f34bac5d69f in pci_device_vgaarb_set_target () from /usr/lib/x86_64-linux-gnu/libpciaccess.so.0
#3 0x000055d830d0d038 in VGAGet (pScreen=0x55d8330adcd0) at ../../../../../../hw/xfree86/common/xf86VGAarbiterPriv.h:102
#4 VGAarbiterSpriteMoveCursor (pDev=0x55d8334d1360, pScreen=0x55d8330adcd0, x=1990, y=664) at ../../../../../../hw/xfree86/common/xf86VGAarbiter.c:948
#5 0x000055d830d0d04f in VGAarbiterSpriteMoveCursor (pDev=0x55d8334d1360, pScreen=0x55d8330adcd0, x=1990, y=664) at ../../../../../../hw/xfree86/common/xf86VGAarbiter.c:949
***Last event #5 repeats***

Revision history for this message
In , Zygfryd (kat-zygfryd) wrote :

Created attachment 133165
gdb log of Xorg execution and crash stacktrace

Steps to reproduce:

1) Have an AMD GPU as your display device (radeon or modesetting, doesn't matter)
2) Have an Intel iGPU, unused
3) Start an X session, grab a window corner and keep moving your mouse for a couple minutes, or just use your computer normally for up to a couple hours.

Xorg was compiled using GCC 4.9.4 with CFLAGS="-march=core-avx2 -O2 -pipe -ggdb"

Revision history for this message
In , Keith Packard (keithp) wrote :

I've posted a proposed patch for this, although I have no way to test to see if it helps. What it does is prevent the main thread and input thread from scrambling the pointer private structures used by the VGA arbiter.

Revision history for this message
In , Ajax-a (ajax-a) wrote :

commit cf7517675d988c2d1ff967d6d162a17acbdad466
Author: Keith Packard <email address hidden>
Date: Wed Aug 2 21:34:52 2017 -0700

    xfree86: Hold input_lock across SPRITE functions in VGA arbiter

    Avoid scrambling the sprite functions wrapper.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101995
    Signed-off-by: Keith Packard <email address hidden>
    Reviewed-by: Adam Jackson <email address hidden>

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks for the bug report. I can't seem to find any similar crash reports on:

https://errors.ubuntu.com/?release=Ubuntu%2018.04&package=xorg-server&period=year

Can you please follow these instructions to try and generate a formal crash report?

https://wiki.ubuntu.com/Bugs/Responses#Missing_a_crash_report_or_having_a_.crash_attachment

I would like it to confirm exactly what is crashing and where...

affects: xorg (Ubuntu) → xorg-server (Ubuntu)
tags: added: bionic
Revision history for this message
Sebastien Bacher (seb128) wrote :
Changed in xorg-server (Ubuntu):
importance: Undecided → High
status: New → Fix Released
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks Seb. That means the fix is in the HWE packages for 18.04, not the regular packages for 18.04.

summary: - Xorg crashes - segfault error 6 in libc-2.27.so
+ Xorg crashes in stdio functions under pci_device_vgaarb_set_target()
+ from VGAGet() from VGAarbiterSpriteMoveCursor()
Changed in xorg-server (Ubuntu):
status: Fix Released → Confirmed
Changed in xorg-server-hwe-18.04 (Ubuntu):
status: New → Fix Released
importance: Undecided → High
Changed in xorg-server (Ubuntu Disco):
status: New → Fix Released
Changed in xorg-server (Ubuntu Eoan):
status: Confirmed → Fix Released
Changed in xorg-server-hwe-18.04 (Ubuntu Bionic):
status: New → Fix Released
no longer affects: xorg-server-hwe-18.04 (Ubuntu Disco)
no longer affects: xorg-server-hwe-18.04 (Ubuntu Eoan)
Changed in xorg-server (Ubuntu Bionic):
status: New → Confirmed
Changed in xorg-server (Ubuntu Eoan):
importance: High → Undecided
Changed in xorg-server-hwe-18.04 (Ubuntu):
importance: High → Undecided
Revision history for this message
Sebastien Bacher (seb128) wrote :

the issue doesn't seem common enough to justify a SRU in the non hwe version at this point so setting that one as wontfix

Changed in xorg-server (Ubuntu Bionic):
importance: Undecided → Low
status: Confirmed → Won't Fix
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If you want the fix and still want to stick with version 18.04 then you should get it by installing 18.04.2 instead:

  http://releases.ubuntu.com/18.04/

Alternatively you can install the HWE packages, but I don't have instructions prepared for how to do that reliably.

Revision history for this message
Wes (teva678) wrote :

installing the HWE packages resolved the issue for Linux Mint 19.2, thank you.
Command: apt install --install-recommends xserver-xorg-hwe-18.04

Changed in xorg-server:
importance: Unknown → Medium
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.