Bug #1815889 “qemu-system-x86_64 crashed with signal 31 in __pth...” : Bugs : mesa package : Ubuntu

Revision history for this message

Joseph Maillardet (jokx) wrote on 2019-02-14:

#1

CoreDump.gz Edit (1.5 MiB, application/x-gzip)
CurrentDmesg.txt Edit (70.7 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (7.1 KiB, text/plain; charset="utf-8")
Disassembly.txt Edit (1.1 KiB, text/plain; charset="utf-8")
JournalErrors.txt Edit (33.3 KiB, text/plain; charset="utf-8")
Lspci.txt Edit (29.8 KiB, text/plain; charset="utf-8")
Lsusb.txt Edit (578 bytes, text/plain; charset="utf-8")
ProcCmdline.txt Edit (3.4 KiB, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (4.2 KiB, text/plain; charset="utf-8")
ProcCpuinfoMinimal.txt Edit (1.1 KiB, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (3.4 KiB, text/plain; charset="utf-8")
ProcMaps.txt Edit (64.7 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (4.9 KiB, text/plain; charset="utf-8")
ProcStatus.txt Edit (1.3 KiB, text/plain; charset="utf-8")
Registers.txt Edit (1.0 KiB, text/plain; charset="utf-8")
RelatedPackageVersions.txt Edit (5.2 KiB, text/plain; charset="utf-8")
Stacktrace.txt Edit (1.1 KiB, text/plain; charset="utf-8")
ThreadStacktrace.txt Edit (7.4 KiB, text/plain; charset="utf-8")
UdevDb.txt Edit (254.3 KiB, text/plain; charset="utf-8")

Revision history for this message

Apport retracing service (apport) wrote on 2019-02-14:

#2

StacktraceTop:
__pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f5771fbf680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
?? () from /tmp/apport_sandbox_8_pwkx51/usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
?? ()
?? ()
?? ()

Revision history for this message

Apport retracing service (apport) wrote on 2019-02-14: Stacktrace.txt

#3

Stacktrace.txt Edit (1.7 KiB, text/plain)

Revision history for this message

Apport retracing service (apport) wrote on 2019-02-14: StacktraceSource.txt

#4

StacktraceSource.txt Edit (963 bytes, text/plain)

Revision history for this message

Apport retracing service (apport) wrote on 2019-02-14: ThreadStacktrace.txt

#5

ThreadStacktrace.txt Edit (12.1 KiB, text/plain)

tags:	added: apport-failed-retrace
tags:	removed: need-amd64-retrace

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-15:

#6

I can confirm the reported issue

Changed in qemu (Ubuntu):
status:	New → Confirmed

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-15:

#7

Trace looks similar:
--- stack trace ---
#0 0x00007f1570fec0bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f156d4e3680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
        __arg2 = 128
        _a3 = 139730004883072
        _a1 = 22587
        resultvar = <optimized out>
        __arg3 = 139730004883072
        __arg1 = 22587
        _a2 = 128
        pd = <optimized out>
        res = <optimized out>
#1 0x00007f156dc8dc73 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
No symbol table info available.
#2 0x00007f156dc8d5d7 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
No symbol table info available.
#3 0x00007f1570fe1164 in start_thread (arg=<optimized out>) at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139730004887296, -2085932122569588158, 140733496626446, 140733496626447, 0, 139730004883520, 2100820740254843458, 2100830499542516290}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#4 0x00007f1570f09def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
--- source code stack trace ---
#0 0x00007f1570fec0bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f156d4e3680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
  [Error: pthread_setaffinity.c was not found in source tree]
#1 0x00007f156dc8dc73 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#2 0x00007f156dc8d5d7 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#3 0x00007f1570fe1164 in start_thread (arg=<optimized out>) at pthread_create.c:486
  [Error: pthread_create.c was not found in source tree]
#4 0x00007f1570f09def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  [Error: clone.S was not found in source tree]

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-15:

#8

Download full text (6.4 KiB)

libvirt XML that was generated:
<domain type="kvm">
  <name>fedora29-wor</name>
  <uuid>2f4e83f7-18ed-45e2-bbf7-eef9f1c6c6c0</uuid>
  <title>Fedora 29 Workstation</title>
  <metadata>
    <boxes:gnome-boxes xmlns:boxes="https://wiki.gnome.org/Apps/Boxes">
      <os-state>live</os-state>
      <media-id>http://fedoraproject.org/fedora/29:0</media-id>
      <media>/home/paelzer/Fedora-Workstation-Live-x86_64-29-1.2.iso</media>
    </boxes:gnome-boxes>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://fedoraproject.org/fedora/29"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">2097152</memory>
  <currentMemory unit="KiB">2097152</currentMemory>
  <vcpu placement="static">2</vcpu>
  <os>
    <type arch="x86_64" machine="pc-q35-3.1">hvm</type>
    <boot dev="cdrom"/>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode="host-passthrough" check="none">
    <topology sockets="1" cores="2" threads="1"/>
  </cpu>
  <clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" cache="writeback"/>
      <source file="/home/paelzer/.local/share/gnome-boxes/images/fedora29-wor"/>
      <target dev="vda" bus="virtio"/>
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/paelzer/Fedora-Workstation-Live-x86_64-29-1.2.iso" startupPolicy="mandatory"/>
      <target dev="hdc" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="2"/>
    </disk>
    <controller type="usb" index="0" model="ich9-ehci1">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x7"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci1">
      <master startport="0"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x0" multifunction="on"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci2">
      <master startport="2"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x1"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci3">
      <master startport="4"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x2"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000...

libvirt XML that was generated:
<domain type="kvm">
  <name>fedora29-wor</name>
  <uuid>2f4e83f7-18ed-45e2-bbf7-eef9f1c6c6c0</uuid>
  <title>Fedora 29 Workstation</title>
  <metadata>
    <boxes:gnome-boxes xmlns:boxes="https://wiki.gnome.org/Apps/Boxes">
      <os-state>live</os-state>
      <media-id>http://fedoraproject.org/fedora/29:0</media-id>
      <media>/home/paelzer/Fedora-Workstation-Live-x86_64-29-1.2.iso</media>
    </boxes:gnome-boxes>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://fedoraproject.org/fedora/29"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">2097152</memory>
  <currentMemory unit="KiB">2097152</currentMemory>
  <vcpu placement="static">2</vcpu>
  <os>
    <type arch="x86_64" machine="pc-q35-3.1">hvm</type>
    <boot dev="cdrom"/>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode="host-passthrough" check="none">
    <topology sockets="1" cores="2" threads="1"/>
  </cpu>
  <clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" cache="writeback"/>
      <source file="/home/paelzer/.local/share/gnome-boxes/images/fedora29-wor"/>
      <target dev="vda" bus="virtio"/>
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/paelzer/Fedora-Workstation-Live-x86_64-29-1.2.iso" startupPolicy="mandatory"/>
      <target dev="hdc" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="2"/>
    </disk>
    <controller type="usb" index="0" model="ich9-ehci1">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x7"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci1">
      <master startport="0"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x0" multifunction="on"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci2">
      <master startport="2"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x1"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci3">
      <master startport="4"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x2"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0x13"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
    </controller>
    <controller type="pci" index="5" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="5" port="0x14"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
    </controller>
    <controller type="virtio-serial" index="0">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="ccid" index="0">
      <address type="usb" bus="0" port="1"/>
    </controller>
    <interface type="user">
      <mac address="52:54:00:ee:17:af"/>
      <model type="virtio"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </interface>
    <smartcard mode="passthrough" type="spicevmc">
      <address type="ccid" controller="0" slot="0"/>
    </smartcard>
    <serial type="pty">
      <target type="isa-serial" port="0">
        <model name="isa-serial"/>
      </target>
    </serial>
    <console type="pty">
      <target type="serial" port="0"/>
    </console>
    <channel type="spicevmc">
      <target type="virtio" name="com.redhat.spice.0"/>
      <address type="virtio-serial" controller="0" bus="0" port="1"/>
    </channel>
    <channel type="spiceport">
      <source channel="org.spice-space.webdav.0"/>
      <target type="virtio" name="org.spice-space.webdav.0"/>
      <address type="virtio-serial" controller="0" bus="0" port="2"/>
    </channel>
    <input type="tablet" bus="usb">
      <address type="usb" bus="0" port="2"/>
    </input>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <graphics type="spice">
      <listen type="none"/>
      <image compression="off"/>
      <gl enable="yes"/>
    </graphics>
    <sound model="ich9">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/>
    </sound>
    <video>
      <model type="virtio" heads="1" primary="yes">
        <acceleration accel3d="yes"/>
      </model>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
    </video>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="3"/>
    </redirdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="4"/>
    </redirdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="5"/>
    </redirdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="6"/>
    </redirdev>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </memballoon>
  </devices>
</domain>

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-15:

#9

Interestingly, the Ubuntu 18.10 image works.
So is it really an attribute of the guest that breaks it?

BTW - Arr, why does it spawn its own libvirtd ?!
Dear gnome boxes what are you doing?
0 1000 21610 1 20 0 85807204 68912 poll_s SLl pts/2 0:00 /usr/lib/x86_64-linux-gnu/webkit2gtk-4.0/WebKitWebProcess 2 15
0 1000 21612 1 20 0 85772584 34132 poll_s SLl pts/2 0:00 /usr/lib/x86_64-linux-gnu/webkit2gtk-4.0/WebKitNetworkProcess 3 15
0 1000 21649 1 20 0 1391464 39144 poll_s Sl ? 0:00 /usr/sbin/libvirtd --timeout=30

Thanks to "lsof +fg -p" some important paths:

The guest log is in /home/paelzer/.cache/libvirt/qemu/log/ubuntu18.10.log
Control sockets are at
/run/user/1000/libvirt/libvirt-sock
/run/user/1000/libvirt/libvirt-admin-sock

Now lets try to poke at it without that UI around it ....

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-15:

#10

The following gets me to non boxy libvirt:
$ virsh -c qemu+unix:///session?socket=/run/user/1000/libvirt/libvirt-sock list --all

For now I'll assume that it is NOT depending on the guest, but lets modify the working Ubuntu guest one by one to become more like the F29 guest and we will see.

1. different disks/iso's/MAC (obviously)
2. F29 has gl enabled on the spice graphics
3. video F29: virtio Ubuntu: qxl
4. video has <acceleration accel3d='yes'/> set

That is all the difference, so it seems 3d'ish to me.

First change
<model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
to
<model type='virtio' heads='1' primary='yes'>
=> still working

Second change enable gl
<gl enable='no'/>
to
<gl enable='yes'/>

=> Broken

Lets take back the First change but keep only the second.
=> still broken.

So it is the enablement of gl which I work on anyway recently (some apparmor changes to make it work in my former setup).

Thanks for sharing this bug, but I need to analyze more in depth what is wrong here, but that might take a while.

Note: Since your guest crashed on start the crash has no private data - marking the bug public ...

For the time being as a workaround:
virsh -c qemu+unix:///session?socket=/run/user/1000/libvirt/libvirt-sock edit fedora29-wor
(assuming that is your guest name as well)
and switch off the gl enablement.
Gives me a perfectly working guest, hope that helps you for now until a real fix is found.

Changed in qemu (Ubuntu):
status:	Confirmed → Triaged
information type:	Private → Public

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-15:

#11

Download full text (6.2 KiB)

FTR: this guest XML (not out of gnome-boxes) works on the very same Host system.
This runs qxl + gl=yes as well and does not fail.
We need to find what the difference is between those is as well.

<domain type='kvm'>
  <name>ubuntu18.04</name>
  <uuid>2f6bde7c-1d3d-498a-b96c-8920f165fa4c</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://ubuntu.com/ubuntu/18.04"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state='off'/>
  </features>
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/ubuntu18.04.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='sda' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x2'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </contr...

FTR: this guest XML (not out of gnome-boxes) works on the very same Host system.
This runs qxl + gl=yes as well and does not fail.
We need to find what the difference is between those is as well.

<domain type='kvm'>
  <name>ubuntu18.04</name>
  <uuid>2f6bde7c-1d3d-498a-b96c-8920f165fa4c</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://ubuntu.com/ubuntu/18.04"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state='off'/>
  </features>
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/ubuntu18.04.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='sda' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x2'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:8c:31:fc'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice'>
      <listen type='none'/>
      <image compression='off'/>
      <gl enable='yes'/>
    </graphics>
    <sound model='ich9'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='3'/>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </rng>
  </devices>
</domain>

P.S. I'm on a trip next week so further response might take a while, sorry

Revision history for this message

In freedesktop.org Bugzilla #109695, Ahzo (ahzo) wrote on 2019-02-20:

#27

Since upgrading Mesa from 18.2 to 18.3, launching a QEMU virtual machine with Spice OpenGL enabled (for virgl), causes QEMU to crash with SIGSYS inside the radeonsi driver. The reason for this is that the QEMU sandbox option 'resourcecontrol=deny' disables the sched_setaffinity syscall called in pthread_setaffinity_np, which is now used by the radeonsi driver.

A simple way to reproduce this problem is:
$ gdb --batch --ex run --ex bt --args qemu-system-x86_64 -spice gl=on -sandbox on,resourcecontrol=deny
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff45aa700 (LWP 23432)]
[New Thread 0x7ffff08e5700 (LWP 23433)]
[New Thread 0x7fffe3fff700 (LWP 23434)]
[New Thread 0x7fffe37fe700 (LWP 23435)]

Thread 4 "qemu-system-x86" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7fffe3fff700 (LWP 23434)]
0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
34 ../sysdeps/unix/sysv/linux/pthread_setaffinity.c: No such file or directory.
#0 0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
#1 0x00007ffff12ba2b3 in util_queue_thread_func (input=input@entry=0x55555640b1f0) at ../src/util/u_queue.c:252
#2 0x00007ffff12b9c17 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
#3 0x00007ffff68c1fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4 0x00007ffff67f280f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The problematic code at src/util/u_queue.c:252 was added in the following commit:
commit d877451b48a59ab0f9a4210fc736f51da5851c9a
Author: Marek Olšák <email address hidden>
Date: Mon Oct 1 15:51:06 2018 -0400

util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY

Initial version discussed with Rob Clark under a different patch name.
This approach leaves his driver unaffected.

Since setting the thread affinity seems non-essential here, the failing syscall should be handled gracefully, for example by setting a signal handler to ignore the SIGSYS signal.

Since upgrading Mesa from 18.2 to 18.3, launching a QEMU virtual machine with Spice OpenGL enabled (for virgl), causes QEMU to crash with SIGSYS inside the radeonsi driver. The reason for this is that the QEMU sandbox option 'resourcecontrol=deny' disables the sched_setaffinity syscall called in pthread_setaffinity_np, which is now used by the radeonsi driver.

A simple way to reproduce this problem is:
$ gdb --batch --ex run --ex bt --args qemu-system-x86_64 -spice gl=on -sandbox on,resourcecontrol=deny
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff45aa700 (LWP 23432)]
[New Thread 0x7ffff08e5700 (LWP 23433)]
[New Thread 0x7fffe3fff700 (LWP 23434)]
[New Thread 0x7fffe37fe700 (LWP 23435)]

Thread 4 "qemu-system-x86" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7fffe3fff700 (LWP 23434)]
0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
34	../sysdeps/unix/sysv/linux/pthread_setaffinity.c: No such file or directory.
#0  0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
#1  0x00007ffff12ba2b3 in util_queue_thread_func (input=input@entry=0x55555640b1f0) at ../src/util/u_queue.c:252
#2  0x00007ffff12b9c17 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
#3  0x00007ffff68c1fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4  0x00007ffff67f280f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The problematic code at src/util/u_queue.c:252 was added in the following commit:
commit d877451b48a59ab0f9a4210fc736f51da5851c9a
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Mon Oct 1 15:51:06 2018 -0400

util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY
    
    Initial version discussed with Rob Clark under a different patch name.
    This approach leaves his driver unaffected.

Since setting the thread affinity seems non-essential here, the failing syscall should be handled gracefully, for example by setting a signal handler to ignore the SIGSYS signal.

Revision history for this message

In freedesktop.org Bugzilla #109695, Marek Olšák (maraeo) wrote on 2019-02-20:

#28

Mesa needs a way to query that it can't set thread affinity.

Revision history for this message

In freedesktop.org Bugzilla #109695, Ahzo (ahzo) wrote on 2019-02-21:

#29

To check for the availability of the syscall, one can try it in a child process and see if the child is terminated by a signal, e.g. like this:

#include <stdbool.h>
#include <unistd.h>
#include <sys/resource.h>
#include <sys/syscall.h>
#include <sys/wait.h>

static bool
can_set_affinity()
{
   pid_t pid = fork();
   int status = 0;
   if (!pid) {
      /* Disable coredumps, because a SIGSYS crash is expected. */
      struct rlimit limit = { 0 };
      limit.rlim_cur = 1;
      limit.rlim_max = 1;
      setrlimit(RLIMIT_CORE, &limit);
      /* Test the syscall in the child process. */
      syscall(SYS_sched_setaffinity, 0, 0, 0);
      _exit(0);
   } else if (pid < 0) {
      return false;
   }
   if (waitpid(pid, &status, 0) < 0) {
      return false;
   }
   if (WIFSIGNALED(status)) {
      /* The child process was terminated by a signal,
       * thus the syscall cannot be used.
       */
      return false;
   }
   return true;
}

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-27:

#12

Since my domain ran gl fine I was eliminating more differences one by one, keeping <gl enable='yes'/> to check if there is a second ingredient needed.

- do not set acceleration on virtio vido dev
- machine type q35 -> i440fx (and all pcie->pci that comes with that)
- 1 instead of 4 vcpus
- no host passthrough
- no boot from CD
- add pae feature
- remove rtc/pit/hpet clock attributes
- usb ich9-[eu]hci1 -> piix3-uhci
- no smartcard entry
- no usb tablet
- use cirrus video card
- virtio channel
- no PM config
- console virtio serial
- no soundcard
- reduce memory

None of it makes it work, but the files are nearly identical now

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-27:

#13

That left only the actual disk+iso of fedora vs ubuntu cloudimg based qcow and that the boxes VM used userspace networking. Still the issue remained.

But I realized there is one more difference, the Boxes VM runs in user context while mine is a system level VM (qemu:///system) running the gl essentially headless until one connects to the local spice port.
But the gnome boxes VM was having the UI up immediately connecting to it once available.

So I defined the XML of the gnome-boxes VM in my qemu:///system libvirt context.
This - as expected (I copied the files to /var/lib/libvirt/images and adapted the paths).
This makes it work which is at least some lead to follow.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-27:

#14

I can make the viewers (virt-viewer / virt-manager) crash when attaching to it semi-remotely - but that might be a broken setup for a local only spice definition.

When attaching viewers locally it works just fine.

In none of those cases qemu crashes, so it clearly isn't the same. Both fail at some glib errors which makes sense since I try to remote (though ssh) use local only features.

So to summarize:
- crash with gl enabled
- only triggers if run in user context
- gl works in system context (local viewers can attach and it works)

I'm out of obvious "change the config to check what it is" options.
But since it is at least reproducible I'll focus on the qemu backtrace itself next ...

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-27:

#15

Stack trace with slightly more info as all DBG and source is installed here.

--- stack trace ---
#0 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
        __arg2 = 128
        _a3 = 139788870899328
        _a1 = 17325
        resultvar = <optimized out>
        __arg3 = 139788870899328
        __arg1 = 17325
        _a2 = 128
        pd = <optimized out>
        res = <optimized out>
#1 0x00007f23227abd83 in util_queue_thread_func (input=input@entry=0x55a59a695bd0) at ../src/util/u_queue.c:252
        cpuset = {__bits = {18446744073709551615 <repeats 16 times>}}
        queue = 0x55a59a8952d0
        thread_index = 0
        __PRETTY_FUNCTION__ = "util_queue_thread_func"
#2 0x00007f23227ab6e7 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
        pack = {func = 0x7f23227aba70 <util_queue_thread_func>, arg = 0x55a59a695bd0}
#3 0x00007f2325ad5164 in start_thread (arg=<optimized out>) at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139788870903552, 9195723382052266688, 140723610455422, 140723610455423, 0, 139788870899776, -9089523756422225216, -9089514281776799040}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#4 0x00007f23259fddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
--- source code stack trace ---
#0 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
  [Error: pthread_setaffinity.c was not found in source tree]
#1 0x00007f23227abd83 in util_queue_thread_func (input=input@entry=0x55a59a695bd0) at ../src/util/u_queue.c:252
  [Error: u_queue.c was not found in source tree]
#2 0x00007f23227ab6e7 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
  [Error: threads_posix.h was not found in source tree]
#3 0x00007f2325ad5164 in start_thread (arg=<optimized out>) at pthread_create.c:486
  [Error: pthread_create.c was not found in source tree]
#4 0x00007f23259fddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  [Error: clone.S was not found in source tree]

Stack trace with slightly more info as all DBG and source is installed here.

--- stack trace ---
#0  0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
        __arg2 = 128
        _a3 = 139788870899328
        _a1 = 17325
        resultvar = <optimized out>
        __arg3 = 139788870899328
        __arg1 = 17325
        _a2 = 128
        pd = <optimized out>
        res = <optimized out>
#1  0x00007f23227abd83 in util_queue_thread_func (input=input@entry=0x55a59a695bd0) at ../src/util/u_queue.c:252
        cpuset = {__bits = {18446744073709551615 <repeats 16 times>}}
        queue = 0x55a59a8952d0
        thread_index = 0
        __PRETTY_FUNCTION__ = "util_queue_thread_func"
#2  0x00007f23227ab6e7 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
        pack = {func = 0x7f23227aba70 <util_queue_thread_func>, arg = 0x55a59a695bd0}
#3  0x00007f2325ad5164 in start_thread (arg=<optimized out>) at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139788870903552, 9195723382052266688, 140723610455422, 140723610455423, 0, 139788870899776, -9089523756422225216, -9089514281776799040}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#4  0x00007f23259fddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
--- source code stack trace ---
#0  0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
  [Error: pthread_setaffinity.c was not found in source tree]
#1  0x00007f23227abd83 in util_queue_thread_func (input=input@entry=0x55a59a695bd0) at ../src/util/u_queue.c:252
  [Error: u_queue.c was not found in source tree]
#2  0x00007f23227ab6e7 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
  [Error: threads_posix.h was not found in source tree]
#3  0x00007f2325ad5164 in start_thread (arg=<optimized out>) at pthread_create.c:486
  [Error: pthread_create.c was not found in source tree]
#4  0x00007f23259fddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  [Error: clone.S was not found in source tree]

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-27:

#16

Eventually it is an "Program terminated with signal SIGSYS, Bad system call"
So we need to find what is bad about it.

(gdb) info threads
  Id Target Id Frame
* 1 Thread 0x7f2321fe6700 (LWP 17325) 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpus
    et@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
  2 Thread 0x7f2323ad3500 (LWP 17322) 0x00007f2326fe0fb7 in dri_bind_extensions (dri=dri@entry=0x55a59a7583e0, matches=matches@entry=0x7f2326fec34
    0 <dri_core_extensions>, extensions=<optimized out>) at ../src/gbm/backends/dri/gbm_dri.c:286
  3 Thread 0x7f2323acf700 (LWP 17323) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

A discussion with the kernel team pointed to seccomp at first:
...
<apw> grep it appears that seccomp is the only thing which triggers that signal

The stack in the breaking cases uses this by default
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny

resourcecontrol is defined as:
"Disable process affinity and schedular priority"

Interestingly that is the global default, the qemu://system qemu also runs with the same.
I'd assume that:
libgl1-mesa-dri:amd64: /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
behaves differently depending if it is on a local UI session or not.
And it gets punished as soon as it tries to set-affinity which it might only do in that case.

Implemented by
- https://git.qemu.org/?p=qemu.git;a=commit;h=24f8cdc5722476e12d8e39d71f66311b4fa971c1
Similar issue being fixed last year
- https://git.qemu.org/?p=qemu.git;a=commit;h=056de1e894155fbb99e7b43c1c4382d4920cf437

Libvirt has no means to fin-control it (yet), only to switch the hole feature of sandboxing on/off.

That matches what we see - it fails on init when spawning threads - most likely there it will set the affinity.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-27:

#17

From Ubuntu's POV this is rather new as the code in Mesa came in with the fresh 18.3.0_rc4-1
It is possible that no one else saw it so far ...
It is in mesa upstream since
https://github.com/mesa3d/mesa/commit/d877451b48a59ab0f9a4210fc736f51da5851c9a

But opinions might differ ...
I'll subscribe upstream qemu to this bug and then post a summary here.
This will mirror the bug updates to the Mailing List, if there is no harsh feedback I'll propose a patch to remove sched_setaffinity from the list of blocked calls.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-27:

#18

Summary:
- qemu crash when using GL
- "sched_setaffinity" is the syscall that is seccomp blocked and kills qemu
- the mesa i915 drivers (and your radeon as well) will do that call
- it is blocked by the current qemu -sanbox on,...,resourcecontrol=deny which is libvirts default
- Implemented by qemu 24f8cdc572
- Similar issue being fixed last year qemu 056de1e894
- new code in mesa 18.3 since mesa d877451b48

I think we just need to allow sched_setaffinity with these new mesa drivers in the wild.
The alternative to detect gl usage in libvirt and only then allow ressourcecontrol IMHO seems over-engineered (needs internals to actually pass the need of seccomp subsets to be switched) and not better (more syscalls will be non-blocked then as the -secomp interface isn't fine grained).

OTOH the man page literally says "... Disable process affinity ...", so I'm not sure we can just remove it. Maybe split resourcecontrol in two, put *affinity* in the new one and make the default being not blocked - so that upper layers like libvirt will work until one explicitly states ... -sandbox on,affinity=on which no one wanting to use GL would do. That again seems too much.
Well the discussion will happen either here on ML/bug or latter when submitting an RFC for it.

Revision history for this message

Daniel Berrange (berrange) wrote on 2019-02-27:

#19

IMHO that mesa change is not valid. It is settings its affinity to run on all threads which is definitely *NOT* something we want to be allowed. Management applications want to control which CPUs QEMU runs on, and as such Mesa should honour the CPU placement that the QEMU process has.

This is a great example of why QEMU wants to use seccomp to block affinity changes to prevent something silently trying to use more CPUs than are assigned to this QEMU.

Revision history for this message

elmarco (marcandre-lureau) wrote on 2019-02-27:

#20

(I reported that issue a few days ago too: https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg06066.html)

Perhaps we can teach mesa to not change CPU affinity (some option, or environment variable, or seccomp check).

Daniel, when virgl/mesa will be running in a separate process (thanks to vhost-user-gpu), I suppose the rendering process will be free to change the CPU affinity. Does that make a difference if mesa thread is in qemu or a separate process, in this case?

Revision history for this message

Daniel Berrange (berrange) wrote on 2019-02-27:

#21

As & when libvirt & QEMU supports the external vhost processes for this I expect it will still restrict the CPU affinity and apply seccomp filters that likely to be as strict as they are today at minimum.

Revision history for this message

Daniel Berrange (berrange) wrote on 2019-02-27:

#22

I did wonder if we could set the action for some syscalls to be "errno" instead of "kill process", but I worry that could then result in silent mis-behaviour as processes fail to check return value as they blindly assume the call cannot fail.

We should probably talk with mesa developers about providing a config option to prevent this affinity change. An env variable is workable if there's no other mechanism they can expose.

Revision history for this message

elmarco (marcandre-lureau) wrote on 2019-02-27:

#23

See also mesa bug:
https://bugs.freedesktop.org/show_bug.cgi?id=109695

Revision history for this message

In freedesktop.org Bugzilla #109695, Dan-freedesktop (dan-freedesktop) wrote on 2019-02-27:

#30

(In reply to Ahzo from comment #2)
> To check for the availability of the syscall, one can try it in a child
> process and see if the child is terminated by a signal, e.g. like this:

Afraid not, QEMU's seccomp filter blocks use of fork() too :-)

Revision history for this message

In freedesktop.org Bugzilla #109695, Dan-freedesktop (dan-freedesktop) wrote on 2019-02-27:

#31

(In reply to Ahzo from comment #0)
> The problematic code at src/util/u_queue.c:252 was added in the following
> commit:
> commit d877451b48a59ab0f9a4210fc736f51da5851c9a
> Author: Marek Olšák <email address hidden>
> Date: Mon Oct 1 15:51:06 2018 -0400
>
> util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY
>
> Initial version discussed with Rob Clark under a different patch name.
> This approach leaves his driver unaffected.
>
>
> Since setting the thread affinity seems non-essential here, the failing
> syscall should be handled gracefully, for example by setting a signal
> handler to ignore the SIGSYS signal.

I'm curious what motivated this change to start with ? Even if QEMU was not enforcing seccomp filters, I think I'd consider it a bug for mesa to be setting its process affinity in this way. The mgmt application or sysadmin has decided that the process must have a certain affinity, based on how it/they want the host CPUs utilized. Why is mesa wanting to override this administrative policy decision to restrict CPU usage ?

Revision history for this message

In freedesktop.org Bugzilla #109695, Alexdeucher (alexdeucher) wrote on 2019-02-27:

#32

(In reply to Daniel P. Berrange from comment #4)
>
> I'm curious what motivated this change to start with ? Even if QEMU was not
> enforcing seccomp filters, I think I'd consider it a bug for mesa to be
> setting its process affinity in this way. The mgmt application or sysadmin
> has decided that the process must have a certain affinity, based on how
> it/they want the host CPUs utilized. Why is mesa wanting to override this
> administrative policy decision to restrict CPU usage ?

To improve performance on modern multi-core NUMA architectures.

Revision history for this message

In freedesktop.org Bugzilla #109695, elmarco (marcandre-lureau) wrote on 2019-02-27:

#33

Sent a quick RFC for an env variable workaround on the ML "[PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering".

Revision history for this message

In freedesktop.org Bugzilla #109695, Marek Olšák (maraeo) wrote on 2019-02-28:

#34

(In reply to Daniel P. Berrange from comment #4)
> I'm curious what motivated this change to start with ? Even if QEMU was not
> enforcing seccomp filters, I think I'd consider it a bug for mesa to be
> setting its process affinity in this way. The mgmt application or sysadmin
> has decided that the process must have a certain affinity, based on how
> it/they want the host CPUs utilized. Why is mesa wanting to override this
> administrative policy decision to restrict CPU usage ?

The correct solution is to fix pthread_setaffinity such that it returns an error code instead of crashing.

An even better solution would be to have a virtual thread affinity that only the application can see and change, which should be silently masked by administrative policies not visible to the application.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-28:

#24

Thanks Daniel and MarcAndre for chiming in here.
Atfer thinking more about it I agree to Daniel that actually mesa should honor and stick with its affinity assignment.

For documentation purpose: the solution proposed on the ML is at https://lists.freedesktop.org/archives/mesa-dev/2019-February/215926.html
I also added a bug tracker to the fredesktop bug as task.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-28:

#25

@Ubuntu-Desktop Team (now subscribed) - is there a chance we can revert [1] in mesa before it will be released with Disco for now. That would be needed until an accepted solution throughout the stack of libvirt/qemu/mesa is found?
Otherwise using GL backed qemu graphics will fail as outlined in the bug.

Once such a cross-package solution to the problem is found we can (if needed at all) SRU back the set of changes to all components required.

[1]: https://github.com/mesa3d/mesa/commit/d877451b48a59ab0f9a4210fc736f51da5851c9a

Revision history for this message

In freedesktop.org Bugzilla #109695, Michel Dänzer (michel-daenzer) wrote on 2019-02-28:

#35

(In reply to Marek Olšák from comment #7)
> An even better solution would be to have a virtual thread affinity that only
> the application can see and change, which should be silently masked by
> administrative policies not visible to the application.

Mesa doesn't really need explicit thread affinity at all. All it wants is that certain sets of threads run on the same CPU module; it doesn't care which particular CPU module that is. What's really needed is an API to express this affinity between threads, instead of to specific CPU cores.

Revision history for this message

Will Cooke (willcooke) wrote on 2019-02-28:

#26

Adding Timo who maintainers mesa.

Bug Watch Updater (bug-watch-updater) on 2019-02-28

Changed in mesa:
importance:	Unknown → High
status:	Unknown → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #109695, Ahzo (ahzo) wrote on 2019-03-02:

#36

(In reply to Daniel P. Berrange from comment #3)
> (In reply to Ahzo from comment #2)
> > To check for the availability of the syscall, one can try it in a child
> > process and see if the child is terminated by a signal, e.g. like this:
>
> Afraid not, QEMU's seccomp filter blocks use of fork() too :-)

Maybe it should, at least when using the spawn=deny option, but currently it doesn't. That option only blocks the fork, vfork and execve syscalls, but glibc's fork() function uses the clone syscall, and thus continues to work.
However, that behavior might be different when using other C library implementations, so it wouldn't be correct to rely on this.
One could use clone() instead of fork(), but future versions of qemu might block the clone syscall, as well.

Unfortunately, I'm not aware of a proper solution for this bug short of adding a new API to the kernel.

Timo Aaltonen (tjaalton) on 2019-03-04

Changed in mesa (Ubuntu):
importance:	Undecided → Medium
status:	New → Confirmed

Christian Ehrhardt  (paelzer) on 2019-03-04

no longer affects:	qemu (Ubuntu Disco)
Changed in mesa (Ubuntu Disco):
status:	Confirmed → Triaged
assignee:	nobody → Timo Aaltonen (tjaalton)
milestone:	none → ubuntu-19.04

Revision history for this message

Timo Aaltonen (tjaalton) wrote on 2019-03-04:

#37

You can test 19.0~rc6 with this reverted on a ppa:

ppa:canonical-x/x-staging

should be built in 30min

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-03-04:

#38

Hi Timo,
I tried to test with the mesa from ppa:canonical-x/x-staging
But there is a dependency issue in that PPA - I can't install all packages from there.
It seems most of the X* packages will need a transition for the new mesa and those are not in this ppa right now.

Installing all that I can from the PPA doesn't resolve the issue, is there something more you need to upload to the PPA - or are there other things I'd need to do to install all of mesa?

This is the current mix of rc5/6 it gave me :-/
libegl-mesa0:amd64 19.0.0~rc5-1ubuntu0.1
libegl1-mesa:amd64 19.0.0~rc6-1ubuntu0.1
libgl1-mesa-dri:amd64 19.0.0~rc5-1ubuntu0.1
libgl1-mesa-glx:amd64 19.0.0~rc6-1ubuntu0.1
libglapi-mesa:amd64 19.0.0~rc5-1ubuntu0.1
libglx-mesa0:amd64 19.0.0~rc5-1ubuntu0.1
libwayland-egl1-mesa:amd64 19.0.0~rc6-1ubuntu0.1
mesa-va-drivers:amd64 19.0.0~rc5-1ubuntu0.1
mesa-vdpau-drivers:amd64 19.0.0~rc5-1ubuntu0.1

Revision history for this message

Timo Aaltonen (tjaalton) wrote on 2019-03-04:

#39

I don't have that issue on a chroot, so you should at least tell me why it would refuse to upgrade them all.. apt should show an error

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-03-04:

#40

The PPA was built against -proposed so I had to enable that to install all libs.
That done the 19.0.0~rc6-1ubuntu0.1 with the set affinity change reverted works quite nicely.

It would be great to get that into Ubuntu 19.04 until the involved upstreams agreed how to proceed with it and we can then sort out what to do in which package. Which after all might be after cutoff and in 19.10 then.

Thanks Timo, let me know if you need another verification on this at any point to drive it into 19.04.

Revision history for this message

In freedesktop.org Bugzilla #109695, Baker-dylan-c (baker-dylan-c) wrote on 2019-03-06:

#41

We're getting down to just a few bugs blocking 19.0, so I'm pinging those bugs to see what the progress is?

Revision history for this message

In freedesktop.org Bugzilla #109695, Baker-dylan-c (baker-dylan-c) wrote on 2019-03-11:

#42

I'm removing this from the 19.0 blocking tracker. Generally we don't add bugs to block a release if they were present in the previous release, additionally there doesn't seem to be any consensus on a solution, at this moment. If there is a fix implemented I'd be happy to pull that into a later 19.0 release.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-03-15:

#43

This bug was fixed in the package mesa - 19.0.0-1ubuntu1

---------------
mesa (19.0.0-1ubuntu1) disco; urgency=medium

* Merge from Debian. (LP: #1818516)
* revert-set-full-thread-affinity.diff: Fix qemu crash. (LP: #1815889)

-- Timo Aaltonen <email address hidden> Thu, 14 Mar 2019 18:48:18 +0200

Changed in mesa (Ubuntu Disco):
status:	Triaged → Fix Released

Revision history for this message

In freedesktop.org Bugzilla #109695, Marek Olšák (maraeo) wrote on 2019-04-02:

#44

(In reply to Michel Dänzer from comment #8)
> Mesa doesn't really need explicit thread affinity at all. All it wants is
> that certain sets of threads run on the same CPU module; it doesn't care
> which particular CPU module that is. What's really needed is an API to
> express this affinity between threads, instead of to specific CPU cores.

I think the thread affinity API is a correct way to optimize for CPU cache topologies. pthread is a basic user API. Security policies shouldn't disallow pthread functions.

Revision history for this message

Daniel Berrange (berrange) wrote on 2019-04-02:

#45

FYI the QEMU change merged in the following pull request changed to return an EPERM errno for the thread affinity syscalls:

commit 12f067cc14b90aef60b2b7d03e1df74cc50a0459
Merge: 84bdc58c06 035121d23a
Author: Peter Maydell <email address hidden>
Date: Thu Mar 28 12:04:52 2019 +0000

Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20190327' into staging

pull-seccomp-20190327

    # gpg: Signature made Wed 27 Mar 2019 12:12:39 GMT
    # gpg: using RSA key DF32E7C0F0FFF9A2
    # gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <email address hidden>" [full]
    # Primary key fingerprint: D67E 1B50 9374 86B4 0723 DBAB DF32 E7C0 F0FF F9A2

    * remotes/otubo/tags/pull-seccomp-20190327:
      seccomp: report more useful errors from seccomp
      seccomp: don't kill process for resource control syscalls

Signed-off-by: Peter Maydell <email address hidden>

IOW, mesa's usage of this syscalls will still be blocked, but it will no longer kill the process.

Changed in qemu:
status:	New → Fix Committed

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-04-03:

#46

Thank you Daniel,
we will most likely keep Disco as-is for now and merge this in 19.10 where then mesa can drop the revert. I tagged it for 19.10 to be revisited.

tags:	added: qemu-19.10
Changed in qemu (Ubuntu):
status:	Triaged → Won't Fix
status:	Won't Fix → Invalid

Revision history for this message

In freedesktop.org Bugzilla #109695, Ahzo (ahzo) wrote on 2019-04-13:

#47

This problem was solved by qemu [1], so this mesa bug can be closed.

[1] https://git.qemu.org/git/qemu.git/?a=commitdiff;h=9a1565a03b79d80b236bc7cc2dbce52a2ef3a1b8

Bug Watch Updater (bug-watch-updater) on 2019-04-14

Changed in mesa:
status:	Confirmed → Won't Fix

Thomas Huth (th-huth) on 2019-04-24

Changed in qemu:
status:	Fix Committed → Fix Released

Sebastien Bacher (seb128) on 2019-05-07

Changed in mesa (Ubuntu Eoan):
status:	Triaged → Fix Released

Revision history for this message

Sebastien Bacher (seb128) wrote on 2019-05-07:

#48

Reopening/Assigning to TImo for eoan since there is a patch which can we dropped once qemu is fixed

Changed in mesa (Ubuntu Eoan):
status:	Fix Released → Triaged
assignee:	nobody → Timo Aaltonen (tjaalton)

Revision history for this message

Timo Aaltonen (tjaalton) wrote on 2019-10-21:

#49

I believe this was fixed by qemu 4.0 in eoan.

Changed in qemu (Ubuntu Eoan):
status:	Triaged → Fix Released

Timo Aaltonen (tjaalton) on 2019-10-21

Changed in mesa (Ubuntu):
milestone:	ubuntu-19.04 → none
status:	Triaged → In Progress

Timo Aaltonen (tjaalton) on 2019-10-28

Changed in mesa (Ubuntu Eoan):
status:	Triaged → Won't Fix

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-11-25:

#50

This bug was fixed in the package mesa - 19.2.4-1ubuntu1

---------------
mesa (19.2.4-1ubuntu1) focal; urgency=medium

  * Merge from Debian.
  * revert-set-full-thread-affinity.diff: Dropped, qemu is fixed now in
    eoan and up. (LP: #1815889)

-- Timo Aaltonen <email address hidden> Wed, 20 Nov 2019 20:17:00 +0200

Changed in mesa (Ubuntu):
status:	In Progress → Fix Released

Ubuntu
mesa package

qemu-system-x86_64 crashed with signal 31 in __pthread_setaffinity_new()

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
Mesa	Won't Fix	High	freedesktop-bugs #109695
QEMU	Fix Released	Undecided	Unassigned
mesa (Ubuntu)	Fix Released	Medium	Timo Aaltonen
Disco	Fix Released	Medium	Timo Aaltonen	Ubuntu ubuntu-19.04
Eoan	Won't Fix	Undecided	Timo Aaltonen
qemu (Ubuntu)	Invalid	Undecided	Unassigned
Eoan	Fix Released	Undecided	Christian Ehrhardt 

Ubuntumesa package

qemu-system-x86_64 crashed with signal 31 in __pthread_setaffinity_new()

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
mesa package