qemu-system-x86_64 crashed with signal 31 in __pthread_setaffinity_new()

Bug #1815889 reported by Joseph Maillardet on 2019-02-14
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Mesa
Won't Fix
High
QEMU
Undecided
Unassigned
mesa (Ubuntu)
Status tracked in Eoan
Disco
Medium
Timo Aaltonen
Eoan
Undecided
Timo Aaltonen
qemu (Ubuntu)
Status tracked in Eoan
Eoan
Undecided
Christian Ehrhardt 

Bug Description

Unable to launch Default Fedora 29 images in gnome-boxes

ProblemType: Crash
DistroRelease: Ubuntu 19.04
Package: qemu-system-x86 1:3.1+dfsg-2ubuntu1
ProcVersionSignature: Ubuntu 4.19.0-12.13-generic 4.19.18
Uname: Linux 4.19.0-12-generic x86_64
ApportVersion: 2.20.10-0ubuntu20
Architecture: amd64
Date: Thu Feb 14 11:00:45 2019
ExecutablePath: /usr/bin/qemu-system-x86_64
KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND
MachineType: Dell Inc. Precision T3610
ProcEnviron: PATH=(custom, user)
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.19.0-12-generic root=UUID=939b509b-d627-4642-a655-979b44972d17 ro splash quiet vt.handoff=1
Signal: 31
SourcePackage: qemu
StacktraceTop:
 __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f5771fbf680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
 () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
 () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
 start_thread (arg=<optimized out>) at pthread_create.c:486
 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Title: qemu-system-x86_64 crashed with signal 31 in __pthread_setaffinity_new()
UpgradeStatus: Upgraded to disco on 2018-11-14 (91 days ago)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo video
dmi.bios.date: 11/14/2018
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A18
dmi.board.name: 09M8Y8
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA18:bd11/14/2018:svnDellInc.:pnPrecisionT3610:pvr00:rvnDellInc.:rn09M8Y8:rvrA01:cvnDellInc.:ct7:cvr:
dmi.product.name: Precision T3610
dmi.product.sku: 05D2
dmi.product.version: 00
dmi.sys.vendor: Dell Inc.

Joseph Maillardet (jokx) wrote :

StacktraceTop:
 __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f5771fbf680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
 ?? () from /tmp/apport_sandbox_8_pwkx51/usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
 ?? ()
 ?? ()
 ?? ()

tags: added: apport-failed-retrace
tags: removed: need-amd64-retrace

I can confirm the reported issue

Changed in qemu (Ubuntu):
status: New → Confirmed

Trace looks similar:
--- stack trace ---
#0 0x00007f1570fec0bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f156d4e3680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
        __arg2 = 128
        _a3 = 139730004883072
        _a1 = 22587
        resultvar = <optimized out>
        __arg3 = 139730004883072
        __arg1 = 22587
        _a2 = 128
        pd = <optimized out>
        res = <optimized out>
#1 0x00007f156dc8dc73 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
No symbol table info available.
#2 0x00007f156dc8d5d7 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
No symbol table info available.
#3 0x00007f1570fe1164 in start_thread (arg=<optimized out>) at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139730004887296, -2085932122569588158, 140733496626446, 140733496626447, 0, 139730004883520, 2100820740254843458, 2100830499542516290}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#4 0x00007f1570f09def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
--- source code stack trace ---
#0 0x00007f1570fec0bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f156d4e3680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
  [Error: pthread_setaffinity.c was not found in source tree]
#1 0x00007f156dc8dc73 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#2 0x00007f156dc8d5d7 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#3 0x00007f1570fe1164 in start_thread (arg=<optimized out>) at pthread_create.c:486
  [Error: pthread_create.c was not found in source tree]
#4 0x00007f1570f09def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  [Error: clone.S was not found in source tree]

Download full text (6.4 KiB)

libvirt XML that was generated:
<domain type="kvm">
  <name>fedora29-wor</name>
  <uuid>2f4e83f7-18ed-45e2-bbf7-eef9f1c6c6c0</uuid>
  <title>Fedora 29 Workstation</title>
  <metadata>
    <boxes:gnome-boxes xmlns:boxes="https://wiki.gnome.org/Apps/Boxes">
      <os-state>live</os-state>
      <media-id>http://fedoraproject.org/fedora/29:0</media-id>
      <media>/home/paelzer/Fedora-Workstation-Live-x86_64-29-1.2.iso</media>
    </boxes:gnome-boxes>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://fedoraproject.org/fedora/29"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">2097152</memory>
  <currentMemory unit="KiB">2097152</currentMemory>
  <vcpu placement="static">2</vcpu>
  <os>
    <type arch="x86_64" machine="pc-q35-3.1">hvm</type>
    <boot dev="cdrom"/>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode="host-passthrough" check="none">
    <topology sockets="1" cores="2" threads="1"/>
  </cpu>
  <clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" cache="writeback"/>
      <source file="/home/paelzer/.local/share/gnome-boxes/images/fedora29-wor"/>
      <target dev="vda" bus="virtio"/>
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/paelzer/Fedora-Workstation-Live-x86_64-29-1.2.iso" startupPolicy="mandatory"/>
      <target dev="hdc" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="2"/>
    </disk>
    <controller type="usb" index="0" model="ich9-ehci1">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x7"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci1">
      <master startport="0"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x0" multifunction="on"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci2">
      <master startport="2"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x1"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci3">
      <master startport="4"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x2"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000...

Read more...

Interestingly, the Ubuntu 18.10 image works.
So is it really an attribute of the guest that breaks it?

BTW - Arr, why does it spawn its own libvirtd ?!
Dear gnome boxes what are you doing?
0 1000 21610 1 20 0 85807204 68912 poll_s SLl pts/2 0:00 /usr/lib/x86_64-linux-gnu/webkit2gtk-4.0/WebKitWebProcess 2 15
0 1000 21612 1 20 0 85772584 34132 poll_s SLl pts/2 0:00 /usr/lib/x86_64-linux-gnu/webkit2gtk-4.0/WebKitNetworkProcess 3 15
0 1000 21649 1 20 0 1391464 39144 poll_s Sl ? 0:00 /usr/sbin/libvirtd --timeout=30

Thanks to "lsof +fg -p" some important paths:

The guest log is in /home/paelzer/.cache/libvirt/qemu/log/ubuntu18.10.log
Control sockets are at
/run/user/1000/libvirt/libvirt-sock
/run/user/1000/libvirt/libvirt-admin-sock

Now lets try to poke at it without that UI around it ....

The following gets me to non boxy libvirt:
$ virsh -c qemu+unix:///session?socket=/run/user/1000/libvirt/libvirt-sock list --all

For now I'll assume that it is NOT depending on the guest, but lets modify the working Ubuntu guest one by one to become more like the F29 guest and we will see.

1. different disks/iso's/MAC (obviously)
2. F29 has gl enabled on the spice graphics
3. video F29: virtio Ubuntu: qxl
4. video has <acceleration accel3d='yes'/> set

That is all the difference, so it seems 3d'ish to me.

First change
<model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
to
<model type='virtio' heads='1' primary='yes'>
=> still working

Second change enable gl
<gl enable='no'/>
to
<gl enable='yes'/>

=> Broken

Lets take back the First change but keep only the second.
=> still broken.

So it is the enablement of gl which I work on anyway recently (some apparmor changes to make it work in my former setup).

Thanks for sharing this bug, but I need to analyze more in depth what is wrong here, but that might take a while.

Note: Since your guest crashed on start the crash has no private data - marking the bug public ...

For the time being as a workaround:
 virsh -c qemu+unix:///session?socket=/run/user/1000/libvirt/libvirt-sock edit fedora29-wor
(assuming that is your guest name as well)
and switch off the gl enablement.
Gives me a perfectly working guest, hope that helps you for now until a real fix is found.

Changed in qemu (Ubuntu):
status: Confirmed → Triaged
information type: Private → Public
Download full text (6.2 KiB)

FTR: this guest XML (not out of gnome-boxes) works on the very same Host system.
This runs qxl + gl=yes as well and does not fail.
We need to find what the difference is between those is as well.

<domain type='kvm'>
  <name>ubuntu18.04</name>
  <uuid>2f6bde7c-1d3d-498a-b96c-8920f165fa4c</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://ubuntu.com/ubuntu/18.04"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state='off'/>
  </features>
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/ubuntu18.04.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='sda' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x2'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </contr...

Read more...

Since upgrading Mesa from 18.2 to 18.3, launching a QEMU virtual machine with Spice OpenGL enabled (for virgl), causes QEMU to crash with SIGSYS inside the radeonsi driver. The reason for this is that the QEMU sandbox option 'resourcecontrol=deny' disables the sched_setaffinity syscall called in pthread_setaffinity_np, which is now used by the radeonsi driver.

A simple way to reproduce this problem is:
$ gdb --batch --ex run --ex bt --args qemu-system-x86_64 -spice gl=on -sandbox on,resourcecontrol=deny
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff45aa700 (LWP 23432)]
[New Thread 0x7ffff08e5700 (LWP 23433)]
[New Thread 0x7fffe3fff700 (LWP 23434)]
[New Thread 0x7fffe37fe700 (LWP 23435)]

Thread 4 "qemu-system-x86" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7fffe3fff700 (LWP 23434)]
0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
34 ../sysdeps/unix/sysv/linux/pthread_setaffinity.c: No such file or directory.
#0 0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
#1 0x00007ffff12ba2b3 in util_queue_thread_func (input=input@entry=0x55555640b1f0) at ../src/util/u_queue.c:252
#2 0x00007ffff12b9c17 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
#3 0x00007ffff68c1fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4 0x00007ffff67f280f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The problematic code at src/util/u_queue.c:252 was added in the following commit:
commit d877451b48a59ab0f9a4210fc736f51da5851c9a
Author: Marek Olšák <email address hidden>
Date: Mon Oct 1 15:51:06 2018 -0400

    util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY

    Initial version discussed with Rob Clark under a different patch name.
    This approach leaves his driver unaffected.

Since setting the thread affinity seems non-essential here, the failing syscall should be handled gracefully, for example by setting a signal handler to ignore the SIGSYS signal.

Mesa needs a way to query that it can't set thread affinity.

To check for the availability of the syscall, one can try it in a child process and see if the child is terminated by a signal, e.g. like this:

#include <stdbool.h>
#include <unistd.h>
#include <sys/resource.h>
#include <sys/syscall.h>
#include <sys/wait.h>

static bool
can_set_affinity()
{
   pid_t pid = fork();
   int status = 0;
   if (!pid) {
      /* Disable coredumps, because a SIGSYS crash is expected. */
      struct rlimit limit = { 0 };
      limit.rlim_cur = 1;
      limit.rlim_max = 1;
      setrlimit(RLIMIT_CORE, &limit);
      /* Test the syscall in the child process. */
      syscall(SYS_sched_setaffinity, 0, 0, 0);
      _exit(0);
   } else if (pid < 0) {
      return false;
   }
   if (waitpid(pid, &status, 0) < 0) {
      return false;
   }
   if (WIFSIGNALED(status)) {
      /* The child process was terminated by a signal,
       * thus the syscall cannot be used.
       */
      return false;
   }
   return true;
}

Since my domain ran gl fine I was eliminating more differences one by one, keeping <gl enable='yes'/> to check if there is a second ingredient needed.

- do not set acceleration on virtio vido dev
- machine type q35 -> i440fx (and all pcie->pci that comes with that)
- 1 instead of 4 vcpus
- no host passthrough
- no boot from CD
- add pae feature
- remove rtc/pit/hpet clock attributes
- usb ich9-[eu]hci1 -> piix3-uhci
- no smartcard entry
- no usb tablet
- use cirrus video card
- virtio channel
- no PM config
- console virtio serial
- no soundcard
- reduce memory

None of it makes it work, but the files are nearly identical now

That left only the actual disk+iso of fedora vs ubuntu cloudimg based qcow and that the boxes VM used userspace networking. Still the issue remained.

But I realized there is one more difference, the Boxes VM runs in user context while mine is a system level VM (qemu:///system) running the gl essentially headless until one connects to the local spice port.
But the gnome boxes VM was having the UI up immediately connecting to it once available.

So I defined the XML of the gnome-boxes VM in my qemu:///system libvirt context.
This - as expected (I copied the files to /var/lib/libvirt/images and adapted the paths).
This makes it work which is at least some lead to follow.

I can make the viewers (virt-viewer / virt-manager) crash when attaching to it semi-remotely - but that might be a broken setup for a local only spice definition.

When attaching viewers locally it works just fine.

In none of those cases qemu crashes, so it clearly isn't the same. Both fail at some glib errors which makes sense since I try to remote (though ssh) use local only features.

So to summarize:
- crash with gl enabled
- only triggers if run in user context
- gl works in system context (local viewers can attach and it works)

I'm out of obvious "change the config to check what it is" options.
But since it is at least reproducible I'll focus on the qemu backtrace itself next ...

Stack trace with slightly more info as all DBG and source is installed here.

--- stack trace ---
#0 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
        __arg2 = 128
        _a3 = 139788870899328
        _a1 = 17325
        resultvar = <optimized out>
        __arg3 = 139788870899328
        __arg1 = 17325
        _a2 = 128
        pd = <optimized out>
        res = <optimized out>
#1 0x00007f23227abd83 in util_queue_thread_func (input=input@entry=0x55a59a695bd0) at ../src/util/u_queue.c:252
        cpuset = {__bits = {18446744073709551615 <repeats 16 times>}}
        queue = 0x55a59a8952d0
        thread_index = 0
        __PRETTY_FUNCTION__ = "util_queue_thread_func"
#2 0x00007f23227ab6e7 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
        pack = {func = 0x7f23227aba70 <util_queue_thread_func>, arg = 0x55a59a695bd0}
#3 0x00007f2325ad5164 in start_thread (arg=<optimized out>) at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139788870903552, 9195723382052266688, 140723610455422, 140723610455423, 0, 139788870899776, -9089523756422225216, -9089514281776799040}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#4 0x00007f23259fddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
--- source code stack trace ---
#0 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
  [Error: pthread_setaffinity.c was not found in source tree]
#1 0x00007f23227abd83 in util_queue_thread_func (input=input@entry=0x55a59a695bd0) at ../src/util/u_queue.c:252
  [Error: u_queue.c was not found in source tree]
#2 0x00007f23227ab6e7 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
  [Error: threads_posix.h was not found in source tree]
#3 0x00007f2325ad5164 in start_thread (arg=<optimized out>) at pthread_create.c:486
  [Error: pthread_create.c was not found in source tree]
#4 0x00007f23259fddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  [Error: clone.S was not found in source tree]

Eventually it is an "Program terminated with signal SIGSYS, Bad system call"
So we need to find what is bad about it.

(gdb) info threads
  Id Target Id Frame
* 1 Thread 0x7f2321fe6700 (LWP 17325) 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpus
    et@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
  2 Thread 0x7f2323ad3500 (LWP 17322) 0x00007f2326fe0fb7 in dri_bind_extensions (dri=dri@entry=0x55a59a7583e0, matches=matches@entry=0x7f2326fec34
    0 <dri_core_extensions>, extensions=<optimized out>) at ../src/gbm/backends/dri/gbm_dri.c:286
  3 Thread 0x7f2323acf700 (LWP 17323) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

A discussion with the kernel team pointed to seccomp at first:
...
<apw> grep it appears that seccomp is the only thing which triggers that signal

The stack in the breaking cases uses this by default
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny

resourcecontrol is defined as:
"Disable process affinity and schedular priority"

Interestingly that is the global default, the qemu://system qemu also runs with the same.
I'd assume that:
  libgl1-mesa-dri:amd64: /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
behaves differently depending if it is on a local UI session or not.
And it gets punished as soon as it tries to set-affinity which it might only do in that case.

Implemented by
- https://git.qemu.org/?p=qemu.git;a=commit;h=24f8cdc5722476e12d8e39d71f66311b4fa971c1
Similar issue being fixed last year
- https://git.qemu.org/?p=qemu.git;a=commit;h=056de1e894155fbb99e7b43c1c4382d4920cf437

Libvirt has no means to fin-control it (yet), only to switch the hole feature of sandboxing on/off.

That matches what we see - it fails on init when spawning threads - most likely there it will set the affinity.

From Ubuntu's POV this is rather new as the code in Mesa came in with the fresh 18.3.0_rc4-1
It is possible that no one else saw it so far ...
It is in mesa upstream since
  https://github.com/mesa3d/mesa/commit/d877451b48a59ab0f9a4210fc736f51da5851c9a

But opinions might differ ...
I'll subscribe upstream qemu to this bug and then post a summary here.
This will mirror the bug updates to the Mailing List, if there is no harsh feedback I'll propose a patch to remove sched_setaffinity from the list of blocked calls.

Summary:
- qemu crash when using GL
- "sched_setaffinity" is the syscall that is seccomp blocked and kills qemu
- the mesa i915 drivers (and your radeon as well) will do that call
- it is blocked by the current qemu -sanbox on,...,resourcecontrol=deny which is libvirts default
- Implemented by qemu 24f8cdc572
- Similar issue being fixed last year qemu 056de1e894
- new code in mesa 18.3 since mesa d877451b48

I think we just need to allow sched_setaffinity with these new mesa drivers in the wild.
The alternative to detect gl usage in libvirt and only then allow ressourcecontrol IMHO seems over-engineered (needs internals to actually pass the need of seccomp subsets to be switched) and not better (more syscalls will be non-blocked then as the -secomp interface isn't fine grained).

OTOH the man page literally says "... Disable process affinity ...", so I'm not sure we can just remove it. Maybe split resourcecontrol in two, put *affinity* in the new one and make the default being not blocked - so that upper layers like libvirt will work until one explicitly states ... -sandbox on,affinity=on which no one wanting to use GL would do. That again seems too much.
Well the discussion will happen either here on ML/bug or latter when submitting an RFC for it.

Daniel Berrange (berrange) wrote :

IMHO that mesa change is not valid. It is settings its affinity to run on all threads which is definitely *NOT* something we want to be allowed. Management applications want to control which CPUs QEMU runs on, and as such Mesa should honour the CPU placement that the QEMU process has.

This is a great example of why QEMU wants to use seccomp to block affinity changes to prevent something silently trying to use more CPUs than are assigned to this QEMU.

elmarco (marcandre-lureau) wrote :

(I reported that issue a few days ago too: https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg06066.html)

Perhaps we can teach mesa to not change CPU affinity (some option, or environment variable, or seccomp check).

Daniel, when virgl/mesa will be running in a separate process (thanks to vhost-user-gpu), I suppose the rendering process will be free to change the CPU affinity. Does that make a difference if mesa thread is in qemu or a separate process, in this case?

Daniel Berrange (berrange) wrote :

As & when libvirt & QEMU supports the external vhost processes for this I expect it will still restrict the CPU affinity and apply seccomp filters that likely to be as strict as they are today at minimum.

Daniel Berrange (berrange) wrote :

I did wonder if we could set the action for some syscalls to be "errno" instead of "kill process", but I worry that could then result in silent mis-behaviour as processes fail to check return value as they blindly assume the call cannot fail.

We should probably talk with mesa developers about providing a config option to prevent this affinity change. An env variable is workable if there's no other mechanism they can expose.

(In reply to Ahzo from comment #2)
> To check for the availability of the syscall, one can try it in a child
> process and see if the child is terminated by a signal, e.g. like this:

Afraid not, QEMU's seccomp filter blocks use of fork() too :-)

(In reply to Ahzo from comment #0)
> The problematic code at src/util/u_queue.c:252 was added in the following
> commit:
> commit d877451b48a59ab0f9a4210fc736f51da5851c9a
> Author: Marek Olšák <email address hidden>
> Date: Mon Oct 1 15:51:06 2018 -0400
>
> util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY
>
> Initial version discussed with Rob Clark under a different patch name.
> This approach leaves his driver unaffected.
>
>
> Since setting the thread affinity seems non-essential here, the failing
> syscall should be handled gracefully, for example by setting a signal
> handler to ignore the SIGSYS signal.

I'm curious what motivated this change to start with ? Even if QEMU was not enforcing seccomp filters, I think I'd consider it a bug for mesa to be setting its process affinity in this way. The mgmt application or sysadmin has decided that the process must have a certain affinity, based on how it/they want the host CPUs utilized. Why is mesa wanting to override this administrative policy decision to restrict CPU usage ?

(In reply to Daniel P. Berrange from comment #4)
>
> I'm curious what motivated this change to start with ? Even if QEMU was not
> enforcing seccomp filters, I think I'd consider it a bug for mesa to be
> setting its process affinity in this way. The mgmt application or sysadmin
> has decided that the process must have a certain affinity, based on how
> it/they want the host CPUs utilized. Why is mesa wanting to override this
> administrative policy decision to restrict CPU usage ?

To improve performance on modern multi-core NUMA architectures.

Sent a quick RFC for an env variable workaround on the ML "[PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering".

(In reply to Daniel P. Berrange from comment #4)
> I'm curious what motivated this change to start with ? Even if QEMU was not
> enforcing seccomp filters, I think I'd consider it a bug for mesa to be
> setting its process affinity in this way. The mgmt application or sysadmin
> has decided that the process must have a certain affinity, based on how
> it/they want the host CPUs utilized. Why is mesa wanting to override this
> administrative policy decision to restrict CPU usage ?

The correct solution is to fix pthread_setaffinity such that it returns an error code instead of crashing.

An even better solution would be to have a virtual thread affinity that only the application can see and change, which should be silently masked by administrative policies not visible to the application.

Thanks Daniel and MarcAndre for chiming in here.
Atfer thinking more about it I agree to Daniel that actually mesa should honor and stick with its affinity assignment.

For documentation purpose: the solution proposed on the ML is at https://lists.freedesktop.org/archives/mesa-dev/2019-February/215926.html
I also added a bug tracker to the fredesktop bug as task.

@Ubuntu-Desktop Team (now subscribed) - is there a chance we can revert [1] in mesa before it will be released with Disco for now. That would be needed until an accepted solution throughout the stack of libvirt/qemu/mesa is found?
Otherwise using GL backed qemu graphics will fail as outlined in the bug.

Once such a cross-package solution to the problem is found we can (if needed at all) SRU back the set of changes to all components required.

[1]: https://github.com/mesa3d/mesa/commit/d877451b48a59ab0f9a4210fc736f51da5851c9a

(In reply to Marek Olšák from comment #7)
> An even better solution would be to have a virtual thread affinity that only
> the application can see and change, which should be silently masked by
> administrative policies not visible to the application.

Mesa doesn't really need explicit thread affinity at all. All it wants is that certain sets of threads run on the same CPU module; it doesn't care which particular CPU module that is. What's really needed is an API to express this affinity between threads, instead of to specific CPU cores.

Will Cooke (willcooke) wrote :

Adding Timo who maintainers mesa.

Changed in mesa:
importance: Unknown → High
status: Unknown → Confirmed

(In reply to Daniel P. Berrange from comment #3)
> (In reply to Ahzo from comment #2)
> > To check for the availability of the syscall, one can try it in a child
> > process and see if the child is terminated by a signal, e.g. like this:
>
> Afraid not, QEMU's seccomp filter blocks use of fork() too :-)

Maybe it should, at least when using the spawn=deny option, but currently it doesn't. That option only blocks the fork, vfork and execve syscalls, but glibc's fork() function uses the clone syscall, and thus continues to work.
However, that behavior might be different when using other C library implementations, so it wouldn't be correct to rely on this.
One could use clone() instead of fork(), but future versions of qemu might block the clone syscall, as well.

Unfortunately, I'm not aware of a proper solution for this bug short of adding a new API to the kernel.

Timo Aaltonen (tjaalton) on 2019-03-04
Changed in mesa (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
no longer affects: qemu (Ubuntu Disco)
Changed in mesa (Ubuntu Disco):
status: Confirmed → Triaged
assignee: nobody → Timo Aaltonen (tjaalton)
milestone: none → ubuntu-19.04
Timo Aaltonen (tjaalton) wrote :

You can test 19.0~rc6 with this reverted on a ppa:

ppa:canonical-x/x-staging

should be built in 30min

Hi Timo,
I tried to test with the mesa from ppa:canonical-x/x-staging
But there is a dependency issue in that PPA - I can't install all packages from there.
It seems most of the X* packages will need a transition for the new mesa and those are not in this ppa right now.

Installing all that I can from the PPA doesn't resolve the issue, is there something more you need to upload to the PPA - or are there other things I'd need to do to install all of mesa?

This is the current mix of rc5/6 it gave me :-/
libegl-mesa0:amd64 19.0.0~rc5-1ubuntu0.1
libegl1-mesa:amd64 19.0.0~rc6-1ubuntu0.1
libgl1-mesa-dri:amd64 19.0.0~rc5-1ubuntu0.1
libgl1-mesa-glx:amd64 19.0.0~rc6-1ubuntu0.1
libglapi-mesa:amd64 19.0.0~rc5-1ubuntu0.1
libglx-mesa0:amd64 19.0.0~rc5-1ubuntu0.1
libwayland-egl1-mesa:amd64 19.0.0~rc6-1ubuntu0.1
mesa-va-drivers:amd64 19.0.0~rc5-1ubuntu0.1
mesa-vdpau-drivers:amd64 19.0.0~rc5-1ubuntu0.1

Timo Aaltonen (tjaalton) wrote :

I don't have that issue on a chroot, so you should at least tell me why it would refuse to upgrade them all.. apt should show an error

The PPA was built against -proposed so I had to enable that to install all libs.
That done the 19.0.0~rc6-1ubuntu0.1 with the set affinity change reverted works quite nicely.

It would be great to get that into Ubuntu 19.04 until the involved upstreams agreed how to proceed with it and we can then sort out what to do in which package. Which after all might be after cutoff and in 19.10 then.

Thanks Timo, let me know if you need another verification on this at any point to drive it into 19.04.

We're getting down to just a few bugs blocking 19.0, so I'm pinging those bugs to see what the progress is?

I'm removing this from the 19.0 blocking tracker. Generally we don't add bugs to block a release if they were present in the previous release, additionally there doesn't seem to be any consensus on a solution, at this moment. If there is a fix implemented I'd be happy to pull that into a later 19.0 release.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mesa - 19.0.0-1ubuntu1

---------------
mesa (19.0.0-1ubuntu1) disco; urgency=medium

  * Merge from Debian. (LP: #1818516)
  * revert-set-full-thread-affinity.diff: Fix qemu crash. (LP: #1815889)

 -- Timo Aaltonen <email address hidden> Thu, 14 Mar 2019 18:48:18 +0200

Changed in mesa (Ubuntu Disco):
status: Triaged → Fix Released

(In reply to Michel Dänzer from comment #8)
> Mesa doesn't really need explicit thread affinity at all. All it wants is
> that certain sets of threads run on the same CPU module; it doesn't care
> which particular CPU module that is. What's really needed is an API to
> express this affinity between threads, instead of to specific CPU cores.

I think the thread affinity API is a correct way to optimize for CPU cache topologies. pthread is a basic user API. Security policies shouldn't disallow pthread functions.

Daniel Berrange (berrange) wrote :

FYI the QEMU change merged in the following pull request changed to return an EPERM errno for the thread affinity syscalls:

commit 12f067cc14b90aef60b2b7d03e1df74cc50a0459
Merge: 84bdc58c06 035121d23a
Author: Peter Maydell <email address hidden>
Date: Thu Mar 28 12:04:52 2019 +0000

    Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20190327' into staging

    pull-seccomp-20190327

    # gpg: Signature made Wed 27 Mar 2019 12:12:39 GMT
    # gpg: using RSA key DF32E7C0F0FFF9A2
    # gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <email address hidden>" [full]
    # Primary key fingerprint: D67E 1B50 9374 86B4 0723 DBAB DF32 E7C0 F0FF F9A2

    * remotes/otubo/tags/pull-seccomp-20190327:
      seccomp: report more useful errors from seccomp
      seccomp: don't kill process for resource control syscalls

    Signed-off-by: Peter Maydell <email address hidden>

IOW, mesa's usage of this syscalls will still be blocked, but it will no longer kill the process.

Changed in qemu:
status: New → Fix Committed

Thank you Daniel,
we will most likely keep Disco as-is for now and merge this in 19.10 where then mesa can drop the revert. I tagged it for 19.10 to be revisited.

tags: added: qemu-19.10
Changed in qemu (Ubuntu):
status: Triaged → Won't Fix
status: Won't Fix → Invalid

This problem was solved by qemu [1], so this mesa bug can be closed.

[1] https://git.qemu.org/git/qemu.git/?a=commitdiff;h=9a1565a03b79d80b236bc7cc2dbce52a2ef3a1b8

Changed in mesa:
status: Confirmed → Won't Fix
Thomas Huth (th-huth) on 2019-04-24
Changed in qemu:
status: Fix Committed → Fix Released
Changed in mesa (Ubuntu Eoan):
status: Triaged → Fix Released
Sebastien Bacher (seb128) wrote :

Reopening/Assigning to TImo for eoan since there is a patch which can we dropped once qemu is fixed

Changed in mesa (Ubuntu Eoan):
status: Fix Released → Triaged
assignee: nobody → Timo Aaltonen (tjaalton)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.