qxl_send_events: spice-server bug: guest stopped, ignoring

Bug #1964777 reported by Christian Ehrhardt 
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
spice-xpi
Fix Released
Unknown
qemu (Ubuntu)
Invalid
Undecided
Unassigned
spice (Ubuntu)
Fix Released
Medium
Christian Ehrhardt 

Bug Description

When starting KVM guest I saw this error on recent Ubuntu:
"qxl_send_events: spice-server bug: guest stopped, ignoring"

Note: the common way how libvirt 8.0 sets up a guest by default also triggers
this (this is what got me started to search for it).

Since graphic related things could be odd on servers I connected an X stack.
But still, when using `-vga qxl` I see this message.

The message happens for all display types -display [vnc gtk none sdl]
Also if vnc and spice are present (for the sake of initializing backends)

$ qemu-system-x86_64 -cpu host -machine q35,accel=kvm -m 512M \
  -kernel /boot/vmlinuz \
  -vga qxl -display vnc=:0 -spice port=5903,addr=127.0.0.1
...
qxl_send_events: spice-server bug: guest stopped, ignoring

This was working prior to qemu 6.2

The fail seems to be non-fatal, the guest worked to present boot console
just fine. Booting the same into a UI e.g. an Ubuntu install ISO TODO.

qemu-system-x86_64 -smp 2 -cpu host -machine ubuntu,accel=kvm -m 1024M \
  -vga qxl -display vnc=:0 \
  -boot d -cdrom ubuntu-20.04.4-desktop-amd64.iso

^^ works including bootsplash graphics as well as the UI stack in gnome is
happy with QXL. This is true for local windows (e.g. gtk) as well as
connections via spice client.
So it is not fatal, but disturbing it is still ...

I found quite some web-hits of this error signature, but mostly in regard to
un-raid. I failed to find why they faced it and what their fix was. mostly
discussions and hints to reboot to fix.

I also found some very old bugs and references:
- https://bugzilla.redhat.com/show_bug.cgi?id=912218
- https://patchwork.ozlabs.org/project<email address hidden>/

But while not seeing an issue about the error itself I see the error message
present on a few recent issue reports - but in none of these it was the topic
that was up for debate:
- https://gitlab.com/qemu-project/qemu/-/issues/681
- https://gitlab.com/qemu-project/qemu/-/issues/867
- https://forum.manjaro.org/t/virtual-machine-manager-error-qemu/95314/4

Tags: server-todo

Related branches

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Looking at the code that warning is from here:
```
 static void qxl_send_events(PCIQXLDevice *d, uint32_t events)
 {
     uint32_t old_pending;
     uint32_t le_events = cpu_to_le32(events);

     trace_qxl_send_events(d->id, events);
     if (!qemu_spice_display_is_running(&d->ssd)) {
         /* spice-server tracks guest running state and should not do this */
         fprintf(stderr, "%s: spice-server bug: guest stopped, ignoring\n",
                 __func__);
         trace_qxl_send_events_vm_stopped(d->id, events);
         return;
     }
```

That explains a bit why it isn't fatal at first, all that happens is the
error message followed by stopping the trace again.
But it does an early return from qxl_send_events and thereby that will
never work. Here a list of later calls that I now wonder if they'd fail as well:
  1 qxl.c 129 qxl_send_events(qxl, QXL_INTERRUPT_ERROR);
  2 qxl.c 663 qxl_send_events(qxl, QXL_INTERRUPT_DISPLAY);
  3 qxl.c 725 qxl_send_events(d, QXL_INTERRUPT_DISPLAY);
  4 qxl.c 822 qxl_send_events(qxl, QXL_INTERRUPT_CURSOR);
  5 qxl.c 925 qxl_send_events(qxl, QXL_INTERRUPT_IO_CMD);
  6 qxl.c 1019 qxl_send_events(qxl, QXL_INTERRUPT_CLIENT);
  7 qxl.c 1123 qxl_send_events(qxl, QXL_INTERRUPT_CLIENT_MONITORS_CONFIG);
  8 qxl.c 1634 qxl_send_events(d, QXL_INTERRUPT_IO_CMD);
  9 qxl.c 1833 qxl_send_events(d, QXL_INTERRUPT_IO_CMD);

Checking related libs:

spice-protocol | 0.14.3-1 | jammy | source
spice | 0.15.0-2ubuntu3 | jammy | source
=> That doesn't seem too outdated.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I wanted to start bisecting to identify the offending commit.
But what I've found is that even old build e.g. v5.2.0 are broken
in this Ubuntu 22.04 test system.

In return I checked older Ubuntu systems if I might just have missed the
message before. But up until 21.10 with qemu 6.0 it was just fine.

In addition I found that when I copied over qemu 5.2.0 (21.04) and 6.0.0 (21.10)
builds to the jammy system that they would there expose the same issue.

So it isn't a qemu change that we are hunting.
Instead something in Jammy changed triggering this!

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

An -O0 build later for debugging...

SimpleSpiceDisplay *ssd is ignored, the value qemu_spice_display_is_running
returns is the global spice_display_is_running of ui/spice-core.c

$6 = (int *) 0x55555650d5b4 <spice_display_is_running>
That is always 0 up to the point this checks it.

A watch on ~/work/qemu/qemu.git/ui/spice-core.c spice_display_is_running did
not see it enabled either.

Cross checks by Athos and Andreas confirmed it is not just a silly setup of
mine - the problem happens the same way for them.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Compare the same git build between Impish and Jammy:

~/qemu/build$ gdb ./qemu-system-x86_64
(gdb) watch spice_display_is_running
(gdb) b qxl_send_events
(gdb) b qemu_spice_display_is_running
(gdb) b qemu_spice_display_start
(gdb) handle SIGUSR1 nostop noprint pass
(gdb) run -cpu qemu64 -machine q35,accel=kvm -m 512M -vga qxl -display none -spice port=5903,addr=127.0.0.1

In Impish I see it checking `qemu_spice_display_is_running` a few times
then we see `qemu_spice_display_start` come by and enable the display.
Only later does qxl_send_events happen and then trigger the error.

```
Starting program: /home/ubuntu/qemu/build/qemu-system-x86_64 -cpu qemu64 -machine q35,accel=kvm -m 512M -vga qxl -display none -spice port=5903,addr=127.0.0.1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6243640 (LWP 721583)]
[New Thread 0x7ffff53c2640 (LWP 721584)]
[New Thread 0x7fffcfdff640 (LWP 721585)]

Thread 1 "qemu-system-x86" hit Breakpoint 2, qemu_spice_display_is_running (ssd=0x555557191bc0) at ../ui/spice-core.c:1015
1015 return spice_display_is_running;
(gdb) c
Continuing.

Thread 1 "qemu-system-x86" hit Breakpoint 2, qemu_spice_display_is_running (ssd=0x555557191bc0) at ../ui/spice-core.c:1015
1015 return spice_display_is_running;
(gdb) c
Continuing.

Thread 1 "qemu-system-x86" hit Breakpoint 2, qemu_spice_display_is_running (ssd=0x555557191bc0) at ../ui/spice-core.c:1015
1015 return spice_display_is_running;
(gdb) c
Continuing.

Thread 1 "qemu-system-x86" hit Breakpoint 2, qemu_spice_display_is_running (ssd=0x555557191bc0) at ../ui/spice-core.c:1015
1015 return spice_display_is_running;
(gdb) c
Continuing.

Thread 1 "qemu-system-x86" hit Breakpoint 4, qemu_spice_display_start () at ../ui/spice-core.c:995
995 if (spice_display_is_running) {
(gdb) c
Continuing.

Thread 1 "qemu-system-x86" hit Hardware watchpoint 1: spice_display_is_running

Old value = 0
New value = 1
qemu_spice_display_start () at ../ui/spice-core.c:1000
1000 spice_server_vm_start(spice_server);
(gdb) c
Continuing.
[Switching to Thread 0x7fffcfdff640 (LWP 721585)]

Thread 4 "SPICE Worker" hit Breakpoint 3, qxl_send_events (d=0x555557191210, events=16) at ../hw/display/qxl.c:1867
```

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We see that on Jammy the initialization ordering is different, here we get a
call to `qxl_send_events` before qemu_spice_display_start happened and
therefore triggering the error message.
Later on the init happens and that might explain why I see no fatal issues
later on on when using spice.

Nevertheless why is this initializing in different orders?

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Good case (qemu_spice_display_start before qxl_send_events)

Thread 1 "qemu-system-x86" hit Breakpoint 4, qemu_spice_display_start () at ../ui/spice-core.c:995
995 if (spice_display_is_running) {
(gdb) bt
#0 qemu_spice_display_start () at ../ui/spice-core.c:995
#1 0x0000555555a8b6b7 in qemu_spice_display_init_done () at ../ui/spice-core.c:635
#2 0x0000555555a8fe25 in qemu_spice_display_init () at ../ui/spice-display.c:1198
#3 0x0000555555b38646 in qemu_init_displays () at ../softmmu/vl.c:2635
#4 0x0000555555b3b1ad in qemu_init (argc=13, argv=0x7fffffffe4a8, envp=0x7fffffffe518) at ../softmmu/vl.c:3777
#5 0x000055555580bd81 in main (argc=13, argv=0x7fffffffe4a8, envp=0x7fffffffe518) at ../softmmu/main.c:49
(gdb) c
Continuing.

Thread 4 "SPICE Worker" hit Breakpoint 3, qxl_send_events (d=0x555557191210, events=16) at ../hw/display/qxl.c:1867
1867 uint32_t le_events = cpu_to_le32(events);
(gdb) bt
#0 qxl_send_events (d=0x555557191210, events=16) at ../hw/display/qxl.c:1867
#1 0x0000555555a9a5d5 in interface_set_client_capabilities (sin=0x555557191c08, client_present=0 '\000', caps=0x7fffcfdfe6b0 "") at ../hw/display/qxl.c:1019
#2 0x00007ffff7e12231 in ?? () from /lib/x86_64-linux-gnu/libspice-server.so.1
#3 0x00007ffff7e3c943 in ?? () from /lib/x86_64-linux-gnu/libspice-server.so.1
#4 0x00007ffff7e0a255 in ?? () from /lib/x86_64-linux-gnu/libspice-server.so.1
#5 0x00007ffff7e0ae51 in ?? () from /lib/x86_64-linux-gnu/libspice-server.so.1
#6 0x00007ffff7a7f7c4 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#7 0x00007ffff7ad2f08 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#8 0x00007ffff7a7ee43 in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#9 0x00007ffff7e3d1ca in ?? () from /lib/x86_64-linux-gnu/libspice-server.so.1
#10 0x00007ffff76b3927 in start_thread (arg=<optimized out>) at pthread_create.c:435
#11 0x00007ffff77439e4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Bad case (qemu_spice_display_start after qxl_send_events)

Thread 4 "SPICE Worker" hit Breakpoint 3, qxl_send_events (d=0x555557194210, events=16) at ../hw/display/qxl.c:1867
1867 uint32_t le_events = cpu_to_le32(events);
(gdb) bt
#0 qxl_send_events (d=0x555557194210, events=16) at ../hw/display/qxl.c:1867
#1 0x0000555555a9a5d5 in interface_set_client_capabilities (sin=0x555557194c08, client_present=0 '\000', caps=0x7fffcfbfe6d0 "") at ../hw/display/qxl.c:1019
#2 0x00007ffff7e09b29 in () at /lib/x86_64-linux-gnu/libspice-server.so.1
#3 0x00007ffff7e32b8f in () at /lib/x86_64-linux-gnu/libspice-server.so.1
#4 0x00007ffff7e0581a in () at /lib/x86_64-linux-gnu/libspice-server.so.1
#5 0x00007ffff7e05ea1 in () at /lib/x86_64-linux-gnu/libspice-server.so.1
#6 0x00007ffff7a65c24 in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#7 0x00007ffff7aba6e8 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#8 0x00007ffff7a65293 in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#9 0x00007ffff7e339b8 in () at /lib/x86_64-linux-gnu/libspice-server.so.1
#10 0x00007ffff768ab43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x00007ffff771ca00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 "qemu-system-x86" hit Breakpoint 4, qemu_spice_display_start () at ../ui/spice-core.c:995
995 if (spice_display_is_running) {
(gdb) bt
#0 qemu_spice_display_start () at ../ui/spice-core.c:995
#1 0x0000555555a8b6b7 in qemu_spice_display_init_done () at ../ui/spice-core.c:635
#2 0x0000555555a8fe25 in qemu_spice_display_init () at ../ui/spice-display.c:1198
#3 0x0000555555b38646 in qemu_init_displays () at ../softmmu/vl.c:2635
#4 0x0000555555b3b1ad in qemu_init (argc=13, argv=0x7fffffffe2a8, envp=0x7fffffffe318) at ../softmmu/vl.c:3777
#5 0x000055555580bd81 in main (argc=13, argv=0x7fffffffe2a8, envp=0x7fffffffe318) at ../softmmu/main.c:49

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.6 KiB)

In the bad case the spice worker is at the fist qxl_send_events while at the same time the main process initializing is still not done yet.

(gdb) info thread
  Id Target Id Frame
  1 Thread 0x7ffff67dbc40 (LWP 144970) "qemu-system-x86" qemu_spice_display_is_running (ssd=0x555557194bc0) at ../ui/spice-core.c:1016
  2 Thread 0x7ffff5e2e640 (LWP 144971) "qemu-system-x86" futex_wait (private=0, expected=2, futex_word=0x55555650df40 <qemu_global_mutex>) at ../sysdeps/nptl/futex-internal.h:146
  3 Thread 0x7ffff4d6e640 (LWP 144973) "qemu-system-x86" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555567d6c98)
    at ./nptl/futex-internal.c:57
* 4 Thread 0x7fffcfbff640 (LWP 144975) "SPICE Worker" qxl_send_events (d=0x555557194210, events=16) at ../hw/display/qxl.c:1867

At the time this event is sent the intializer still isn't done yet.

(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff67dbc40 (LWP 144970))]
#0 qemu_spice_display_is_running (ssd=0x555557194bc0) at ../ui/spice-core.c:1016
1016 }
(gdb) bt
#0 qemu_spice_display_is_running (ssd=0x555557194bc0) at ../ui/spice-core.c:1016
#1 0x0000555555a9abfa in qxl_check_state (d=0x555557194210) at ../hw/display/qxl.c:1200
#2 0x0000555555a9acb3 in qxl_reset_state (d=0x555557194210) at ../hw/display/qxl.c:1210
#3 0x0000555555a9d767 in qxl_realize_common (qxl=0x555557194210, errp=0x7fffffffd9e0) at ../hw/display/qxl.c:2186
#4 0x0000555555a9d9bd in qxl_realize_primary (dev=0x555557194210, errp=0x7fffffffda38) at ../hw/display/qxl.c:2218
#5 0x00005555559d1479 in pci_qdev_realize (qdev=0x555557194210, errp=0x7fffffffdab0) at ../hw/pci/pci.c:2143
#6 0x0000555555be14bf in device_set_realized (obj=0x555557194210, value=true, errp=0x7fffffffdbc0) at ../hw/core/qdev.c:531
#7 0x0000555555beb318 in property_set_bool (obj=0x555557194210, v=0x55555718f370, name=0x555555ee76a9 "realized", opaque=0x5555565ba460, errp=0x7fffffffdbc0) at ../qom/object.c:2268
#8 0x0000555555be92e7 in object_property_set (obj=0x555557194210, name=0x555555ee76a9 "realized", v=0x55555718f370, errp=0x7fffffffdbc0) at ../qom/object.c:1403
#9 0x0000555555bed785 in object_property_set_qobject (obj=0x555557194210, name=0x555555ee76a9 "realized", value=0x555557193e30, errp=0x5555565137d0 <error_fatal>) at ../qom/qom-qobject.c:28
#10 0x0000555555be9652 in object_property_set_bool (obj=0x555557194210, name=0x555555ee76a9 "realized", value=true, errp=0x5555565137d0 <error_fatal>) at ../qom/object.c:1472
#11 0x0000555555be0d53 in qdev_realize (dev=0x555557194210, bus=0x555556937860, errp=0x5555565137d0 <error_fatal>) at ../hw/core/qdev.c:333
#12 0x0000555555be0d80 in qdev_realize_and_unref (dev=0x555557194210, bus=0x555556937860, errp=0x5555565137d0 <error_fatal>) at ../hw/core/qdev.c:340
#13 0x00005555559d176a in pci_realize_and_unref (dev=0x555557194210, bus=0x555556937860, errp=0x5555565137d0 <error_fatal>) at ../hw/pci/pci.c:2210
#14 0x00005555559d17b6 in pci_create_simple_multifunction (bus=0x555556937860, devfn=-1, multifunction=false, name=0x555555e6fb8f "qxl-vga") at ../hw/pci/pci.c:2218
#15 0x00005555559d17ea in...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.1 KiB)

Thread spawn for the spice worker:

Thread 1 "qemu-system-x86" hit Breakpoint 5, __pthread_create_2_1 (newthread=0x5555572d1690, attr=0x0, start_routine=0x7ffff7e3d140, arg=0x5555572d1690) at pthread_create.c:597
597 in pthread_create.c
(gdb) info threads
  Id Target Id Frame
* 1 Thread 0x7ffff6bb5c40 (LWP 730722) "qemu-system-x86" __pthread_create_2_1 (newthread=0x5555572d1690, attr=0x0, start_routine=0x7ffff7e3d140, arg=0x5555572d1690)
    at pthread_create.c:597
  2 Thread 0x7ffff6243640 (LWP 731428) "qemu-system-x86" futex_wait (private=0, expected=2, futex_word=0x55555650df40 <qemu_global_mutex>) at ../sysdeps/nptl/futex-internal.h:146
  3 Thread 0x7ffff53c2640 (LWP 731429) "qemu-system-x86" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555567d4928)
    at futex-internal.c:57
(gdb) bt
#0 __pthread_create_2_1 (newthread=0x5555572d1690, attr=0x0, start_routine=0x7ffff7e3d140, arg=0x5555572d1690) at pthread_create.c:597
#1 0x00007ffff7e362f0 in spice_server_add_interface () from /lib/x86_64-linux-gnu/libspice-server.so.1
#2 0x0000555555a8c078 in qemu_spice_add_interface (sin=0x555557191c08) at ../ui/spice-core.c:874
#3 0x0000555555a8c2d1 in qemu_spice_add_display_interface (qxlin=0x555557191c08, con=0x5555567d3450) at ../ui/spice-core.c:944
#4 0x0000555555a9d533 in qxl_realize_common (qxl=0x555557191210, errp=0x7fffffffdbd0) at ../hw/display/qxl.c:2167
#5 0x0000555555a9d9bd in qxl_realize_primary (dev=0x555557191210, errp=0x7fffffffdc28) at ../hw/display/qxl.c:2218
#6 0x00005555559d1479 in pci_qdev_realize (qdev=0x555557191210, errp=0x7fffffffdca0) at ../hw/pci/pci.c:2143
#7 0x0000555555be14bf in device_set_realized (obj=0x555557191210, value=true, errp=0x7fffffffddb0) at ../hw/core/qdev.c:531
#8 0x0000555555beb318 in property_set_bool (obj=0x555557191210, v=0x55555718ee70, name=0x555555ee76a9 "realized", opaque=0x5555565a94e0, errp=0x7fffffffddb0) at ../qom/object.c:2268
#9 0x0000555555be92e7 in object_property_set (obj=0x555557191210, name=0x555555ee76a9 "realized", v=0x55555718ee70, errp=0x7fffffffddb0) at ../qom/object.c:1403
#10 0x0000555555bed785 in object_property_set_qobject (obj=0x555557191210, name=0x555555ee76a9 "realized", value=0x55555718edd0, errp=0x5555565137d0 <error_fatal>) at ../qom/qom-qobject.c:28
#11 0x0000555555be9652 in object_property_set_bool (obj=0x555557191210, name=0x555555ee76a9 "realized", value=true, errp=0x5555565137d0 <error_fatal>) at ../qom/object.c:1472
#12 0x0000555555be0d53 in qdev_realize (dev=0x555557191210, bus=0x555556935580, errp=0x5555565137d0 <error_fatal>) at ../hw/core/qdev.c:333
#13 0x0000555555be0d80 in qdev_realize_and_unref (dev=0x555557191210, bus=0x555556935580, errp=0x5555565137d0 <error_fatal>) at ../hw/core/qdev.c:340
#14 0x00005555559d176a in pci_realize_and_unref (dev=0x555557191210, bus=0x555556935580, errp=0x5555565137d0 <error_fatal>) at ../hw/pci/pci.c:2210
#15 0x00005555559d17b6 in pci_create_simple_multifunction (bus=0x555556935580, devfn=-1, multifunction=false, name=0x555555e6fb8f "qxl-vga") at ../hw/pci/pci.c:2218
#16 0x00005555559d17ea in pci_create_simp...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It seems that when spawning the "SPICE worker" in the bad case
the worker passes events before the other side is ready.

In both cases the spawn goes through the same control path

#0 __pthread_create_2_1 (newthread=0x5555572d4f10, attr=0x0, start_routine=0x7ffff7e33950, arg=0x5555572d4f10) at ./nptl/pthread_create.c:621
#1 0x00007ffff7eb11e6 in () at /lib/x86_64-linux-gnu/libspice-server.so.1
#2 0x00007ffff7e2c4eb in spice_server_add_interface () at /lib/x86_64-linux-gnu/libspice-server.so.1
#3 0x0000555555a8c078 in qemu_spice_add_interface (sin=0x555557194c08) at ../ui/spice-core.c:874
#4 0x0000555555a8c2d1 in qemu_spice_add_display_interface (qxlin=0x555557194c08, con=0x555557133c00) at ../ui/spice-core.c:944
...

#0 __pthread_create_2_1 (newthread=0x5555572d1690, attr=0x0, start_routine=0x7ffff7e3d140, arg=0x5555572d1690) at pthread_create.c:597
#1 0x00007ffff7e362f0 in spice_server_add_interface () from /lib/x86_64-linux-gnu/libspice-server.so.1
#2 0x0000555555a8c078 in qemu_spice_add_interface (sin=0x555557191c08) at ../ui/spice-core.c:874
#3 0x0000555555a8c2d1 in qemu_spice_add_display_interface (qxlin=0x555557191c08, con=0x5555567d3450) at ../ui/spice-core.c:944
...

Maybe it is libspice-server.so ?

 libspice-server1 | 0.14.3-2.1ubuntu1 | impish | amd64, arm64, armhf, ppc64el, riscv64
 libspice-server1 | 0.15.0-2ubuntu3 | jammy | amd64, arm64, armhf, ppc64el, riscv64

They have the same soname, so they could be interchangeable

And indeed replacing the lib with the older one it works as it did in the past.
So we are actually looking for a libspice-server change.

Updating the bug-tasks ...

Changed in spice (Ubuntu):
status: New → Confirmed
Changed in qemu (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With the old lib, when the `qemu_spice_display_start` happens in the main
thread the spawned "SPICE worker" still is at:

(gdb) bt
#0 0x00007ffff770ad7f in __GI___poll (fds=0x7fffb80031b0, nfds=2, timeout=2147483647) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007ffff7ab6686 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007ffff7a61293 in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff7e311ca in () at /lib/x86_64-linux-gnu/libspice-server.so.1
#4 0x00007ffff7686b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5 0x00007ffff7718a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

That means it is pretty much still waiting at spawn-time.

But the TL;DR so far is:
- affects different qemu versions
- some change in spice causes it
- this clearly is a race
- the display initializes later and from there things are ok
- being a race in general explains why I've found it in various unrelated logs

TODO: find why it seems to be happen consistently in Jammy with the new spice version

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With debuginfo into libspice this looks like:

(gdb) bt
#0 __pthread_create_2_1 (newthread=0x5555572d4f10, attr=0x0, start_routine=0x7ffff7e33950 <red_worker_main(void*)>, arg=0x5555572d4f10) at ./nptl/pthread_create.c:621
#1 0x00007ffff7eb11e6 in red_worker_run.isra.0 (worker=worker@entry=0x5555572d4f10) at ../server/red-worker.cpp:1127
#2 0x00007ffff7e2c4eb in red_qxl_init (qxl=0x555557194c08, reds=<optimized out>) at ../server/red-qxl.cpp:583
#3 spice_server_add_interface(SpiceServer*, SpiceBaseInstance*) (reds=<optimized out>, sin=0x555557194c08) at ../server/reds.cpp:3248
#4 0x0000555555a8c078 in qemu_spice_add_interface (sin=0x555557194c08) at ../ui/spice-core.c:874
#5 0x0000555555a8c2d1 in qemu_spice_add_display_interface (qxlin=0x555557194c08, con=0x555557133c00) at ../ui/spice-core.c:944

Now that we know that much this changelog entry in spice is suspicious
  "Fix some potential thread race condition in RedClient"
But none of the commits was immediately obvious as being "the one"

Gladly I can reproduce this from spice git and spice builds fast,
so we can bisect spice to be sure what change it was without making
too many assumptions.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Identified commit that causes this:

commit c23cbd6fa821fea8ac4ed97ca679afebe2333c8c
Author: Marc-André Lureau <email address hidden>
Date: Thu Jan 28 13:59:24 2021 +0400

    reds: start QXL devices if VM is running

    Like char devices, QXL devices need to be explicily started.

    For some historical reason, char devices are started when in running
    state. See commi bf1d9007b. Reading that commit comments, there was a
    plan to provide an API to stop/start devices invidually, but that never
    happened. Whether that API would really be useful now, I wonder.

    For now, just follow the char devices behaviour and start QXL devices
    added when vm_running.

    Signed-off-by: Marc-André Lureau <email address hidden>
    Acked-by: Frediano Ziglio <email address hidden>

 server/reds.cpp | 3 +++
 1 file changed, 3 insertions(+)

Yeah and that seems to match all that I've seen so far.
It changes the `spice_server_add_interface` which we have seen in backtraces
to now always start the QXL device if the VM is running.
Our VM is running, but the receiving QXL/Spice side is not yet.

I'll need to file a bug what we'd loose reverting this change for now until a
better coordinated start (e.g. check if the other end is ready instead of VM
running) does exist.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In fact - as always - when you know what exactly you search for you find it.
Related are:
- https://gitlab.freedesktop.org/spice/spice/-/issues/64
- https://bugzilla.redhat.com/show_bug.cgi?id=2007363
- https://bugs.archlinux.org/task/71473

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI I chimed in on https://gitlab.freedesktop.org/spice/spice/-/issues/64 outlining what I've found. I'll check back later if there is an easy conclusion and follow up now that the people that know more about this code see what happens.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In terms of annoyance this is "Critical", but since it correctly initializes just a bit later and then works fine I can't consider it worse than "Medium" at the moment.

tags: added: server-todo server-triage-discuss
Changed in spice (Ubuntu):
importance: Undecided → Medium
tags: removed: server-triage-discuss
Changed in spice (Ubuntu):
assignee: nobody → Lena Voytek (lvoytek)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I did some more case analysis on this - this is complex and I might be wrong - but for now it seems to me that a revert of the offending commit is the best short term option. I've updated the upstream issue and submitted a PR:

- https://gitlab.freedesktop.org/spice/spice/-/issues/64
- https://gitlab.freedesktop.org/spice/spice/-/merge_requests/202

Prepping this for a Ubuntu-Jammy fix:

- PPA: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4808/+packages
- MP: https://code.launchpad.net/~paelzer/ubuntu/+source/spice/+git/spice/+merge/417153

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

MP approved, tests good, upstream ok with the change (some CI issues, but nothing really stopping it) => uploaded the fix for Jammy.

Changed in spice (Ubuntu):
assignee: Lena Voytek (lvoytek) → Christian Ehrhardt  (paelzer)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package spice - 0.15.0-2ubuntu4

---------------
spice (0.15.0-2ubuntu4) jammy; urgency=medium

  * d/p/Revert-reds-start-QXL-devices-if-VM-is-running-fix-r.patch: fix race
    on spice init (LP: #1964777)

 -- Christian Ehrhardt <email address hidden> Mon, 21 Mar 2022 09:23:33 +0100

Changed in spice (Ubuntu):
status: Confirmed → Fix Released
Changed in spice-xpi:
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.