qemu 2.7.0 segfaults in qemu_co_queue_run_restart()

Bug #1671876 reported by Mohammed Gamal
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

Hi,

I've been experiencing frequent segfaults lately with qemu 2.7.0 running Ubuntu 16.04 guests. The crash usually happens in qemu_co_queue_run_restart(). I haven't seen this so far with any other guests or distros.

Here is one back trace I obtained from one of the crashing VMs.

--------------------------------------------------------------------------
(gdb) bt
#0 qemu_co_queue_run_restart (co=0x7fba8ff05aa0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:59
#1 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8ff05aa0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#2 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd20430) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#3 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd20430) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#4 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd14ea0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#5 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd14ea0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#6 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba80c11dc0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#7 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba80c11dc0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#8 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd0bd70) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#9 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd0bd70) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#10 0x000055c1656f3fa0 in qemu_co_enter_next (queue=queue@entry=0x55c1669e75e0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:106
#11 0x000055c165692060 in timer_cb (blk=0x55c1669e7590, is_write=<optimized out>) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/block/throttle-groups.c:400
#12 0x000055c16564f615 in timerlist_run_timers (timer_list=0x55c166a53e80) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:528
#13 0x000055c16564f679 in timerlistgroup_run_timers (tlg=tlg@entry=0x55c167c81cf8) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:564
#14 0x000055c16564ff47 in aio_dispatch (ctx=ctx@entry=0x55c167c81bb0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:357
#15 0x000055c1656500e8 in aio_poll (ctx=0x55c167c81bb0, blocking=<optimized out>) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:479
#16 0x000055c1654b1c79 in iothread_run (opaque=0x55c167c81960) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/iothread.c:46
#17 0x00007fbc4b64f0a4 in allocate_stack (stack=<synthetic pointer>, pdp=<synthetic pointer>, attr=0x0) at allocatestack.c:416
#18 __pthread_create_2_1 (newthread=<error reading variable: Cannot access memory at address 0xffffffffffffff48>, attr=<error reading variable: Cannot access memory at address 0xffffffffffffff40>,
    start_routine=<error reading variable: Cannot access memory at address 0xffffffffffffff58>, arg=<error reading variable: Cannot access memory at address 0xffffffffffffff50>) at pthread_create.c:539
Backtrace stopped: Cannot access memory at address 0x8
--------------------------------------------------------------------------

The code that crashes is this
--------------------------------------------------------------------------
void qemu_co_queue_run_restart(Coroutine *co)
{
    Coroutine *next;

    trace_qemu_co_queue_run_restart(co);
    while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
        QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next); <-Crash
        qemu_coroutine_enter(next);
    }
}
--------------------------------------------------------------------------

Expanding the macro QSIMPLEQ_REMOVE_HEAD gives us
--------------------------------------------------------------------------
#define QSIMPLEQ_REMOVE_HEAD(head, field) do { \
    if (((head)->sqh_first = (head)->sqh_first->field.sqe_next) == NULL)\
        (head)->sqh_last = &(head)->sqh_first; \
} while (/*CONSTCOND*/0)
--------------------------------------------------------------------------

which corrsponds to
--------------------------------------------------------------------------
if (((&co->co_queue_wakeup)->sqh_first = (&co->co_queue_wakeup)->sqh_first->co_queue_next.sqe_next) == NULL)\
        (&co->co_queue_wakeup)->sqh_last = &(&co->co_queue_wakeup)->sqh_first;
--------------------------------------------------------------------------

Debugging the list we see
--------------------------------------------------------------------------
(gdb) print *(&co->co_queue_wakeup->sqh_first)
$6 = (struct Coroutine *) 0x1000
(gdb) print *(&co->co_queue_wakeup->sqh_first->co_queue_next)
Cannot access memory at address 0x1030
--------------------------------------------------------------------------

So the data in co->co_queue_wakeup->sqh_first is corrupted and represents an invalid address. Any idea why is that?

description: updated
summary: - qemu segfaults in qemu_co_queue_run_restart()
+ qemu 2.7.0 segfaults in qemu_co_queue_run_restart()
description: updated
description: updated
Revision history for this message
Mohammed Gamal (m-gamal005) wrote :

Another stack trace

---------------------------------------------------------------------
(gdb) bt
#0 qemu_co_queue_run_restart (co=0x7f668be15260) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:59
#1 0x0000564cb19f59a9 in qemu_coroutine_enter (co=0x7f668be15260) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#2 0x0000564cb19f5fa0 in qemu_co_enter_next (queue=queue@entry=0x564cb35e55e0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:106
#3 0x0000564cb1994060 in timer_cb (blk=0x564cb35e5590, is_write=<optimized out>) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/block/throttle-groups.c:400
#4 0x0000564cb1951615 in timerlist_run_timers (timer_list=0x564cb3651e80) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:528
#5 0x0000564cb1951679 in timerlistgroup_run_timers (tlg=tlg@entry=0x564cb487fcf8) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:564
#6 0x0000564cb1951f47 in aio_dispatch (ctx=ctx@entry=0x564cb487fbb0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:357
#7 0x0000564cb19520e8 in aio_poll (ctx=0x564cb487fbb0, blocking=<optimized out>) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:479
#8 0x0000564cb17b3c79 in iothread_run (opaque=0x564cb487f960) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/iothread.c:46
#9 0x00007f684b0b30a4 in allocate_stack (stack=<synthetic pointer>, pdp=<synthetic pointer>, attr=0x0) at allocatestack.c:416
#10 __pthread_create_2_1 (newthread=<error reading variable: Cannot access memory at address 0xffffffffffffff48>, attr=<error reading variable: Cannot access memory at address 0xffffffffffffff40>,
    start_routine=<error reading variable: Cannot access memory at address 0xffffffffffffff58>, arg=<error reading variable: Cannot access memory at address 0xffffffffffffff50>) at pthread_create.c:539
Backtrace stopped: Cannot access memory at address 0x8
-----------------------------------------------------------------------------------------------

Here is a bit of examination of the data
-----------------------------------------------------------------------------------------------
(gdb) print *(&co->co_queue_wakeup->sqh_first)
$1 = (struct Coroutine *) 0xc54b578
(gdb) print *(&co->co_queue_wakeup->sqh_first->co_queue_next)
Cannot access memory at address 0xc54b5a8
-----------------------------------------------------------------------------------------------

Again seems to be pointing at an invalid address. It's worth noting here that it the number of restarted and re-run co-routines is much smaller.

Revision history for this message
Mohammed Gamal (m-gamal005) wrote :
Download full text (7.7 KiB)

A third stack trace

It generates the following stack trace
---------------------------------------------------------------------
(gdb) bt
#0 qemu_co_queue_run_restart (co=0x7f75ed30dbc0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:59
#1 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75ed30dbc0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#2 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75f1c0f200) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#3 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75f1c0f200) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#4 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75ed304870) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#5 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75ed304870) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#6 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800fcd0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#7 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800fcd0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#8 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800fac0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#9 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800fac0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#10 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800f8b0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#11 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800f8b0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#12 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75fbf05570) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#13 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75fbf05570) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#14 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e8009b70) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#15 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e8009b70) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#16 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800b5d0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#17 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800b5d0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#18 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e8008910) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#19 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e8008910) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#20 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75e800f6a0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#21 0x00005619274829a9 in qemu_coroutine_enter (co=0x7f75e800f6a0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#22 0x0000561927482e74 in qemu_co_queue_run_restart (co=0x7f75fbf05100) at /buil...

Read more...

Revision history for this message
Mohammed Gamal (m-gamal005) wrote :

The VMs were running with the following arguments
---------------------------------------------------------------------
-m 1024,slots=255,maxmem=256G -M pc-i440fx-2.7 -enable-kvm -nodefconfig -nodefaults -rtc base=utc -netdev tap,ifname=n020133f0895e,id=hostnet6,vhost=on,vhostforce=on,vnet_hdr=off,script=no,downscript=no -device virtio-net-pci,netdev=hostnet6,id=net6,mac=02:01:33:f0:89:5e,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -vnc 0.0.0.0:94 -vga qxl -cpu Haswell,+vmx -smp 6,sockets=32,cores=1,maxcpus=64,threads=2 -drive file=/dev/md10,if=none,id=drive-virtio-disk5,format=raw,snapshot=off,aio=native,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk5,num-queues=3,id=virtio-disk5,bootindex=1 -S
---------------------------------------------------------------------

Revision history for this message
Thomas Huth (th-huth) wrote :

Could you please retry with the latest stable version (either 2.8.0 or 2.7.1) ... maybe the problem is already fixed there.

description: updated
Revision history for this message
Mohammed Gamal (m-gamal005) wrote :

Unfortunately it'd not be possible to use another version at the moment. Is it possible that someone takes a look at the stack traces?

Revision history for this message
Mohammed Gamal (m-gamal005) wrote :

Fixed by commit 528f449f590829b53ea01ed91817a695b540421d

Changed in qemu:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.