Hi,
I've been experiencing frequent segfaults lately with qemu 2.7.0 running Ubuntu 16.04 guests. The crash usually happens in qemu_co_queue_run_restart(). I haven't seen this so far with any other guests or distros.
Here is one back trace I obtained from one of the crashing VMs.
--------------------------------------------------------------------------
(gdb) bt
#0 qemu_co_queue_run_restart (co=0x7fba8ff05aa0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:59
#1 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8ff05aa0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#2 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd20430) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#3 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd20430) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#4 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd14ea0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#5 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd14ea0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#6 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba80c11dc0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#7 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba80c11dc0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#8 0x000055c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd0bd70) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#9 0x000055c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd0bd70) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#10 0x000055c1656f3fa0 in qemu_co_enter_next (queue=queue@entry=0x55c1669e75e0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:106
#11 0x000055c165692060 in timer_cb (blk=0x55c1669e7590, is_write=<optimized out>) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/block/throttle-groups.c:400
#12 0x000055c16564f615 in timerlist_run_timers (timer_list=0x55c166a53e80) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:528
#13 0x000055c16564f679 in timerlistgroup_run_timers (tlg=tlg@entry=0x55c167c81cf8) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:564
#14 0x000055c16564ff47 in aio_dispatch (ctx=ctx@entry=0x55c167c81bb0) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:357
#15 0x000055c1656500e8 in aio_poll (ctx=0x55c167c81bb0, blocking=<optimized out>) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:479
#16 0x000055c1654b1c79 in iothread_run (opaque=0x55c167c81960) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/iothread.c:46
#17 0x00007fbc4b64f0a4 in allocate_stack (stack=<synthetic pointer>, pdp=<synthetic pointer>, attr=0x0) at allocatestack.c:416
#18 __pthread_create_2_1 (newthread=<error reading variable: Cannot access memory at address 0xffffffffffffff48>, attr=<error reading variable: Cannot access memory at address 0xffffffffffffff40>,
start_routine=<error reading variable: Cannot access memory at address 0xffffffffffffff58>, arg=<error reading variable: Cannot access memory at address 0xffffffffffffff50>) at pthread_create.c:539
Backtrace stopped: Cannot access memory at address 0x8
--------------------------------------------------------------------------
The code that crashes is this
--------------------------------------------------------------------------
void qemu_co_queue_run_restart(Coroutine *co)
{
Coroutine *next;
trace_qemu_co_queue_run_restart(co);
while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next); <-Crash
qemu_coroutine_enter(next);
}
}
--------------------------------------------------------------------------
Expanding the macro QSIMPLEQ_REMOVE_HEAD gives us
--------------------------------------------------------------------------
#define QSIMPLEQ_REMOVE_HEAD(head, field) do { \
if (((head)->sqh_first = (head)->sqh_first->field.sqe_next) == NULL)\
(head)->sqh_last = &(head)->sqh_first; \
} while (/*CONSTCOND*/0)
--------------------------------------------------------------------------
which corrsponds to
--------------------------------------------------------------------------
if (((&co->co_queue_wakeup)->sqh_first = (&co->co_queue_wakeup)->sqh_first->co_queue_next.sqe_next) == NULL)\
(&co->co_queue_wakeup)->sqh_last = &(&co->co_queue_wakeup)->sqh_first;
--------------------------------------------------------------------------
Debugging the list we see
--------------------------------------------------------------------------
(gdb) print *(&co->co_queue_wakeup->sqh_first)
$6 = (struct Coroutine *) 0x1000
(gdb) print *(&co->co_queue_wakeup->sqh_first->co_queue_next)
Cannot access memory at address 0x1030
--------------------------------------------------------------------------
So the data in co->co_queue_wakeup->sqh_first is corrupted and represents an invalid address. Any idea why is that?
Another stack trace
------- ------- ------- ------- ------- ------- ------- ------- ------- ------ queue_run_ restart (co=0x7f668be15260) at /build/ pb-qemu- pssKUp/ pb-qemu- 2.7.0/util/ qemu-coroutine- lock.c: 59 enter (co=0x7f668be15260) at /build/ pb-qemu- pssKUp/ pb-qemu- 2.7.0/util/ qemu-coroutine. c:119 queue@entry= 0x564cb35e55e0) at /build/ pb-qemu- pssKUp/ pb-qemu- 2.7.0/util/ qemu-coroutine- lock.c: 106 5590, is_write=<optimized out>) at /build/ pb-qemu- pssKUp/ pb-qemu- 2.7.0/block/ throttle- groups. c:400 run_timers (timer_ list=0x564cb365 1e80) at /build/ pb-qemu- pssKUp/ pb-qemu- 2.7.0/qemu- timer.c: 528 run_timers (tlg=tlg@ entry=0x564cb48 7fcf8) at /build/ pb-qemu- pssKUp/ pb-qemu- 2.7.0/qemu- timer.c: 564 entry=0x564cb48 7fbb0) at /build/ pb-qemu- pssKUp/ pb-qemu- 2.7.0/aio- posix.c: 357 fbb0, blocking=<optimized out>) at /build/ pb-qemu- pssKUp/ pb-qemu- 2.7.0/aio- posix.c: 479 0x564cb487f960) at /build/ pb-qemu- pssKUp/ pb-qemu- 2.7.0/iothread. c:46 create_ 2_1 (newthread=<error reading variable: Cannot access memory at address 0xfffffffffffff f48>, attr=<error reading variable: Cannot access memory at address 0xfffffffffffff f40>, routine= <error reading variable: Cannot access memory at address 0xfffffffffffff f58>, arg=<error reading variable: Cannot access memory at address 0xfffffffffffff f50>) at pthread_ create. c:539 ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ----
(gdb) bt
#0 qemu_co_
#1 0x0000564cb19f59a9 in qemu_coroutine_
#2 0x0000564cb19f5fa0 in qemu_co_enter_next (queue=
#3 0x0000564cb1994060 in timer_cb (blk=0x564cb35e
#4 0x0000564cb1951615 in timerlist_
#5 0x0000564cb1951679 in timerlistgroup_
#6 0x0000564cb1951f47 in aio_dispatch (ctx=ctx@
#7 0x0000564cb19520e8 in aio_poll (ctx=0x564cb487
#8 0x0000564cb17b3c79 in iothread_run (opaque=
#9 0x00007f684b0b30a4 in allocate_stack (stack=<synthetic pointer>, pdp=<synthetic pointer>, attr=0x0) at allocatestack.c:416
#10 __pthread_
start_
Backtrace stopped: Cannot access memory at address 0x8
-------
Here is a bit of examination of the data ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ---- co_queue_ wakeup- >sqh_first) co_queue_ wakeup- >sqh_first- >co_queue_ next) ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ----
-------
(gdb) print *(&co->
$1 = (struct Coroutine *) 0xc54b578
(gdb) print *(&co->
Cannot access memory at address 0xc54b5a8
-------
Again seems to be pointing at an invalid address. It's worth noting here that it the number of restarted and re-run co-routines is much smaller.