sig-abort / coredump observed from aio_ctx_finalize
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
QEMU |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Observing occasional sig-abort based on v5.2.0 (tag) of QEMU. The VMM is configured for Kata use case, launching with a nvdimm/pmem based rootfs, and a set of workloads which are heavily utilizing virtio-fs.
Sample qemu-cmdline:
/usr/bin/
-name sandbox-
-uuid cd58d78d-
-machine pc,accel=
-cpu host,pmu=off
-qmp unix:/run/
-m 2048M,slots=
-device pci-bridge,
-device virtio-
-device virtconsole,
-chardev socket,
-device nvdimm,
-object memory-
-object rng-random,
-device virtio-
-device vhost-vsock-
-chardev socket,
-device vhost-user-
-netdev tap,id=
-device driver=
-rtc base=utc,
-global kvm-pit.
-vga none
-no-user-config
-nodefaults
-nographic
--no-reboot
-daemonize
-object memory-
-numa node,memdev=dimm1
-kernel /usr/share/
-append tsc=reliable no_timer_check rcupdate.
-pidfile /run/vc/
-smp 1,cores=
From the core file I was able to obtain a backtrace:
```
(gdb) info thread
Id Target Id Frame
6 Thread 0x7f92feffd700 (LWP 14678) 0x00007f93b23a0a35 in pthread_
5 Thread 0x7f92fffff700 (LWP 13860) 0x00007f93b23a0a35 in pthread_
4 Thread 0x7f930dcff700 (LWP 13572) 0x00007f93b23a0a35 in pthread_
3 Thread 0x7f92ff7fe700 (LWP 14179) 0x00007f93b23a0a35 in pthread_
2 Thread 0x7f93aed03700 (LWP 13565) 0x00007f93b20bfd19 in syscall () from /lib64/libc.so.6
* 1 Thread 0x7f93c718dcc0 (LWP 13564) 0x00007f93b1ffd3d7 in raise () from /lib64/libc.so.6
(gdb) bt trace
No symbol table is loaded. Use the "file" command.
(gdb) bt
#0 0x00007f93b1ffd3d7 in raise () from /lib64/libc.so.6
#1 0x00007f93b1ffeac8 in abort () from /lib64/libc.so.6
#2 0x00007f93b1ff61a6 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007f93b1ff6252 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000007c6955 in aio_ctx_finalize ()
#5 0x00007f93c64223d1 in g_source_
#6 0x00007f93c64225f5 in g_source_iter_next () from /lib64/
#7 0x00007f93c642362d in g_main_
#8 0x00007f93c6425628 in g_main_loop_unref () from /lib64/
#9 0x00000000006dbaa0 in iothread_
#10 0x00000000006c01e9 in object_unref ()
#11 0x00000000006be647 in object_
#12 0x000000000075ad79 in monitor_cleanup ()
#13 0x0000000000630635 in qemu_cleanup ()
#14 0x000000000040fed3 in main ()
```
I *think* we're hitting this assert: https:/
```
(gdb) up
#4 0x00000000007c6955 in aio_ctx_finalize ()
```
The error is relatively infrequent, but a catastrophic core dump none the less.
Please let me know if there's more I can pull from the core, or more info I can share to help facilitate debugging this error.
Changed in qemu: | |
status: | Fix Committed → Fix Released |
Please install debuginfo and run "p *ctx" in GDB from the aio_ctx_finalize frame. That should show ctx->scheduled_ coroutines, ctx->bh_slice_list, etc.