Comment 23 for bug 1743637

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

By fixing the previous BUG, we continued executing the subsequent logic and found another issue. 'll consider them all part of the same bug since it is all related to vhost-user interface shutdown on QEMU.

And, yes, this is related also to proper memory mapping cleanup (or unmapping) for the vhost-user interface and *could* be related to some other bugs (if core dumps are left behind) like memory leaks and such.

From the stack trace:

Thread #1 13162 (Suspended : Signal : SIGSEGV:Segmentation fault)

__GI___pthread_mutex_lock() at pthread_mutex_lock.c:66 0x7f5cc3f88404
qemu_mutex_lock() at qemu-thread-posix.c:73 0x55606bd5f5a9
qemu_chr_fe_write_all() at qemu-char.c:205 0x55606bb23420 -> CharDriverState *s is NULL and there are no checks
vhost_user_write() at vhost-user.c:195 0x55606ba8e3d3
vhost_user_get_vring_base() at vhost-user.c:364 0x55606ba8f06c
vhost_virtqueue_stop() at vhost.c:895 0x55606ba8af40
vhost_dev_stop() at vhost.c:1,262 0x55606ba8d894
vhost_net_stop_one() at vhost_net.c:293 0x55606ba76248
vhost_net_stop() at vhost_net.c:371 0x55606ba76dfb
virtio_net_vhost_status() at virtio-net.c:150 0x55606ba726e5
virtio_net_set_status() at virtio-net.c:162 0x55606ba726e5
virtio_set_status() at virtio.c:624 0x55606ba873cc
vm_state_notify() at vl.c:1,605 0x55606bb2b1c2
do_vm_stop() at cpus.c:724 0x55606ba2c54a
vm_stop() at cpus.c:1,407 0x55606ba2c54a
main_loop_should_exit() at vl.c:1,883 0x55606b9f8060
main_loop() at vl.c:1,931 0x55606b9f8060
main() at vl.c:4,683 0x55606b9f8060

The seg fault happens because the during the "vhost_user_write" logic to obtain the vring_base address for the virtio shutdown, the character device is gone (likely closed by the other side and/or other thread). With that, qemu_chr_fe_write_all() function tries to write to an empty qemu char device (actually it tries to lock, using an object mutex, but it but it is gone) causing the segfault.

The following upstream commits seem to fix this erratic behaviour...

commit fa394ed625731c18f904578903718bf16617fe92
Author: Marc-André Lureau <email address hidden>
Date: Sat Oct 22 12:52:59 2016 +0300

char: make some qemu_chr_fe skip if no driver

Fixes issue by checking the "CharBackend" (instead of ChardriverState) for NULL.

commit 5345fdb4467816c44f6752b3a1f4e73aa25919f9
Author: Marc-André Lureau <email address hidden>
Date: Sat Oct 22 12:52:55 2016 +0300

char: use qemu_chr_fe* functions with CharBackend argument

Creates the "CharBackend" mechanism for qemu-char.

I'm still verifying if its safe to backport only those commits or any other situation should also be satisfied for proper vhost-user shutdown handling.