qemu 3.1/i386 crashes/guest hangs when MTTCG is enabled

Bug #1811244 reported by Jakub Jermar on 2019-01-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Unassigned

Bug Description

When MTTCG is enabled, QEMU 3.1.0 sometimes crashes when running the following command line:

qemu-system-i386 -kernel /home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/bootstrap -append bootstrap -initrd "/home/jermar/work/software/l4/fiasco/.build-i386/fiasco -serial_esc,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/sigma0 ,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/moe rom/ahci.cfg,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/ned ,test_env.lua ,/home/jermar/Kernkonzept/software/l4/pkg/ahci-driver/examples/md5sum/ahci.cfg ,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/l4re ,/home/jermar/Kernkonzept/software/l4/pkg/ahci-driver/examples/md5sum/ahci.io ,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/io ,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/ahci-drv ,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/ahci-md5-sync" -smp 4 -accel tcg,thread=multi -device ahci,id=ahci0 -drive if=none,file=/home/jermar/Kernkonzept/software/l4/.build-i386/pkg/ahci-driver/test/examples/test_ahci.img,format=raw,id=drive-sata0-0-0 -device ide-drive,bus=ahci0.0,drive=drive-sata0-0-0,id=sata0-0-0 -serial stdio -nographic -monitor none

The host is x86_64.

The stack at the time of the crash (core dump and debug binary attached to the bug):

Core was generated by `qemu-system-i386 -kernel /home/jermar/Kernkonzept/software/l4/.build-i386/bin/x'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 io_writex (env=env@entry=0x565355ca0140, iotlbentry=iotlbentry@entry=0x565355ca9120, mmu_idx=2, val=val@entry=0, addr=addr@entry=3938451632, retaddr=retaddr@entry=140487132809203, recheck=false, size=4)
    at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/cputlb.c:791
791 if (mr->global_locking && !qemu_mutex_iothread_locked()) {
[Current thread is 1 (Thread 0x7fc5af7fe700 (LWP 3625719))]
Missing separate debuginfos, use: dnf debuginfo-install SDL2-2.0.9-1.fc29.x86_64 at-spi2-atk-2.30.0-1.fc29.x86_64 at-spi2-core-2.30.0-2.fc29.x86_64 atk-2.30.0-1.fc29.x86_64 bzip2-libs-1.0.6-28.fc29.x86_64 cairo4
(gdb) bt
#0 0x0000565354f5f365 in io_writex
    (env=env@entry=0x565355ca0140, iotlbentry=iotlbentry@entry=0x565355ca9120, mmu_idx=2, val=val@entry=0, addr=addr@entry=3938451632, retaddr=retaddr@entry=140487132809203, recheck=false, size=4)
    at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/cputlb.c:791
#1 0x0000565354f621b2 in io_writel (recheck=<optimized out>, retaddr=140487132809203, addr=3938451632, val=0, index=0, mmu_idx=2, env=0x565355ca0140)
    at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/softmmu_template.h:310
#2 0x0000565354f621b2 in helper_le_stl_mmu (env=0x565355ca0140, addr=<optimized out>, val=0, oi=34, retaddr=140487132809203)
    at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/softmmu_template.h:310
#3 0x00007fc5b5a587f3 in code_gen_buffer ()
#4 0x0000565354f75fd0 in cpu_tb_exec (itb=<optimized out>, cpu=0x7fc5b5a5aa40 <code_gen_buffer+12266006>) at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/cpu-exec.c:171
#5 0x0000565354f75fd0 in cpu_loop_exec_tb (tb_exit=<synthetic pointer>, last_tb=<synthetic pointer>, tb=<optimized out>, cpu=0x7fc5b5a5aa40 <code_gen_buffer+12266006>)
    at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/cpu-exec.c:615
#6 0x0000565354f75fd0 in cpu_exec (cpu=cpu@entry=0x565355c97e90) at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/cpu-exec.c:725
#7 0x0000565354f33b1f in tcg_cpu_exec (cpu=0x565355c97e90) at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/cpus.c:1429
#8 0x0000565354f35e83 in qemu_tcg_cpu_thread_fn (arg=0x565355c97e90) at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/cpus.c:1733
#9 0x0000565354f35e83 in qemu_tcg_cpu_thread_fn (arg=arg@entry=0x565355c97e90) at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/cpus.c:1707
#10 0x00005653552ec5da in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:498
#11 0x00007fc5b858a58e in start_thread () at /lib64/libpthread.so.0
#12 0x00007fc5b84b96a3 in clone () at /lib64/libc.so.6

Another symptom that occurs more often than this crash is that the guest hangs while waiting for another CPU to complete a cross-CPU call. Disabling MTTCG makes both symptoms go away.

Jakub Jermar (jakub) wrote :
Jakub Jermar (jakub) wrote :
description: updated
summary: - qemu 3.1/i386 crashes when MTTCG is enabled
+ qemu 3.1/i386 crashes/guest hangs when MTTCG is enabled
Jakub Jermar (jakub) wrote :

As for the other outcome, when the guest hangs (instead of QEMU crashing), the guest CPUs that block forward progress are halted in an idle loop, have interrupts enabled and have a queued timer IRQ 248 and a pending software IPI IRQ 250. It appears another timer IRQ is currently being serviced (but the CPU is idling).

(qemu) cpu 1
(qemu) info registers
EAX=ff8b7000 EBX=ff8b7000 ECX=00000003 EDX=00000003
ESI=00000001 EDI=ff8b5240 EBP=ff8b7000 ESP=ff8b7fac
EIP=f0029707 EFL=00000202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=1
ES =0023 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
CS =0008 00000000 ffffffff 00cf9b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0023 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
FS =0023 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
GS =0023 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
LDT=0000 00000000 00000000 00008200 DPL=0 LDT
TR =0068 efbfe280 00003d80 00008900 DPL=0 TSS32-avl
GDT= ffbd8400 00000077
IDT= eacfe000 000007ff
CR0=8001003b CR2=00000000 CR3=03fde000 CR4=00000690
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
(qemu) info lapic
dumping local APIC state for CPU 1

LVT0 0x0001003f active-hi edge masked Fixed (vec 63)
LVT1 0x0001003f active-hi edge masked Fixed (vec 63)
LVTPC 0x000100ff active-hi edge masked Fixed (vec 255)
LVTERR 0x000000fb active-hi edge Fixed (vec 251)
LVTTHMR 0x000100ff active-hi edge masked Fixed (vec 255)
LVTT 0x000200f8 active-hi edge periodic Fixed (vec 248)
Timer DCR=0xb (divide by 1) initial_count = 997376
SPIV 0x00000107 APIC enabled, focus=off, spurious vec 7
ICR 0x00000000 physical edge de-assert no-shorthand
ICR2 0x00000000 cpu 0 (APIC ID)
ESR 0x00000000
ISR 248
IRR 248 250

APR 0x00 TPR 0x00 DFR 0x0f LDR 0x00 PPR 0xf0

(gdb) set $eip=f0029707
(gdb) set $esp=ff8b7fac
(gdb) bt
#0 0xf0029707 in Proc::halt () at /home/jermar/Kernkonzept/software/l4/fiasco/src/drivers/ia32/processor-ia32.cpp:47
#1 0xf00193b8 in Kernel_thread::idle_op (this=this@entry=0xffb66da4) at /home/jermar/Kernkonzept/software/l4/fiasco/src/kern/kernel_thread.cpp:134
#2 0xf001bc11 in call_ap_bootstrap (this=0xffb66da4, resume=0xf001bc11) at /home/jermar/Kernkonzept/software/l4/fiasco/src/kern/app_cpu_thread.cpp:111
#3 0x00000001 in ?? ()

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers