qemu 3.1/i386 crashes/guest hangs when MTTCG is enabled

Bug #1811244 reported by Jakub Jermar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

When MTTCG is enabled, QEMU 3.1.0 sometimes crashes when running the following command line:

qemu-system-i386 -kernel /home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/bootstrap -append bootstrap -initrd "/home/jermar/work/software/l4/fiasco/.build-i386/fiasco -serial_esc,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/sigma0 ,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/moe rom/ahci.cfg,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/ned ,test_env.lua ,/home/jermar/Kernkonzept/software/l4/pkg/ahci-driver/examples/md5sum/ahci.cfg ,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/l4re ,/home/jermar/Kernkonzept/software/l4/pkg/ahci-driver/examples/md5sum/ahci.io ,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/io ,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/ahci-drv ,/home/jermar/Kernkonzept/software/l4/.build-i386/bin/x86_gen/l4f/ahci-md5-sync" -smp 4 -accel tcg,thread=multi -device ahci,id=ahci0 -drive if=none,file=/home/jermar/Kernkonzept/software/l4/.build-i386/pkg/ahci-driver/test/examples/test_ahci.img,format=raw,id=drive-sata0-0-0 -device ide-drive,bus=ahci0.0,drive=drive-sata0-0-0,id=sata0-0-0 -serial stdio -nographic -monitor none

The host is x86_64.

The stack at the time of the crash (core dump and debug binary attached to the bug):

Core was generated by `qemu-system-i386 -kernel /home/jermar/Kernkonzept/software/l4/.build-i386/bin/x'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 io_writex (env=env@entry=0x565355ca0140, iotlbentry=iotlbentry@entry=0x565355ca9120, mmu_idx=2, val=val@entry=0, addr=addr@entry=3938451632, retaddr=retaddr@entry=140487132809203, recheck=false, size=4)
    at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/cputlb.c:791
791 if (mr->global_locking && !qemu_mutex_iothread_locked()) {
[Current thread is 1 (Thread 0x7fc5af7fe700 (LWP 3625719))]
Missing separate debuginfos, use: dnf debuginfo-install SDL2-2.0.9-1.fc29.x86_64 at-spi2-atk-2.30.0-1.fc29.x86_64 at-spi2-core-2.30.0-2.fc29.x86_64 atk-2.30.0-1.fc29.x86_64 bzip2-libs-1.0.6-28.fc29.x86_64 cairo4
(gdb) bt
#0 0x0000565354f5f365 in io_writex
    (env=env@entry=0x565355ca0140, iotlbentry=iotlbentry@entry=0x565355ca9120, mmu_idx=2, val=val@entry=0, addr=addr@entry=3938451632, retaddr=retaddr@entry=140487132809203, recheck=false, size=4)
    at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/cputlb.c:791
#1 0x0000565354f621b2 in io_writel (recheck=<optimized out>, retaddr=140487132809203, addr=3938451632, val=0, index=0, mmu_idx=2, env=0x565355ca0140)
    at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/softmmu_template.h:310
#2 0x0000565354f621b2 in helper_le_stl_mmu (env=0x565355ca0140, addr=<optimized out>, val=0, oi=34, retaddr=140487132809203)
    at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/softmmu_template.h:310
#3 0x00007fc5b5a587f3 in code_gen_buffer ()
#4 0x0000565354f75fd0 in cpu_tb_exec (itb=<optimized out>, cpu=0x7fc5b5a5aa40 <code_gen_buffer+12266006>) at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/cpu-exec.c:171
#5 0x0000565354f75fd0 in cpu_loop_exec_tb (tb_exit=<synthetic pointer>, last_tb=<synthetic pointer>, tb=<optimized out>, cpu=0x7fc5b5a5aa40 <code_gen_buffer+12266006>)
    at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/cpu-exec.c:615
#6 0x0000565354f75fd0 in cpu_exec (cpu=cpu@entry=0x565355c97e90) at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/accel/tcg/cpu-exec.c:725
#7 0x0000565354f33b1f in tcg_cpu_exec (cpu=0x565355c97e90) at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/cpus.c:1429
#8 0x0000565354f35e83 in qemu_tcg_cpu_thread_fn (arg=0x565355c97e90) at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/cpus.c:1733
#9 0x0000565354f35e83 in qemu_tcg_cpu_thread_fn (arg=arg@entry=0x565355c97e90) at /home/jermar/software/HelenOS/helenos.git/contrib/qemu/qemu-3.1.0/cpus.c:1707
#10 0x00005653552ec5da in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:498
#11 0x00007fc5b858a58e in start_thread () at /lib64/libpthread.so.0
#12 0x00007fc5b84b96a3 in clone () at /lib64/libc.so.6

Another symptom that occurs more often than this crash is that the guest hangs while waiting for another CPU to complete a cross-CPU call. Disabling MTTCG makes both symptoms go away.

Tags: i386 core mttcg
Revision history for this message
Jakub Jermar (jakub) wrote :
Revision history for this message
Jakub Jermar (jakub) wrote :
description: updated
summary: - qemu 3.1/i386 crashes when MTTCG is enabled
+ qemu 3.1/i386 crashes/guest hangs when MTTCG is enabled
Revision history for this message
Jakub Jermar (jakub) wrote :

As for the other outcome, when the guest hangs (instead of QEMU crashing), the guest CPUs that block forward progress are halted in an idle loop, have interrupts enabled and have a queued timer IRQ 248 and a pending software IPI IRQ 250. It appears another timer IRQ is currently being serviced (but the CPU is idling).

(qemu) cpu 1
(qemu) info registers
EAX=ff8b7000 EBX=ff8b7000 ECX=00000003 EDX=00000003
ESI=00000001 EDI=ff8b5240 EBP=ff8b7000 ESP=ff8b7fac
EIP=f0029707 EFL=00000202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=1
ES =0023 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
CS =0008 00000000 ffffffff 00cf9b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0023 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
FS =0023 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
GS =0023 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
LDT=0000 00000000 00000000 00008200 DPL=0 LDT
TR =0068 efbfe280 00003d80 00008900 DPL=0 TSS32-avl
GDT= ffbd8400 00000077
IDT= eacfe000 000007ff
CR0=8001003b CR2=00000000 CR3=03fde000 CR4=00000690
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
(qemu) info lapic
dumping local APIC state for CPU 1

LVT0 0x0001003f active-hi edge masked Fixed (vec 63)
LVT1 0x0001003f active-hi edge masked Fixed (vec 63)
LVTPC 0x000100ff active-hi edge masked Fixed (vec 255)
LVTERR 0x000000fb active-hi edge Fixed (vec 251)
LVTTHMR 0x000100ff active-hi edge masked Fixed (vec 255)
LVTT 0x000200f8 active-hi edge periodic Fixed (vec 248)
Timer DCR=0xb (divide by 1) initial_count = 997376
SPIV 0x00000107 APIC enabled, focus=off, spurious vec 7
ICR 0x00000000 physical edge de-assert no-shorthand
ICR2 0x00000000 cpu 0 (APIC ID)
ESR 0x00000000
ISR 248
IRR 248 250

APR 0x00 TPR 0x00 DFR 0x0f LDR 0x00 PPR 0xf0

(gdb) set $eip=f0029707
(gdb) set $esp=ff8b7fac
(gdb) bt
#0 0xf0029707 in Proc::halt () at /home/jermar/Kernkonzept/software/l4/fiasco/src/drivers/ia32/processor-ia32.cpp:47
#1 0xf00193b8 in Kernel_thread::idle_op (this=this@entry=0xffb66da4) at /home/jermar/Kernkonzept/software/l4/fiasco/src/kern/kernel_thread.cpp:134
#2 0xf001bc11 in call_ap_bootstrap (this=0xffb66da4, resume=0xf001bc11) at /home/jermar/Kernkonzept/software/l4/fiasco/src/kern/app_cpu_thread.cpp:111
#3 0x00000001 in ?? ()

Revision history for this message
Thomas Huth (th-huth) wrote :

The QEMU project is currently considering to move its bug tracking to another system. For this we need to know which bugs are still valid and which could be closed already. Thus we are setting older bugs to "Incomplete" now.
If you still think this bug report here is valid, then please switch the state back to "New" within the next 60 days, otherwise this report will be marked as "Expired". Or mark it as "Fix Released" if the problem has been solved with a newer version of QEMU already. Thank you and sorry for the inconvenience.

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.