-icount increases boot time by >10x

Bug #1774677 reported by Andreas Gustafsson
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

When I specify the -icount option, some guest operations such as booting a Linux kernel take more than 10 times longer than otherwise. For example, the following will boot Aboriginal Linux to the login prompt about 6 seconds on my system (using TCG, not KVM):

wget http://landley.net/aboriginal/downloads/old/binaries/1.4.5/system-image-i686.tar.gz
gunzip <system-image-i686.tar.gz | tar xfv -
cd system-image-i686
sh run-emulator.sh

If I replace the last line with

QEMU_EXTRA="-icount shift=auto" sh run-emulator.sh

booting to the login prompt takes about 1 minute 20 seconds.

I have tried different values for "shift" other than the "auto" used above, but have not been able to find one that gives reasonable performance. Specifying "sleep=off" also did not help.

During the slow boots, qemu appears to spend most of its time sleeping, not using the host CPU.

I see this with multiple versions of qemu, including current git sources (c181ddaa176856b3cd2dfd12bbcf25fa9c884a97), and on multiple host OSes, including Debian 9 on x86_64.

Revision history for this message
Peter Maydell (pmaydell) wrote :

Yes, I can reproduce this. Also, interestingly, with the -icount version when I typed 'exit' to quit the guest I got this abort:

qemu: fatal: cpu_io_recompile: could not find TB for pc=0x7fcd0aae636e
EAX=c88053a4 EBX=00000069 ECX=c00211a4 EDX=00000000
ESI=c7812180 EDI=c78126f0 EBP=0000000f ESP=c7a0de44
EIP=c11ada30 EFL=00000287 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
CS =0060 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =007b 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
FS =0000 00000000 00000000 00000000
GS =0033 080a130c ffffffff 00dff300 DPL=3 DS [-WA]
LDT=0000 00000000 00000000 00008200 DPL=0 LDT
TR =0080 c12fbe60 0000206b 00008900 DPL=0 TSS32-avl
GDT= c12f6000 000000ff
IDT= ffffc000 000007ff
CR0=8005003b CR2=09d40ffc CR3=07a08000 CR4=00000690
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=c8800000 CCD=c8805200 CCO=SUBL
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=ec00000000000000 4006
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
Aborted (core dumped)

Changed in qemu:
status: New → Confirmed
Revision history for this message
Peter Maydell (pmaydell) wrote :

...ah, that's the bug fixed by this patch: https://patchwork.ozlabs.org/patch/940058/

Revision history for this message
Peter Maydell (pmaydell) wrote :

The stalls in the test kernel seem to happen at and after the guest prints

tsc: Refined TSC clocksource calibration: 1000.022 MHz
clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0xe6a2bd9634, max_idle_ns: 440795211s
hdc: QEMU DVD-ROM, ATAPI CD/DVD-ROM drive
clocksource: Switched to clocksource tsc

so I suspect a bad interaction between what icount is doing and in particular our emulation of the x86 timestamp counter and what Linux does with it.

Revision history for this message
Andreas Gustafsson (gson) wrote :

A couple of comments... First, the problem is not limited to Linux guests. In fact, I originally ran across it with a NetBSD guest, but then reproduced it with a Linux guest for the bug report, because in my experience, qemu bug reports involving non-Linux guests tend to be ignored.

Second, the qemu man page says that "With sleep=on|off, the virtual time will jump to the next timer deadline instantly whenever the virtual cpu goes to sleep mode". Although the text is confusing (see bug 1774412), to me it suggests that when the mode in case is selected, qemu should execute the guest as fast as possible and only provide it with an illusion of the passage of time, and there should be no reason for qemu itself to ever sleep. Yet, regardless of whether I specify "sleep=on" or "sleep=off", qemu sleeps.

Revision history for this message
Andreas Gustafsson (gson) wrote :

I bisected this using the following test case (after the setup shown in the original bug report, and replacing "e1000" by "ne2k_pci" in the run-emulator.sh script to avoid a panic in the e1000 driver with some qemu versions):

QEMU_EXTRA="-icount shift=3,sleep=off" sh run-emulator.sh

The bisection identified the following commit:

281b2201e4e18d5b9a26e1e8d81b62b5581a13be is the first bad commit
commit 281b2201e4e18d5b9a26e1e8d81b62b5581a13be
Author: Pavel Dovgalyuk <email address hidden>
Date: Thu Mar 10 14:56:03 2016 +0300

    icount: remove obsolete warp call

    qemu_clock_warp call in qemu_tcg_wait_io_event function is not needed
    anymore, because it is called in every iteration of main_loop_wait.

    Reviewed-by: Paolo Bonzini <email address hidden>

    Signed-off-by: Pavel Dovgalyuk <email address hidden>
    Message-Id: <20160310115603.4812.67559.stgit@PASHA-ISP>
    Signed-off-by: Paolo Bonzini <email address hidden>

With revision 281b2201e4e18d5b9a26e1e8d81b62b5581a13be, the test case takes 141 seconds to boot. With the previous revision, 33577b47c64435fcc2a1bc01c7e82534256f1fc3, it takes 1.7 seconds.

Revision history for this message
Pavel Dovgalyuk (dovgalyuk) wrote :

I can reproduce it. But there is one interesting thing.
When I replace "-nographic" with "-serial stdio" everything is ok,
icount works normally.

But "-nographic" runs are very slow in icount mode.
It seems that GUI produces some events that force main loop to iterate.

Revision history for this message
Paolo Bonzini (bonzini) wrote :

Does reverting the commit fix it? It was just a cleanup I think.

Revision history for this message
Pavel Dovgalyuk (dovgalyuk) wrote :

Yes, I tried to revert (with some updates to the current code base) and the bug was fixed:

diff --git a/cpus.c b/cpus.c
index 181ce33..e2fc972 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1172,6 +1172,13 @@ static void qemu_tcg_rr_wait_io_event(CPUState *cpu)
 {
     while (all_cpu_threads_idle()) {
         stop_tcg_kick_timer();
+
+ qemu_mutex_unlock_iothread();
+ replay_mutex_lock();
+ qemu_mutex_lock_iothread();
+ qemu_start_warp_timer();
+ replay_mutex_unlock();
+
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }

Revision history for this message
Andreas Gustafsson (gson) wrote :

I tested Pavel's patch, applying it to master (c447afd5783b9237fa51b7a85777007d8d568bfc), but I'm afraid it only made things worse - qemu has now been booting the test kernel for 30 minutes but the boot has still not completed. The last console messages printed were:

piix 0000:00:01.1: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xc100-0xc107
    ide1: BM-DMA at 0xc108-0xc10f

Running strace -p on the qemu process shows it calling ppoll once per second:

ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)

Revision history for this message
Pavel Dovgalyuk (dovgalyuk) wrote :

We figured out that there are races or something.
I run QEMU under Ubuntu in VirtualBox (which has 1 CPU) and it works normally.
On the real machine with multiple cores QEMU stalls with this patch.

Revision history for this message
Andreas Gustafsson (gson) wrote :

Looks like the bug was fixed by this commit:

commit 013aabdc665e4256b38d8875a1a7b5e030ba98f1
Author: Clement Deschamps <email address hidden>
Date: Sun Oct 21 16:21:03 2018 +0200

    icount: fix deadlock when all cpus are sleeping

    When all cpus are sleeping (e.g in WFI), to avoid a deadlock
    in the main_loop, wake it up in order to start the warp timer.

    Signed-off-by: Clement Deschamps <email address hidden>
    Message-Id: <email address hidden>
    Signed-off-by: Paolo Bonzini <email address hidden>

Changed in qemu:
status: Confirmed → Fix Committed
Thomas Huth (th-huth)
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.