qemu loses serial console data on EAGAIN

Bug #1335444 reported by Andreas Gustafsson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

When running a guest OS with a serial console under "qemu-system-i386
-nographic", parts of the serial console output are sometimes lost.

This happens when a write() to standard output by qemu returns EAGAIN,
as may be the case when the guest is generating console output faster
than the tty (or pty/pipe/socket, etc.) connected to qemu's standard
output accepts it. The bug affects all releases of qemu since 1.5,
which was the first version to set stdout to O_NONBLOCK mode. Version
1.4.2 and earlier work correctly.

To reproduce the bug, you will need a guest OS configured with a
serial console, and a host with a slow tty. Two different methods
of setting this up are outlined below.

Method 1

This fully automated test uses the "pexpect" Python module to run qemu
under a pty, with an Aboriginal Linux guest. A "seq" command is sent
to the guest to generate 100,000 lines of output containing sequential
integers, and the output is checked for gaps. The script limits the
tty output rate by occasionally sleeping for 1/10 of a second.

Run the following commands in a Bourne shell:

wget http://landley.net/aboriginal/downloads/binaries/system-image-i686.tar.bz2
bunzip2 <system-image-i686.tar.bz2 | tar xf -
cd system-image-i686
cat <<\END >test.py
#!/usr/bin/python
import sys
import time
import pexpect
n = 100000
child = pexpect.spawn('./run-emulator.sh', logfile = sys.stderr)
child.expect("/home #")
child.send("seq -f '<%%08.0f>' 0 %d\r" % (n - 1))
for i in range(n):
    child.expect("<([0-9]+)>", timeout = 5)
    got = int(child.match.group(1))
    if got != i:
        print >>sys.stderr, "\nFAIL: expected %d, got %d" % (i, got)
        sys.exit(1)
    if i % 100 == 0:
        time.sleep(0.1)
child.send("exit")
print >>sys.stderr, "\nPASS"
sys.exit(0)
END
python test.py

This will output PASS if the console output contains the 100,000
sequential integers as expected, or FAIL if parts of the output
are missing due to the bug.

Method 2

This method does not require Python or pexpect. Instead, the qemu source
is modified to simulate a simulate a slow tty by forcing an EAGAIN return
from every other write(). If qemu were working correctly, this
change would not cause any data loss, because the writes would be
retried, but in actuality, they are not retried, and the end result is
that every other character in the guest output will be missing.

Apply the following patch to the qemu source (this is against 2.0.0):

--- ../qemu-2.0.0.orig/qemu-char.c 2014-04-17 16:44:45.000000000 +0300
+++ ./qemu-char.c 2014-06-20 16:47:18.000000000 +0300
@@ -779,6 +779,17 @@
     size_t offset = 0;
     GIOStatus status = G_IO_STATUS_NORMAL;

+ /*
+ * Simulate a tty with a limited output buffer by returning
+ * EAGAIN on every second call.
+ */
+ static unsigned int toggle = 0;
+ toggle++;
+ if (toggle & 1) {
+ errno = EAGAIN;
+ return -1;
+ }
+
     while (offset < len && status == G_IO_STATUS_NORMAL) {
         gsize bytes_written = 0;

Build and install qemu.

Run any serial console guest. You could use the same Aboriginal Linux image
as in Method 1, or for example the PLD RescueCD:

  wget http://rescuecd.pld-linux.org/download/2011-02-12/x86/RCDx86_11_02.iso
  qemu-system-i386 -nographic -cdrom RCDx86_11_02.iso

If this command is run with an unmodified qemu, a set of boot messages
will appear, starting with:

  ISOLINUX 4.03 2010-10-22 Copyright (C) 1994-2010 H. Peter Anvin et al

When qemu has been patched to simulate EAGAIN returns, every other
character in the boot messages will be missing, so that the first line
of output will instead read:

  SLNX40 001-2 oyih C 9421 .PtrAvne l

Revision history for this message
Kirill Batuzov (batuzovk) wrote :
Revision history for this message
Andreas Gustafsson (gson) wrote :

Kirill - thank you for looking into the problem. I reran the test of "Method 1"
with your patch, and it is still failing, but the blocks of missing data
seem to be smaller than before.

Here is an extract from the output of the "Method 1" test without your patch.
In this case, the test failed because360 consecutive lines of output were missing:

<00001071>
<00001072>
<00001433>
<00001434>

With your patch, it still failed, but only a single character was missing:

<00001073>
0001074>
<00001075>

Revision history for this message
Kirill Batuzov (batuzovk) wrote :

I see. There is an issue with retry logic of serial port emulation. I've send a patch to the mailing list that should help.

http://lists.nongnu.org/archive/html/qemu-devel/2014-07/msg01976.html

With it both methods pass the test for me. Note that in the first method you'll need to disable networking or rogue line about it will mess the output short after test start.

Revision history for this message
Andreas Gustafsson (gson) wrote :

With both patches applied, qemu works as expected. Thank you!

Revision history for this message
Kirill Batuzov (batuzovk) wrote :

v2.1.0-rc2 has both patches.

Changed in qemu:
status: New → Fix Committed
Revision history for this message
Andreas Gustafsson (gson) wrote :

Although the bug has been fixed in qemu-system-i386 and qemu-system-x86_64, it is still present in qemu-system-sparc. I'm attaching an updated version of the "Method 1" shell script which reproduces the problem with qemu 2.1.0.

When I run it, the last output is:

<00000919>
<00000920>
<0000092964>
<00000965>
<00000966>
<00000967>
FAIL: expected 921, got 92964

Changed in qemu:
status: Fix Committed → New
Revision history for this message
Paolo Bonzini (bonzini) wrote :

Please open a separate bug. Each serial device implementation must support flow control separately.

Changed in qemu:
status: New → Fix Released
Revision history for this message
Andreas Gustafsson (gson) wrote :

A separate bug report has now been filed for the sparc case as requested: bug #1399943.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.