qemu hangs when guest is using linux kernel 4.16+

Bug #1794950 reported by Filipe Manana
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

I have been using qemu on daily basis 5+ years in order to do btrfs development and testing and it always worked perfectly, until I upgraded the linux kernel of the guests to 4.16. With 4.16+ kernels, when running all the fstests (previously called xfstests), the qemu process hangs (console unresponsive, can't ping or ssh the guest anymore, etc) and stays in a state Sl+ according to 'ps'.

This happens on two different physical machines, one running openSUSE tumbleweed (which I don't access at the moment to check kernel version) and another running xubuntu (tried kernels 4.15.0-32-generic and vanilla 4.18.0). Using any kernel from 4.16 to 4.19-rc5 in the guests (they use different debian versions) makes qemu hang running the fstests suite (after about 30 to 40 minutes, either at test generic/299 or test generic/451).

I tried different qemu versions, 2.11.2, 2.12.1 and 3.0.0, and it happens with all of them (all built from the sources available at https://www.qemu.org/download/#source).

I built 3.0.0 with debug enabled, using the following parameters for 'configure':

--prefix=/home/fdmanana/qemu-3.0.0 --enable-tools --enable-linux-aio --enable-kvm --enable-vnc --enable-vnc-png --enable-debug --extra-cflags="-O0 -g3 -fno-omit-frame-pointer"

And captured a coredump of the qemu process, available at:

https://www.dropbox.com/s/d1tlsimahykwhla/core_dump_debug.tar.xz?dl=0

the stack traces of all threads, for a quick look:

https://friendpaste.com/zqkz2pD0WgSdeSKITHPDf

qemu is being invoked with the following script:

#!/bin/bash -x

sudo modprobe tun
sudo modprobe kvm
sudo modprobe kvm-intel

sudo tunctl -t tap5 -u fdmanana
sudo ifconfig tap5 up
sudo brctl addif br0 tap5

sudo umount /mnt/tmp5
sudo mkdir -p /mnt/tmp5
sudo mount -t tmpfs -o size=14G tmpfs /mnt/tmp5
for ((i = 2; i <= 7; i++)); do
        sudo qemu-img create -f qcow2 /mnt/tmp5/disk$i 13G
done
sudo chown fdmanana /mnt/tmp5/disk*

qemu-system-x86_64 -m 4G \
        -device virtio-scsi-pci \
        -boot c \
\
        -drive if=none,file=debian5.qcow2,cache=none,aio=native,cache.direct=on,format=qcow2,id=drive0,discard=on \
        -device scsi-hd,drive=drive0,bus=scsi.0 \
\
        -drive if=none,file=/mnt/tmp5/disk2,cache=writeback,format=qcow2,id=drive1,discard=on \
        -device scsi-hd,drive=drive1,bus=scsi.0 \
\
        -drive if=none,file=/mnt/tmp5/disk3,cache=writeback,format=qcow2,id=drive2,discard=on \
        -device scsi-hd,drive=drive2,bus=scsi.0 \
\
        -drive if=none,file=/mnt/tmp5/disk4,cache=writeback,format=qcow2,id=drive3,discard=on \
        -device scsi-hd,drive=drive3,bus=scsi.0 \
\
        -drive if=none,file=/mnt/tmp5/disk5,cache=writeback,format=qcow2,id=drive4,discard=on \
        -device scsi-hd,drive=drive4,bus=scsi.0 \
\
        -drive if=none,file=/mnt/tmp5/disk6,cache=writeback,format=qcow2,id=drive5,discard=on \
        -device scsi-hd,drive=drive5,bus=scsi.0 \
\
        -drive if=none,file=/mnt/tmp5/disk7,cache=writeback,format=qcow2,id=drive6,discard=on \
        -device scsi-hd,drive=drive6,bus=scsi.0 \
\
        -drive if=none,file=disk8,cache=writeback,aio=native,cache.direct=on,format=qcow2,id=drive7,discard=on \
        -device scsi-hd,drive=drive7,bus=scsi.0 \
\
        -drive if=none,file=disk9,cache=writeback,aio=native,cache.direct=on,format=qcow2,id=drive8,discard=on \
        -device scsi-hd,drive=drive8,bus=scsi.0 \
\
        -drive if=none,file=disk10,cache=writeback,aio=native,cache.direct=on,format=qcow2,id=drive9,discard=on \
        -device scsi-hd,drive=drive9,bus=scsi.0 \
\
        -net nic,macaddr=52:54:00:12:34:fa -net tap,ifname=tap5,script=no,downscript=no \
        -rtc base=localtime -enable-kvm -machine accel=kvm -smp 4 -cpu host \
        -k pt -serial tcp:127.0.0.1:9997 -display vnc=:5

Is there anything else I can provided to help debug this?

Thanks.

Revision history for this message
Thomas Huth (th-huth) wrote :

The QEMU project is currently considering to move its bug tracking to another system. For this we need to know which bugs are still valid and which could be closed already. Thus we are setting older bugs to "Incomplete" now.
If you still think this bug report here is valid, then please switch the state back to "New" within the next 60 days, otherwise this report will be marked as "Expired". Or mark it as "Fix Released" if the problem has been solved with a newer version of QEMU already. Thank you and sorry for the inconvenience.

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.