2020-11-04 23:54:13 |
John Snow |
description |
Since upgrading to QEMU 2.8.0, my Windows 7 64-bit virtual machines
started crashing due to the assertion quoted in the summary failing.
The assertion in question was added by commit 9972354856 ("block: add
BDS field to count in-flight requests"). My tests show that setting
discard=unmap is needed to reproduce the issue. Speaking of
reproduction, it is a bit flaky, because I have been unable to come up
with specific instructions that would allow the issue to be triggered
outside of my environment, but I do have a semi-sane way of testing that
appears to depend on a specific initial state of data on the underlying
storage volume, actions taken within the VM and waiting for about 20
minutes.
Here is the shortest QEMU command line that I managed to reproduce the
bug with:
qemu-system-x86_64 \
-machine pc-i440fx-2.7,accel=kvm \
-m 3072 \
-drive file=/dev/lvm/qemu,format=raw,if=ide,discard=unmap \
-netdev tap,id=hostnet0,ifname=tap0,script=no,downscript=no,vhost=on \
-device virtio-net-pci,netdev=hostnet0 \
-vnc :0
The underlying storage (/dev/lvm/qemu) is a thin LVM snapshot.
QEMU was compiled using:
./configure --python=/usr/bin/python2.7 --target-list=x86_64-softmmu
make -j3
My virtualization environment is not really a critical one and
reproduction is not that much of a hassle, so if you need me to gather
further diagnostic information or test patches, I will be happy to help. |
Maintainer Edit:
The functions in dma-helpers mismanage misaligned IO, badly enough to cause an infinite loop where no progress can be made. This allows the IDE state machine to get wedged such that cancelling DMA can fail; because the DMA helpers have bodged the state of the DMA transfer. See Comment #15 for the in-depth analysis.
I've updated the name of this bug to reflect the current status as I understand it.
--js
Original report:
Since upgrading to QEMU 2.8.0, my Windows 7 64-bit virtual machines
started crashing due to the assertion quoted in the summary failing.
The assertion in question was added by commit 9972354856 ("block: add
BDS field to count in-flight requests"). My tests show that setting
discard=unmap is needed to reproduce the issue. Speaking of
reproduction, it is a bit flaky, because I have been unable to come up
with specific instructions that would allow the issue to be triggered
outside of my environment, but I do have a semi-sane way of testing that
appears to depend on a specific initial state of data on the underlying
storage volume, actions taken within the VM and waiting for about 20
minutes.
Here is the shortest QEMU command line that I managed to reproduce the
bug with:
qemu-system-x86_64 \
-machine pc-i440fx-2.7,accel=kvm \
-m 3072 \
-drive file=/dev/lvm/qemu,format=raw,if=ide,discard=unmap \
-netdev tap,id=hostnet0,ifname=tap0,script=no,downscript=no,vhost=on \
-device virtio-net-pci,netdev=hostnet0 \
-vnc :0
The underlying storage (/dev/lvm/qemu) is a thin LVM snapshot.
QEMU was compiled using:
./configure --python=/usr/bin/python2.7 --target-list=x86_64-softmmu
make -j3
My virtualization environment is not really a critical one and
reproduction is not that much of a hassle, so if you need me to gather
further diagnostic information or test patches, I will be happy to help. |
|