virtio block read errors
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
QEMU |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Context :
- Gentoo Linux distribution on host and guests.
- qemu-kvm-0.12.5-r1
- 2.6.34-gentoo-r11 host kernel
- 2.6.29-gentoo-r5 guest kernels
- VM boots from and uses a single virtio block device.
On the old kvm bugtracker there was a discussion about a bug with virtio block devices :
http://
I was affected (user gyver in the above discussion) and believed that the problem was fully solved : we had the read error problems on 4 physical hosts . I migrated 3 of the 4 hosts to Gentoo's qemu-kvm-0.12.5-r1 which fixed the problems and allowed us to use virtio block devices instead of emulated PIIX.
It seems there's a corner case left or another bug with similar consequences.
I just used a maintenance window on the last physical host (one hard disk switched for repair in a RAID1 array) to migrate the ancient kvm-85 (which worked for us with virtio) to 0.12.5. The read errors in virtio block mode came back instantly.
We have 3 VMs on this 4th host, 2 are x86, 1 is x86_64. All of them try to boot from a virtio block device and fail to do so with Gentoo's qemu-kvm-0.12.5-r1. They report read errors on /dev/vda and remount the root fs read-only. Reconfiguring them to use emulated PIIX works. There's something interesting about PIIX mode that I'm not sure I've seen before though: there are errors reported by the ATA stack during the boot and the guest kernels switch to PIO after resetting the ide0 interface. More on that later.
Booting all these VMs works properly with Gentoo's 0.11.1-r1 with virtio block.
Two details that might help :
1/
We use DRBD devices for all our virtual disks (on all 4 physical hosts),
2/
The "failing" host has different hardware, main differences :
- Core2 Duo architecture instead of Core i7,
- hardware RAID controller: a 3ware 8006-2LP with two SATA disks in RAID-1 mode instead of plain AHCI SATA controllers and software raid 1.
Currently the controller on the "failing" host is rebuilding the array after we switched a failing disk with a brand new one. Although there's no read error on the physical host as far as its kernel is concerned, read performance is suffering : 5MB/s top from the guest point of view with 0.11.1-r1 and virtio block with a dd if=/dev/vda (only one VM running and host idle to avoid interferences other than the RAID rebuild), ...
This poor read performance might explain the problem we saw in the guest kernel with PIIX emulation on qemu-kvm-0.12.5-r1 (slow reads might be confused with buggy DMA transfers explaining the PIO fallback). We didn't have time to test PIIX emulation after the RAID array was fully synchronized but can do on request.
Triaging old bug tickets ... can you still recreate this issue with the latest version of QEMU, or can we close this bug nowadays?