qemu-nbd corrupts files
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
QEMU |
Fix Released
|
Undecided
|
Unassigned | ||
qemu (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Trusty |
Fix Released
|
High
|
Unassigned |
Bug Description
[Impact]
A race condition in the VDI block driver of Qemu leads to image (and thus file system) corruption under certain circumstances.
This makes Qemu tools usage for VDI formatted images particularly dangerous (qemu-img, qemu-nbd).
The bug fix introduces locks to prevent such race condition.
[Test Case]
A simple test case was provided in comment #5 (https:/
$ ./qemu-img create -f vdi test.vdi 2G
Formatting 'test.vdi', fmt=vdi size=2147483648 static=off
$ ./qemu-img create -f raw test.raw 2G
Formatting 'test.raw', fmt=raw size=2147483648
$ x86_64-
blkverify: read sector_num=810976 nb_sectors=256 contents mismatch in sector 811008
Operations in the guest:
$ dd if=/dev/vdb of=/dev/vda
$ dd if=/dev/vda of=/dev/null
[Regression Potential]
In case of bugs affecting the way locks are used, deadlocks could be a regression, but they would only affect VDI images.
Original bug report:
Dear all,
On Trusty, in certain situations, try to copy files over a qemu-nbd mounted file system leads to write errors (and thus, file corruption).
Here is the last example I tried:
-> virtual disk is a VDI disk
-> It has only one partition, in FAT
Here is my mount process:
# modprobe nbd max_part=63
# qemu-nbd -c /dev/nbd0 "virtual_disk.vdi"
# partprobe /dev/nbd0
# mount /dev/nbd0p1 /tmp/mnt/
Partition is properly mounted at that point:
/dev/nbd0p1 on /tmp/mnt type vfat (rw)
Now, when I copy a file (rather big, ~28MB):
# cp file_to_copy /tmp/mnt/ ; sync
# md5sum /tmp/mnt/
2efc9f32e426778
# umount /tmp/mnt
# mount /dev/nbd0p1 /tmp/mnt/
# md5sum /tmp/mnt/
42b0a3bf73f704d
The first hash was obviously the right one.
On a previous attempt I did, I spotted thanks to vbindiff that parts of the file were just filed with 0s instead of actual data.
It will randomly work after several attempts to write.
Version information:
# qemu-nbd --version
qemu-nbd version 0.0.1
Written by Anthony Liguori.
Cheers,
Changed in qemu (Ubuntu Trusty): | |
importance: | Medium → High |
assignee: | nobody → Serge Hallyn (serge-hallyn) |
Changed in qemu (Ubuntu Trusty): | |
assignee: | Serge Hallyn (serge-hallyn) → nobody |
tags: |
added: verification-done removed: verification-needed |
Hi Pierre,
I can reproduce the bug with a 2 GB VDI image with a single FAT32-formatted partition (on git master):
# cp src.vdi test.vdi 2c0c7dccbf770fa e6 /dev/nbd0 vm/drop_ caches da5e327710bfb39 96 /dev/nbd0
# ./qemu-nbd -c /dev/nbd0 test.vdi
# dd if=/dev/urandom of=/dev/nbd0 bs=1M count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 3.34091 s, 20.1 MB/s
# md5sum /dev/nbd0
bfa6726d0d8fe75
# sync
# echo 1 > /proc/sys/
# md5sum /dev/nbd0
cb4762769e09ed6
# ./qemu-nbd -d /dev/nbd0
/dev/nbd0 disconnected
Using qcow2 or not using NBD I cannot reproduce the issue. Using a qcow2 image and converting it to VDI, the issue appears again.
Using an empty VDI image, or one filled with random data, the issue does not appear either.
I have attached a qcow2 image for others to test:
# ./qemu-img convert -O vdi src.qcow2 test.vdi; ./qemu-nbd -c /dev/nbd0 test.vdi; dd if=/dev/urandom of=/dev/nbd0 bs=1M count=64; md5sum /dev/nbd0; sync; echo 1 > /proc/sys/ vm/drop_ caches; md5sum /dev/nbd0; ./qemu-nbd -d /dev/nbd0 8da04ec2f1b7abc 4a /dev/nbd0 26056eb2f2e5009 44 /dev/nbd0
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 3.33071 s, 20.1 MB/s
9f683b4a58cecdd
efb1cdd5ebe1dd3
/dev/nbd0 disconnected
Unfortunately, I do not yet know the cause of this issue.
Max