qcow2 image logical corruption after host crash
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
QEMU |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Description of problem:
In case of power failure disk images that were active and created in qcow2 format can become logically corrupt so that they actually appear as unused (full of zeroes).
Data seems to be there, but at this moment i cannot find any reliable method to recover it. Should it be a raw image, a recovery path would be available, but a qcow2 image only presents zeroes once it gets corrupted. My understanding is that the blockmap of the image gets reset and the image is then assumed to be unused.
My detailed setup :
Kernel 2.6.32-
qemu-kvm-
Used via libvirt libvirt-
The image was used from a NFS share (the nfs server did NOT crash and remained permanently active).
qemu-img check finds no corruption;
qemu-img convert will fully convert the image to raw at a raw image full of zeroes. However, there is data in the file, and the storage backend was not restarted, inactivated during the incident.
I encountered this issue on two different machines, in both cases i was not able to recover the data.
Image was qcow2, thin provisioned, created like this :
qemu-img create -f qcow2 -o cluster_size=2M imagename.img
While addressing the root cause in order to not have this issue repeat would be the ideal scenario, a temporary workaround to run on the affected qcow2 image to "patch" it and recover the data (eventually after a full fsck/recovery inside the guest) would also be good. Otherwise we are basically losing data on a large scale when using qcow2.
Version-Release number of selected component (if applicable):
Kernel 2.6.32-
qemu-kvm-
Used via libvirt libvirt-
How reproducible:
I am not able (and don't have at the moment enough resources to try to manually reproduce it), but the probability of the issue seems quite high as this is the second case of such corruption in weeks.
Additional info:
I can privately provide an image displaying the corruption.
The reported problem has actually two aspects : first is the cause that eventually produces this issue.
The second is the fact that once the logical corruption has occured, qemu-img check finds nothing wrong with the image - this is obviously wrong.
On Tue, Nov 12, 2013 at 09:17:34AM -0000, Blue wrote:
Please post the qemu command-line (ps aux | grep qemu) for the affected
VM.
What kind of workload is accessing the disk? Guest OS and version?
Please also confirm that nothing else is accessing the image file while
the VM is running. It is not safe to use tools like qemu-img on the
file while the VM has it open.
> Image was qcow2, thin provisioned, created like this :
> qemu-img create -f qcow2 -o cluster_size=2M imagename.img
Interesting note, you are setting a custom cluster size. It should work
but it's not the default configuration that is well-tested.
> I can privately provide an image displaying the corruption.
Please send a link to <email address hidden> and <email address hidden>. We can
help you inspect the image to determine the nature of the corruption and
what data can be restored.
Stefan