data corruption in storage attached to VM using KVM

Bug #1189926 reported by Chris J Arges
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu-kvm (Ubuntu)
Fix Released
Undecided
Unassigned
Precise
Fix Released
High
Chris J Arges
Quantal
Fix Released
High
Unassigned

Bug Description

[Impact]

When using qemu-kvm-1.0 occasionally qcow2 disks will be corrupted.

[Test Case]

A test case can be downloaded here:
    http://people.canonical.com/~arges/lp1189926/lnv-382.tgz
Extract the contents and run ./create-disk, then ./test-kvm to test the currently installed KVM. Keep in mind that 100GB+ of disk is required to run the test.

[Regression Potential]

The patch is a backport of commit 143550a83ef4eef86a847d00023d148e1f59f743 upstream, which changes the way that the number of available clusters are counted. While the original patch has assumptions that certain functions are available, the backport had to change these to be able to apply to v1.0.

--

When using io_perf corruption can be detected when using qemu-kvm-1.0.
A test case can be downloaded here:
    http://people.canonical.com/~arges/lp1189926/lnv-382.tgz
Extract the contents and run ./create-disk, then ./test-kvm to test the currently installed KVM. Keep in mind that 100GB+ of disk is required to run the test.

This affects the qemu-kvm version in Precise, but is fixed in Quantal and beyond.

It is expected that running the test case results in no disk corruption; however we detect corruption on Precise.

Running git bisect on this test finds that commit 68d100e905453ebbeea8e915f4f18a2bd4339fe8 introduced the problem between v0.15.0 and v1.0.
Running git bisect in reverse finds that commit b7ab0fea37c15ca9e249c42c46f5c48fd1a0943c fixes the issue between v1.1.2 and v1.2.0.

However, b7ab0fea cannot be easily backported to v1.0 and 68d100e9 is a large change to revert. However, both changes seem to indicate that the problem lies in the qcow2 parts of the code.

bug 1040033 seems to be a related issue.

Chris J Arges (arges)
Changed in qemu-kvm (Ubuntu Precise):
importance: Undecided → High
Changed in qemu-kvm (Ubuntu):
importance: High → Undecided
status: New → Fix Released
Changed in qemu-kvm (Ubuntu Quantal):
status: New → Fix Released
description: updated
Changed in qemu-kvm (Ubuntu Precise):
status: New → Triaged
Changed in qemu-kvm (Ubuntu Quantal):
importance: Undecided → High
Chris J Arges (arges)
Changed in qemu-kvm (Ubuntu Precise):
assignee: nobody → Chris J Arges (arges)
status: Triaged → In Progress
Revision history for this message
Chris J Arges (arges) wrote :

This patch applied against v1.0 fixes the issue.

Revision history for this message
Chris J Arges (arges) wrote :

The attached debdiff fixes the issue for precise.

A test build of this package is available here:
http://people.canonical.com/~arges/lp1189926/

description: updated
Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello Chris, or anyone else affected,

Accepted qemu-kvm into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/qemu-kvm/1.0+noroms-0ubuntu14.9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in qemu-kvm (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Dmitry Shachnev (mitya57) wrote :

Unsubscribing sponsors, as the fix has been uploaded.

Revision history for this message
Chris J Arges (arges) wrote :

Please do NOT promote this package, there is a better fix that can be used and it currently being tested. I'll attach the new debdiff here soon.

Revision history for this message
Chris J Arges (arges) wrote :

This patch contains the proper solution to this issue.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi Chris,

Do you mind if the patch comment gets updated to something like:

While searching for available clusters, if we detect an ongoing AIO
write request, then we restart after the other has completed. By not
re-setting i to 0, we fail to re-check clusters which may no longer be
available.

Revision history for this message
Chris J Arges (arges) wrote :

Made modifications to patch based on your comments. I also tested this in a loop this weekend and passed every time.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Setting verification-failed to indicate we'd like 1.0+noroms-0ubuntu14.9 dropped. 1.0+noroms-0ubuntu14.10 is in Unapproved, and has the better fix.

tags: added: verification-failed
removed: verification-needed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hello Chris, or anyone else affected,

Accepted qemu-kvm into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/qemu-kvm/1.0+noroms-0ubuntu14.10 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: removed: verification-failed
tags: added: verification-needed
Revision history for this message
Chris J Arges (arges) wrote :

I have verified these packages and they do pass the tests.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu-kvm - 1.0+noroms-0ubuntu14.10

---------------
qemu-kvm (1.0+noroms-0ubuntu14.10) precise-proposed; urgency=low

  * remove 9004-qcow2-Simplify-count_cow_clusters.patch, which may or may
    not have actually fixed bug 1189926. Replace ith with:
    9004-qcow2-start-at-0-when-counting-cow-clusters.patch: Fixes corruption
    issues with qcow2. (LP: #1189926)

qemu-kvm (1.0+noroms-0ubuntu14.9) precise-proposed; urgency=low

  * 9004-qcow2-Simplify-count_cow_clusters.patch: fixes corruption
    with qcow2. (LP: #1189926)
 -- Chris J Arges <email address hidden> Mon, 17 Jun 2013 10:11:38 -0500

Changed in qemu-kvm (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Colin Watson (cjwatson) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.