data corruption in storage attached to VM using KVM

Bug #1189926 reported by Chris J Arges on 2013-06-11
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu-kvm (Ubuntu)
Undecided
Unassigned
Precise
High
Chris J Arges
Quantal
High
Unassigned

Bug Description

[Impact]

When using qemu-kvm-1.0 occasionally qcow2 disks will be corrupted.

[Test Case]

A test case can be downloaded here:
    http://people.canonical.com/~arges/lp1189926/lnv-382.tgz
Extract the contents and run ./create-disk, then ./test-kvm to test the currently installed KVM. Keep in mind that 100GB+ of disk is required to run the test.

[Regression Potential]

The patch is a backport of commit 143550a83ef4eef86a847d00023d148e1f59f743 upstream, which changes the way that the number of available clusters are counted. While the original patch has assumptions that certain functions are available, the backport had to change these to be able to apply to v1.0.

--

When using io_perf corruption can be detected when using qemu-kvm-1.0.
A test case can be downloaded here:
    http://people.canonical.com/~arges/lp1189926/lnv-382.tgz
Extract the contents and run ./create-disk, then ./test-kvm to test the currently installed KVM. Keep in mind that 100GB+ of disk is required to run the test.

This affects the qemu-kvm version in Precise, but is fixed in Quantal and beyond.

It is expected that running the test case results in no disk corruption; however we detect corruption on Precise.

Running git bisect on this test finds that commit 68d100e905453ebbeea8e915f4f18a2bd4339fe8 introduced the problem between v0.15.0 and v1.0.
Running git bisect in reverse finds that commit b7ab0fea37c15ca9e249c42c46f5c48fd1a0943c fixes the issue between v1.1.2 and v1.2.0.

However, b7ab0fea cannot be easily backported to v1.0 and 68d100e9 is a large change to revert. However, both changes seem to indicate that the problem lies in the qcow2 parts of the code.

bug 1040033 seems to be a related issue.

Chris J Arges (arges) on 2013-06-11
Changed in qemu-kvm (Ubuntu Precise):
importance: Undecided → High
Changed in qemu-kvm (Ubuntu):
importance: High → Undecided
status: New → Fix Released
Changed in qemu-kvm (Ubuntu Quantal):
status: New → Fix Released
description: updated
Changed in qemu-kvm (Ubuntu Precise):
status: New → Triaged
Changed in qemu-kvm (Ubuntu Quantal):
importance: Undecided → High
Chris J Arges (arges) on 2013-06-12
Changed in qemu-kvm (Ubuntu Precise):
assignee: nobody → Chris J Arges (arges)
status: Triaged → In Progress
Chris J Arges (arges) wrote :

This patch applied against v1.0 fixes the issue.

Chris J Arges (arges) wrote :

The attached debdiff fixes the issue for precise.

A test build of this package is available here:
http://people.canonical.com/~arges/lp1189926/

description: updated

Hello Chris, or anyone else affected,

Accepted qemu-kvm into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/qemu-kvm/1.0+noroms-0ubuntu14.9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in qemu-kvm (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Dmitry Shachnev (mitya57) wrote :

Unsubscribing sponsors, as the fix has been uploaded.

Chris J Arges (arges) wrote :

Please do NOT promote this package, there is a better fix that can be used and it currently being tested. I'll attach the new debdiff here soon.

Chris J Arges (arges) wrote :

This patch contains the proper solution to this issue.

Serge Hallyn (serge-hallyn) wrote :

Hi Chris,

Do you mind if the patch comment gets updated to something like:

While searching for available clusters, if we detect an ongoing AIO
write request, then we restart after the other has completed. By not
re-setting i to 0, we fail to re-check clusters which may no longer be
available.

Chris J Arges (arges) wrote :

Made modifications to patch based on your comments. I also tested this in a loop this weekend and passed every time.

Serge Hallyn (serge-hallyn) wrote :

Setting verification-failed to indicate we'd like 1.0+noroms-0ubuntu14.9 dropped. 1.0+noroms-0ubuntu14.10 is in Unapproved, and has the better fix.

tags: added: verification-failed
removed: verification-needed
Clint Byrum (clint-fewbar) wrote :

Hello Chris, or anyone else affected,

Accepted qemu-kvm into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/qemu-kvm/1.0+noroms-0ubuntu14.10 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: removed: verification-failed
tags: added: verification-needed
Chris J Arges (arges) wrote :

I have verified these packages and they do pass the tests.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu-kvm - 1.0+noroms-0ubuntu14.10

---------------
qemu-kvm (1.0+noroms-0ubuntu14.10) precise-proposed; urgency=low

  * remove 9004-qcow2-Simplify-count_cow_clusters.patch, which may or may
    not have actually fixed bug 1189926. Replace ith with:
    9004-qcow2-start-at-0-when-counting-cow-clusters.patch: Fixes corruption
    issues with qcow2. (LP: #1189926)

qemu-kvm (1.0+noroms-0ubuntu14.9) precise-proposed; urgency=low

  * 9004-qcow2-Simplify-count_cow_clusters.patch: fixes corruption
    with qcow2. (LP: #1189926)
 -- Chris J Arges <email address hidden> Mon, 17 Jun 2013 10:11:38 -0500

Changed in qemu-kvm (Ubuntu Precise):
status: Fix Committed → Fix Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers