qemu-img convert intermittently corrupts output images

Bug #1368815 reported by Michael Steffens
36
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
Won't Fix
High
Tony Breeds
OpenStack Compute (nova)
Won't Fix
High
Tony Breeds
QEMU
Fix Released
Undecided
Tony Breeds
qemu (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
High
Unassigned
Utopic
Fix Released
High
Unassigned
Vivid
Fix Released
Undecided
Unassigned

Bug Description

==========================================================
Impact: occasional image corruption (any format on local filesystem)
Test case: see the qemu-img command below
Regression potential: this cherrypicks a patch from upstream to a not-insignificantly older qemu source tree. While the cherrypick seems sane, it's possible that there are subtle interactions with the other delta. I'd really like for a full qa-regression-test qemu testcase to be run against this package.
==========================================================

-- Found in releases qemu-2.0.0, qemu-2.0.2, qemu-2.1.0. Tested on Ubuntu 14.04 using Ext4 filesystems.

The command

  qemu-img convert -O raw inputimage.qcow2 outputimage.raw

intermittently creates corrupted output images, when the input image is not yet fully synchronized to disk. While the issue has actually been discovered in operation of of OpenStack nova, it can be reproduced "easily" on command line using

  cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH $DST_PATH && cksum $DST_PATH

on filesystems exposing this behavior. (The difficult part of this exercise is to prepare a filesystem to reliably trigger this race. On my test machine some filesystems are affected while other aren't, and unfortunately I haven't found the relevant difference between them, yet. Possible it's timing issues completely out of userspace control ...)

The root cause, however, is the same as in

  http://lists.gnu.org/archive/html/coreutils/2011-04/msg00069.html

and it can be solved the same way as suggested in

  http://lists.gnu.org/archive/html/coreutils/2011-04/msg00102.html

In qemu, file block/raw-posix.c use the FIEMAP_FLAG_SYNC, i.e change

    f.fm.fm_flags = 0;

to

    f.fm.fm_flags = FIEMAP_FLAG_SYNC;

As discussed in the thread mentioned above, retrieving a page cache coherent map of file extents is possible only after fsync on that file.

See also

  https://bugs.launchpad.net/nova/+bug/1350766

In that bug report filed against nova, fsync had been suggested to be performed by the framework invoking qemu-img. However, as the choice of fiemap -- implying this otherwise unneeded fsync of a temporary file -- is not made by the caller but by qemu-img, I agree with the nova bug reviewer's objection to put it into nova. The fsync should instead be triggered by qemu-img utilizing the FIEMAP_FLAG_SYNC, specifically intended for that purpose.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Is there a minimum version of qemu that would be required to use the FIEMAP_FLAG_SYNC flag?

Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Michael Steffens (michael-steffens-b) wrote :

The affected code was introduced with version 1.2.0. However, due to https://bugs.launchpad.net/qemu/+bug/1193628 I can't build these old releases to verify whether they actually expose the same behaviour.

Revision history for this message
Michael Steffens (michael-steffens-b) wrote :

It seems the dust settles a bit: Found the relevant difference between my various filesystems, and how to reproduce the failure: Susceptible filesystems don't have the extent feature of ext4 enabled.

You can create such a filesystem using

  mke2fs -t ext4 -O ^extent /dev/...
  mount /mnt /dev/...

Adapting the command line example provided above you can see

  rm -f /mnt/tmp.qcow2
  cat $SRC_PATH > /mnt/tmp.qcow2 && qemu-img convert -O raw /mnt/tmp.qcow /mnt/tmp.qcow
  cksum /mnt/tmp.qcow

creating corrupt (usually nullified) result images. By inserting a sleep of at least 33 seconds between the cat command and the qemu-img invocation I'm getting proper output.

To me it's unclear now, where the actual defect is located. Creating ext4 filesystems with certain features disabled (such as the exetent tree) is apparently supported and ok. Is the fiemap ioctl supposed to handle this gracefully, for example by assuming FIEMAP_FLAG_SYNC in absence of an extent tree? Or are clients such as qemu-img supposed to always FIEMAP_FLAG_SYNC to be safe?

Revision history for this message
Pádraig Brady (p-draigbrady) wrote :

I see seek hole is supported in the latest qemu-img so I would reorder so that's tried first like:

    if lseek(SEEK_HOLE) == ENOTSUP
        use_that
        if fiemap(FIEMAP_FLAG_SYNC)
            use_that

Revision history for this message
Michael Steffens (michael-steffens-b) wrote :

The fallback cascade Pádraig mentions is already implemented in qemu-2.1.0, in function raw_co_get_block_status. Just swap

  ret = try_fiemap( ... )

and

  ret = try_seek_hole( ... )

to reverse the order. I can confirm that it works just fine on 3.13 kernel (all version since 3.1, according to lseek(2)), while older versions will fall back to fiemap, which needs to be protected with FIEMAP_FLAG_SYNC in try_fiemap, to be safe.

This should work under all conditions, and avoid redundant syncs where possible, right?

Revision history for this message
Matt Riedemann (mriedem) wrote :

Marking as High since duplicate bug 1350766 was marked High.

Changed in nova:
importance: Medium → High
Changed in qemu (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in qemu (Ubuntu Trusty):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Tony Breeds (o-tony) wrote :
Changed in nova:
assignee: nobody → Tony Breeds (o-tony)
Changed in qemu:
assignee: nobody → Tony Breeds (o-tony)
status: New → In Progress
Changed in qemu (Ubuntu):
assignee: nobody → Tony Breeds (o-tony)
Changed in nova:
status: Triaged → In Progress
Changed in qemu (Ubuntu):
assignee: Tony Breeds (o-tony) → nobody
Revision history for this message
Tony Breeds (o-tony) wrote :

FWIW the following 2 commits in qemu master resolve the issue for qemu-img.

  http://git.qemu.org/?p=qemu.git;a=commit;h=38c4d0aea3e1264c86e282d99560330adf2b6e25
  http://git.qemu.org/?p=qemu.git;a=commit;h=7c15903789953ead14a417882657d52dc0c19a24

If possible they should be back ported to trusty and utopic.

You'll also need something like:

  http://git.qemu.org/?p=qemu.git;a=commit;h=4f11aa8a40351b28c0e67c7276e0003b38cc46ac

before my 2 patches.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for the information. Looks like we can apply these in debian too.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu Vivid):
status: New → Confirmed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 2.1+dfsg-4ubuntu7

---------------
qemu (2.1+dfsg-4ubuntu7) vivid; urgency=medium

  * Apply two patches to fix intermittent qemu-img corruption
    (LP: #1368815)
    - 501-block-raw-posix-fix-disk-corruption-in-try-fiemap
    - 502-block-raw-posic-use-seek-hole-ahead-of-fiemap
 -- Serge Hallyn <email address hidden> Wed, 29 Oct 2014 22:31:43 -0500

Changed in qemu (Ubuntu Vivid):
status: Confirmed → Fix Released
description: updated
Revision history for this message
Tony Breeds (o-tony) wrote :

Hi Serge,

Is there any chance these fixes will go into trusty?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi Tony,

yes, I've uploaded a proposed fix for trusty-proposed earlier today. It should be available for testing as soon as it is accepted.

Revision history for this message
Tony Breeds (o-tony) wrote :

Awesome.

Thanks!

description: updated
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Michael, or anyone else affected,

Accepted qemu into utopic-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/qemu/2.1+dfsg-4ubuntu6.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in qemu (Ubuntu Utopic):
status: Triaged → Fix Committed
tags: added: verification-needed
Changed in qemu (Ubuntu Trusty):
status: Triaged → Fix Committed
Revision history for this message
Chris J Arges (arges) wrote :

Hello Michael, or anyone else affected,

Accepted qemu into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/qemu/2.0.0+dfsg-2ubuntu1.8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Michael Steffens (michael-steffens-b) wrote :

Tested qemu-utils 2.0.0+dfsg-2ubuntu1.8. Successful.

tags: added: verification-done-trusty
removed: verification-needed
Mathew Hodson (mhodson)
tags: added: verification-needed-utopic
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 2.0.0+dfsg-2ubuntu1.8

---------------
qemu (2.0.0+dfsg-2ubuntu1.8) trusty-proposed; urgency=medium

  * debian/qemu-system-x86.qemu-kvm.upstart: create /dev/kvm in a
    container. (LP: #1370199)
  * Cherrypick upstream patch to fix intermittent qemu-img corruption
    (LP: #1368815)
    - 501-block-raw-posix-fix-disk-corruption-in-try-fiemap
    - (note - 502-block-raw-posic-use-seek-hole-ahead-of-fiemap (which was
      also needed in utopic) appears to be unneeded here as the code being
      changed has not yet been switched to using try_fiemap)
 -- Serge Hallyn <email address hidden> Thu, 20 Nov 2014 11:24:51 -0600

Changed in qemu (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Michael,

by any chance would you be albe to test on utopic?

Changed in mos:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → MOS Linux (mos-linux)
milestone: none → 6.0
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I couldn't reproduce the bug on the old qemu myself, however Michael has verified the (same) fix on trusty, and the full qa-regression-test passed for me on utopic-proposed. So I would request that we call this verification-done.

Revision history for this message
Chris J Arges (arges) wrote :

Looking at the fixes, I also see the following commits remove the above changes, which could mean we might encounter this again:
c4875e5 raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
d1f06fe raw-posix: The SEEK_HOLE code is flawed, rewrite it

Note there is also a related issue:
bug 1292234
So far testing with the proposed qemu version or upstream I still encounter issues on ext4 w/ ^extent and ext3 filesystems.

Chris J Arges (arges)
tags: added: verification-done-utopic
removed: verification-needed-utopic
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Filed a separate issue for MOS https://bugs.launchpad.net/mos/+bug/1401261

no longer affects: mos
Revision history for this message
Tony Breeds (o-tony) wrote :

Hi Chris,
Markus' rework will not reintroduce this bug as it completely removes all fiemap code.

bug 129224 is a different issue, I'll comment on that bug.

You say: you encounter issues with upstream with ^extent and ext3 filesystems. Just to be clear: Are you saying that *this* bug is still a problem for you?

if it's a different bug then I write it up and I'll take a look.

Revision history for this message
Chris J Arges (arges) wrote :

Tony,

Yea, its a different bug. I tested with the above patched package and upstream qemu from git, and I can still hit bug 129224. I was hoping this also fixed my issue, but unfortunately it seems to be a different issue that occurs when using the same types of filesystems. I have a solid reproducer on my desk so let me know which experiments / areas of code / etc I should look at.

Revision history for this message
Chris J Arges (arges) wrote :

Just to clarify it's bug 1292234 in the previous comment.

Revision history for this message
Tony Breeds (o-tony) wrote :

Chris,
I've read through 1292234 and I'll have a play with your reproducer locally and see if I can gain any insight.

I'm sorry my fix didn't help 1292234, but glad you can't hit 1368815 with upstream, I was kinda having kittens here ;P

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 2.1+dfsg-4ubuntu6.2

---------------
qemu (2.1+dfsg-4ubuntu6.2) utopic-proposed; urgency=medium

  * Apply two patches to fix intermittent qemu-img corruption
    (LP: #1368815)
    - 501-block-raw-posix-fix-disk-corruption-in-try-fiemap
    - 502-block-raw-posic-use-seek-hole-ahead-of-fiemap
 -- Serge Hallyn <email address hidden> Thu, 20 Nov 2014 16:33:09 -0600

Changed in qemu (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Tony Breeds (o-tony) wrote :

I'm happy to tackle to also fix cinder with a version of the nova fix (for consistency). I propose waiting until the nova fix lands

Changed in cinder:
assignee: nobody → Tony Breeds (o-tony)
Revision history for this message
Tony Breeds (o-tony) wrote :

I'd elevate this to high so it matches nova and ubuntu but I don't have permissions to do so.

Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/141259

Changed in cinder:
assignee: Tony Breeds (o-tony) → John Griffith (john-griffith)
Jay Bryant (jsbryant)
Changed in cinder:
importance: Undecided → High
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> - 501-block-raw-posix-fix-disk-corruption-in-try-fiemap
> - (note - 502-block-raw-posic-use-seek-hole-ahead-of-fiemap (which was
> also needed in utopic) appears to be unneeded here as the code being
> changed has not yet been switched to using try_fiemap)

Actually such a enforces fsync and drastically reduces the performance of conversion.
I propose to use seek_hole instead of FIEMAP (which is basically what
 502-block-raw-posic-use-seek-hole-ahead-of-fiemap does).

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

The second part of the fix (which does not reduce the performance) for qemu 2.0 (apparently uploading two patches at once is not so easy)

Revision history for this message
Tony Breeds (o-tony) wrote :

Patchg 0500-block-raw-posix-Try-both-FIEMAP-and-SEEK_HOLE.patch appears to be part of a bigger re-write of the related code. and is ON TOP of the patches already applied in this bug.

No doubt the rewirtten code is "better" but backporting it contains more risk than the 2 simple fixes I already nominated.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> Patch 0500-block-raw-posix-Try-both-FIEMAP-and-SEEK_HOLE.patch appears to be part of a bigger re-write
> of the related code. and is ON TOP of the patches already applied in this bug.

Yep, sorry for not mentioning this. As far as I understand qemu-2.1 package contains this partially rewritten
code too (without any recent changes like disabling FIEMAP completely and rewriting the code using SEEK_HOLE).

> No doubt the rewirtten code is "better" but backporting it contains more risk than the 2 simple fixes I already nominated.

Can we completely disable the FIEMAP code and pretend that all blocks are allocated? I'm afraid fsync'ing 100+ GB
files might be even slower than ignoring the sparseness.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by John Griffith (<email address hidden>) on branch: master
Review: https://review.openstack.org/141259

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/143575

Changed in cinder:
assignee: John Griffith (john-griffith) → Tony Breeds (o-tony)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Mike Perez (<email address hidden>) on branch: master
Review: https://review.openstack.org/143575
Reason: 1 month, no update.

Mike Perez (thingee)
Changed in cinder:
status: In Progress → Triaged
assignee: Tony Breeds (o-tony) → nobody
Tony Breeds (o-tony)
Changed in cinder:
assignee: nobody → Tony Breeds (o-tony)
Tony Breeds (o-tony)
Changed in nova:
milestone: none → kilo-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Tony Breeds (<email address hidden>) on branch: master
Review: https://review.openstack.org/123957
Reason: The main distros we care about have landed or are in progress.

Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-2 → kilo-3
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Marking as Wont-Fix.

Changed in nova:
status: In Progress → Won't Fix
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-3 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Mike Perez (<email address hidden>) on branch: master
Review: https://review.openstack.org/143575
Reason: No activity for over a month.

Revision history for this message
Eric Harney (eharney) wrote :

Closing based on the assumption that a working qemu-img is available now.

Changed in cinder:
status: Triaged → Won't Fix
Revision history for this message
Thomas Huth (th-huth) wrote :

According to comment #8 the fixes have been included in the upstream QEMU repository, so setting the status to "Fix released" now.

Changed in qemu:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.