qemu-img convert on Mac OSX creates corrupt images

Bug #1776920 reported by Waldemar Kozaczuk on 2018-06-14
30
This bug affects 3 people
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Unassigned

Bug Description

An image created by qemu-img create, then modified by another program is converted to bad/corrupt image when using convert sub command on Mac OSX. The same convert works on Linux. The version of qemu-img is 2.12.

Can this be done with like a 1M example file that you could copy off in all stages.

Provide the commands you use like
1. create
2. do ??
3. convert

Then for Mac and Linux you'd have M1/M2/M3 L1/L2/L3 files that can all be attached here to be evaluated for what might be broken.

IMHO In the current state there is neither enough data for good debugging, not enough steps to reproduce what exactly you faced.

I will provide all necessary info. Unfortunately the smallest image I can
provide is around 10M.

What is M1/M2/M3 L1/L2/L3 file?

Waldek

On Thu, Jun 14, 2018 at 10:46 AM,  Christian Ehrhardt  <
<email address hidden>> wrote:

> Can this be done with like a 1M example file that you could copy off in
> all stages.
>
> Provide the commands you use like
> 1. create
> 2. do ??
> 3. convert
>
> Then for Mac and Linux you'd have M1/M2/M3 L1/L2/L3 files that can all
> be attached here to be evaluated for what might be broken.
>
> IMHO In the current state there is neither enough data for good
> debugging, not enough steps to reproduce what exactly you faced.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1776920
>
> Title:
> qemu-img convert on Mac OSX creates corrupt images
>
> Status in QEMU:
> New
>
> Bug description:
> An image created by qemu-img create, then modified by another program
> is converted to bad/corrupt image when using convert sub command on
> Mac OSX. The same convert works on Linux. The version of qemu-img is
> 2.12.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1776920/+subscriptions
>

>
> I will provide all necessary info. Unfortunately the smallest image I can
> provide is around 10M.
>
> What is M1/M2/M3 L1/L2/L3 file?
>

Just a suggestion how you could name the files
Linux-at-step-1 would be L1 and similar.

John Snow (jnsnow) wrote :

Are you converting to a destination on APFS? If yes, do you have the ability to easily reproduce this by converting to a non-APFS destination and seeing if that breaks too?

Thanks,
--js

Waldemar Kozaczuk (wkozaczuk) wrote :

I believe I have distilled entire process to few repeatable steps that can be fully reproduced on my Mac. The binary source files - - boot.bin and lzloader.elf - were created on my Linux VM running in VirtualBox on same Mac but I do not think it matters as the execution completely happens on Mac.

The steps executed on my mac:
1. dd if=boot.bin of=image.img > /dev/null 2>&1
2. dd if=lzloader.elf of=image.img conv=notrunc seek=128 > /dev/null 2>&1
3. qemu-img convert image.img -O qcow2 image.qemu
4. qemu-img convert image.qemu -O qcow2 image2.qemu

The end result:
ll image*
-rw-r--r-- 1 *** *** 6684672 Jun 14 17:17 image.img
-rw-r--r-- 1 *** *** 7012352 Jun 14 17:40 image.qemu
-rw-r--r-- 1 *** *** 196616 Jun 14 17:40 image2.qemu

The result of regular compare:
qemu-img compare image.qemu image2.qemu
Images are identical.

The result of strict compare:
qemu-img compare -s image.qemu image2.qemu
Strict mode: Offset 0 block status mismatch!

Images are clearly different.

The same 4 steps executed on my Linux VM behave correctly - image.qemu is TRULY identical with image2.qemu.

Qemu-img on my Mac:
qemu-img --version
qemu-img version 2.12.0
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

Details about Mac and OSX version:
Model Name: MacBook Pro
  Model Identifier: MacBookPro13,3
  Processor Name: Intel Core i7
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 4
  L2 Cache (per Core): 256 KB
  L3 Cache: 8 MB
  Memory: 16 GB

Mac filesystem:
Mount Point: /
  File System: APFS
  Writable: Yes
  Ignore Ownership: No
  BSD Name: disk1s1
  Physical Drive:
  Device Name: APPLE SSD SM0512L
  Media Name: AppleAPFSMedia
  Medium Type: SSD
  Protocol: PCI-Express
  Internal: Yes
  Partition Map Type: Unknown

I am also attaching both source files and images for examination.

Waldemar Kozaczuk (wkozaczuk) wrote :

Source file 1

Waldemar Kozaczuk (wkozaczuk) wrote :

Source file 2

Waldemar Kozaczuk (wkozaczuk) wrote :

Raw image created by dd in steps 1 and 2.

Waldemar Kozaczuk (wkozaczuk) wrote :

Original qcow2 image converted from raw image in step 3.

Waldemar Kozaczuk (wkozaczuk) wrote :

The corrupt qcow2 image created by converting image.qemu in step 4.

Waldemar Kozaczuk (wkozaczuk) wrote :

Also if I use the same image.qemu file and convert to vmdk format I get even smaller file which for sure is wrong as well:

qemu-img convert image.qemu -O vmdk image2.vbox

ll image*
-rw-r--r-- 1 *** *** 6684672 Jun 14 17:17 image.img
-rw-r--r-- 1 *** *** 7012352 Jun 14 17:40 image.qemu
-rw-r--r-- 1 *** *** 196616 Jun 14 17:40 image2.qemu
-rw-r--r-- 1 *** *** 65536 Jun 14 18:00 image2.vbox

Waldemar Kozaczuk (wkozaczuk) wrote :

Have I provided all necessary data and other details?

John Snow (jnsnow) wrote :

I haven't had the time to look just yet.

If there's a developer out there with a Mac has the bandwidth to take a look at this I'd be grateful. (I don't have access to one presently.)

I see that the target filesystem is APFS however -- I think we might have a bug in our APFS support. Do you have the ability to try it on your Mac but on a non-APFS destination?

It looks like the image is pretty small (~6MB?) so maybe you can try with a non-APFS formatted thumb drive?

My hunch is that:
- APFS source to FAT32 destination might work correctly.
- FAT32 source to FAT32 destination will definitely work correctly.

I have done more tests based on your suggestion and I think it is the opposite. The problem happens is the source is APFS.

Following failed:
APFS -> ExFAT
APFS -> Fat32

Following worked:
ExFAT -> APFS
FAT32 -> APFS

So for now my workaround is to use USB stick formatted with FAT32 or ExFAT, copy the source images there and then convert to somewhere or my main drive with APFS.

My regards,
Waldek

Sent from my iPhone

> On Jun 20, 2018, at 11:54, John Snow <email address hidden> wrote:
>
> I haven't had the time to look just yet.
>
> If there's a developer out there with a Mac has the bandwidth to take a
> look at this I'd be grateful. (I don't have access to one presently.)
>
> I see that the target filesystem is APFS however -- I think we might
> have a bug in our APFS support. Do you have the ability to try it on
> your Mac but on a non-APFS destination?
>
> It looks like the image is pretty small (~6MB?) so maybe you can try
> with a non-APFS formatted thumb drive?
>
> My hunch is that:
> - APFS source to FAT32 destination might work correctly.
> - FAT32 source to FAT32 destination will definitely work correctly.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1776920
>
> Title:
> qemu-img convert on Mac OSX creates corrupt images
>
> Status in QEMU:
> New
>
> Bug description:
> An image created by qemu-img create, then modified by another program
> is converted to bad/corrupt image when using convert sub command on
> Mac OSX. The same convert works on Linux. The version of qemu-img is
> 2.12.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1776920/+subscriptions

John Snow (jnsnow) wrote :

Great, thanks for the additional info!

It sounds like maybe something to do with our sparse detection on APFS is broken. Maybe lseek calls are failing? This is a bad problem because it means that any read from an APFS source could break similarly.

I'm still a bit tied down with other work at the moment, but I will try to address this prior to the 3.0 release if nobody else gets to it first.

Waldemar Kozaczuk (wkozaczuk) wrote :

I am not familiar with QEMU source code but I might have some cycles to look into it. Where would I look - https://github.com/qemu/qemu/blob/master/qemu-img.c? Or somewhere else? Any suggestions would be appreciated.

John Snow (jnsnow) wrote :

My hunch is that we're handling zero regions incorrectly and we might be skipping data regions we ought to be copying.

...I'm looking at the `image2.qemu` file you've uploaded and it's empty! we didn't copy *anything* from your source file.

Try converting with '-S 0` to disable sparse protection and see if that might help.

  '-S' indicates the consecutive number of bytes (defaults to 4k) that must
       contain only zeros for qemu-img to create a sparse image during
       conversion. If the number of bytes is 0, the source will not be scanned for
       unallocated or zero sectors, and the destination image will always be
       fully allocated

If that helps, I'd take a look at img_convert in qemu-img.c ...

convert_iteration_sectors looks relevant, it appears to help us know which sectors to copy when called by convert_co_do_copy.

Tracing around "convert_co_read" in the same function might also help us to know which portions of the image we're even deciding to read; and we could probably track backwards from there to figure out which condition(s) are disqualifying us if we're deciding not to copy data.

My current hunch is that bdrv_block_status[_above] is returning something wrong when backed by APFS.

Waldemar Kozaczuk (wkozaczuk) wrote :

Bingo! Adding '-S 0' makes convert work. But it is not perfect as the end result is fully allocated image.

So with qcow2 like this:

image: mysql-example.qemu
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 50M
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

and do this:

qemu-img convert mysql-example.qemu -S 0 -O vmdk mysql-example.vbox

I get 10G vmdk file:
10738794496 Jul 6 18:27 mysql-example.vbox

John Snow (jnsnow) wrote :

I marked https://bugs.launchpad.net/qemu/+bug/1738840 as a duplicate of this bug, even though that bug was older, this bug has a slightly more active thread.

I'm still in need of either an APFS capable machine that I can reproduce on, or another mac dev willing to help a bit, though.

Yan-Jie Wang (jaywang0tw) wrote :

I have done some experiments and find out that
the behavior of lseek with whence set to SEEK_DATA is different from the behavior of Linux's lseek.

If the supplied offset is in the middle of a data region, it returns the start of the next data region. There may be many data regions in a big file even though it has no hole.

return value of lseek with whence set to SEEK_DATA:

|--(offset)--Data----|(return value)----Data----|
|--(offset)--Data----|----Hole----|(return value)----Data----|

On 09/07/2018 01:04 PM, Yan-Jie Wang wrote:
> I have done some experiments and find out that
> the behavior of lseek with whence set to SEEK_DATA is different from the behavior of Linux's lseek.
>
> If the supplied offset is in the middle of a data region, it returns the
> start of the next data region. There may be many data regions in a big
> file even though it has no hole.
>
> return value of lseek with whence set to SEEK_DATA:
>
> |--(offset)--Data----|(return value)----Data----|
> |--(offset)--Data----|----Hole----|(return value)----Data----|
>
>
> ** Patch added: "macOS-lseek.patch"
> https://bugs.launchpad.net/qemu/+bug/1776920/+attachment/5186138/+files/macOS-lseek.patch

While a developer can chase a URL, our CI tools can't. Can you please
also send that patch directly to <email address hidden>, so that it gets
the same level of review as other patches?

But I am very reluctant to take your patch. MacOS is flat-out buggy and
in violation of the specification of lseek(SEEK_DATA), so making all
applications work around their bug is not going to scale as well as
having them fix their bug in the first place.

Here's the proposed POSIX specification for what lseek(SEEK_DATA) is
supposed to do:
http://austingroupbugs.net/view.php?id=415#c862

That text has been present for 7 years now - so anyone implementing
SEEK_DATA should really be paying attention to interoperability with
that text.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org

Sven R (mmoole) wrote :

Hi,

I recently ran into problems and after a long time trying to find out the cause landed here, I got in trouble using a CentOs Cloud image:
https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1905.qcow2.xz
which extracts to a .qcow2 image with sha256 of:
b376afdc0150601f15e53516327d9832216e01a300fecfc302066e36e2ac2b39

image: CentOS-7-x86_64-GenericCloud-1905.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 898M
cluster_size: 65536
Format specific information:
    compat: 0.10
    refcount bits: 16

I use this command on a Mac, OS X 10.13.6 (17G7024), qemu installed via brew:
qemu-img convert -f qcow2 -O vmdk -o adapter_type=lsilogic,subformat=streamOptimized,compat6=on -p CentOS-7-x86_64-GenericCloud-1905.qcow2 -S 0 result.vmdk

941359104 21 Jul 17:11 CentOS-7-x86_64-GenericCloud-1905.qcow2 - original image

Converting this gives these results:
214551040 23 Jul 20:45 conv_mac_v3_1_mit_s_0.vmdk - doesnt work, made on Mac (APFS) with -S 0
402303488 23 Jul 20:50 conv_mac_v3_1_auf_exfat.vmdk - works and is bootable, made on same Mac, on external drive, exFAT formatted.
149743104 23 Jul 21:21 conv_mac_v4_0.vmdk - doesnt work, made on Mac (APFS) without -S 0
214551040 23 Jul 21:20 conv_mac_v4_0mit_S0.vmdk - doesnt work, made on Mac (APFS) with -S 0

converting on non-Mac also works fine:
402303488 23 Jul 18:48 conv_centos7_v1-5-3.vmdk - works and is bootable, made on Centos, qemu-img version 1.5.3

So it seems that '-S 0' didn't fix it for me, or is that only in the development branch?

Best Regards

John Snow (jnsnow) wrote :

Hi, there isn't really a development branch; if '-S 0' didn't help alleviate the problem there may be other problems at hand, or the APFS implementation of SEEK_DATA is causing us even more problems in new versions.

You could try Yan-Jie Wang's patch that was posted in #20, but until it's posted to the mailing list with a Signed-off-by tag, we can't include it ourselves.

However, it would still be nice to know if it fixes your problem.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.