qemu-img convert on Mac OSX creates corrupt images

Bug #1776920 reported by Waldemar Kozaczuk
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

An image created by qemu-img create, then modified by another program is converted to bad/corrupt image when using convert sub command on Mac OSX. The same convert works on Linux. The version of qemu-img is 2.12.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Can this be done with like a 1M example file that you could copy off in all stages.

Provide the commands you use like
1. create
2. do ??
3. convert

Then for Mac and Linux you'd have M1/M2/M3 L1/L2/L3 files that can all be attached here to be evaluated for what might be broken.

IMHO In the current state there is neither enough data for good debugging, not enough steps to reproduce what exactly you faced.

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote : Re: [Bug 1776920] Re: qemu-img convert on Mac OSX creates corrupt images

I will provide all necessary info. Unfortunately the smallest image I can
provide is around 10M.

What is M1/M2/M3 L1/L2/L3 file?

Waldek

On Thu, Jun 14, 2018 at 10:46 AM,  Christian Ehrhardt  <
<email address hidden>> wrote:

> Can this be done with like a 1M example file that you could copy off in
> all stages.
>
> Provide the commands you use like
> 1. create
> 2. do ??
> 3. convert
>
> Then for Mac and Linux you'd have M1/M2/M3 L1/L2/L3 files that can all
> be attached here to be evaluated for what might be broken.
>
> IMHO In the current state there is neither enough data for good
> debugging, not enough steps to reproduce what exactly you faced.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1776920
>
> Title:
> qemu-img convert on Mac OSX creates corrupt images
>
> Status in QEMU:
> New
>
> Bug description:
> An image created by qemu-img create, then modified by another program
> is converted to bad/corrupt image when using convert sub command on
> Mac OSX. The same convert works on Linux. The version of qemu-img is
> 2.12.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1776920/+subscriptions
>

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: [Qemu-devel] [Bug 1776920] Re: qemu-img convert on Mac OSX creates corrupt images

>
> I will provide all necessary info. Unfortunately the smallest image I can
> provide is around 10M.
>
> What is M1/M2/M3 L1/L2/L3 file?
>

Just a suggestion how you could name the files
Linux-at-step-1 would be L1 and similar.

Revision history for this message
John Snow (jnsnow) wrote :

Are you converting to a destination on APFS? If yes, do you have the ability to easily reproduce this by converting to a non-APFS destination and seeing if that breaks too?

Thanks,
--js

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote :

I believe I have distilled entire process to few repeatable steps that can be fully reproduced on my Mac. The binary source files - - boot.bin and lzloader.elf - were created on my Linux VM running in VirtualBox on same Mac but I do not think it matters as the execution completely happens on Mac.

The steps executed on my mac:
1. dd if=boot.bin of=image.img > /dev/null 2>&1
2. dd if=lzloader.elf of=image.img conv=notrunc seek=128 > /dev/null 2>&1
3. qemu-img convert image.img -O qcow2 image.qemu
4. qemu-img convert image.qemu -O qcow2 image2.qemu

The end result:
ll image*
-rw-r--r-- 1 *** *** 6684672 Jun 14 17:17 image.img
-rw-r--r-- 1 *** *** 7012352 Jun 14 17:40 image.qemu
-rw-r--r-- 1 *** *** 196616 Jun 14 17:40 image2.qemu

The result of regular compare:
qemu-img compare image.qemu image2.qemu
Images are identical.

The result of strict compare:
qemu-img compare -s image.qemu image2.qemu
Strict mode: Offset 0 block status mismatch!

Images are clearly different.

The same 4 steps executed on my Linux VM behave correctly - image.qemu is TRULY identical with image2.qemu.

Qemu-img on my Mac:
qemu-img --version
qemu-img version 2.12.0
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

Details about Mac and OSX version:
Model Name: MacBook Pro
  Model Identifier: MacBookPro13,3
  Processor Name: Intel Core i7
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 4
  L2 Cache (per Core): 256 KB
  L3 Cache: 8 MB
  Memory: 16 GB

Mac filesystem:
Mount Point: /
  File System: APFS
  Writable: Yes
  Ignore Ownership: No
  BSD Name: disk1s1
  Physical Drive:
  Device Name: APPLE SSD SM0512L
  Media Name: AppleAPFSMedia
  Medium Type: SSD
  Protocol: PCI-Express
  Internal: Yes
  Partition Map Type: Unknown

I am also attaching both source files and images for examination.

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote :

Source file 1

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote :

Source file 2

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote :

Raw image created by dd in steps 1 and 2.

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote :

Original qcow2 image converted from raw image in step 3.

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote :

The corrupt qcow2 image created by converting image.qemu in step 4.

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote :

Also if I use the same image.qemu file and convert to vmdk format I get even smaller file which for sure is wrong as well:

qemu-img convert image.qemu -O vmdk image2.vbox

ll image*
-rw-r--r-- 1 *** *** 6684672 Jun 14 17:17 image.img
-rw-r--r-- 1 *** *** 7012352 Jun 14 17:40 image.qemu
-rw-r--r-- 1 *** *** 196616 Jun 14 17:40 image2.qemu
-rw-r--r-- 1 *** *** 65536 Jun 14 18:00 image2.vbox

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote :

Have I provided all necessary data and other details?

Revision history for this message
John Snow (jnsnow) wrote :

I haven't had the time to look just yet.

If there's a developer out there with a Mac has the bandwidth to take a look at this I'd be grateful. (I don't have access to one presently.)

I see that the target filesystem is APFS however -- I think we might have a bug in our APFS support. Do you have the ability to try it on your Mac but on a non-APFS destination?

It looks like the image is pretty small (~6MB?) so maybe you can try with a non-APFS formatted thumb drive?

My hunch is that:
- APFS source to FAT32 destination might work correctly.
- FAT32 source to FAT32 destination will definitely work correctly.

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote : Re: [Bug 1776920] Re: qemu-img convert on Mac OSX creates corrupt images

I have done more tests based on your suggestion and I think it is the opposite. The problem happens is the source is APFS.

Following failed:
APFS -> ExFAT
APFS -> Fat32

Following worked:
ExFAT -> APFS
FAT32 -> APFS

So for now my workaround is to use USB stick formatted with FAT32 or ExFAT, copy the source images there and then convert to somewhere or my main drive with APFS.

My regards,
Waldek

Sent from my iPhone

> On Jun 20, 2018, at 11:54, John Snow <email address hidden> wrote:
>
> I haven't had the time to look just yet.
>
> If there's a developer out there with a Mac has the bandwidth to take a
> look at this I'd be grateful. (I don't have access to one presently.)
>
> I see that the target filesystem is APFS however -- I think we might
> have a bug in our APFS support. Do you have the ability to try it on
> your Mac but on a non-APFS destination?
>
> It looks like the image is pretty small (~6MB?) so maybe you can try
> with a non-APFS formatted thumb drive?
>
> My hunch is that:
> - APFS source to FAT32 destination might work correctly.
> - FAT32 source to FAT32 destination will definitely work correctly.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1776920
>
> Title:
> qemu-img convert on Mac OSX creates corrupt images
>
> Status in QEMU:
> New
>
> Bug description:
> An image created by qemu-img create, then modified by another program
> is converted to bad/corrupt image when using convert sub command on
> Mac OSX. The same convert works on Linux. The version of qemu-img is
> 2.12.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1776920/+subscriptions

Revision history for this message
John Snow (jnsnow) wrote :

Great, thanks for the additional info!

It sounds like maybe something to do with our sparse detection on APFS is broken. Maybe lseek calls are failing? This is a bad problem because it means that any read from an APFS source could break similarly.

I'm still a bit tied down with other work at the moment, but I will try to address this prior to the 3.0 release if nobody else gets to it first.

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote :

I am not familiar with QEMU source code but I might have some cycles to look into it. Where would I look - https://github.com/qemu/qemu/blob/master/qemu-img.c? Or somewhere else? Any suggestions would be appreciated.

Revision history for this message
John Snow (jnsnow) wrote :

My hunch is that we're handling zero regions incorrectly and we might be skipping data regions we ought to be copying.

...I'm looking at the `image2.qemu` file you've uploaded and it's empty! we didn't copy *anything* from your source file.

Try converting with '-S 0` to disable sparse protection and see if that might help.

  '-S' indicates the consecutive number of bytes (defaults to 4k) that must
       contain only zeros for qemu-img to create a sparse image during
       conversion. If the number of bytes is 0, the source will not be scanned for
       unallocated or zero sectors, and the destination image will always be
       fully allocated

If that helps, I'd take a look at img_convert in qemu-img.c ...

convert_iteration_sectors looks relevant, it appears to help us know which sectors to copy when called by convert_co_do_copy.

Tracing around "convert_co_read" in the same function might also help us to know which portions of the image we're even deciding to read; and we could probably track backwards from there to figure out which condition(s) are disqualifying us if we're deciding not to copy data.

My current hunch is that bdrv_block_status[_above] is returning something wrong when backed by APFS.

Revision history for this message
Waldemar Kozaczuk (wkozaczuk) wrote :

Bingo! Adding '-S 0' makes convert work. But it is not perfect as the end result is fully allocated image.

So with qcow2 like this:

image: mysql-example.qemu
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 50M
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

and do this:

qemu-img convert mysql-example.qemu -S 0 -O vmdk mysql-example.vbox

I get 10G vmdk file:
10738794496 Jul 6 18:27 mysql-example.vbox

Revision history for this message
John Snow (jnsnow) wrote :

I marked https://bugs.launchpad.net/qemu/+bug/1738840 as a duplicate of this bug, even though that bug was older, this bug has a slightly more active thread.

I'm still in need of either an APFS capable machine that I can reproduce on, or another mac dev willing to help a bit, though.

Revision history for this message
Yan-Jie Wang (jaywang0tw) wrote :

I have done some experiments and find out that
the behavior of lseek with whence set to SEEK_DATA is different from the behavior of Linux's lseek.

If the supplied offset is in the middle of a data region, it returns the start of the next data region. There may be many data regions in a big file even though it has no hole.

return value of lseek with whence set to SEEK_DATA:

|--(offset)--Data----|(return value)----Data----|
|--(offset)--Data----|----Hole----|(return value)----Data----|

Revision history for this message
Eric Blake (eblake) wrote : Re: [Qemu-devel] [Bug 1776920] Re: qemu-img convert on Mac OSX creates corrupt images

On 09/07/2018 01:04 PM, Yan-Jie Wang wrote:
> I have done some experiments and find out that
> the behavior of lseek with whence set to SEEK_DATA is different from the behavior of Linux's lseek.
>
> If the supplied offset is in the middle of a data region, it returns the
> start of the next data region. There may be many data regions in a big
> file even though it has no hole.
>
> return value of lseek with whence set to SEEK_DATA:
>
> |--(offset)--Data----|(return value)----Data----|
> |--(offset)--Data----|----Hole----|(return value)----Data----|
>
>
> ** Patch added: "macOS-lseek.patch"
> https://bugs.launchpad.net/qemu/+bug/1776920/+attachment/5186138/+files/macOS-lseek.patch

While a developer can chase a URL, our CI tools can't. Can you please
also send that patch directly to <email address hidden>, so that it gets
the same level of review as other patches?

But I am very reluctant to take your patch. MacOS is flat-out buggy and
in violation of the specification of lseek(SEEK_DATA), so making all
applications work around their bug is not going to scale as well as
having them fix their bug in the first place.

Here's the proposed POSIX specification for what lseek(SEEK_DATA) is
supposed to do:
http://austingroupbugs.net/view.php?id=415#c862

That text has been present for 7 years now - so anyone implementing
SEEK_DATA should really be paying attention to interoperability with
that text.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org

Revision history for this message
Sven R (mmoole) wrote :

Hi,

I recently ran into problems and after a long time trying to find out the cause landed here, I got in trouble using a CentOs Cloud image:
https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1905.qcow2.xz
which extracts to a .qcow2 image with sha256 of:
b376afdc0150601f15e53516327d9832216e01a300fecfc302066e36e2ac2b39

image: CentOS-7-x86_64-GenericCloud-1905.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 898M
cluster_size: 65536
Format specific information:
    compat: 0.10
    refcount bits: 16

I use this command on a Mac, OS X 10.13.6 (17G7024), qemu installed via brew:
qemu-img convert -f qcow2 -O vmdk -o adapter_type=lsilogic,subformat=streamOptimized,compat6=on -p CentOS-7-x86_64-GenericCloud-1905.qcow2 -S 0 result.vmdk

941359104 21 Jul 17:11 CentOS-7-x86_64-GenericCloud-1905.qcow2 - original image

Converting this gives these results:
214551040 23 Jul 20:45 conv_mac_v3_1_mit_s_0.vmdk - doesnt work, made on Mac (APFS) with -S 0
402303488 23 Jul 20:50 conv_mac_v3_1_auf_exfat.vmdk - works and is bootable, made on same Mac, on external drive, exFAT formatted.
149743104 23 Jul 21:21 conv_mac_v4_0.vmdk - doesnt work, made on Mac (APFS) without -S 0
214551040 23 Jul 21:20 conv_mac_v4_0mit_S0.vmdk - doesnt work, made on Mac (APFS) with -S 0

converting on non-Mac also works fine:
402303488 23 Jul 18:48 conv_centos7_v1-5-3.vmdk - works and is bootable, made on Centos, qemu-img version 1.5.3

So it seems that '-S 0' didn't fix it for me, or is that only in the development branch?

Best Regards

Revision history for this message
John Snow (jnsnow) wrote :

Hi, there isn't really a development branch; if '-S 0' didn't help alleviate the problem there may be other problems at hand, or the APFS implementation of SEEK_DATA is causing us even more problems in new versions.

You could try Yan-Jie Wang's patch that was posted in #20, but until it's posted to the mailing list with a Signed-off-by tag, we can't include it ourselves.

However, it would still be nice to know if it fixes your problem.

Revision history for this message
Thomas Huth (th-huth) wrote :

The QEMU project is currently considering to move its bug tracking to
another system. For this we need to know which bugs are still valid
and which could be closed already. Thus we are setting older bugs to
"Incomplete" now.

If you still think this bug report here is valid, then please switch
the state back to "New" within the next 60 days, otherwise this report
will be marked as "Expired". Or please mark it as "Fix Released" if
the problem has been solved with a newer version of QEMU already.

Thank you and sorry for the inconvenience.

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
Revision history for this message
Juan Niño (ruluk) wrote (last edit ):

Hey there! I tested @wkozaczuk's suggested minimal steps and THEY WORKED FOR ME!!

The steps executed on my mac:
1. dd if=boot.bin of=image.img > /dev/null 2>&1
2. dd if=lzloader.elf of=image.img conv=notrunc seek=128 > /dev/null 2>&1
3. qemu-img convert image.img -O qcow2 image.qemu
4. qemu-img convert image.qemu -O qcow2 image2.qemu

The end result:
-rw-r--r-- 1 *** *** 6684672 Jun 22 14:19 image.img
-rw-r--r-- 1 *** *** 7012352 Jun 22 14:20 image.qemu
-rw-r--r-- 1 *** *** 7012352 Jun 22 14:20 image2.qemu
-rw-r--r-- 1 *** *** 6750208 Jun 22 14:22 image2.vbox

The result of regular compare:
qemu-img compare image.qemu image2.qemu
Images are identical.

The result of strict compare:
qemu-img compare -s image.qemu image2.qemu
Images are identical.

Qemu-img on my Mac:
qemu-img --version
qemu-img version 6.0.0
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

Hardware Overview:

  Model Name: MacBook Pro
  Model Identifier: MacBookPro16,1
  Processor Name: 8-Core Intel Core i9
  Processor Speed: 2,4 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 16 MB
  Hyper-Threading Technology: Enabled
  Memory: 64 GB
  Activation Lock Status: Enabled

Storage:

  Mount Point: /
  File System: APFS
  Writable: No
  Ignore Ownership: No
  BSD Name: disk1s1
  Physical Drive:
  Device Name: APPLE SSD AP1024N
  Media Name: AppleAPFSMedia
  Medium Type: SSD
  Protocol: PCI-Express
  Internal: Yes
  Partition Map Type: Unknown
  S.M.A.R.T. Status: Verified

System Software Overview:

  System Version: macOS 10.15.7 (19H1217)
  Kernel Version: Darwin 19.6.0
  Boot Volume: Macintosh HD
  Boot Mode: Normal
  Secure Virtual Memory: Enabled
  System Integrity Protection: Enabled

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.