> After reading some related code, I have more questions than before, but > let's see... As more qcow2 code was merged since, I would suggest that > we debug the problem on commit 69f4750 (the bisection result) rather > than on anything newer. Okay, for all of the following I did a fresh compile of qemu 69f4750 and ran all commands in this version. > First of all: Michael, you didn't specify explicitly how your images > were created, but can I assume that the test image is not preallocated > (in contrast to Laszlo's)? Actually these were converted from vmdk files using qemu-img and previously VMware Fusion VMs. To avoid any suspicion as to what that may have brought with it in breakage I just created a fresh image using this command: $ bin/qemu-bisect/bin/qemu-img create -f qcow2 qtest.qcow2 20G Formatting 'qtest.qcow2', fmt=qcow2 size=21474836480 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ bin/qemu-bisect/bin/qemu-img info qtest.qcow2 image: qtest.qcow2 file format: qcow2 virtual size: 20 GiB (21474836480 bytes) disk size: 196 KiB cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false $ ls -la qtest.qcow2 -rw-r--r-- 1 m m 196928 Oct 21 22:43 qtest.qcow2 $ du -sk qtest.qcow2 196 qtest.qcow2 So I guess that means its not preallocated. Then I installed a minimal Debian buster into it by just entering default values: $ bin/qemu-bisect/bin/qemu-system-x86_64 -machine pc-q35-3.1,accel=kvm -m 4096 -chardev stdio,id=charmonitor -mon chardev=charmonitor -drive file=qtest.qcow2,id=d -cdrom Downloads/mini.iso After that the image reported: $ bin/qemu-bisect/bin/qemu-img check qtest.qcow2 No errors were found on the image. 26443/327680 = 8.07% allocated, 17.10% fragmented, 0.00% compressed clusters Image end offset: 1734148096 Then I prepared it for the automatic reproducer by running the following command in it and saving that running state as snapshot foo using savevm: $ while true ; do dd if=/dev/zero of=t bs=1024k count=4000 ; done Then I ran the reproducer using this command: $ while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | bin/qemu-bisect/bin/qemu-system-x86_64 -machine pc-q35-3.1,accel=kvm -m 4096 -chardev stdio,id=charmonitor -mon chardev=charmonitor -drive file=qtest.qcow2,id=d -display none -S ; done It took nine iterations for the image to corrupt. After that qemu-img reads: $ bin/qemu-bisect/bin/qemu-img check qtest.qcow2 2>&1 | sed -e s,Leaked.*,Leaked, | uniq Leaked ERROR cluster 163840 refcount=3 reference=4 ERROR cluster 163841 refcount=3 reference=4 ERROR cluster 163848 refcount=1 reference=2 ERROR cluster 163850 refcount=1 reference=2 ERROR cluster 163921 refcount=1 reference=2 ERROR cluster 163957 refcount=3 reference=4 ERROR cluster 163958 refcount=3 reference=4 Leaked ERROR cluster 163962 refcount=1 reference=2 Leaked ERROR cluster 163968 refcount=1 reference=2 Leaked ERROR cluster 163974 refcount=1 reference=2 Leaked 10 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 129130 leaked clusters were found on the image. This means waste of disk space, but no harm to data. 253326/327680 = 77.31% allocated, 1.77% fragmented, 0.00% compressed clusters Image end offset: 18906611712 > Another caller I see in the code, but didn't get run in my guest, is > qcow2_co_pwrite_zeroes(). This is not discard, but maybe the discard > mount option does cause a write_zeroes call (WRITE SAME in SCSI) > sometimes? But then, your reproducer seems to use AHCI and I can't see > a write_zeroes call in the AHCI or IDE device emulation. In above test I had not knowingly configured any discard in the guest. Neither /etc/fstab nor /proc/mounts contained the discard option. The image also did not shrink when deleting files. Nor did it shrink when explicitly calling fstrim / for that matter - presumably because because unmap on discard is disabled by default. So I'd postulate that discard does at most play an aggravating role here but is not necessary for the problem to occur. > I think it might be worth a try reproducing with the > handle_alloc_space() call commented out. If that doesn't fix/hide the > bug, I commented out the call to handle_alloc_space() in block/qcow2.c:qcow2_co_pwritev() and that certainly hid the bug. The reproducer ran for quarter of an hour without any corruption. The image was fine after that: $ bin/qemu-bisect/bin/qemu-img check qtest.qcow2 No errors were found on the image. 253376/327680 = 77.32% allocated, 2.00% fragmented, 0.00% compressed clusters Image end offset: 16909860864 Commenting handle_alloc_space() back in, recompiling, reinstalling and rerunning the reproducer took a single iteration to violently corrupt the image. So I guess it's safe to say that the bug occurs in the handle_alloc_space() codepath. This quick corruption made me think that maybe the level of preallocation has something to do with it. So I filled up all disk space in the guest by writing zeroes to a file using dd. This yielded a preallocation above 80%: No errors were found on the image. 266793/327680 = 81.42% allocated, 3.92% fragmented, 0.00% compressed clusters Image end offset: 21343764480 Running the reproducer again the image took five iterations to corrupt. I'd call that inconclusive. > Michael, would you like to give it a try and figure out in which code path > qcow2_detect_metadata_preallocation() is even called in your reproducer After letting the VM run for about ten seconds with gdb attached a breakpoint on qcow2_detect_metadata_preallocation triggers and I get this backtrace: (gdb) bt #0 0x0000555555d22bfd in qcow2_detect_metadata_preallocation (bs=0x5555567c69e0) at block/qcow2-refcount.c:3449 #1 0x0000555555d124b8 in qcow2_co_block_status (bs=0x5555567c69e0, want_zero=false, offset=2148532224, count=4096, pnum=0x7ffee0ae2b28, map=0x7ffee0ae28a0, file=0x7ffee0ae28a8) at block/qcow2.c:1899 #2 0x0000555555d6124a in bdrv_co_block_status (bs=0x5555567c69e0, want_zero=false, offset=2148532224, bytes=4096, pnum=0x7ffee0ae2b28, map=0x0, file=0x0) at block/io.c:2081 #3 0x0000555555d6166d in bdrv_co_block_status_above (bs=0x5555567c69e0, base=0x0, want_zero=false, offset=2148532224, bytes=4096, pnum=0x7ffee0ae2b28, map=0x0, file=0x0) at block/io.c:2190 #4 0x0000555555d61753 in bdrv_block_status_above_co_entry (opaque=0x7ffee0ae2a10) at block/io.c:2220 #5 0x0000555555d6187e in bdrv_common_block_status_above (bs=0x5555567c69e0, base=0x0, want_zero=false, offset=2148532224, bytes=4096, pnum=0x7ffee0ae2b28, map=0x0, file=0x0) at block/io.c:2255 #6 0x0000555555d61ae9 in bdrv_is_allocated (bs=0x5555567c69e0, offset=2148532224, bytes=4096, pnum=0x7ffee0ae2b28) at block/io.c:2285 #7 0x0000555555d61b7b in bdrv_is_allocated_above (top=0x5555567c69e0, base=0x0, offset=2148532224, bytes=4096, pnum=0x7ffee0ae2b80) at block/io.c:2323 #8 0x0000555555d12d48 in is_unallocated (bs=0x5555567c69e0, offset=2148532224, bytes=4096) at block/qcow2.c:2151 #9 0x0000555555d12dbc in is_zero_cow (bs=0x5555567c69e0, m=0x555556ed0520) at block/qcow2.c:2162 #10 0x0000555555d12e9c in handle_alloc_space (bs=0x5555567c69e0, l2meta=0x555556ed0520) at block/qcow2.c:2188 #11 0x0000555555d13310 in qcow2_co_pwritev (bs=0x5555567c69e0, offset=2148536320, bytes=61440, qiov=0x7fffe83507a0, flags=0) at block/qcow2.c:2301 #12 0x0000555555d5e6c4 in bdrv_driver_pwritev (bs=0x5555567c69e0, offset=2148536320, bytes=61440, qiov=0x7fffe83507a0, flags=0) at block/io.c:1043 #13 0x0000555555d6013a in bdrv_aligned_pwritev (child=0x55555675cf80, req=0x7ffee0ae2e50, offset=2148536320, bytes=61440, align=1, qiov=0x7fffe83507a0, flags=0) at block/io.c:1670 #14 0x0000555555d60d66 in bdrv_co_pwritev (child=0x55555675cf80, offset=2148536320, bytes=61440, qiov=0x7fffe83507a0, flags=0) at block/io.c:1897 #15 0x0000555555d47cb6 in blk_co_pwritev (blk=0x5555567c6730, offset=2148536320, bytes=61440, qiov=0x7fffe83507a0, flags=0) at block/block-backend.c:1183 #16 0x0000555555d48499 in blk_aio_write_entry (opaque=0x7fffe8350820) at block/block-backend.c:1382 #17 0x0000555555e3ff80 in coroutine_trampoline (i0=-402575600, i1=32767) at util/coroutine-ucontext.c:116 #18 0x00007ffff5fc61a0 in () at /lib64/libc.so.6 #19 0x00007ffff17c5920 in () #20 0x0000000000000000 in () A savevm command does not trigger the breakpoint. Hope this helps in narrowing down the culprit. -- Michael