Comment 17 for bug 1846427

Revision history for this message
Kevin Wolf (kwolf-redhat) wrote :

> To avoid any suspicion as to what that may have brought with it in
> breakage I just created a fresh image using this command: [...]

I tried to reproduce the problem locally, on the same commit, with the steps you described, but I wasn't lucky. I tried keeping the image on my home directory (XFS), on tmpfs, and finally on a newly created ext4 filesystem on a spare LVM volume, but the image just wouldn't break even after letting the loop run for a quite a while.

> So I'd postulate that discard does at most play an aggravating role here
> but is not necessary for the problem to occur.

That makes sense to me because you have internal snapshots. Both discard and snapshots mean that the next write to the block will trigger a cluster allocation (and with it a handle_alloc_space() call) again.

> So I guess it's safe to say that the bug occurs in the
> handle_alloc_space() codepath.

This is an important finding.

It's a bit odd because the only related thing handle_alloc_space() calls is bdrv_is_allocated_above(), which only cares about BDRV_BLOCK_ALLOCATED. I don't think the commit in question should make any difference as to whether this flag is set or cleared. The only possible difference should be BDRV_BLOCK_ZERO, and we don't even check that flag.

So as the next step I would like to test my theory that the problem isn't bdrv_co_block_status() returning a different value after the commit, but that qcow2_detect_metadata_preallocation() even runs. I think the easiest way to do this would be modifying handle_alloc_space() so that it performs the checks, but skips its optimisation regardless of the is_zero_cow() return value:

        if (!is_zero_cow(bs, m) || true) {
            continue;
        }

Unfortunately, as long as I can't reproduce the problem, I'll have to rely on you to actually run the tests I come up with after each step. If you'd prefer some more real-time interaction, feel free to ping me on IRC (kwolf on OFTC or Freenode).