QEMU

Bug #1847793
Comment #9

Comment 9 for bug 1847793

Revision history for this message

Max Reitz (xanclic) wrote on 2019-10-24:

I suppose that the problem described in bug 1846427 can also affect guest data, so I think it makes sense to divide based on whether there are only data corruptions or both data and metadata corruptions.

So far, I don’t know of a report of pure guest data corruptions (without qcow2 metadata being affected) that didn’t happen on XFS, so I assume there is an issue that affects both data and metadata on all filesystems (described by 1846427; Kevin has sent a patch series upstream ot address it), and another one that only affects guest data and only occurs on XFS (this one).

Actually, there are two problems we know of on XFS:

The first one was a bug in qemu that has been fixed upstream by b2c6f23f4a9f6d8f1b648705cd46d3713b78d6a2. People that don’t use master but the 4.1 release instead are likely to hit that problem instead of the other one.

The second one seems to be a kernel bug. When fallocating (writing zeroes in our case) and writing to a file in parallel, the write is discarded if:
- The fallocated area begins at or after the EOF,
- The written area begins after the fallocated area,
- The write is submitted through the AIO interface (io_submit()),
- The write and the fallocate operation are submitted before either one finishes (i.e. concurrently),
- The fallocate operation finishes after the write.

In qemu, this happens only with aio=native, and then most of the time when an FALLOC_FL_ZERO_RANGE happens after the EOF while a write after that range is ongoing.

Claus as the reporter didn’t use aio=native, so if he’s indeed on XFS, he can’t have hit this second bug. If he’s on XFS, he will most likely have hit the first one that’s already fixed in master.

Still, we need to fix the second bug. As for how… It looks to me like a kernel bug, so in qemu we can’t do anything to fix it. But we should probably work around it. Kevin has proposed making zero-writes on XFS serializing until infinity, basically (i.e. UINT64_MAX in practice). That gives us some layering problems (either the file-posix block driver needs access to the TrackedRequest to extend its length, or the generic block layer needs to know whether a file-posix node is on XFS), and it yields the question of how to detect whether the bug has been fixed in the kernel.

Max

Actually, there are two problems we know of on XFS:

The first one was a bug in qemu that has been fixed upstream by b2c6f23f4a9f6d8f1b648705cd46d3713b78d6a2.  People that don’t use master but the 4.1 release instead are likely to hit that problem instead of the other one.

The second one seems to be a kernel bug.  When fallocating (writing zeroes in our case) and writing to a file in parallel, the write is discarded if:
- The fallocated area begins at or after the EOF,
- The written area begins after the fallocated area,
- The write is submitted through the AIO interface (io_submit()),
- The write and the fallocate operation are submitted before either one finishes (i.e. concurrently),
- The fallocate operation finishes after the write.

In qemu, this happens only with aio=native, and then most of the time when an FALLOC_FL_ZERO_RANGE happens after the EOF while a write after that range is ongoing.

Claus as the reporter didn’t use aio=native, so if he’s indeed on XFS, he can’t have hit this second bug.  If he’s on XFS, he will most likely have hit the first one that’s already fixed in master.

Still, we need to fix the second bug.  As for how…  It looks to me like a kernel bug, so in qemu we can’t do anything to fix it.  But we should probably work around it.  Kevin has proposed making zero-writes on XFS serializing until infinity, basically (i.e. UINT64_MAX in practice).  That gives us some layering problems (either the file-posix block driver needs access to the TrackedRequest to extend its length, or the generic block layer needs to know whether a file-posix node is on XFS), and it yields the question of how to detect whether the bug has been fixed in the kernel.

Max