QEMU hits assertion when virtual disk is stored on NFS server and is not 4 kib byte aligned

Bug #1921665 reported by Matthew Ruffell on 2021-03-29
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Undecided
Unassigned
Focal
Medium
Matthew Ruffell
Groovy
Medium
Matthew Ruffell

Bug Description

[Impact]

QEMU can hit an assertion and crash when attempting to write to a virtual disk image when the following conditions are met:

1. disk type is "raw"
2. disk cache type set to "none"
3. disk is shared over NFS
4. disk size is not a multiple of 4 kiB

In this case, QEMU assumes that the image needs to be aligned to 4kib, and that writing to the disk which is not a multiple of 4kib will lead to writing past the end of the disk image, and will hit the following assert:

qemu-system-x86_64: /build/qemu-AB62EU/qemu-4.2/block/io.c:1885: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
Aborted (core dumped)

This is particularly prevalent if you have a disk of size 10000000000 on the NFS server. You can work-around the problem by making the disk image a multiple of 4kib, in order to not hit the assert.

[Test case]

This bug is straightforward to reproduce on Focal and Groovy.

Start with a fresh install of Ubuntu Server, install KVM stack:

$ sudo apt-get install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils
$ sudo reboot

Next, install and configure a NFS server:

$ sudo -s
$ sudo apt install nfs-kernel-server
$ mkdir -p /export
$ chown libvirt-qemu:kvm /export
$ echo "/export 127.0.0.1(rw,sync,no_subtree_check)" >> /etc/exports
$ systemctl restart nfs-server

Create a disk image:

$ truncate -s 10000000000 /export/reproducer-centos.img
$ chown libvirt-qemu:kvm /export/reproducer-centos.img
$ chmod 666 /export/reproducer-centos.img

Mount the NFS server to /mnt:

$ mount 127.0.0.1:/export /mnt -o bg,noacl,noatime,nolock,proto=udp,vers=3,noauto

(for Groovy / 5.8 kernel, drop the proto=udp option)

Download the CentOS image:

$ wget https://vault.centos.org/7.2.1511/isos/x86_64/CentOS-7-x86_64-Minimal-1511.iso

Start the VM:

$ qemu-system-x86_64 -cdrom ./CentOS-7-x86_64-Minimal-1511.iso -m 1024 -blockdev '{"driver":"file","filename":"/mnt/reproducer-centos.img","node-name":"disk0","cache":{"direct":true}}' -device virtio-blk-pci,drive=disk0 -vnc 0.0.0.0:0 -enable-kvm

Connect to the VM with Reminna, configured for VNC connection to the server on <ip address>:5900

Click Continue at the language/keyboard selection screen.
Click Installation Destination.
Click Done (no changes needed on that screen).
Click Begin Installation.

It will crash after displaying "Setting up the installation environment" (which is the second thing printed) or the message about creating the disk label. If it gets any farther than that (i.e. starts creating filesystems), it's going to work and you can stop the test.

This is the error I see:

qemu-system-x86_64: /build/qemu-AB62EU/qemu-4.2/block/io.c:1885: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
Aborted (core dumped)

Test packages are available for Focal and Groovy in the following PPA:

https://launchpad.net/~mruffell/+archive/ubuntu/sf298252-test

When you use the QEMU packages from this PPA, the issue is fixed and the CentOS installation works properly.

[Where problems could occur]

There are two places where problems could occur.

The first, is around the handling of byte aligned disk images, in the block subsystem of QEMU.A new check is added to see if we have the write permission, and not the resize permission, and if that is the case, then we throw an error. This error is more graceful than hitting an assert, but it introduces error handling and other risks for regression.

The second, is that NFS is explicitly cleared for byte aligned writes, and this is enforced by testing for the NFS magic bytes on the start of the filesystem the disk image is being loaded from. By checking magic bytes, we ensure no other filesystem type could get mixed up and allow byte aligned writes when they would not be supported, which reduces the risk of regression.

If a regression were to occur, it would likely only affect users with non 4kib aligned disk images, and a workaround would be to resize the virtual disk image to 4kib alignment, or create new VMs with disk images as a multiple of 4kib.

[Other]

The commits which fix the problem landed in QEMU 5.1, and are:

commit 9c60a5d1978e6dcf85c0e01b50e6f7f54ca09104
From: Kevin Wolf <email address hidden>
Date: Thu, 16 Jul 2020 16:26:00 +0200
Subject: block: Require aligned image size to avoid assertion failure
Link: https://git.qemu.org/?p=qemu.git;a=commit;h=9c60a5d1978e6dcf85c0e01b50e6f7f54ca09104

commit 5edc85571e7b7269dce408735eba7507f18ac666
From: Kevin Wolf <email address hidden>
Date: Thu, 16 Jul 2020 16:26:01 +0200
Subject: file-posix: Allow byte-aligned O_DIRECT with NFS
Link: https://git.qemu.org/?p=qemu.git;a=commit;h=5edc85571e7b7269dce408735eba7507f18ac666

Mailing list discussion:
https://<email address hidden>/msg721982.html

Related branches

Changed in qemu (Ubuntu):
status: New → Fix Released
Changed in qemu (Ubuntu Focal):
status: New → In Progress
Changed in qemu (Ubuntu Groovy):
status: New → In Progress
Changed in qemu (Ubuntu Focal):
importance: Undecided → Medium
Changed in qemu (Ubuntu Groovy):
importance: Undecided → Medium
Changed in qemu (Ubuntu Focal):
assignee: nobody → Matthew Ruffell (mruffell)
Changed in qemu (Ubuntu Groovy):
assignee: nobody → Matthew Ruffell (mruffell)
tags: added: focal groovy sts
Matthew Ruffell (mruffell) wrote :

Attached is a debdiff for Focal which fixes this bug.

Matthew Ruffell (mruffell) wrote :

Attached is a debdiff for Groovy which fixes this bug.

Hi,
this overall LGTM (some minimal file name issues and changelog line length things).
I've fixed those when applying and will make it part of a set of SRUs that is incoming.

Thanks @Matthew!

I verified the test instructions, working as outlined:

root@node-horsea:/home/ubuntu# qemu-system-x86_64 -cdrom ./CentOS-7-x86_64-Minimal-1511.iso -m 1024 -blockdev '{"driver":"file","filename":"/mnt/reproducer-centos.img","node-name":"disk0","cache":{"direct":true}}' -device virtio-blk-pci,drive=disk0 -vnc 0.0.0.0:0 -enable-kvm
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64: /build/qemu-RSjBPs/qemu-5.0/block/io.c:1887: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.

FYI - fixes (combined with other inbound SRUs) build in PPA and verified vs the testcase.
Fix works as expected.

Matthew Ruffell (mruffell) wrote :

Thanks for the quick work Christian!

I'll keep an eye out and be ready to test once builds enter -proposed.

FYI - uploaded to the -unapproved queue yesterday. Now on the SRU team to evaluate.

Hello Matthew, or anyone else affected,

Accepted qemu into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:5.0-5ubuntu9.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Groovy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-groovy
Changed in qemu (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed-focal
Robie Basak (racb) wrote :

Hello Matthew, or anyone else affected,

Accepted qemu into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:4.2-3ubuntu6.15 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

@mruffel - will you verify these?

All autopkgtests for the newly accepted qemu (1:4.2-3ubuntu6.15) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

casper/1.445.1 (amd64, ppc64el)
systemd/245.4-4ubuntu3.6 (amd64)
ubuntu-image/1.11+20.04ubuntu1 (armhf, amd64, s390x)
livecd-rootfs/2.664.19 (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

All autopkgtests for the newly accepted qemu (1:5.0-5ubuntu9.7) for groovy have finished running.
The following regressions have been reported in tests triggered by the package:

systemd/246.6-1ubuntu1.3 (ppc64el)
cloud-utils/0.31-29-ge0792e3d-0ubuntu1 (s390x)
open-iscsi/2.1.1-1ubuntu2 (amd64)
ubuntu-image/1.11+20.10ubuntu1 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/groovy/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Matthew Ruffell (mruffell) wrote :

Performing verification for Groovy.

I installed qemu 5.0-5ubuntu9.6 from -updates with the usual KVM stack command:

$ sudo apt-get install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils
$ sudo reboot

From there I installed and configured a NFS server:

$ sudo -s
$ sudo apt install nfs-kernel-server
$ mkdir -p /export
$ chown libvirt-qemu:kvm /export
$ echo "/export 127.0.0.1(rw,sync,no_subtree_check)" >> /etc/exports
$ systemctl restart nfs-server

Created a disk image:

$ truncate -s 10000000000 /export/reproducer-centos.img
$ chown libvirt-qemu:kvm /export/reproducer-centos.img
$ chmod 666 /export/reproducer-centos.img

and mounted the NFS server:

$ mount 127.0.0.1:/export /mnt -o bg,noacl,noatime,nolock,vers=3,noauto

I downloaded the CentOS image:

$ wget https://vault.centos.org/7.2.1511/isos/x86_64/CentOS-7-x86_64-Minimal-1511.iso

and started the VM with the following QEMU command line:

$ sudo qemu-system-x86_64 -cdrom ./CentOS-7-x86_64-Minimal-1511.iso -m 1024 -blockdev '{"driver":"file","filename":"/mnt/reproducer-centos.img","node-name":"disk0","cache":{"direct":true}}' -device virtio-blk-pci,drive=disk0 -vnc 0.0.0.0:0 -enable-kvm

On my host I opened Reminna, connected to a VNC connection on <ip address>:5900.

The CentOS installer eventually showed up, and I followed the steps:

Click Continue at the language/keyboard selection screen.
Click Installation Destination.
Click Done (no changes needed on that screen).
Click Begin Installation.

A few seconds later QEMU crashed with:

qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64: /build/qemu-RSjBPs/qemu-5.0/block/io.c:1887: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
Aborted

I then enabled -proposed, and installed qemu 5.0-5ubuntu9.7

$ sudo apt install qemu-kvm qemu-system-data qemu-system-gui qemu-system-x86

I then re-started the qemu VM with:

$ sudo qemu-system-x86_64 -cdrom ./CentOS-7-x86_64-Minimal-1511.iso -m 1024 -blockdev '{"driver":"file","filename":"/mnt/reproducer-centos.img","node-name":"disk0","cache":{"direct":true}}' -device virtio-blk-pci,drive=disk0 -vnc 0.0.0.0:0 -enable-kvm

This time the VM managed to format the disk, and successfully install CentOS.

The packages in -proposed fix the issue, and I am happy to mark the bug as verified.

tags: added: verification-done-groovy
removed: verification-needed-groovy
Matthew Ruffell (mruffell) wrote :

Performing verification for Focal.

I installed qemu 4.2-3ubuntu6.14 from -updates with the usual KVM stack command:

$ sudo apt-get install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils
$ sudo reboot

From there I installed and configured a NFS server:

$ sudo -s
$ sudo apt install nfs-kernel-server
$ mkdir -p /export
$ chown libvirt-qemu:kvm /export
$ echo "/export 127.0.0.1(rw,sync,no_subtree_check)" >> /etc/exports
$ systemctl restart nfs-server

Created a disk image:

$ truncate -s 10000000000 /export/reproducer-centos.img
$ chown libvirt-qemu:kvm /export/reproducer-centos.img
$ chmod 666 /export/reproducer-centos.img

and mounted the NFS server:

$ mount 127.0.0.1:/export /mnt -o bg,noacl,noatime,nolock,proto=udp,vers=3,noauto

I downloaded the CentOS image:

$ wget https://vault.centos.org/7.2.1511/isos/x86_64/CentOS-7-x86_64-Minimal-1511.iso

and started the VM with the following QEMU command line:

$ sudo qemu-system-x86_64 -cdrom ./CentOS-7-x86_64-Minimal-1511.iso -m 1024 -blockdev '{"driver":"file","filename":"/mnt/reproducer-centos.img","node-name":"disk0","cache":{"direct":true}}' -device virtio-blk-pci,drive=disk0 -vnc 0.0.0.0:0 -enable-kvm

On my host I opened Reminna, connected to a VNC connection on <ip address>:5900.

The CentOS installer eventually showed up, and I followed the steps:

Click Continue at the language/keyboard selection screen.
Click Installation Destination.
Click Done (no changes needed on that screen).
Click Begin Installation.

A few seconds later QEMU crashed with:

qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64: /build/qemu-AB62EU/qemu-4.2/block/io.c:1885: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
Aborted

I then enabled -proposed, and installed qemu 4.2-3ubuntu6.15

$ sudo apt install qemu-kvm qemu-system-data qemu-system-gui qemu-system-x86

I then re-started the qemu VM with:

$ sudo qemu-system-x86_64 -cdrom ./CentOS-7-x86_64-Minimal-1511.iso -m 1024 -blockdev '{"driver":"file","filename":"/mnt/reproducer-centos.img","node-name":"disk0","cache":{"direct":true}}' -device virtio-blk-pci,drive=disk0 -vnc 0.0.0.0:0 -enable-kvm

This time the VM managed to format the disk, and successfully install CentOS.

The packages in -proposed fix the issue, and I am happy to mark the bug as verified.

tags: added: verification-done-focal
removed: verification-needed verification-needed-focal

Thank you mruffel for the verifications!

FYI - autopkgtest issues resolved as well now (as assumed it was due to flaky tests)

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:5.0-5ubuntu9.7

---------------
qemu (1:5.0-5ubuntu9.7) groovy; urgency=medium

  * d/p/u/lp-1921468-*: fix issues handling boot menu index on s390x
    (LP: #1921468)
  * d/p/u/lp-1887535-configure-replace-enable-disable-git-update-with-wit.patch,
    d/rules: Backport --with-git-submodules param so building from git repo
    doesn't fail (LP: #1887535)
  * Fix byte aligned writes when writing to image stored on NFS
    server, as they aren't required to be 4kib aligned. (LP: #1921665)
    - d/p/u/lp-1921665-1-block-Require-aligned-image-size-to-avoid-assert.patch
    - d/p/u/lp-1921665-2-file-posix-Allow-byte-aligned-O_DIRECT-with-NFS.patch

 -- Christian Ehrhardt <email address hidden> Fri, 26 Mar 2021 10:36:31 +0100

Changed in qemu (Ubuntu Groovy):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:4.2-3ubuntu6.15

---------------
qemu (1:4.2-3ubuntu6.15) focal; urgency=medium

  * d/p/u/lp-1921468-*: fix issues handling boot menu index on s390x
    (LP: #1921468)
  * d/p/u/lp-1887535-configure-replace-enable-disable-git-update-with-wit.patch,
    d/rules: Backport --with-git-submodules param so building from git repo
    doesn't fail (LP: #1887535)
  * Fix byte aligned writes when writing to image stored on NFS
    server, as they aren't required to be 4kib aligned. (LP: #1921665)
    - d/p/u/lp-1921665-1-block-Require-aligned-image-size-to-avoid-assert.patch
    - d/p/u/lp-1921665-2-file-posix-Allow-byte-aligned-O_DIRECT-with-NFS.patch

 -- Christian Ehrhardt <email address hidden> Fri, 26 Mar 2021 10:38:47 +0100

Changed in qemu (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers