qemu 2.10 locks images with no feature flag

Bug #1716028 reported by Ryan Harper
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Opinion
Undecided
Unassigned
libvirt (Ubuntu)
Fix Released
Medium
Unassigned
Artful
Won't Fix
Low
Unassigned
qemu (Ubuntu)
Opinion
Undecided
Unassigned
Artful
Won't Fix
Undecided
Unassigned

Bug Description

1) % lsb_release -rd
Description: Ubuntu Artful Aardvark (development branch)
Release: 17.10

2) % apt-cache policy qemu-system-x86
qemu-system-x86:
  Installed: 1:2.10~rc3+dfsg-0ubuntu1
  Candidate: 1:2.10+dfsg-0ubuntu1
  Version table:
     1:2.10+dfsg-0ubuntu1 500
        500 http://archive.ubuntu.com//ubuntu devel/main amd64 Packages
 *** 1:2.10~rc3+dfsg-0ubuntu1 100
        100 /var/lib/dpkg/status

3) qemu locks image files with no way to discover this feature nor how to disable it

4) qemu provides a way to query if it supports image locking, and what the default value is, and how to disable the locking via cli

qemu 2.10 now will lock image files and warn if an image is currently locked. This prevent qemu from running (and possibly corrupting said image).

However, qemu does not provide any way to determine if a qemu binary actually has this capability. Normally behavior changing features are exposed via some change to the qemu help menu or QMP/QAPI output of capabilities.

I believe this slipped through since libvirt already does image locking, but direct cli users will be caught by this change.

In particular, we have a use-case where we simulate multipath disks by creating to disks which point to the same file which now breaks without adding the 'file.locking=off' to the -drive parameters; which is also completely undocumented and unexposed.

Some parts of the cli like -device allow querying of settable options (qemu-system-x86 -device scsi_hd,?) but nothing equivalent exists for -drive parameters.

ProblemType: Bug
DistroRelease: Ubuntu 17.10
Package: qemu-system-x86 1:2.10~rc3+dfsg-0ubuntu1
ProcVersionSignature: Ubuntu 4.12.0-11.12-generic 4.12.5
Uname: Linux 4.12.0-11-generic x86_64
NonfreeKernelModules: zfs zunicode zavl zcommon znvpair
ApportVersion: 2.20.6-0ubuntu7
Architecture: amd64
Date: Fri Sep 8 12:56:53 2017
JournalErrors:
 Hint: You are currently not seeing messages from other users and the system.
       Users in groups 'adm', 'systemd-journal' can see all messages.
       Pass -q to turn off this notice.
 -- Logs begin at Mon 2017-01-30 11:56:02 CST, end at Fri 2017-09-08 12:56:46 CDT. --
 -- No entries --
KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND
MachineType: HP ProLiant DL360 Gen9
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.12.0-11-generic root=UUID=45354276-e0c0-4bf6-9083-f130b89411cc ro --- console=ttyS1,115200
SourcePackage: qemu
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/05/2015
dmi.bios.vendor: HP
dmi.bios.version: P89
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrP89:bd03/05/2015:svnHP:pnProLiantDL360Gen9:pvr:cvnHP:ct23:cvr:
dmi.product.family: ProLiant
dmi.product.name: ProLiant DL360 Gen9
dmi.sys.vendor: HP

Related branches

Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Ryan for filing this bug, as I said in IRC all the file locking is rather new and I think this is for upstream qemu to address.
Might just be a missed use case for them as well.

I'll be adding an upstream qemu task to get their attention.
@Rayn - It would be nice if - from the multipath tests - you could copy in here a commandline that worked before but fails now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Added a qemu task, this seems to be a use case affected by the file locking.
In particular a (formerly working) case for multipath tests that use the same image files multiple times intentionally.

So far the workaround is to set file.locking=off for all these, which might be just the right thing to do.
But then as outlined by the reporter there currently is no way to query if file locking is supported, nor if it is the current default - so at least adding that might be required to be added in qemu.

Revision history for this message
Kevin Wolf (kwolf-redhat) wrote :

The correct way to query whether file locking is supported is 'query-qmp-schema', which will expose the 'locking' option for the 'file' branch of the 'blockdev-add' command then.

As a first comment, in your setup, completely disabling file locking is probably a too big hammer. It is preferable to just allow multiple writers on the same image by setting share-rw=on for the block device (e.g. '-device virtio-blk-pci,drive=foo,share-rw=on'). This will allow all guest devices to share the same image, but will still protect the image against other users like an incorrect qemu-img invocation while the VM is running.

However, note that opening the same image file twice is not the best approach to the problem anyway. This happens to work with raw images (except for the locking), but it would cause corruption for any other image format.

The better solution (though it may require more changes to your management application) is to open the image once and assign a node-name to it, and then let two guest devices point to the same node. Like above, this requires that share-rw=on be set for guest devices, but it will also work with non-raw image formats because the image file is now opened only once and the sharing is done internally in qemu.

An example command line fragment might look like this:

  -blockdev node-name=disk,driver=file,filename=hd.img
  -device virtio-blk,drive=disk,share-rw=on
  -device virtio-blk,drive=disk,share-rw=on

Technically, you can also keep using -drive instead of -blockdev, but it will result in a mixed state of modern node-name based block device management and the traditional drive based configuration, which might be confusing at times.

Revision history for this message
Ryan Harper (raharper) wrote :

Kevin,

Thanks for the information. A couple of points for feedback:

1) there doesn't appear to be a way to run qmp query-schema without spawning a qemu instance in -qmp mode and having a second client issue the query-schema; certainly a qemu-system-$arch -qmp-schema would be quite useful when examining feature availability. While I know the QMP/QAPI introspection is where most of the work has gone to describing how to interact with qemu it's quite cumbersome at best:

searching for blockdev-add, find arg-type: 48, read arg-type-48, find list of 'variants', know that locking feature is part of 'file', find type: 207, see member 'locking' in list[A], which is of type 296, find type: 296, which is an enum of 'on', 'off', 'auto'

A. Interesting enough, qapi says the default is None, however qemu certainly locks files by default which would seem to imply a mismatch between qapi defaults and qemu behavior.

That's pretty heavy; Maybe that warning message qemu prints could provide some hints as to what a user could do (or refer to a manpage on locking?).

2) share-rw appears to be a blockdev parameter (I see it available via most block devices via qemu-system-$arch -device {scsi-hd,ide-hd,virtio-blk}? However there is no equivalent -blockdev for dumping the default options that a -blockdev parameter takes. The qmp-schema also does not include any information about 'share-rw' w.r.t what values are available that I could find after dumping the schema.

Thanks smoser:

#!/bin/sh
qemu-system-x86_64 -S -nodefaults -nographic \
   -serial none -monitor none -qmp stdio <<EOF |
{ "execute": "qmp_capabilities" }
{ "execute": "query-qmp-schema" }
{ "execute": "quit" }
EOF
   python3 -c '
import json, sys
data = json.loads(sys.stdin.read().splitlines()[2])
print(json.dumps(data, indent=1, sort_keys=True,
                 separators=(",", ": ")))'

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

In our multipath case, the initial open succeeds (it's not locked by anything else) and the second lock attempt fails, however, IIUC the fcntl structure[1] includes the locking pid, which should be our invocation of QEMU;

Can we not check if the locking pid matches the current pid and not fail? This appears to be the original goal of protecting locked images from being manipulated by external processes.

Revision history for this message
Daniel Berrange (berrange) wrote :

Having a single QEMU process open the same qcow2 file twice is just as dangerous as having two separate QEMU processes open the same qcow2 file concurrently. In both cases you have qcow2 state cached in a BlockDriverState and nothing is able to invalidate the cache in the other BlockDriverState. So short-circuiting locking if the current pid matches would be wrong.

Revision history for this message
Scott Moser (smoser) wrote :

Kevin,
Thanks for the suggestion of share-rw=on. I figured I'd try to change our 'xkvm' wrapper around qemu to use that.

Unfortunately, it looks like , at least in our version of qemu (QEMU emulator
version 2.10.0(Debian 1:2.10+dfsg-0ubuntu1)), that this does not work
with the -drive path.

$ qemu-img create -f qcow2 disk1.img 1G
$ qemu-system-x86_64 \
   -device virtio-net-pci,netdev=net00 \
   -netdev type=user,id=net00 \
   -drive id=drive01,file=disk1.img,format=qcow2,share-rw=on \
   -device drive=drive01,serial=sn-drive01,driver=virtio-blk,index=1 \
   -device drive=drive01,serial=sn-drive01,driver=virtio-blk,index=2
qemu-system-x86_64: -drive id=drive01,file=disk1.img,format=qcow2,share-rw=on: Block format 'qcow2' does not support the option 'share-rw'

I had thought you were suggesting the above, right?

Revision history for this message
Kevin Wolf (kwolf-redhat) wrote :

The share-rw=on option belongs to -device, not to -drive/-blockdev.

Revision history for this message
Scott Moser (smoser) wrote :

Your example does work (using -blockdev), but I can't get it to work with
-drive.

$ qemu-system-x86_64 \
   -drive id=d01,file=disk1.img,format=qcow2 \
   -device drive=d01,serial=s01,driver=virtio-blk,index=1,share-rw=on \
   -device drive=d01,serial=s01,driver=virtio-blk,index=2,share-rw=on
warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
qemu-system-x86_64: -device drive=d01,serial=s01,driver=virtio-blk,index=1,share-rw=on: Drive 'd01' is already in use because it has been automatically connected to another device (did you need 'if=none' in the drive options?)

## ok, fix that error, add 'if=none' to the -drive.

$ qemu-system-x86_64 \
  -drive id=d01,file=disk1.img,format=qcow2,if=none \
  -device virtio-blk,drive=d01,serial=s01,index=1,share-rw=on \
  -device virtio-blk,drive=d01,serial=s01,index=2,share-rw=on
qemu-system-x86_64: -device drive=d01,serial=s01,driver=virtio-blk,index=1,share-rw=on: Property '.index' not found

## ok, index belongs on the -drive (which I should have known from
## the past, but which seems not the right place). Try that anyway.

$ qemu-system-x86_64 \
  -drive id=d01,file=disk1.img,format=qcow2,if=none,index=1 \
  -device virtio-blk,drive=d01,serial=s01,share-rw=on \
  -device virtio-blk,drive=d01,serial=s01,share-rw=on
qemu-system-x86_64: -device drive=d01,serial=s01,driver=virtio-blk,share-rw=on: Drive 'd01' is already in use by another device

## Huh? Isn't that what I said to explicitly allow with share-rw=on?

Note that I've also tried with 'format=raw'. Is there something I'm
missing to try to use -drive and -device ?

Lastly (if you're still reading), how do you specify the format of
the file to -blockdev ? adding 'format=qcow2' makes qemu complain that
"'format' is unexpected".

Thanks for your time.

Revision history for this message
Scott Moser (smoser) wrote :

I see the answer to my question above. 'format' is now driver=qcow2.

Revision history for this message
Kevin Wolf (kwolf-redhat) wrote :

The important difference between your -drive command line and my -blockdev example is that I used the node-name to reference the image. You can specify a node-name with -drive, too (having both id and node-name is one of the main things that I meant what I said mixing both styles can be confusing).

I also don't think that index=1 does anything useful when used with if=none, so you can leave that out.

Putting everything together, we get this:

$ qemu-system-x86_64 \
  -drive node-name=d01,file=disk1.img,format=qcow2,if=none \
  -device virtio-blk,drive=d01,serial=s01,share-rw=on \
  -device virtio-blk,drive=d01,serial=s01,share-rw=on

Revision history for this message
Scott Moser (smoser) wrote :

Kevin,
thanks again. You've provided enough support for me at this point. I had looked at trying to coalesce multiple -drive values into a single one, and that can definitely be made to work with the newer qemu, but i'm not sure I can make it work with older.

the goal there would be to do something like:
$ qemu-system-x86_64 \
   -drive id=d01,file.filename=disk1.img,format=qcow2,if=none -\
   -device virtio-blk,drive=d01,serial=s01 \
   -device virtio-blk,drive=d01,serial=s02

on newer qemu, that works if i change 'id=' to 'node-name', but on older qemu I can't convince it to let me have 1 drive associated to multiple -device.

What we ended up doing is at
  https://code.launchpad.net/~smoser/curtin/trunk.lp1716028-hack-file-locking-in-qemu/+merge/330456

Revision history for this message
Kevin Wolf (kwolf-redhat) wrote :

Note that 'share-rw' was introduced earlier (commit dabd18f6, qemu 2.9) than 'locking' (commit 16b48d5d, qemu 2.10), so if qemu 2.9 is relevant for you, your hacky check doesn't work.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We had a discussion today on how to workaround if you drive qmeu via libvirt.
While discussions often and up at the correctness of such setups they exists, I think it is worth to document until libvirt supports that officially.

TL:DR:
- no native libvirt feature yet
- discussions if <shareable/> should set it
- workaround available via cmdline

Details on the workaround:
- To use the workaround you have to check your log usually in /var/log/libvirt/qemu/<guestname>.log
- Get the id of the device that matters to you
- then use [1] to tweak qemu cmdline
- example is from [2] to [3]

[1]: http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
[2]: https://gist.github.com/anonymous/a2ce3cbf7878995537212f0dafd06d99
[3]: https://gist.github.com/anonymous/07a357af9e34172b60c83d410fe63fdd

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note: Depending on the case you might also get lucky with "shareable" and/or "readonly" disk attributes.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Umm, the latter is the official way to go, but only in latter libvirt versions.
https://libvirt.org/git/?p=libvirt.git;a=commit;h=28907b0043fbf71085a798372ab9c816ba043b93

I'm adding a libvirt bug task for the bionic merge to pick that up "explicitly"

Changed in qemu (Ubuntu):
status: New → Won't Fix
Changed in libvirt (Ubuntu):
status: New → Triaged
tags: added: libvirt-18.04
Changed in qemu:
status: New → Opinion
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Also on the qemu side this is "works as intended" and workarounds are documented in this bug.
So we should set that bug status as well - looking back given it was mostly a discussion I guess "opinion" is the best close status in this case.

Changed in qemu (Ubuntu):
status: Won't Fix → Opinion
Changed in libvirt (Ubuntu):
importance: Undecided → Medium
Scott Moser (smoser)
tags: added: qemu-file-locking
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Libvirt will make this "easier" to be avoided by
commit 28907b0043fbf71085a798372ab9c816ba043b93
Author: Peter Krempa <email address hidden>
Date: Wed Nov 15 15:21:14 2017 +0100

    qemu: command: Mark <shared/> disks as such in qemu

    Qemu has now an internal mechanism for locking images to fix specific
    cases of disk corruption. This requires libvirt to mark the image as
    shared so that qemu lifts certain restrictions.

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1378242

commit 860a3c4bea1d24773d8a495f213d5de3ac48a462
Author: Peter Krempa <email address hidden>
Date: Wed Nov 15 15:02:58 2017 +0100

    qemu: caps: Add capability for 'share-rw' disk option

    'share-rw' for the disk device configures qemu to allow concurrent
    access to the backing storage.

    The capability is checked in various supported disk frontend buses since
    it does not make sense to partially backport it.

Once this is complete in bionic by taking the new upstream I'll take a look at how backportable that is (changes look ok, but might need some testing in the SRU spirit of things).

Changed in libvirt (Ubuntu Artful):
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (13.5 KiB)

This bug was fixed in the package libvirt - 4.0.0-1ubuntu1

---------------
libvirt (4.0.0-1ubuntu1) bionic; urgency=medium

  * Merged with Debian unstable (4.0)
    This closes several bugs:
    - Error generating apparmor profile when hostname contains spaces
      (LP: #799997)
    - qemu 2.10 locks files, libvirt shared now sets share-rw=on (LP: #1716028)
    - libvirt usb passthrough throws apparmor denials related to
      /run/udev/data/+usb (LP: #1727311)
    - AppArmor denies access to /sys/block/*/queue/max_segments (LP: #1729626)
    - iohelper improvements to let bypass-cache work without opening up the
      apparmor isolation (LP: #1719579)
    - nodeinfo on s390x to contain more CPU info (LP: #1733688)
    - Upgrade libvirt >= 4.0 (LP: #1745934)
  * Remaining changes:
    - Disable libssh2 support (universe dependency)
    - Disable firewalld support (universe dependency)
    - Disable selinux
    - Set qemu-group to kvm (for compat with older ubuntu)
    - Additional apport package-hook
    - Modifications to adapt for our delayed switch away from libvirt-bin (can
      be dropped >18.04).
      + d/p/ubuntu/libvirtd-service-add-bin-alias.patch: systemd: define alias
        to old service name so that old references work
      + d/p/ubuntu/libvirtd-init-add-bin-alias.patch: sysv init: define alias
        to old service name so that old references work
      + d/control: transitional package with the old name and maintainer
        scripts to handle the transition
    - Backwards compatible handling of group rename (can be dropped >18.04).
    - config details and autostart of default bridged network. Creating that is
      now the default in general, yet our solution provides the following on
      top as of today:
      + autostart the default network by default
      + do not autostart if subnet is already taken (e.g. in guests).
    - d/p/ubuntu/Allow-libvirt-group-to-access-the-socket.patch: This is
      the group based access to libvirt functions as it was used in Ubuntu
      for quite long.
      + d/p/ubuntu/daemon-augeas-fix-expected.patch fix some related tests
        due to the group access change.
    - ubuntu/parallel-shutdown.patch: set parallel shutdown by default.
    - d/p/ubuntu/enable-kvm-spice.patch: compat with older Ubuntu qemu/kvm
      which provided a separate kvm-spice.
    - d/p/ubuntu/ubuntu-libxl-qemu-path.patch: this change was split. The
      section that adapts the path of the emulator to the Debian/Ubuntu
      packaging is kept.
    - d/p/ubuntu/ubuntu-libxl-Fix-up-VRAM-to-minimum-requirements.patch: auto
      set VRAM to minimum requirements
    - d/p/ubuntu/xen-default-uri.patch: set default URI on xen hosts
    - Add libxl log directory
    - libvirt-uri.sh: Automatically switch default libvirt URI for users on
      Xen dom0 via user profile (was missing on changelogs before)
    - d/p/ubuntu/apibuild-skip-libvirt-common.h: drop libvirt-common.h from
      included_files to avoid build failures due to duplicate definitions.
    - Update README.Debian with Ubuntu changes
    - Convert libvirt0, libnss_libvirt and libvirt-dev to multi-arch.
    - Enable some additional features on ppc...

Changed in libvirt (Ubuntu):
status: Triaged → Fix Released
Changed in qemu (Ubuntu Artful):
status: New → Won't Fix
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The avoidance for libvirt driven cases to map </shared> onto share-rw is complete in Bionic now, so I started to look into a potential SRU.

But looking back it seems in the meantime everybody learned how to use it.
I have not seen any more dups coming in recently.
The same is true for openstack and likely everyone else.

Therefore I'll set Won't Fix for Artful, we would have to wait for an upcoming security fix to be out of the queue anyway. If until that is done someone steps up and explains why this really still is needed in Artful I'll backport them, but otherwise I think we are good with Won't fix.

Changed in libvirt (Ubuntu Artful):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.