KVM instances fail at creation with: Failed to get shared "write" lock Is another process using the image?

Bug #1816170 reported by Pedro Guimarães
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Joseph Phillips
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned
qemu (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Juju version: 2.5.0
Qemu version: qemu-system-x86 2.11+dfsg-1ubuntu7.9

When creating multiple KVM instances using Juju, some fail with "exit status 13".

Looking at the unit-machine logs, I can see what seems to be a race-condition between qemu-img create and virsh start commands:

2019-02-15 16:31:27 DEBUG juju.container.kvm run_linux.go:24 running: qemu-img [create -b /var/lib/juju/kvm/guests/bionic-amd64-backing-file.qcow -f qcow2 /var/lib/juju/kvm/guests/juju-720d5c-1-kvm-3.qcow
 8G]
2019-02-15 16:31:27 DEBUG juju.container.kvm run_linux.go:25 running as uid: 64055, gid: 117
2019-02-15 16:31:27 DEBUG juju.container.kvm run_linux.go:36 output: Formatting '/var/lib/juju/kvm/guests/juju-720d5c-1-kvm-3.qcow', fmt=qcow2 size=8589934592 backing_file=/var/lib/juju/kvm/guests/bionic-
amd64-backing-file.qcow cluster_size=65536 lazy_refcounts=off refcount_bits=16

2019-02-15 16:31:27 DEBUG juju.container.kvm wrappedcmds.go:481 create root image: Formatting '/var/lib/juju/kvm/guests/juju-720d5c-1-kvm-3.qcow', fmt=qcow2 size=8589934592 backing_file=/var/lib/juju/kvm/
guests/bionic-amd64-backing-file.qcow cluster_size=65536 lazy_refcounts=off refcount_bits=16

2019-02-15 16:31:27 DEBUG juju.container.kvm run.go:19 virsh [define /var/lib/juju/containers/juju-720d5c-1-kvm-3/juju-720d5c-1-kvm-3.xml]
2019-02-15 16:31:27 DEBUG juju.container.kvm run.go:21 output: Domain juju-720d5c-1-kvm-3 defined from /var/lib/juju/containers/juju-720d5c-1-kvm-3/juju-720d5c-1-kvm-3.xml

2019-02-15 16:31:27 DEBUG juju.container.kvm wrappedcmds.go:204 created domain: Domain juju-720d5c-1-kvm-3 defined from /var/lib/juju/containers/juju-720d5c-1-kvm-3/juju-720d5c-1-kvm-3.xml

2019-02-15 16:31:27 DEBUG juju.container.kvm run.go:19 virsh [start juju-720d5c-1-kvm-3]
2019-02-15 16:31:29 DEBUG juju.container.kvm run.go:21 output: error: Failed to start domain juju-720d5c-1-kvm-3
error: internal error: qemu unexpectedly closed the monitor: 2019-02-15T16:31:28.826568Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
2019-02-15T16:31:28.839983Z qemu-system-x86_64: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1: Failed to get shared "write" lock
Is another process using the image?

2019-02-15 16:31:29 ERROR juju.provisioner.kvm kvm-broker.go:152 failed to start container: kvm container creation failed: failed to start domain "juju-720d5c-1-kvm-3": exit status 1
2019-02-15 16:31:29 WARNING juju.provisioner provisioner_task.go:1282 failed to start machine 1/kvm/3 (kvm container creation failed: failed to start domain "juju-720d5c-1-kvm-3": exit status 1), retrying
 in 10s (10 more attempts)

It never works (on all 10x trials).

There is a lot of reports on qemu-img's request for write lock although reading infos:
https://bugs.launchpad.net/qemu/+bug/1721788
https://bugs.launchpad.net/nova/+bug/1718295
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1718133
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1716028

One solution seems to be passing --force-share to qemu-img command. That may be risky in a scenario where Juju AND other things are using qemu at the same time, as discussed on: https://bugs.launchpad.net/qemu/+bug/1721788/comments/1

Can we create a new image file for every VM to deploy? Or --force-share is an alternative?

Either way, there seems to be no work-around besides changing Juju at the moment.

Tags: cpe-onsite
summary: - KVM containers fail at creation with: Failed to get shared "write" lock
+ KVM instances fail at creation with: Failed to get shared "write" lock
Is another process using the image?
description: updated
description: updated
Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

I am setting this bug to field-high as we have two projects that can greatly benefit on this feature.

tags: added: cpe-onsite
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (5.2 KiB)

/var/log from a host with a bunch of failed kvm "containers":
https://private-fileshare.canonical.com/~dima/juju-dumps/kvm-host-2019-03-14.tar.gz

Relevant VM logs: /var/log/libvirt/qemu/*

Some VMs have started successfully, some have not:

pgrep -af qemu | grep -ioP 'qemu.*?guest=.*?\s'
qemu-system-x86_64 -enable-kvm -name guest=juju-5ad6c1-6-kvm-0,debug-threads=on
qemu-system-x86_64 -enable-kvm -name guest=juju-5ad6c1-6-kvm-2,debug-threads=on
qemu-system-x86_64 -enable-kvm -name guest=juju-5ad6c1-6-kvm-3,debug-threads=on

root@control-7:~# pgrep -af qemu | grep -ioP 'drive file=.*?\s'
drive file=/var/lib/juju/kvm/guests/juju-5ad6c1-6-kvm-0.qcow,format=qcow2,if=none,id=drive-virtio-disk0
drive file=/var/lib/juju/kvm/guests/juju-5ad6c1-6-kvm-0-ds.iso,format=raw,if=none,id=drive-virtio-disk1
drive file=/var/lib/juju/kvm/guests/juju-5ad6c1-6-kvm-2.qcow,format=qcow2,if=none,id=drive-virtio-disk0
drive file=/var/lib/juju/kvm/guests/juju-5ad6c1-6-kvm-2-ds.iso,format=raw,if=none,id=drive-virtio-disk1
drive file=/var/lib/juju/kvm/guests/juju-5ad6c1-6-kvm-3.qcow,format=qcow2,if=none,id=drive-virtio-disk0
drive file=/var/lib/juju/kvm/guests/juju-5ad6c1-6-kvm-3-ds.iso,format=raw,if=none,id=drive-virtio-disk1

root@control-7:~# ls /var/lib/juju/kvm/guests/
bionic-amd64-backing-file.qcow juju-5ad6c1-6-kvm-1-ds.iso juju-5ad6c1-6-kvm-2.qcow juju-5ad6c1-6-kvm-4-ds.iso juju-5ad6c1-6-kvm-5.qcow
juju-5ad6c1-6-kvm-0-ds.iso juju-5ad6c1-6-kvm-1.qcow juju-5ad6c1-6-kvm-3-ds.iso juju-5ad6c1-6-kvm-4.qcow
juju-5ad6c1-6-kvm-0.qcow juju-5ad6c1-6-kvm-2-ds.iso juju-5ad6c1-6-kvm-3.qcow juju-5ad6c1-6-kvm-5-ds.iso

root@control-7:~# grep -RiP 'Failed' /var/log/libvirt/qemu/
/var/log/libvirt/qemu/juju-5ad6c1-6-kvm-4.log:2019-03-14T19:44:09.095078Z qemu-system-x86_64: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1: Failed to get shared "write" lock
/var/log/libvirt/qemu/juju-5ad6c1-6-kvm-4.log:2019-03-14 19:44:09.184+0000: shutting down, reason=failed
/var/log/libvirt/qemu/juju-5ad6c1-6-kvm-5.log:2019-03-14T19:44:11.875801Z qemu-system-x86_64: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1: Failed to get shared "write" lock
/var/log/libvirt/qemu/juju-5ad6c1-6-kvm-5.log:2019-03-14 19:44:12.160+0000: shutting down, reason=failed
/var/log/libvirt/qemu/juju-5ad6c1-6-kvm-1.log:2019-03-14T19:44:09.219746Z qemu-system-x86_64: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1: Failed to get shared "write" lock
/var/log/libvirt/qemu/juju-5ad6c1-6-kvm-1.log:2019-03-14 19:44:09.497+0000: shutting down, reason=failed

Each VM uses the following files (an ISO and a qcow image):

-drive file=/var/lib/juju/kvm/guests/juju-5ad6c1-6-kvm-0.qcow
-drive file=/var/lib/juju/kvm/guests/juju-5ad6c1-6-kvm-0-ds.iso

ls -il /var/lib/juju/kvm/guests/
total 1519192
125175532 -rw-r--r-- 1 root root 2361393152 Mar 14 19:46 bionic-amd64-backing-file.qcow
125175524 -rw-r--r-- 1 libvirt-qemu kvm 391168 Mar 14 19:44 juju-5ad6c1-6-kvm-0-ds.iso
125175525 -rw-r--r-- 1 libvirt-qemu kvm 393216...

Read more...

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

After starting the VMs that were not due to the above error coming from QEMU:

juju-5ad6c1-6-kvm-0: stuck in initramfs
juju-5ad6c1-6-kvm-1: shows a login prompt (juju-5ad6c1-6-kvm-1 login:)
juju-5ad6c1-6-kvm-2: shows a login prompt
juju-5ad6c1-6-kvm-3: stuck in initramfs
juju-5ad6c1-6-kvm-4: shows a login prompt
juju-5ad6c1-6-kvm-5: shows a login prompt

(initramfs) Gave up waiting for root file system device. Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT! LABEL=cloudimg-rootfs does not exist. Dropping to a shell!

(initramfs) cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.15.0-46-generic root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0

If ./init is ran by hand from the console in this initramfs "Gave up waiting for root file system device" is encountered again.

(initramfs) blkid
/dev/vdb: UUID="2019-03-14-19-44-16-00" LABEL="cidata" TYPE="iso9660"

(initramfs) ls /dev/vd*
/dev/vdb /dev/vda

(initramfs) mount | grep /dev/vd ; echo $?
1

http://manpages.ubuntu.com/manpages/bionic/man8/initramfs-tools.8.html
"If a label is used, as in root=LABEL=rootPart the initrd will search all available devices for a filesystem with the appropriate label, and mount that device as the root filesystem."

Revision history for this message
Bogdan Suchok (bogdan-sk) wrote :

subscribed ~field-critical as this bug affects customer deployment.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Targeting qemu for input from that end.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

From what I see, the fix would be to use --force-share when invoking `qemu-img create -b <backing-image-path> -f qcow2 <imgpath>` as of qemu 2.10 which is currently not the case:

https://github.com/juju/juju/blob/juju-2.5.2/container/kvm/wrappedcmds.go#L474-L485

 out, err := params.runCmd(
  "qemu-img",
  "create",
  "-b", backingPath,
  "-f", "qcow2",
  imgPath,
  fmt.Sprintf("%dG", params.RootDisk))

https://git.qemu.org/?p=qemu.git;a=commit;h=335e9937844449a3fc172dd5e64549ad7d5ad1c2

git --no-pager tag --contains=335e9937844449a3fc172dd5e64549ad7d5ad1c2 | grep -v rc
v2.10.0
v2.10.1
v2.10.2
v2.11.0
v2.11.1
v2.11.2
v2.12.0
v2.12.1
v3.0.0
v3.1.0

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
looking at this from qemu's POV.
But it seems you already identified most of the related history, I'll add one more (bug 1716028) to the description and then summarize.

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Yes qemu had too much cases in the past where it came to data corruption for using the disk multiple times.
That has caused quite some pain (as you found in the old bugs) but turned out to be much more reliable.

The different entry points use different options to eventually achieve the same which is to explicitly allow sharing the disks as needed in the rare cases that this applies:
1. manual qemu usage: cmdline needs to set share-rw=on
2. usage through libvirt: sets share-rw=on if "<sharable>" is set, see bug 1716028 and refs from there)
3. tools like qemu-img: need to set --force-share as you already identified
4. openstack: well this used #2 (and has to set sharable as needed and #3 which is fixed (by bug 1718295 and bug 1718133 that you already found)

That should be in all of the virt-stack that is on Bionic and later (where the qemu with the change is) and in the related Cloud Archive versions.

The TL;DR is:
- images really shared (like an ISO) need to marked sharable
- things not really meant to be opened multiple times e.g. same root disk file (that only worked by chance before) should not be used that way

I don't see in the comments above what would be left, are there charms using qemu-img that need to be adapted (that might be what comment #6 says) or what exactly is the missing bit here?

For qemu/libvirt all that I know to be needed for this is fixed released - setting accordingly unless we identify something new that we will need.

Changed in libvirt (Ubuntu):
status: New → Fix Released
Changed in qemu (Ubuntu):
status: New → Fix Released
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Christian,

In this case Juju (not charms) uses the following command in its code to create a VM image based on a common cloud image:

 out, err := params.runCmd(
  "qemu-img",
  "create",
  "-b", backingPath,
  "-f", "qcow2",
  imgPath,
  fmt.Sprintf("%dG", params.RootDisk))

This is done when charms are requested to be deployed into a VM provisioned by a juju machine agent (--to kvm:<host-machine-id> syntax analogous to --to lxd:<host-machine-id>).

The reported problem specifically occurs when multiple machines are provisioned on the same machine.

My understanding is:

1) that Redirect-on-Write type of snapshot is safe to do (even if multiple copies of qemu-img are using the same backing image at the same moment) because the backing image does not change in progress;
2) it is safe to run multiple VMs using different snapshots that use the same base image (because of the Redirect-on-Write behavior).
https://wiki.qemu.org/Documentation/CreateSnapshot

I initially thought that it might be a race between multiple parallel `qemu-img create -b bfile -f qcow2 <targetimg>` invocations but the errors are present in the libvirt domain log so it looks like I might be wrong.

Given that each VM has its own snapshot based on the cloud image and its own ISO, I am not sure why the write lock is necessary:

-drive file=/var/lib/juju/kvm/guests/juju-5ad6c1-6-kvm-0.qcow
-drive file=/var/lib/juju/kvm/guests/juju-5ad6c1-6-kvm-0-ds.iso

Any thoughts about that?

I included `ls -il /var/lib/juju/kvm/guests/` output in comment #2 to show if there were any hard links (none based on what I see in the first column).

The patch for snapshots and backing images seems to be included in qemu/bionic already so the qemu-img -b (...) invocation shouldn't be a problem:

➜ qemu git:(ubuntu/bionic) grep -RiP ' Open backing image in force share mode'
debian/patches/ubuntu/qemu-stable-2.11.1.patch:Subject: [PATCH 09/80] block: Open backing image in force share mode for size

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg02816.html
https://git.qemu.org/?p=qemu.git;a=commit;h=cc954f01e3c004aad081aa36736a17e842b80211

git --no-pager tag --contains=cc954f01e3c004aad081aa36736a17e842b80211 | grep -v rc
v2.12.0
v2.12.1
v3.0.0
v3.1.0

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (7.5 KiB)

I have done some more lab testing and here is what I found:

1) If one VM is provisioned via Juju and then after the provisioning is fully complete (i.e. a VM is started) other VMs are created successfully.

This means that there is no issue with using the same base image for snapshots of multiple VMs at the same time and that QEMU does not consider that a backing image somehow needs to be write-lock protected when multiple VMs are started.

juju add-machine # i
juju deploy cs:ubuntu -n 1 --to kvm:<i>
# wait until 0/kvm/<i> starts
juju add-unit ubuntu --to kvm:<i>
# everything is ok

2) If multiple VMs are provisioned at the same time (without any prior VM provisioning) issues start to appear:

# juju.container and juju.container.kvm tracing is enabled
juju model-config logging-config
<root>=WARNING;juju.container.kvm=TRACE;juju.container=TRACE;unit=DEBUG

juju add-machine # <i>
juju deploy cs:ubuntu -n 6 --to kvm:<i>,kvm:<i>,kvm:<i>,kvm:<i>,kvm:<i>,kvm:<i>
# errors: https://paste.ubuntu.com/p/DrsVw6xhzf/

https://private-fileshare.canonical.com/~dima/juju-dumps/kvm-host-var-log-17-03-2019.tar.gz (/var/log from the kvm host)

Intermediate Juju status shows that image downloads for multiple VMs on the same machine start concurrently:

https://paste.ubuntu.com/p/CXKPCdDZnm/ (multiple consecutive juju status commands showing that there is no synchronization between cloud image downloads)

ubuntu@maas:~$ juju status
#...
Machine State DNS Inst id Series AZ Message
3 started 10.232.24.16 node-2712d152-5daf-11e4-93f3-525400a6ab07 bionic default Deployed
3/kvm/0 pending pending bionic starting
3/kvm/1 pending pending bionic copying http://cloud-images.ubuntu.com/server/releases/bionic/release-20190307/ubuntu-18.04-server-cloudimg-amd64.img 1% (576KB/s)
3/kvm/2 pending pending bionic copying http://cloud-images.ubuntu.com/server/releases/bionic/release-20190307/ubuntu-18.04-server-cloudimg-amd64.img 47% (5.2MB/s)
3/kvm/3 pending pending bionic copying http://cloud-images.ubuntu.com/server/releases/bionic/release-20190307/ubuntu-18.04-server-cloudimg-amd64.img 34% (4.6MB/s)
3/kvm/4 pending pending bionic starting
3/kvm/5 pending pending bionic copying http://cloud-images.ubuntu.com/server/releases/bionic/release-20190307/ubuntu-18.04-server-cloudimg-amd64.img 37% (2.7MB/s)

From the machine log I can see that concurrent image downloads also result in concurrent executions of `qemu-img convert` to the same (!!!) destination file:

machine.log:
https://paste.ubuntu.com/p/fkmDv69pR8/

2019-03-17 13:48:05 DEBUG juju.container.kvm run.go:19 qemu-img [convert -f qcow2 /tmp/juju-kvm-ubuntu-18.04-server-cloudimg-amd64.img-720678751 /var/lib/juju/kvm/guests/bionic-amd64-backing-file.qcow]
2019-03-17 13:48:09 DEBUG juju.container.kvm...

Read more...

Tim Penhey (thumper)
Changed in juju:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Richard Harding (rharding)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Answering on commend #9 to be complete.
    #1 creating a snapshot should be safe, and in fact an qemu-img create works fine on a base file
       that currently has a running VM associated

    #2 running multiple VMs with images referring to the same backing file is safe as well

    Yes if each has an own disk (.qcow) and own .iso then the locks will be necessary but it should
    be able to get the logs as nothing else holds them.

    Also yes the patch for qemu-img is in Bionic by 2.11.1

Ok in comment #10 you came to the same conclusions (\o/) and identified an issue in the concurrent pull&convert of the initial image - that would make sense as the convert will need the write lock (which also denies further read locks) and the concurrent actions potentially corrupt the file anyway.

So yeah, I'd conclude similarly to you and would assume that any "image synchronize + convert" actions by Juju should be mutually exclusive to each other at least per target-img file.

You already assigned to Rick for Juju which I think is the right next step, thanks!

Changed in juju:
status: Triaged → In Progress
assignee: Richard Harding (rharding) → Joseph Phillips (manadart)
Revision history for this message
Joseph Phillips (manadart) wrote :

I have proposed https://github.com/juju/juju/pull/9898 to address this.

Many thanks to all involved in reproducing/delving this issue. Really nice to have the analysis side handed to the core team on a plate :)

Changed in juju:
milestone: none → 2.5.3
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → In Progress
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Tried the first branch:

https://github.com/juju/juju/pull/9898

Had positive results for images themselves but there seems to be another issue with data store ISOs as described in the comment above https://bugs.launchpad.net/juju/+bug/1816170/comments/10 and in the comment here https://github.com/juju/juju/pull/9898#issuecomment-474345689

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Tried the latest PR:

https://github.com/juju/juju/pull/9912

➜ juju git:(2.5) ✗ git --no-pager branch --contains=f73e98eeb6971ade1da020e6e58448469fc0ad00
* 2.5

➜ juju git:(2.5) ✗ grep -RiP 1ca489a5e41e9bcb51f0ba8d76616a785c9ce843
parts/juju/state/pull: source-commit: 1ca489a5e41e9bcb51f0ba8d76616a785c9ce843

Managed to reproduce the behavior in #13 on one node:

https://pastebin.canonical.com/p/fvcvjz69Bb/

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.