libvirt: blockcommit fails - disk not ready for pivot yet

Bug #1681839 reported by Patrick Best on 2017-04-11
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Medium
Unassigned
Xenial
Medium
Matthew Ruffell
Artful
Undecided
Unassigned
Bionic
Medium
Unassigned

Bug Description

[Impact]

On xenial, if you manually invoke blockcommit through virsh in libvirt, the command immediately fails with blockcommit supposedly being 100%, and that the disk is not ready for pivot yet:

root@xenial-apparmor:~# virsh blockcommit snapvm vda --active --verbose --pivot --wait
Block commit: [100 %]
error: failed to pivot job for disk vda
error: block copy still active: disk 'vda' not ready for pivot yet

However, if you look at the status of the active blockjob, we see that the blockcommit is still active in the background:

root@xenial-apparmor:~# virsh blockjob snapvm vda --info
Active Block Commit: [0 %]

root@xenial-apparmor:~# virsh blockjob snapvm vda --info
Active Block Commit: [2 %]

root@xenial-apparmor:~# virsh blockjob snapvm vda --info
Active Block Commit: [6 %]

This happens until it reaches 100%, where it gets stuck. To un-stick things, you must then manually --abort the blockjob.

root@xenial-apparmor:~# virsh blockjob snapvm vda --info
Active Block Commit: [100 %]

This happens in VMs which are experiencing load, and is caused by a race condition in libvirt. Users are not able to commit their snapshots to disk.

[Test Case]

Credit goes to Fabio Martins, who determined how to reproduce this issue.

On a Ubuntu 16.04 host with libvirt 1.3.1-1ubuntu10.27:

1) Create a VG and define a LVM pool:

root@xenial-apparmor:~# cat lvmpool.xml
<pool type="logical">
<name>LVMpool_vg</name>
<source>
<device path="/dev/sdb"/>
</source>
<target>
<path>/dev/LVMpool_vg</path>
</target>
</pool>

# virsh pool-define lvmpool.xml
# virsh pool-start LVMpool_vg
# virsh pool-autostart LVMpool_vg

2) Create a config file to use as a cdrom device with the new VM (will be created in next steps), just to inject a password with cloud-init:

# cat > config <<EOF
> #cloud-config
> password: passw0rd
> chpasswd: { expire: False }
> ssh_pwauth: True
> EOF

# apt install cloud-image-utils

# cloud-localds config.img config

# mv config.img /var/lib/libvirt/images/
# chown libvirt-qemu:kvm /var/lib/libvirt/images/config.img
# chmod 664 /var/lib/libvirt/images/config.img

3) Create one VM using this pool:

# virt-install --connect=qemu:///system --name snapvm --ram 2048 --vcpus=1 --os-type=linux --disk pool=LVMpool_vg,size=15,bus=virtio --disk /var/lib/libvirt/images/config.img,device=cdrom --network network=kvm-br0 --graphics none --import --noautoconsole

4) Stop the VM

# virsh destroy snapvm

5) Download a Ubuntu cloud image, convert to raw and restore it into the LV used as a disk by our VM:

# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
# qemu-img convert ./bionic-server-cloudimg-amd64.img ./bionic-server-cloudimg-amd64.raw
# dd if=./bionic-server-cloudimg-amd64.raw of=/dev/LVMpool_vg/snapvm bs=8M conv=sparse

6) Start the VM and connect to it in another window

# virsh start snapvm

7) Check that the VM is using the LV as the disk:

root@xenial-apparmor:~# virsh domblklist snapvm
Target Source
------------------------------------------------
vda /dev/LVMpool_vg/snapvm
hda /var/lib/libvirt/images/config.img

8) Create a snapshot and check that the new domblklist points to the snapshot file:

# virsh snapshot-create-as --domain snapvm --diskspec vda,file=/var/lib/libvirt/images/xenial-snapvm.qcow2,snapshot=external --disk-only --atomic

root@xenial-apparmor:~# virsh domblklist snapvm
Target Source
------------------------------------------------
vda /var/lib/libvirt/images/xenial-snapvm.qcow2
hda /var/lib/libvirt/images/config.img

9) Connect to your VM and start an I/O intensive job. In this case I'm starting a 'dd' writing zeroes to a file until it gets to 10GBs:

ubuntu@ubuntu:~$ dd if=/dev/zero of=file.txt count=1024 bs=10240000

10) Back to the host, monitor the snapshot file and let it grow until at list a bit more than 1GB, as in the example below (where we can see the file with 3.9G):

root@xenial-apparmor:~# ls -lh /var/lib/libvirt/images/
total 5.2G
-rw-rw-r-- 1 libvirt-qemu kvm 329M Sep 3 03:18 bionic-server-cloudimg-amd64.img
-rw-r--r-- 1 root root 10G Sep 3 03:28 bionic-server-cloudimg-amd64.raw
-rw-rw-r-- 1 libvirt-qemu kvm 366K Sep 3 03:19 config.img
-rw------- 1 libvirt-qemu kvm 3.9G Sep 3 04:41 xenial-snapvm.qcow2

11) Start a blockcommit job with --active --verbose --pivot --wait and we'll hit the error when the job gets to 100%:

root@xenial-apparmor:~# virsh blockcommit snapvm vda --active --verbose --pivot --wait
Block commit: [100 %]
error: failed to pivot job for disk vda
error: block copy still active: disk 'vda' not ready for pivot yet

12) The blkjob will continue in the background, and status increments:

root@xenial-apparmor:~# virsh blockjob snapvm vda --info
Active Block Commit: [0 %]

root@xenial-apparmor:~# virsh blockjob snapvm vda --info
Active Block Commit: [2 %]

root@xenial-apparmor:~# virsh blockjob snapvm vda --info
Active Block Commit: [6 %]

13) The blkjob will show it is stuck at 100% until you --abort the blkjob:

root@xenial-apparmor:~# virsh blockjob snapvm vda --info
Active Block Commit: [100 %]

I have created a test package with the commits needed to solve the problem, and it is available here:

https://launchpad.net/~mruffell/+archive/ubuntu/sf242822-test

What should happen:

If you install the test libvirt-bin and libvirt0 packages from the above ppa, and run through the test case, when blockcommit is invoked, it will not fail immediately, and instead, will continue on until it reaches 100%. Once 100% is reached, the blockjob will complete successfully.

[Regression Potential]

While there are four commits which are required to fix this issue, all of them are fairly minor and only modify the way the current status percentage is counted, and how states are being changed, upon reaching 100% blockcommit. All changes are localised to one file.

Most of the commits are limited to blockcommit, and in event of regression, only blockcommit and by extension, some blockjobs would be impacted.

The commits have been present in upstream for a long time, have been well tested by the community, and are from a release of libvirt with very small delta to the one in xenial (1.3.2 versus 1.3.1 in xenial), I believe there is little risk of regression.

[Other Info]

The following commits were identified in the upstream bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1197592

which are also listed in comment #6.

commit 86c4df83b913dd73b79caeed2038291374384dc5
Author: Michael Chapman <email address hidden>
Date: Wed Jan 27 13:24:54 2016 +1100
Subject: virsh: improve waiting for block job readiness

commit 8fa216bbb40df33e7fce5d727aa3dc334480878a
Author: Michael Chapman <email address hidden>
Date: Wed Jan 27 13:24:53 2016 +1100
Subject: virsh: ensure SIGINT action is reset on all errors

commit 15dee2ef24f2f19f6dcd30d997b81c8a14582361
Author: Michael Chapman <email address hidden>
Date: Wed Jan 27 13:24:52 2016 +1100
Subject: virsh: be consistent with style of loop exit

commit 704dfd6b0fafe7eafca93a03793389239f8ab869
Author: Michael Chapman <email address hidden>
Date: Wed Jan 27 13:24:51 2016 +1100
Subject: virsh: avoid unnecessary progress updates

These fix the problem, and were introduced in libvirt 1.3.2 upstream. All commits are clean cherry picks, and the code is still present in B, D, E and F.

Joshua Powers (powersj) on 2017-04-12
Changed in libvirt (Ubuntu):
status: New → Incomplete
status: Incomplete → New
Joshua Powers (powersj) wrote :

Hi and thanks for reporting this bug! I am going to see if someone else from the team can also take a look at this to see how big of a change this would require.

Also, sorry for marking this as incomplete and then new again as I was on the wrong tab.

​many thanks!​

On Wed, Apr 12, 2017 at 2:38 PM, Joshua Powers <email address hidden>
wrote:

> Hi and thanks for reporting this bug! I am going to see if someone else
> from the team can also take a look at this to see how big of a change
> this would require.
>
> Also, sorry for marking this as incomplete and then new again as I was
> on the wrong tab.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1681839
>
> Title:
> libvirt - disk not ready for pivot yet
>
> Status in libvirt package in Ubuntu:
> New
>
> Bug description:
> root@thewind:/home/bestpa/scripts# virsh blockcommit mail vda --active
> --verbose --pivot
> Block commit: [100 %]error: failed to pivot job for disk vda
> error: block copy still active: disk 'vda' not ready for pivot yet
>
> found related bugfix at redhat... can i get 1.3.2 pushed into ubuntu
> 16.04 release?
>
> bestpa@thewind:~$ cat /etc/os-release
> NAME="Ubuntu"
> VERSION="16.04.2 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.2 LTS"
>
> bestpa@thewind:~$ libvirtd --version
> libvirtd (libvirt) 1.3.1
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/
> 1681839/+subscriptions
>

Info to repro I tried:
# create a simple system via uvtool-libvirt
$ uvt-kvm create [...]
$ virsh dumpxml <guest> > t1.xml
$ virsh undefine <guest>

# need to be transient for the blockcopy test
$ virsh create t1.xml

# Now we have a transient domain, and can copy them around:

$ virsh domblklist xenial-zfspool-libvirtTarget Source
------------------------------------------------
vda /var/lib/uvtool/libvirt/images/xenial-zfspool-libvirt.qcow
vdb /var/lib/uvtool/libvirt/images/xenial-zfspool-libvirt-ds-clone.qcow

# Since the referred bug reported that being racy I tried in a loop:
$ for idx in $(seq 1 20); do virsh blockcopy xenial-zfspool-libvirt vdb /var/lib/uvtool/libvirt/images/xenial-zfspool-libvirt-ds-clone${idx}.qcow --pivot --verbose --wait; done

It worked fine in 20/20 cases for me - I also checked on the bigger vda image but it worked as well.
That might only be due to less load, smaller file or whatever else defines the race window.

You reported your issue on commit rather than copy as in the RH bug.
So looking into that more specific.

$ virsh snapshot-create-as --domain testguest snap1 --diskspec vda,file=/var/lib/uvtool/libvirt/images/vda-snap1.qcow2 --disk-only --atomic --no-metadata
# touch a file in guest
$ virsh snapshot-create-as --domain testguest snap2 --diskspec vda,file=/var/lib/uvtool/libvirt/images/vda-snap2.qcow2 --disk-only --atomic --no-metadata
# touch a file in guest

This gave me a two stage snapshot list
$ sudo qemu-img info --backing-chain /var/lib/uvtool/libvirt/images/vda-snap2.qcow2
image: /var/lib/uvtool/libvirt/images/vda-snap2.qcow2
[...]
backing file: /var/lib/uvtool/libvirt/images/vda-snap1.qcow2
[...]
image: /var/lib/uvtool/libvirt/images/vda-snap1.qcow2
[...]
backing file: /var/lib/uvtool/libvirt/images/testguest-clone5.qcow
backing file format: qcow2
[...]
image: /var/lib/uvtool/libvirt/images/testguest-clone5.qcow

Committing those onto the base worked as well:

virsh blockcommit testguest vda --active --verbose --pivot
Block commit: [100 %]
Successfully pivoted

In "virsh domblklist testguest" this moved me back from:
vda /var/lib/uvtool/libvirt/images/vda-snap1.qcow2
to
vda /var/lib/uvtool/libvirt/images/testguest-clone5.qcow

On the changes:
First set of patches is in 1.2.18, so we have that already
faa14391 virsh: Refactor block job waiting in cmdBlockCopy
74084035 virsh: Refactor block job waiting in cmdBlockCommit
2e782763 virsh: Refactor block job waiting in cmdBlockPull
eae59247 qemu: Update state of block job to READY only if it actually is ready

Second set that got upstream into 1.3.2 that would need to be backports
86c4df83 virsh: improve waiting for block job readiness
8fa216bb virsh: ensure SIGINT action is reset on all errors
15dee2ef virsh: be consistent with style of loop exit
704dfd6b virsh: avoid unnecessary progress updates

This set seems almost backportable at first look, but I didn't check all the dependencies that are not so obvious.

On the original request, we won't just move to 1.3.2 in Xenial as that would be against the SRU policy [1] to protect the stability for many/all other use cases.
Instead we could either work on this together to backport, test and verify it for Xenial.
Or you could use the Ubuntu Cloud Archive [2] which is like a special backport pocket that provides you the latest cloud/virtualization related packages. With that you could get the Libvirt/qemu stack of Yakkety or Zesty and should be good as well.

If you pick the former one needs to find the time to create the backports.
I don't know if I immediately get to this given that it is a rather less used use-case and also only triggering on a race as it seems to me. If you want to work with me on preparing that I'll try to help as good as I can - and there also is the USBSD [3] that might help if you are unsure.

[1]: https://wiki.ubuntu.com/StableReleaseUpdates
[2]: https://wiki.ubuntu.com/OpenStack/CloudArchive
[3]: https://naccblog.wordpress.com/2017/03/24/usbsd-1-goals-inaugural-ubuntu-server-bug-squashing-day/

For now I'll mark it as incomplete waiting for any further info you can provide.
To better triage and confirm your case, I'd like to understand if you:
- can reliably trigger this (if you have steps to do so please report them as well)
- was a one-time failure
- happens every now and then in your environment

Also if you happen to have identified any steps on the creation of the images (size, format, ...) that will affect the chance to reproduce please let us know.

Changed in libvirt (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
Patrick Best (bestpa) wrote :

Well i'll be. It seems to work no problem upon subsequent tries.
Here's my methodology and run-through.

virsh # list
 Id Name State
----------------------------------------------------

 17 mail running

virsh #
virsh #
virsh #
virsh # domblklist mail
Target Source
------------------------------------------------
vda /images2/mail.img
hdb /home/bestpa/iso/ubuntu-14.04.5-server-amd64.iso

virsh #
virsh #
virsh #
virsh #
virsh # list
 Id Name State
----------------------------------------------------

 17 mail running

virsh # domblklist mail
Target Source
------------------------------------------------
vda /images2/mail.img
hdb /home/bestpa/iso/ubuntu-14.04.5-server-amd64.iso

virsh # snapshot-list mail
 Name Creation Time State
------------------------------------------------------------

virsh # snapshot-create-as --domain mail mail-snap1 --disk-only --atomic
Domain snapshot mail-snap1 created
virsh # snapshot-list mail
 Name Creation Time State
------------------------------------------------------------
 mail-snap1 2017-04-20 16:03:06 -0400 disk-snapshot

virsh # domblklist mail
Target Source
------------------------------------------------
vda /images2/mail.mail-snap1
hdb /home/bestpa/iso/ubuntu-14.04.5-server-amd64.iso

virsh # blockcommit mail vda --active --verbose --pivot
Block commit: [100 %]
Successfully pivoted
virsh # domblklist mail
Target Source
------------------------------------------------
vda /images2/mail.img
hdb /home/bestpa/iso/ubuntu-14.04.5-server-amd64.iso

virsh # snapshot-list mail
 Name Creation Time State
------------------------------------------------------------
 mail-snap1 2017-04-20 16:03:06 -0400 disk-snapshot

virsh # snapshot-delete mail mail-snap1 --metadata
Domain snapshot mail-snap1 deleted

virsh # snapshot-list mail
 Name Creation Time State
------------------------------------------------------------

virsh # domblklist mail
Target Source
------------------------------------------------
vda /images2/mail.img
hdb /home/bestpa/iso/ubuntu-14.04.5-server-amd64.iso

so as you see, i had no problems doing this. I wish I could delete snap files , but for some reason we can only use --meta , and must delete them at the filesystem level.

Patrick Best (bestpa) wrote :

at this point, i'll consider it a one-time occurence, but i'm thinking it may have happened other times, due to a jammed up backup script i see once in a while. I don't wish to pursue it further until it's too infuriating, then i'll just run a hostOS with a fresher version available in a different long-term distro.

Thanks for looking.

P

Thanks for reporting back,
there are a few races with block jobs that we look into atm which might as well affect this.
Unfortunately - as it always is with races - they are hard to trigger/confirm and I had hoped you might have found a case to reliably trigger.

If you happen to find any coincidence like the jammed backup that helps to somewhat reliably recreate please reach out.

Otherwise as I said anything of Yakkety and newer already have the fixes. And while this was not the purpose it was meant for, I've seen people choose [1] sometimes just to run the base LTS with a newer virt stack.

[1]: https://wiki.ubuntu.com/OpenStack/CloudArchive

Patrick Best (bestpa) wrote :

Happened again. Same VM, too. My backup script made it through 3 of these, one hda, and one vda as well. Then this:

-------------BEGIN backup for VM called mail
Sat Apr 22 00:07:51 EDT 2017
current snapshots mail - should be empty
 Name Creation Time State
------------------------------------------------------------

initial blklist and snapshot list mail
Target Source
------------------------------------------------
vda /images2/mail.img
hdb /home/bestpa/iso/ubuntu-14.04.5-server-amd64.iso

 Name Creation Time State
------------------------------------------------------------

block type is vda
image location is /images2/mail.img
creating snapshot for mail
Domain snapshot mail-snap1 created
current snapshots for mail
 Name Creation Time State
------------------------------------------------------------
 mail-snap1 2017-04-22 00:07:52 -0400 disk-snapshot

  performing FIRST TIME SPARSE rsync for mail
mail.img

sent 77.70G bytes received 35 bytes 75.11M bytes/sec
total size is 77.68G speedup is 1.00
I am done with the rsync.
current blklist mail
Target Source
------------------------------------------------
vda /images2/mail.mail-snap1
hdb /home/bestpa/iso/ubuntu-14.04.5-server-amd64.iso

blockcommit mail
Block commit: [100 %]error: failed to pivot job for disk vda
error: block copy still active: disk 'vda' not ready for pivot yet

Patrick Best (bestpa) wrote :

what's the proper way to keep running the base LTS with a newer virt stack? Do i need to point to a particular repo? Don't even know where to start with this...

Hi Patrick,
sorry to see you run into it again - but for now I consider it a great chance to find something that allows to reproduce and catch the issue.

Qou said:
"My backup script made it through 3 of these, one hda, and one vda as well. Then this"
It seems you backup script does:
1. check old snapshots
2. create a snapshot
3. copies off the now stable base image
4. blockcommits image and snapshot together
=> Is your comment saying that of these cycles 3 works (like one a day or such) but then on the fourth you triggered the bug again?

To improve the chance of recreation might I ask you a bunch of questions around your disk/system setup:
1. could you share your guest xml
2. could you share your backup script
3. could you elaborate on your base filesystem setup on the Host
4. is your system overall under a lot of CPU consumption - if so what kind of load?
5. is your system overall under a lot of Disk I/O - if so what kind of load?
6. is your guest that is failing under a lot of CPU consumption - if so what kind of load?
7. is your guest that is failing under a lot of Disk I/O - if so what kind of load?
8. Any changes coming to your mind that explain why this happens recently - is there any new HW/Software/Scripts or a changed workload in place now?

On your question about using Ubuntu Cloud Archive - as I mentioned people sometimes "mis-use" it for just a newer virtualization stack, but one has to keep in mind that this is not the original purpose of it.
If you want to give it a try to check if newer releases in there give you the stability you need for your use-case go to [1]. It explains the basics, as a TL;DR it is a special ppa [2]. Therfore the "use" is via adding that ppa, and then an apt update/upgrade will pull in the newer software packages. Given that you seem to be on a prod system you might want to test that ahead almost as you'd do with a major OS upgrade.

[1]: https://wiki.ubuntu.com/OpenStack/CloudArchive#The_Ubuntu_Cloud_Archive
[2]: https://help.launchpad.net/Packaging/PPA

Patrick Best (bestpa) wrote :
Download full text (16.1 KiB)

Happy to share what i can.

I should have mentioned that the backup script goes through all my VM's, and my ambiguous comment mentions that it went through 3 of the VM's before stalling on this, the fourth. System is low utilisation for RAM CPU and disk. Proliant G5 dual chip quad core (no HT) using P400 RAID CARD (transparent to system) with /images being on a RAID-1'd SATA spindle, and /images2 being on a RAID-1'd SATA SSD. While there are some i/o wait indicators in my hypervisor (very low though), there's no steal time recorded on any of my VM's.
 The ten or so VM's are low utilisation, administrative (my mail server, a zabbix server, a landscape server, etc). System is LTS with no tweaks, up to date on a regular basis.

/backup has been an NFS mount point and an external USB drive, witnessed failure condition on both.
the failing VM is on my SSD raid drive at /images2

Smart Array P400 in Slot 0 (Embedded)
   Bus Interface: PCI
   Slot: 0
   Serial Number: PA2240J9SU5360
   Cache Serial Number: PA2270D9SU21FK
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 1.18
   Rebuild Priority: Low
   Surface Scan Delay: 15 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Elevator Sort: Enabled
   Post Prompt Timeout: 0 secs
   Cache Board Present: True
   Cache Status: OK
   Cache Ratio: 100% Read / 0% Write
   Drive Write Cache: Disabled
   Total Cache Size: 512 MB
   Total Cache Memory Available: 464 MB
   No-Battery Write Cache: Disabled
   Battery/Capacitor Count: 0
   SATA NCQ Supported: False
   Number of Ports: 2 Internal only
   Driver Name: cciss
   Driver Version: 3.6.26
   PCI Address (Domain:Bus:Device.Function): 0000:06:00.0
   Host Serial Number: 2UX70501S6
   Sanitize Erase Supported: False

   Array: A
      Interface Type: SATA
      Unused Space: 0 MB (0.0%)
      Used Space: 931.5 GB (100.0%)
      Status: OK
      Array Type: Data

   Array: B
      Interface Type: SATA
      Unused Space: 0 MB (0.0%)
      Used Space: 447.1 GB (100.0%)
      Status: OK
      Array Type: Data

      logicaldrive 1 (465.7 GB, RAID 1, OK)
      logicaldrive 2 (223.5 GB, RAID 1, OK)
      physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SATA, 500 GB, OK)
      physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SATA, 500 GB, OK)
      physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SATA, 240.0 GB, OK)
      physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SATA, 250 GB, OK)

root@thewind:~#

root@thewind:~# top
top - 09:10:18 up 38 days, 14:09, 1 user, load average: 4.81, 4.53, 4.60
Tasks: 280 total, 1 running, 279 sleeping, 0 stopped, 0 zombie
%Cpu(s): 10.4 us, 13.1 sy, 0.0 ni, 75.5 id, 1.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 64943112 total, 394580 free, 25359596 used, 39188936 buff/cache
KiB Swap: 66056188 total, 64023212 free, 2032976 used. 37839620 avail Mem

root@thewind:~# cat /proc/cpuinfo | grep Xe
model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
model name : Intel(R) Xe...

Thanks Patrick,
I unfortunately haven't found anything in there which I'd have lacked in my try to recreate :-/

Btw: I think this should be $i and not mail, although if guests are all the same it doesn't matter
"IMAGE_DIR=`virsh domblklist mail" -> "IMAGE_DIR=`virsh domblklist $i"

What is the ammount of Data you sync each time - maybe this is far bigger in your than in my case.
Do you happen to know how much that is? I mean we only talk about the I/O that sums up while you do backup, how long is that - a few minutes, should not be too much right?

If you end up trying the newer libvirt let me know if that solves your issue at least.
There would also be a way in between - I have a ppa where we tried to backport some of the blockjob changes here:
https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2619

Chances are that this might affect you, yet OTOH since this is your production system I'd not use it as it is experimental. That said while I fail to reproduce, do you have a test env where this triggers as well and that you could use to try such experimental libvirt packages?

kritek (kritek) wrote :

I can reproduce this reliably on Server 16.04.3 LTS.

virsh version
Compiled against library: libvirt 1.3.1
Using library: libvirt 1.3.1
Using API: QEMU 1.3.1
Running hypervisor: QEMU 2.5.0

I have 4 VMs, the one that consistently fails is write heavy, its a carbon/graphite server.
Steps to reproduce:

virsh snapshot-create-as --domain centos7-graphite centos7-graphite-SNAP1 --diskspec vda,file=/var/lib/libvirt/images/centos7-graphite.img-SNAP1 --disk-only --atomic

sleep 300 (approximate time of rsync of base img to destination)

virsh blockcommit centos7-graphite vda --active --pivot --shallow --verbose
This is where it fails:

Block commit: [100 %]error: failed to pivot job for disk vda
error: block copy still active: disk 'vda' not ready for pivot yet

I end up having to shut down the vm, delete snapshot metadata, delete the disk attachment (the SNAP disk), and re-attach the original disk, then boot the vm again to restore the vm.

I increased the write load in my reproducer, but still can't trigger it here :-/

Did you have any chance to try the Cloud Archive versions mentioned in c#13 or (I know based on an older version, but you could force it in ) the ppa from c#15 ?

From the comments it seems this is your production environment, any chance you can set up an equivalent test environment to test the packages I mentioned without causing too much trouble for your main load?

Dominik Psenner (dpsenner) wrote :

We see this same issue in one of our production systems. The live backup scripts fail every few days and it is necessary to manually run a blockjob abort and a subsequent blockcommit usually passes. The backup scripts can be found here: https://github.com/dpsenner/libvirt-administration-tools

On <email address hidden> a dev suggested to upgrade libvirt to a newer version of libvirt. He indicated that virsh 1.3.1 is ancient and artful is actually already at 3.6. It would seem that they have addressed several issues and fixed several race conditions as indicated in earlier comments. Unfortunately there's no way to upgrade the production systems os to a newer ubuntu release just for gigs. It would however help if a newer version of libvirt was backported to 16.04 lts. Are there any dependency issues that prevent the backport of a newer libvirt?

Hi Dominik,
thanks for the links.

Yes 1.3.1 is ancient in a way of "as old as the 16.04 Ubuntu release" plus fix-backports as they are identifiable and qualify for the SRU process [1].

We have three Options here, but atm not all are feasible:

1. Backport the fix to Xenial
I beg your pardon, but for this particular case so far I unable to recreate to debug further on my side or to identify the fix it would need for the SRU - I provided a test ppa in c#15 to help with that, but I understand that it can be unwanted to shove that into production systems to test it.

2. Update the packages in Xenial to newer versions
The complexity of the virtualization stack (and all the potential regressions by just upgrading the versions for everyone out there) can be high, so just bumping all those in Xenial to the level e.g. we have in artful do not qualify for an SRU update.

3. Use a backport
Since the virtualization stack is one of those places where the "newer vs stable" issue comes up often, and even more so because newer openstack releases should go along a newer virt-stack there is a way out just as you assume. Via the Ubuntu Cloud Archive [2] you can get access to a backport of the most recent versions into the latest LTS. While it wasn't created for that I'd think this is the "backport" to 16.04 LTS you might be looking for.

[1]: https://wiki.ubuntu.com/StableReleaseUpdates
[2]: https://wiki.ubuntu.com/OpenStack/CloudArchive

Dominik Psenner (dpsenner) wrote :

Hi Christian,

thanks for your insights. The Ubuntu Cloud Archive is completely new to me. Am I right in the assumption that adding the ocata cloud-archive repository with 'sudo add-apt-repository cloud-archive:ocata' and a subsequent 'apt update && apt upgrade' would effectively upgrade libvirt to 2.5.0, respectively 3.5.0 if I would add cloud-archive:pike? What implications does this have when upgrading the production system to a newer LTS in roughly two years? Will a dist-upgrade even work out fine without actually bashing the production system or would you advice to plan the reinstallation of the hosting machine from scratch?

pike would be 3.6 not 3.5 - other than that yes.

In general upgrading from 16.04+UCA-Pike -> 18.04 shouldn't be very different to 17.10->18.04.
It is supposed to work.

As you know it is generally a good advise to go with test systems, backups, phased upgrades, ... as there always could be something - but in general yeah, there is noting blocking your usual upgrade path.

If you want test how that would be when 18.04 is out - on a test system take Trusty + UCA-Mitaka (which is on the level of 16.04) and then upgrade to Ubuntu 16.04.

Dominik Psenner (dpsenner) wrote :

Thanks for pointing out that pike would be 3.6. To me it is still hard to track which version what UCA release includes because those resources are actually quite hard to find.

Given the proposed solution [3], do you consider the usage of the Ubuntu Cloud Archive package repository to be the recommended way of having a more recent libvirt package version on a ubuntu lts? Or would you rather recommend to do a dist-upgrade to the next stable ubuntu release? As of today that would be 17.10. Of course that would require us to schedule major updates more frequently.

<personal_opinion>
On "static" systems I'm usually a slow upgrader on others systems I use daily cloud images right away.
So if you have a complex (custom/manual) setup I'd likely go with LTS+UCA.
It means less major changes to your system than doing a release upgrade every 6 months but would keep your virt stack up to date.
</personal_opinion>

Note: I'm not sure if for official support (read Ubuntu-advantage) there are special constraints around that.

falstaff (falstaff) wrote :

Observed the same issue on Ubuntu 16.04.4 with a Dell R440 and a RAID 5 consisting of 3 10k SAS disks. Using 16.04+UCA-Pike resolved the issue just fine.

I’ve given up on qcow and on ubuntu for my hypervisor needs. See ya!

On Thu, Apr 26, 2018 at 3:00 PM falstaff <email address hidden> wrote:

> Observed the same issue on Ubuntu 16.04.4 with a Dell R440 and a RAID 5
> consisting of 3 10k SAS disks. Using 16.04+UCA-Pike resolved the issue
> just fine.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1681839
>
> Title:
> libvirt - disk not ready for pivot yet
>
> Status in libvirt package in Ubuntu:
> Incomplete
>
> Bug description:
> root@thewind:/home/bestpa/scripts# virsh blockcommit mail vda --active
> --verbose --pivot
> Block commit: [100 %]error: failed to pivot job for disk vda
> error: block copy still active: disk 'vda' not ready for pivot yet
>
> found related bugfix at redhat... can i get 1.3.2 pushed into ubuntu
> 16.04 release?
>
> bestpa@thewind:~$ cat /etc/os-release
> NAME="Ubuntu"
> VERSION="16.04.2 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.2 LTS"
>
> bestpa@thewind:~$ libvirtd --version
> libvirtd (libvirt) 1.3.1
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1681839/+subscriptions
>

Thanks Falstaff, yeah we knew the fixes are in latter releases - they just were hard to backport keeping the general regression risk low (for all other users).
UCA as you used it is a valid way to get fixes ahead of time into last LTS, Thanks for verifying this again falstaff.

@bestpa - sad to hear :-/ but see ya another day on another case.

Changed in libvirt (Ubuntu Bionic):
status: Incomplete → Fix Released
Changed in libvirt (Ubuntu Artful):
status: New → Fix Released
Changed in libvirt (Ubuntu Xenial):
status: New → Won't Fix
summary: - libvirt - disk not ready for pivot yet
+ libvirt: blockcommit fails - disk not ready for pivot yet
description: updated
tags: added: sts
Changed in libvirt (Ubuntu Xenial):
status: Won't Fix → In Progress
importance: Undecided → Medium
assignee: nobody → Matthew Ruffell (mruffell)
Matthew Ruffell (mruffell) wrote :

Attached is the debdiff for xenial to fix this issue.

I was not sure if the patches in debian/patches should be placed in the debian/patches/ubuntu directory or not, so I left them outside. Feel free to move them if necessary.

Thanks++
Now that I had (new) steps to recreate I could work on those.
I wondered if an LVM is really strictly needed - which would also easen the initialization.
So I simplified it to.

$ apt install uvtool-libvirt
$ uvt-simplestreams-libvirt sync --source http://cloud-images.ubuntu.com/daily arch=amd64 label=daily release=xenial
$ uvt-kvm create xsnaptest arch=amd64 release=xenial label=daily
# depending on your apparmor config you might want to add something like this TEMPORARY to /etc/apparmor.d/abstractions/libvirt-qemu '/var/lib/uvtool/libvirt/images/* rwk,'
$ virsh snapshot-create-as --domain xsnaptest --diskspec vda,file=/var/lib/libvirt/images/xsnaptest-snapshot.qcow2,snapshot=external --disk-only --atomic

I started a loop which on one side dirtied the snapshot and on the other pivoted it.
# make dirty:
$ while /bin/true; do uvt-kvm ssh --insecure xsnaptest "dd if=/dev/urandom of=file.txt count=4096 bs=1M"; done
# snapshot, wait and pivot blockcommit
$ while virsh blockcommit xsnaptest vda --active --verbose --pivot --wait; do rm /var/lib/libvirt/images/xsnaptest-snapshot.qcow2; sleep 2s; virsh snapshot-create-as --domain xsnaptest --diskspec vda,file=/var/lib/libvirt/images/xsnaptest-snapshot.qcow2,snapshot=external --disk-only --atomic; sleep $(( RANDOM % 30 ))s; ll -h /var/lib/libvirt/images/xsnaptest-snapshot.qcow2; done

The snapshots to commit were about 200M to 6.9G, but none triggered the issue (about 40 tries in the loop).
So maybe really it only happens (much more likely) when the original backing to write back is an LVM.
Glad you found that for your test to become a reliable reproducer.

For the sake of seeing it trigger at least once I redeployed a mchine with Xenial to create LVMs there on a free /dev/sdb disk as your example needs it.
# create guest
$ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 label=daily release=xenial
$ uvt-kvm create xsnaptest arch=amd64 release=xenial label=daily

# create Volume
$ sudo pvcreate /dev/sdb
$ sudo vgcreate LVMpool_vg /dev/sdb
$ cat > lvmpool.xml <<EOF
<pool type="logical">
<name>LVMpool_vg</name>
<source>
<device path="/dev/sdb"/>
</source>
<target>
<path>/dev/LVMpool_vg</path>
</target>
</pool>
EOF
$ virsh pool-define lvmpool.xml
$ sudo vgcreate LVMpool_vg /dev/sdb
$ virsh pool-start LVMpool_vg
$ virsh vol-create-as LVMpool_vg lvvol1 15G

# Use volume in the guest
$ cat > lvmdisk.xml <<EOF
<disk type='block' device='disk'>
  <driver name='qemu' type='raw'/>
  <source dev='/dev/LVMpool_vg/lvvol1'/>
  <target dev='vdc' bus='virtio'/>
</disk>
EOF
$ virsh attach-device xsnaptest lvmdisk.xml

# Prep initial snapshot
virsh snapshot-create-as --domain xsnaptest --diskspec vdc,file=/var/lib/libvirt/images/xsnaptest-snapshot.qcow2,snapshot=external --disk-only --atomic

# Check snapshot being backed by lvmdisk
$ sudo qemu-img info /var/lib/libvirt/images/xsnaptest-snapshot.qcow2
image: /var/lib/libvirt/images/xsnaptest-snapshot.qcow2
file format: qcow2
virtual size: 15G (16106127360 bytes)
disk size: 196K
cluster_size: 65536
backing file: /dev/LVMpool_vg/lvvol1
backing file format: raw
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# dump I/O onto that device from inside the guest
$ while /bin/true; do uvt-kvm ssh --insecure xsnaptest "sudo dd if=/dev/urandom of=/dev/vdc count=8192 bs=1M"; done

# Iterate on it while the disk/snapshot keeps getting dirty
$ while virsh blockcommit xsnaptest vdc --active --verbose --pivot --wait; do sudo rm /var/lib/libvirt/images/xsnaptest-snapshot.qcow2; sleep 2s; virsh snapshot-create-as --domain xsnaptest --diskspec vdc,file=/var/lib/libvirt/images/xsnaptest-snapshot.qcow2,snapshot=external --disk-only --atomic; sleep $(( RANDOM % 30 + 20 ))s; sudo ls -laFh /var/lib/libvirt/images/xsnaptest-snapshot.qcow2; done

Finally I saw it in action
Block commit: [100 %]error: failed to pivot job for disk vdc
error: block copy still active: disk 'vdc' not ready for pivot yet

I retried and this was reproducible.

I upgraded to the PPA (more about that later) and ran my loop.
It reached 100% and then got slow (I/O while doing the pivot).
I needed to either wait quite a while or slow down the ongoing I/O a bit.

I had the loop running a 10 times and with the fix it never failed again (sized between 519M and 7.1G).

Hi Mathew,
thanks for picking up the torch again on this issue that affected quite some people, but but so far never reached the state to be really fixable.

## Verification ##
Most importantly was to get some repro for test and verification.
This was formerly a big issue, it affected plenty of people on the bug but we never reached a state to reliably reproduce it to verify its effects. I have read your update to the Description - thanks for adding all that.
I was giving the new repro steps outlined there a try as you have seen above I can confirm that they are good \o/

## Patches ##
You included 4 patches which exactly matched what I identified a while ago - thanks.
As I said back then in comment #6 they seemed rather backportable, thanks for doing that in the debdiff applied.
They miss proper dep3 tagging but I can fix this up ahead sponsoring for you.

## Regression Risk ##
I don't fully agree to the regression potential. You didn't say anything wrong, just from lessons-learned int he past blockjobs have turned out to be a source of unexpected and sometime strange regressions.
I agree that it should (tm) be safe, but we should be extra cautios as well.
Once it is in proposed we'd want more time there and probably should do some extra tests.

## Testing in proposed ##
1) I can provide some regression testing on my own, with a focus but not exclusive on migration. Mine isn't that heavy on snapshots where this certainly has the biggest chance of an impact.

2) @Matthew - if you could provide more tests (mabye SEG has some on top) for regressions in general that would be great.

3) @Matthew - we might consider going to e.g. the Openstack Team to run a test set on it as well just to be on the safe side. Will you ping and ask them or should I do so?

## PPA ##
The old PPA I had is long dead.
I opened a new one (like yours but with my minimal patch header updates and builds on all architectures) at:
=> https://launchpad.net/~paelzer/+archive/ubuntu/bug-1681839-blockjob-timeout-xenial/+packages

## Sponsoring ##
This LGTM as-is from the patches, but as mentioned we should do tests 1+2+3
The SRU team can already take a look at accepting it, we can either test from the PPA or against xenial-proposed once accepted. The only one that strictly has to be on -proposed is the actual verification of the case then.

Tagged and sponsored to Xenial-unapproved.
Now it is up to the SRU Team.

@Matthew - please try to get as much testing in place as possible.
As I said all but the final verification can be done either on the PPA in advance or once in -proposed. As it fits your time and setup.

I'll setup a test on my own as I mentioned ...

FYI: my pre-checks on the PPA build 1.3.1-1ubuntu10.29~ppa1 look good.

prep (x86_64) : Pass 25 F/S/N 0/0/0 - RC 0 (17 min 84141 lin)
migrate (x86_64) : Pass 232 F/S/N 0/12/0 - RC 0 (63 min 104302 lin)
cross (x86_64) : Pass 64 F/S/N 0/1/0 - RC 0 (70 min 94458 lin)
misc (x86_64) : Pass 76 F/S/N 0/0/0 - RC 0 (18 min 29606 lin)

Hello Patrick, or anyone else affected,

Accepted libvirt into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/1.3.1-1ubuntu10.29 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in libvirt (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
Download full text (10.0 KiB)

I thank you for all your work.

I have since moved away from this block architecture and am no longer able
to verify with an existing configuration.

On Fri, Nov 22, 2019 at 6:36 AM Timo Aaltonen <email address hidden> wrote:

> Hello Patrick, or anyone else affected,
>
> Accepted libvirt into xenial-proposed. The package will build now and be
> available at
> https://launchpad.net/ubuntu/+source/libvirt/1.3.1-1ubuntu10.29 in a few
> hours, and then in the -proposed repository.
>
> Please help us by testing this new package. See
> https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
> to enable and use -proposed. Your feedback will aid us getting this
> update out to other Ubuntu users.
>
> If this package fixes the bug for you, please add a comment to this bug,
> mentioning the version of the package you tested and change the tag from
> verification-needed-xenial to verification-done-xenial. If it does not
> fix the bug for you, please add a comment stating that, and change the
> tag to verification-failed-xenial. In either case, without details of
> your testing we will not be able to proceed.
>
> Further information regarding the verification process can be found at
> https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
> advance for helping!
>
> N.B. The updated package will be released to -updates after the bug(s)
> fixed by this package have been verified and the package has been in
> -proposed for a minimum of 7 days.
>
> ** Changed in: libvirt (Ubuntu Xenial)
> Status: In Progress => Fix Committed
>
> ** Tags added: verification-needed verification-needed-xenial
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1681839
>
> Title:
> libvirt: blockcommit fails - disk not ready for pivot yet
>
> Status in libvirt package in Ubuntu:
> Fix Released
> Status in libvirt source package in Xenial:
> Fix Committed
> Status in libvirt source package in Artful:
> Fix Released
> Status in libvirt source package in Bionic:
> Fix Released
>
> Bug description:
> [Impact]
>
> On xenial, if you manually invoke blockcommit through virsh in
> libvirt, the command immediately fails with blockcommit supposedly
> being 100%, and that the disk is not ready for pivot yet:
>
> root@xenial-apparmor:~# virsh blockcommit snapvm vda --active --verbose
> --pivot --wait
> Block commit: [100 %]
> error: failed to pivot job for disk vda
> error: block copy still active: disk 'vda' not ready for pivot yet
>
> However, if you look at the status of the active blockjob, we see that
> the blockcommit is still active in the background:
>
> root@xenial-apparmor:~# virsh blockjob snapvm vda --info
> Active Block Commit: [0 %]
>
> root@xenial-apparmor:~# virsh blockjob snapvm vda --info
> Active Block Commit: [2 %]
>
> root@xenial-apparmor:~# virsh blockjob snapvm vda --info
> Active Block Commit: [6 %]
>
> This happens until it reaches 100%, where it gets stuck. To un-stick
> things, you must then manually --abort the blockjob.
>
> root@xenial-apparmor:~# virsh blockjob snapvm vda --info
> Active Block C...

@mruffel / @fabiomartins - would you be so kind doing the SRU verification on this one?

Matthew Ruffell (mruffell) wrote :

The following is verification performed by Fabio in a lab:

- Tested with the original libvirt to make sure I was able to reproduce:

root@ubuntu:~# apt-cache policy libvirt-bin
libvirt-bin:
Installed: 1.3.1-1ubuntu10.27
Candidate: 1.3.1-1ubuntu10.27
Version table:
*** 1.3.1-1ubuntu10.27 500
500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
100 /var/lib/dpkg/status
1.3.1-1ubuntu10 500
500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages

root@ubuntu:~# virsh blockcommit testvm vda --active --verbose --pivot --wait
Block commit: [100 %]error: failed to pivot job for disk vda
error: block copy still active: disk 'vda' not ready for pivot yet

- Upgraded to proposed and tested again, and problem is gone:

root@ubuntu:~# apt-cache policy libvirt-bin
libvirt-bin:
Installed: 1.3.1-1ubuntu10.29
Candidate: 1.3.1-1ubuntu10.29
Version table:
*** 1.3.1-1ubuntu10.29 500
500 http://archive.ubuntu.com/ubuntu xenial-proposed/main amd64 Packages
100 /var/lib/dpkg/status
1.3.1-1ubuntu10.27 500
500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
1.3.1-1ubuntu10 500
500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages

root@ubuntu:~# virsh blockcommit testvm vda --active --verbose --pivot --wait
Block commit: [100 %]
Successfully pivoted

End of test by Fabio.

The package is looking good. We have also asked the customer to install the test package and verify that it works under their workload. We might just wait for their confirmation before marking this as verified, in order to give this a little more time to soak in -proposed.

Will update again soon.

Matthew Ruffell (mruffell) wrote :

The customer has been unresponsive in testing the package in -proposed in their environment, so we will move on with verification.

In my previous comment on 2019-11-27, we showed that libvirt 1.3.1-1ubuntu10.29 can successfully execute a blockcommit on a lvm backed volume with virsh.

This still holds today, and this bug has had ample time to soak in -proposed, making me happy to mark this as verified.

tags: added: verification-done-xenial
removed: verification-needed verification-needed-xenial

The verification of the Stable Release Update for libvirt has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 1.3.1-1ubuntu10.29

---------------
libvirt (1.3.1-1ubuntu10.29) xenial; urgency=medium

  * debian/patches/lp1681839-*.patch: Fix block commit timeout
    races, and ensure that once commit has reached 100%, timeouts
    no longer apply. (LP: #1681839)

 -- Matthew Ruffell <email address hidden> Thu, 31 Oct 2019 10:52:41 +1300

Changed in libvirt (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.