qcow2 image corruption on non-extent filesystems (ext3)
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | linux (Ubuntu) |
High
|
Chris J Arges | ||
| | Trusty |
High
|
Chris J Arges | ||
| | Vivid |
High
|
Unassigned | ||
| | linux-lts-utopic (Ubuntu) |
Undecided
|
Unassigned | ||
| | Trusty |
High
|
Unassigned | ||
Bug Description
[Impact]
Users of non-extent ext4 filesystems (ext4 ^extents, or ext3 w/ CONFIG_
[Test Case]
1) Setup ext4 ^extents, or ext3 filesystem with CONFIG_
2) Create and install a VM using a qcow2 image and store the file on the filesystem
3) Snapshot the image with qemu-img
4) Boot the image and do some disk operations (fio,etc)
5) Shutdown image and delete snapshot
6) Repeat 3-5 until VM no longer boots due to image corruption, generally this takes a few iterations depending on disk operations.
[Fix]
commit 6f30b7e37a8239f
This has been discussed upstream here:
http://
A temporary fix would be to disable punch_hole for non-extent filesystem. This is how the normal ext3 module handles this and it is up to userspace to handle the failure. I've run this with the test case and was able to run for 600 iterations over 3 days where most failures occur within the first 2-20 iterations.
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5653fa4..e14cdfe 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3367,6 +3367,10 @@ int ext4_punch_
offset, loff_t length)
unsigned int credits;
int ret = 0;
+ /* EXTENTS required */
+ if (!(ext4_
+ return -EOPNOTSUPP;
+
if (!S_ISREG(
return -EOPNOTSUPP;
--
The security team uses a tool (http://
qemu-kvm 2.0~git-
$ cat /proc/version_
Ubuntu 3.13.0-
$ qemu-img info ./forhallyn-
image: ./forhallyn-
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 4.0G
cluster_size: 65536
Format specific information:
compat: 0.10
Steps to reproduce:
1. create a virtual machine. For a simplified reproducer, I used virt-manager with:
OS type: Linux
Version: Ubuntu 14.04
Memory: 768
CPUs: 1
Select managed or existing (Browse, new volume)
Create a new storage volume:
qcow2
Max capacity: 8192
Allocation: 0
Advanced:
NAT
kvm
x86_64
firmware: default
2. install a VM. I used trusty-
3. Backup the image file somewhere since steps 1 and 2 take a while :)
4. Execute the following commands which are based on what our uvt tool does:
$ virsh snapshot-create-as forhallyn-
$ virsh snapshot-current --name forhallyn-
pristine
$ virsh start forhallyn-
$ virsh snapshot-list forhallyn-
in guest:
sudo apt-get update
sudo apt-get dist-upgrade
780 upgraded...
shutdown -h now
$ virsh snapshot-delete forhallyn-
$ virsh snapshot-create-as forhallyn-
$ virsh start forhallyn-
The idea behind the above is to create a new VM with a pristine snapshot that we could revert later if we wanted. Instead, we boot the VM, run apt-get dist-upgrade, cleanly shutdown and then remove the old 'pristine' snapshot and create a new 'pristine' snapshot. The intention is to update the VM and the pristine snapshot so that when we boot the next time, we boot from the updated VM and can revert back to the updated VM.
After running 'virsh start' after doing snapshot-
This does not seem to be related to the machine type used. Ie, pc-i440fx-1.5, pc-i440fx-1.7 and pc-i440fx-2.0 all fail with qemu 2.0, pc-i440fx-1.5 and pc-i440fx-1.7 fail with qemu 1.7 and pc-i440fx-1.5 works fine with qemu 1.5.
Only workaround I know if is to downgrade qemu to 1.5.0+dfsg-
| summary: |
- qcow2 image corruption in trusty (qemu 1.7) + qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) |
| Changed in qemu (Ubuntu): | |
| importance: | Undecided → High |
| Serge Hallyn (serge-hallyn) wrote : Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #1 |
| Jamie Strandboge (jdstrand) wrote : | #2 |
I don't, I just used the options that our uvt command uses. I downgraded to saucy's qemu in the meantime so I can do my work. Do you need me to try some new test?
I'm not sure it makes any difference, but note that I am using a trusty host and kernel.
| Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1292234] Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #3 |
Quoting Jamie Strandboge (<email address hidden>):
> I don't, I just used the options that our uvt command uses. I downgraded
> to saucy's qemu in the meantime so I can do my work. Do you need me to
> try some new test?
sigh, maybe.
I will keep trying.
> I'm not sure it makes any difference, but note that I am using a trusty
> host and kernel.
Right, that's what I'm using.
Have others on your team (who are not on the same thinkpad model :) seen
this as well? Have you seen it on different types of machines? Does
it happen more often if the machine is already working hard?
I wonder if I can reproduce it manually with qemu-img and qemu-nbd.
| Jamie Strandboge (jdstrand) wrote : Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #4 |
Did you try with the image on https:/
Ie:
$ virsh snapshot-create-as forhallyn-
$ virsh snapshot-current --name forhallyn-
pristine
$ virsh start forhallyn-
$ virsh snapshot-list forhallyn-
in guest:
sudo apt-get update
sudo apt-get dist-upgrade
780 upgraded...
shutdown -h now
$ virsh snapshot-delete forhallyn-
$ virsh snapshot-create-as forhallyn-
$ virsh start forhallyn-
| Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1292234] Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #5 |
Quoting Jamie Strandboge (<email address hidden>):
> Did you try with the image on
> https:/
Yup! I wget that, create the snapshot, upgrade, remove and create
the snapshot, then start the vm. The upgrades take a long time
so I've only tested it 3 times so far. How likely is the failure?
Should I just keep going?
| Serge Hallyn (serge-hallyn) wrote : Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #6 |
I've not yet been able to definitively reproduce this. (On a bad nested qemu setup i had some issues which i think were unrelated). I've tried on a trusty laptop, and on a faster machine with a trusty container on a trusty kernel. Starting with the images you posted for me each time.
| Seth Arnold (seth-arnold) wrote : | #7 |
I believe I just tripped this bug; I compressed some qcow2 images using this:
for f in sec-{lucid,
do echo $f ;
qemu-img convert -s pristine -p -f qcow2 -O qcow2 $f.qcow2 reclaimed.qcow2 ;
mv reclaimed.qcow2 $f.qcow2 ;
virsh snapshot-delete $f --snapshotname pristine ;
uvt snapshot $f ;
done
The 'uvt snapshot' command makes a snapshot named 'pristine'.
AMD64 guests:
sec-lucid-amd64 booted without trouble.
sec-precise-amd64 reports:
Booting from Hard Disk...
Boot failed: not a booktable disk
No bootable device.
sec-quantal-amd64 reports:
Booting from Hard Disk...
error; file `/boot/
grub rescue>
sec-saucy-amd64 reports:
Booting from Hard Disk...
error: file `/boot/
Entering rescue mode...
grub rescue>
sec-trusty-amd64 reports:
Booting from Hard Disk...
Boot failed: not a bootable disk
No bootable device.
i386 guests:
sec-lucid-i386, sec-precise-i386, sec-quantal-i386, sec-saucy-i386 all booted fine.
sec-trusty-i386 reports:
Booting from Hard Disk...
Boot failed: not a bootable disk
No bootable device.
I use the i386 VMs significantly less often than the amd64 VMs.
| Launchpad Janitor (janitor) wrote : | #8 |
Status changed to 'Confirmed' because the bug affects multiple users.
| Changed in qemu (Ubuntu): | |
| status: | New → Confirmed |
| Jamie Strandboge (jdstrand) wrote : | #9 |
FYI, I periodically use and follow the same procedure that Seth described (in fact, I did it yesterday) and had no problems with qemu 1.5.0+dfsg-
| description: | updated |
| Changed in qemu (Ubuntu): | |
| assignee: | nobody → Serge Hallyn (serge-hallyn) |
| Serge Hallyn (serge-hallyn) wrote : | #10 |
I have a clean install of trusty on an intel laptop. I added the following upstart job in the forhallyn-
#######
description "update and shutdown"
author "Serge Hallyn <email address hidden>"
start on runlevel [2345]
script
sleep 5s
apt-get update
DEBIAN_
sleep 5s
shutdown -h now
end script
#######
Then on the host I run this script:
#######
#!/bin/bash
cp orig-with-
virsh snapshot-create-as forhallyn-
virsh start forhallyn-
sleep 20s
while [ 1 ]; do
virsh list | grep -q forhallyn || break
sleep 20s
done
# guest has updated. check the image file and fs here
qemu-img check forhallyn-
if [ $? -ne 0 ]; then
echo "image check failed after shutdown"
exit 1
fi
qemu-nbd -c /dev/nbd0 forhallyn-
fsck -a /dev/nbd0p1
if [ $? -ne 0 ]; then
echo "fs bad after shutdown"
qemu-nbd -d /dev/nbd0
exit 1
fi
qemu-nbd -d /dev/nbd0
# now tweak the snapshots
virsh snapshot-delete forhallyn-
virsh snapshot-create-as forhallyn-
# and check the image file and fs again
qemu-img check forhallyn-
if [ $? -ne 0 ]; then
echo "image check failed after snapshot remove/create"
exit 1
fi
qemu-nbd -c /dev/nbd0 forhallyn-
fsck -a /dev/nbd0p1
if [ $? -ne 0 ]; then
echo "fs bad after snapshot remove/create"
qemu-nbd -d /dev/nbd0
exit 1
fi
qemu-nbd -d /dev/nbd0
# all seems well
exit 0
#######
I'll run that in a loop and see if it fails after 10 tries.
If you see anything there that I am NOT doing which would help to reproduce,
please let me know.
| tags: | added: qcow2 |
| Serge Hallyn (serge-hallyn) wrote : | #11 |
As far as I know, everyone who has experienced this has been using a
thinkpad. I've first experienced this myself last week, on a new
thinkpad running utopic.
Two curious things I noticed, beside this being a thinkpad:
1. I could not start the VM with the bad image at all. Until I rebooted.
Then the image was fine, and fsck-clean. This suggests a possible problem
with the page cache on the host.
2. I then disabled KSM. I have not seen this problem since then, however I
also have not hit a vm quite as hard yet. Will have to see whether a series
of package builds manages to make this happen again with KSM disabled.
| Jamie Strandboge (jdstrand) wrote : | #12 |
On utopic amd64, I tried the new qemu 2.1 packages and disabled KSM. They seemed to be ok for a while, but after using 'uvt update' today (which under the hood does what is decribed in the bug description), I lost 6 VMs to this bug. A reboot did not solve it. I've downgraded to saucy again. Unfortunately, the saucy packages are no longer supported and have stopped getting security updates. This is getting rather dire for me....
| Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1292234] Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #13 |
Hi Jamie,
just to make sure, did you permanently disable ksm? Does
cat /sys/kernel/
still show 0?
I've so far never seen a case where a reboot did not fix the issue,
nor have I seen an issue (other than suspending the host sometimes
causing the VM to hang so that I have to destroy it) with ksm
disabled.
I had hoped to do some large parallel upgrade tests this week, but
network at linuxcon is not up to the task (even with apt-cacher-ng!)
If I can find a better room I'll see about trying there.
| Jamie Strandboge (jdstrand) wrote : Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #14 |
I disabled KSM by setting /etc/default/
KSM_ENABLED=0
and did 'sudo restart qemu-kvm'. I also rebooted before seeing the problem. Since then, I downgraded to saucy's qemu-kvm which reset KSM_ENABLED=1. I didn't specifically check /sys/kernel/
| Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1292234] Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #15 |
Ok - thanks Jamie.
| Ryan Harper (raharper) wrote : Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #16 |
For the reproducers, something worth trying is to use to try is external snapshots (instead of internal which the snapshot-create-as does without flags).
instead run: snapshot-create-as --disk-only
which will basically do qemu-img create -b your_original_qcow2 -f qcow2 pristine
And store the snapshot delta in a separate file.
| Ryan Harper (raharper) wrote : | #17 |
I've been running the scripts from comment #10. I have two VMs each running simultaneously; I've completed 24 hours of this sequence, about 50 total cycles with zero errors in the qcow2 images.
We're missing something; possibly hardware specific?
Host machine is an Intel NUC on Trusty.
Linux kriek 3.13.0-34-generic #60-Ubuntu SMP Wed Aug 13 15:45:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Ill see about increasing concurrency next.
| Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1292234] Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #18 |
There are 69 commits to block/qcow* between 1.5.0 and 1.7.0.
I have compiled binaries of qemu-system-x86_64 and qemu-img
at each of those commits and pushed them to
http://
through
http://
Note that binaries.0 is the *latest* commit.
So to bisect with these you could start with binaries.34, then
if that shows corruption, try binaries.51, or if it does not,
try binaries.17 etc. 6 steps should get us to a single commit.
It's not certain that one of these commits caused the
regression, but it seems a reasonable place to start.
| Ryan Harper (raharper) wrote : Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #19 |
I'm also starting work on updating uvt to use external snapshots instead; this would be an alternative to use while chasing down the bug in internal snapshots.
| Jamie Strandboge (jdstrand) wrote : | #20 |
I tried to reproduce this many different ways with 2.1+dfsg-3ubuntu3 over the weekend and could not trigger the issue (with ksm enabled too). I don't know what version I had in comment #12. 2.1+dfsg-3ubuntu2 is plausible based on the date of the comment and the publication of this version, though I can't guarantee it wasn't 2.1+dfsg-2ubuntu2 or even 2.1+dfsg-2ubuntu1 though I did specifically mention I used 2.1. I don't see anything in the changes that jumps out that qcow2 corruption bugs were fixed since my comment, so I'm worried I just haven't been able to reproduce....
| Jamie Strandboge (jdstrand) wrote : | #21 |
I just had this happen to me with 2.1+dfsg-3ubuntu3 on utopic. I had a VM I had been using for a days, then did a 'uvt stop -rf ...' followed by 'uvt update sec-utopic-amd64' and I was dropped to a grub rescue. :\
I'll downgrade again and regenerate the VM.
| Jamie Strandboge (jdstrand) wrote : | #22 |
This happened again with an important VM. I still don't have a reproducer for testing the bisect packages.... :(
| Serge Hallyn (serge-hallyn) wrote : | #23 |
On my main server (3.13.0-32-generic with precise userspace) I installed a trusty container with ext3 (LVM) backing store. There I installed uvt and created 4 VMs, 2 precise amd64 and 2 precise i386. I several times did:
ubuntu@uvttest:~$ cat list
p-precise-
p-precise-
q-precise-
q-precise-
ubuntu@uvttest:~$ for n in `cat list`; do uvt start -fr $n; done
ubuntu@uvttest:~$ for n in `cat list`; do tmux splitw -p 25 -t $TMUX_PANE "expect vmupgrade.expect $n"; done
where vmupgrade.expect is:
=======
#!/usr/bin/expect
set container [lrange $argv 0 0]
spawn ssh $container
#expect "assword:"
#send -- "ubuntu\r"
expect "$container:~$"
send -- "export DEBIAN_
send -- "sudo sed -i 's/never/lts/' /etc/update-
expect "assword for ubuntu:"
send -- "ubuntu\r"
expect "$container:~$"
send -- "sudo apt-get update\r"
expect "$container:~$"
send -- "sudo do-release-upgrade -f DistUpgradeView
set timeout 11000
expect "$container:~$"
send -- "sudo reboot\r"
=======
Then I find /lib -name xxx; sudo reboot; find /lib -name xxx; and look
through dmesg for errors, then do
ubuntu@uvttest:~$ for n in `cat list`; do uvt stop -fr $n; done
Alas I've seen no corruption yet. The goal here isn't just to reproduce
it, but to do so reliably enough to be able to bisect - this isn't it :(
| Jamie Strandboge (jdstrand) wrote : | #24 |
FYI, I was able to reproduce this last night and uploaded forhallyn-
| Chris J Arges (arges) wrote : | #25 |
$ od -x -N 72 forhallyn-
refcount_
0000000 4651 fb49 0000 0200 0000 0000 0000 0000
0000020 0000 0000 0000 1000 0000 0200 0000 0000
0000040 0000 0000 0000 1000 0000 0000 0300 0000
0000060 0000 0000 0100 0000 0000 0100 0000 0100
0000100 0000 0000 0500 0000
nb_snapshots = 0000 0100
snapshots_offset = 0000 0000 0500 0000
$ od -x -N 72 forhallyn-
0000000 4651 fb49 0000 0200 0000 0000 0000 0000
0000020 0000 0000 0000 1000 0000 0200 0000 0000
0000040 0000 0000 0000 1000 0000 0000 0300 0000
0000060 0000 0000 0100 0000 0000 0100 0000 0000
0000100 0000 0000 0000 0000
nb_snapshots = 0000 0000
snapshots_offset = 0000 0000 0000 0000
Looking at just the QCowHeader (and not de-scrambling BE format), I see the following differences; however I think this looks 'ok', I'll need to examine the rest of the file.
| Changed in qemu (Ubuntu): | |
| assignee: | Serge Hallyn (serge-hallyn) → Chris J Arges (arges) |
| Chris J Arges (arges) wrote : | #26 |
Ok I think I can reproduce this; after running some disk operations (bonnie++ and split a 100MB file), if I shutdown and try to boot the VM the disk cannot be booted and I'm presented with the grub menu.
However this reproducer is not yet 100% reliable. Next week I'll work on bisecting it down after testing latest upstream.
| Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1292234] Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #27 |
Awesome - thank you Chris.
| Ryan Harper (raharper) wrote : Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #28 |
can we confirm what filesystems and options are enabled when reproducing (ie, ext4 +extent mapping)[1] ? Bug 1368815 sounds very much like this. If the reproducing systems have ext4 extents mapping enabled, one could create an ext4 fs without extent mapping[2] and see if this still reproduces.
If it is related to the ext4 extents, the rate of memory pressure and speed of the underlying device would determine whether or not the file ends up being corrupt which might explain the difficulty of reproducing.
1. % sudo tune2fs -l /dev/disk/
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
2. mke2fs -t ext4 -O ^extent /dev/<device>
| Chris J Arges (arges) wrote : | #29 |
Ryan,
The host's root filesystem is ext3/LVM (per Jamie's original configuration):
sudo tune2fs -l /dev/disk/
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
| Jamie Strandboge (jdstrand) wrote : | #30 |
Actually, for me it is just ext3 without LVM.
$ sudo tune2fs -l /dev/sda3 | grep -i features
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
| Chris J Arges (arges) wrote : | #31 |
Attached is a reproducer for this issue, here is what needs to be done to setup the reproducer:
1) The host machine's filesystem needs to be ext3
2) Install a VM (via virsh) and use a qcow2 disk
3) Ensure you can ssh without a password and the VM has bonnie++ installed
4) Adjust the variables in the script before running
5) Run the script a couple of times
While this doesn't reproduce 100% of the time, I can usually get a failure within 1-3 trials. However executing this on a ext4 host filesystem I've been unable to reproduce this issue.
| Chris J Arges (arges) wrote : | #32 |
Also I've been able to reproduce this with the latest master in qemu, and even with the latest daily 3.18-rcX kernel on the host.
| Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1292234] Re: qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) | #33 |
Excellent!
Any chance you can start bisecting with http://
Serge,
So I was able to just compile my own qemu and test with that.
I did attempt a reverse bisect, and was able to reproduce as early as v1.1 and also reproduce on master HEAD.
v1.0 was inconclusive because qcow2 format I made with the newer binary seemed to be incompatible with v1.0; however from Jamies testing this seems to be a working version; so I'd say somewhere between v1.0.0, v1.1.0 lies the original change that enabled this issue. As I've been unable to reproduce this without virsh, reverse bisecting and using older qemu versions is a bit challenging as machine types change, features virsh wants to use aren't available, etc.
Another interesting thing I tested today was I was able to reproduce with ext4 with extents disabled; maybe that gives more clues. Just to make sure I wasn't crazy, mkfs'd the partition to vanilla ext4 and iterated for most of the afternoon with no failures.
My next steps are going to be enabling verbose output for qcow2, looking more deeply into what gets corrupted in the file, and turning on host filesystem debugging.
--chris
| Changed in qemu (Ubuntu): | |
| status: | Confirmed → In Progress |
| Chris J Arges (arges) wrote : | #35 |
FWIW, just re-reproduced this with latest upstream kernel / qemu / fresh qcow2 image.
| summary: |
- qcow2 image corruption in trusty (qemu 1.7 and 2.0 candidate) + qcow2 image corruption on non-extent filesystems (ext3) |
| no longer affects: | qemu |
| Changed in linux (Ubuntu): | |
| assignee: | nobody → Chris J Arges (arges) |
| importance: | Undecided → High |
| status: | New → In Progress |
| Changed in qemu (Ubuntu): | |
| status: | In Progress → Invalid |
| assignee: | Chris J Arges (arges) → nobody |
| importance: | High → Undecided |
| description: | updated |
| Chris J Arges (arges) wrote : | #36 |
Sent e-mail upstream about this issue: http://
| Josep M. Perez (josep-m-perez) wrote : | #37 |
Apparently this bug is also present in Debian. In my case the corrupted image was a windows one. When I run qemu-img check over it it will complain about lots of clusters, and if I pass it the repair flag, then it will end up crashing with the following message:
$ qemu-img check -r all windows.img
Repairing cluster 0 refcount=0 reference=1
Repairing cluster 1 refcount=0 reference=1
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
Repairing cluster 2 refcount=0 reference=1
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
Repairing cluster 3 refcount=0 reference=1
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
Repairing cluster 4 refcount=0 reference=1
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
Repairing cluster 5 refcount=0 reference=1
qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
Repairing cluster 6 refcount=0 reference=1
...
Repairing OFLAG_COPIED data cluster: l2_entry=
Repairing OFLAG_COPIED data cluster: l2_entry=
Repairing OFLAG_COPIED data cluster: l2_entry=
The following inconsistencies were found and repaired:
0 leaked clusters
97850 corruptions
Double checking the fixed image now...
[1] 27716 segmentation fault (core dumped) qemu-img check -r all windows.img
Has anyone else tried this over a copy of the corrupted image?
| description: | updated |
| description: | updated |
| description: | updated |
| Chris J Arges (arges) wrote : | #38 |
@josep-m-perez
Yes, this is an upstream bug. So it affects anyone using the right filesystem and CONFIGs. Once we fix this upstream, then it will be submitted as a stable kernel update and make its way into all stable kernels as applicable.
| Launchpad Janitor (janitor) wrote : | #39 |
This bug was fixed in the package linux - 3.18.0-13.14
---------------
linux (3.18.0-13.14) vivid; urgency=low
[ Andy Whitcroft ]
* hyper-v -- fix comment handing in /etc/network/
- LP: #1413020
[ Chris J Arges ]
* [Config] Add ibmvfc to d-i
- LP: #1416001
* SAUCE: ext4: disable ext4_punch_hole for indirect filesystems
- LP: #1292234
[ Leann Ogasawara ]
* rebase to v3.18.5
* [Config] CONFIG_
* Release Tracking Bug
- LP: #1417475
-- Leann Ogasawara <email address hidden> Thu, 05 Feb 2015 09:58:20 +0200
| Changed in linux (Ubuntu): | |
| status: | In Progress → Fix Released |
| Chris J Arges (arges) wrote : | #40 |
Note there currently is a patch upstream:
https:/
This fixes the original bug correctly without having to disable ext4_punch_hole for indirect filesystems. Once this lands in Linus' tree, I'll file an SRU to get this fixed across the board.
| Jamie Strandboge (jdstrand) wrote : | #41 |
Woohoo! *Huge* thanks. This was a tricky one :)
| description: | updated |
| Chris J Arges (arges) wrote : | #42 |
Sent email to upstream stable to apply this bug to affected kernels.
| Changed in linux (Ubuntu): | |
| status: | Fix Released → Confirmed |
| Changed in linux (Ubuntu): | |
| status: | Confirmed → Fix Committed |
| Launchpad Janitor (janitor) wrote : | #43 |
This bug was fixed in the package linux - 3.19.0-12.12
---------------
linux (3.19.0-12.12) vivid; urgency=low
[ Andy Whitcroft ]
* [Packaging] do_common_tools should always be on
* [Packaging] Provides: virtualbox-
- LP: #1434579
[ Chris J Arges ]
* Revert "SAUCE: ext4: disable ext4_punch_hole for indirect filesystems"
- LP: #1292234
[ Leann Ogasawara ]
* Release Tracking Bug
- LP: #1439803
[ Timo Aaltonen ]
* SAUCE: i915_bpo: Provide a backport driver for Skylake & Cherryview
graphics
- LP: #1420774
* SAUCE: i915_bpo: Update intel_ips.h file location
- LP: #1420774
* SAUCE: i915_bpo: Only support Skylake and Cherryview with the backport
driver
- LP: #1420774
* SAUCE: i915_bpo: Rename the backport driver to i915_bpo
- LP: #1420774
* i915_bpo: [Config] Enable CONFIG_
- LP: #1420774
* SAUCE: i915_bpo: Add i915_bpo_*() calls for ubuntu/i915
- LP: #1420774
* SAUCE: i915_bpo: Revert "drm/i915: remove unused
power_
- LP: #1420774
* SAUCE: i915_bpo: Add i915_bpo specific power well calls
- LP: #1420774
* SAUCE: Backport I915_PARAM_
- LP: #1420774
* SAUCE: Partial backport of drm/i915: Add ioctl to set per-context
parameters
- LP: #1420774
* SAUCE: drm/i915: Specify bsd rings through exec flag
- LP: #1420774
* SAUCE: drm/i915: add I915_PARAM_HAS_BSD2 to i915_getparam
- LP: #1420774
* SAUCE: drm/i915: add component support
- LP: #1420774
* SAUCE: drm/i915: Add tiled framebuffer modifiers
- LP: #1420774
* SAUCE: Backport new displayable tiling formats
- LP: #1420774
* SAUCE: Backport drm_crtc_
- LP: #1420774
* SAUCE: drm/i915: Add I915_PARAM_REVISION
- LP: #1420774
* SAUCE: drm/i915: Export total subslice and EU counts
- LP: #1420774
* SAUCE: i915_bpo: Revert drm/mm: Support 4 GiB and larger ranges
- LP: #1420774
[ Upstream Kernel Changes ]
* drm/i915/skl: Split the SKL PCI ids by GT
- LP: #1420774
* drm: Reorganize probed mode validation
- LP: #1420774
* drm: Perform basic sanity checks on probed modes
- LP: #1420774
* drm: Do basic sanity checks for user modes
- LP: #1420774
* drm/atomic-helper: Export both plane and modeset check helpers
- LP: #1420774
* drm/atomic-helper: Again check modeset *before* plane states
- LP: #1420774
* drm/atomic: Introduce state->obj backpointers
- LP: #1420774
* drm: allow property validation for refcnted props
- LP: #1420774
* drm: store property instead of id in obj attachment
- LP: #1420774
* drm: get rid of direct property value access
- LP: #1420774
* drm: add atomic_set_property wrappers
- LP: #1420774
* drm: tweak getconnector locking
- LP: #1420774
* drm: add atomic_get_property
- LP: #1420774
* drm: Remove unneeded braces for single statement blocks
- LP: #1420774
* drm: refactor getproperties/
- LP: #1420774
* drm: add atomic properties
- LP: #1420774
* drm/atomic: atomic_check functions
- LP: #1420774
* drm: s...
| Changed in linux (Ubuntu): | |
| status: | Fix Committed → Fix Released |
| Seth Arnold (seth-arnold) wrote : | #44 |
Is this still open against the 14.04.1 LTS kernel?
Thanks
| Chris J Arges (arges) wrote : | #45 |
The fix is the following:
$ git describe --contains 6f30b7e37a8239f
v4.0-rc1~1^2
I thought this was going to be queued up for stable, but doesn't look like that happened.
If this still affects you in 3.13, 3.16, I can backport this patch. Let me know.
| Seth Arnold (seth-arnold) wrote : | #46 |
Chris, please do, I just recreated the issue with the "uvt update -rf" recipe from earlier; four of six VMs couldn't boot to a login: prompt, presumably from this bug.
Linux hunt 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
(I know, it misses this week's update. I can't keep up on this treadmill...)
Thanks
| no longer affects: | qemu (Ubuntu) |
| no longer affects: | qemu (Ubuntu Trusty) |
| no longer affects: | qemu (Ubuntu Vivid) |
| Changed in linux-lts-utopic (Ubuntu): | |
| status: | New → Invalid |
| Changed in linux (Ubuntu Trusty): | |
| assignee: | nobody → Chris J Arges (arges) |
| Changed in linux (Ubuntu Vivid): | |
| assignee: | nobody → Chris J Arges (arges) |
| Changed in linux-lts-utopic (Ubuntu Trusty): | |
| assignee: | nobody → Chris J Arges (arges) |
| Changed in linux (Ubuntu Trusty): | |
| importance: | Undecided → High |
| Changed in linux (Ubuntu Vivid): | |
| importance: | Undecided → High |
| Changed in linux-lts-utopic (Ubuntu Trusty): | |
| importance: | Undecided → High |
| Changed in linux (Ubuntu Trusty): | |
| status: | New → In Progress |
| Changed in linux (Ubuntu Vivid): | |
| status: | New → In Progress |
| Changed in linux-lts-utopic (Ubuntu Trusty): | |
| status: | New → In Progress |
| Chris J Arges (arges) wrote : | #47 |
Ok verified that this fix is in 3.16, 3.19+ kernels. Sent Trusty backport to ML.
| Changed in linux (Ubuntu Vivid): | |
| status: | In Progress → Fix Released |
| Changed in linux-lts-utopic (Ubuntu Trusty): | |
| status: | In Progress → Fix Released |
| Changed in linux (Ubuntu Vivid): | |
| assignee: | Chris J Arges (arges) → nobody |
| Changed in linux-lts-utopic (Ubuntu Trusty): | |
| assignee: | Chris J Arges (arges) → nobody |
| Changed in linux (Ubuntu Trusty): | |
| status: | In Progress → Fix Committed |
| Luis Henriques (henrix) wrote : | #48 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
| tags: | added: verification-needed-trusty |
| tags: |
added: verification-failed removed: verification-needed-trusty |
| Seth Arnold (seth-arnold) wrote : | #49 |
I was unable to test this specific modification due to significant regressions in the proposed kernel: https:/
| tags: |
added: verification-failed-trusty removed: verification-failed |
| Seth Arnold (seth-arnold) wrote : | #50 |
Henrix pointed out that I also needed the linux-image-extras package. I'm now able to test this, and will report back when I've had a chance to create the VM images.
Thanks
| tags: |
added: verification-needed-trusty removed: verification-failed-trusty |
| Seth Arnold (seth-arnold) wrote : | #51 |
I've tested several dozen VM snapshot and revert operations; previously, I'd have expected all my VMs to be dead by this time. This update makes libvirt / qemu / with qcow2 images usable again for me. Thanks!
| tags: |
added: verification-done-trusty removed: verification-needed-trusty |
| Launchpad Janitor (janitor) wrote : | #52 |
This bug was fixed in the package linux - 3.13.0-70.113
---------------
linux (3.13.0-70.113) trusty; urgency=low
[ Luis Henriques ]
* Release Tracking Bug
- LP: #1516733
[ Upstream Kernel Changes ]
* arm64: errata: use KBUILD_
- LP: #1516682
linux (3.13.0-69.112) trusty; urgency=low
[ Luis Henriques ]
* Release Tracking Bug
- LP: #1514858
[ Joseph Salisbury ]
* SAUCE: storvsc: use small sg_tablesize on x86
- LP: #1495983
[ Luis Henriques ]
* [Config] updateconfigs after 3.13.11-ckt28 and 3.13.11-ckt29 stable
updates
[ Upstream Kernel Changes ]
* ext4: fix indirect punch hole corruption
- LP: #1292234
* x86/hyperv: Mark the Hyper-V TSC as unstable
- LP: #1498206
* namei: permit linking with CAP_FOWNER in userns
- LP: #1498162
* iwlwifi: pci: add a few more PCI subvendor IDs for the 7265 series
- LP: #1510616
* Drivers: hv: vmbus: Increase the limit on the number of pfns we can
handle
- LP: #1495983
* sctp: fix race on protocol/netns initialization
- LP: #1514832
* [media] v4l: omap3isp: Fix sub-device power management code
- LP: #1514832
* [media] rc-core: fix remove uevent generation
- LP: #1514832
* xtensa: fix threadptr reload on return to userspace
- LP: #1514832
* ARM: OMAP2+: DRA7: clockdomain: change l4per2_7xx_clkdm to SW_WKUP
- LP: #1514832
* mac80211: enable assoc check for mesh interfaces
- LP: #1514832
* PCI: Add dev_flags bit to access VPD through function 0
- LP: #1514832
* PCI: Add VPD function 0 quirk for Intel Ethernet devices
- LP: #1514832
* usb: dwc3: ep0: Fix mem corruption on OUT transfers of more than 512
bytes
- LP: #1514832
* serial: 8250_pci: Add support for Pericom PI7C9X795[1248]
- LP: #1514832
* KVM: MMU: fix validation of mmio page fault
- LP: #1514832
* auxdisplay: ks0108: fix refcount
- LP: #1514832
* devres: fix devres_get()
- LP: #1514832
* iio: adis16400: Fix adis16448 gyroscope scale
- LP: #1514832
* iio: Add inverse unit conversion macros
- LP: #1514832
* iio: adis16480: Fix scale factors
- LP: #1514832
* iio: industrialio-
- LP: #1514832
* iio: event: Remove negative error code from iio_event_poll
- LP: #1514832
* NFSv4: don't set SETATTR for O_RDONLY|O_EXCL
- LP: #1514832
* unshare: Unsharing a thread does not require unsharing a vm
- LP: #1514832
* ASoC: adav80x: Remove .read_flag_mask setting from
adav80x_
- LP: #1514832
* drivers: usb :fsl: Implement Workaround for USB Erratum A007792
- LP: #1514832
* drivers: usb: fsl: Workaround for USB erratum-A005275
- LP: #1514832
* serial: 8250: don't bind to SMSC IrCC IR port
- LP: #1514832
* staging: comedi: adl_pci7x3x: fix digital output on PCI-7230
- LP: #1514832
* blk-mq: fix buffer overflow when reading sysfs file of 'pending'
- LP: #1514832
* xtensa: fix kernel register spilling
- LP: #1514832
* NFS: nfs_set_pgio_error sometimes misses errors
- LP: #1514832
* NFS: Fix a NULL pointer dereference of migration...
| Changed in linux (Ubuntu Trusty): | |
| status: | Fix Committed → Fix Released |
| status: | Fix Committed → Fix Released |


Have not yet been able to reproduce this. I'm considering adding an upstart job to your image which updates and shuts down, so I can test this in a loop.
Do you know whether (a) the --children option to snapshot delete or (b) using the same name for the new snapshot as the one you just delete are crucial to reproducing this?