On VM restart: Could not open 'poppy.qcow2': Could not read snapshots: File too large

Bug #1354167 reported by Todd
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

I'm unable to restart a VM. virt-manager is giving me:

Error starting domain: internal error: process exited while connecting to monitor: qemu-system-x86_64: -drive file=/var/lib/libvirt/images/poppy.qcow2,if=none,id=drive-virtio-disk0,format=qcow2: could not open disk image /var/lib/libvirt/images/poppy.qcow2: Could not read snapshots: File too large

From the command line trying to check the image also gives me:
qemu-img check poppy.qcow2
qemu-img: Could not open 'poppy.qcow2': Could not read snapshots: File too large

This bug appears with both the default install of qemu for ubuntu 14.04:
qemu-img version 2.0.0, Copyright (c) 2004-2008 Fabrice Bellard

And the latest version.
qemu-img version 2.1.50, Copyright (c) 2004-2008 Fabrice Bellard

Host:
Dual E5-2650 v2 @ 2.60GHz
32GB Memory
4TB Disk space (2.1TB Free)

Host OS: Ubuntu 14.04.1 LTS 64bit

Guest:
Ubuntu 14.04 64bit
Storage Size: 500gb

Revision history for this message
Gary Roberts (wangel) wrote :

I too am getting his bug.

Same error message Todd gets word for word.

Even going to the command prompt yields the same issue.

I have QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.2)

Thank you

Revision history for this message
Todd (toddfx) wrote :

Just putting this up here for anyone unlucky enough to hit this. This isn't a fix but it may rescue your borked qcow2 image.

Download and compile the 1.7.2 version of qemu but don't install it. For example create a directory called qemutemp, download qemu-1.7.2.tar.bz2, untar it and do ./configure and the make (but don't do make install).

Then you should be able to convert your image from qcow2 to raw with:

 ./qemu-img convert /var/lib/libvirt/images/yourimagename.qcow2 /var/lib/libvirt/images/yourimagenamefixed.raw

then
virsh edit yourvmname

and change yourimagename.qcow2 to yourimagenamefixed.raw and change the driver type from qcow2 to raw.

Cross your fingers and boot the copy.

Note: This bug spooked management and now I'm mandated to only use vmware. And I was soooo close to escape too.

Revision history for this message
Tristão van de Haar Pinto (tristaopinto) wrote :

@Todd, thank you for posting the work-around. I'm experiencing the same problem. I'll try your method in an attempt to repair my image files.

Revision history for this message
Nenad Cuturic (nenad-cuturic) wrote :

Todd, thank you for your post and advise. It helped me to fix the same problem with one of the virtual disks that became corrupted after ubuntu host release upgrade to 14.0.1 LTS
I tried first to install qemu 2.1.2 from sources but without any improvement.
qemu 1.7.2 could convert to raw img format and my VM is up and running now.
Seems like your work around is the only available on the Internet so thanks a lot :)

Revision history for this message
Kevin Wolf (kwolf-redhat) wrote :

Can one of you give some more information about the qcow2 image that can't be
opened any more? What is new is a sanity check of the snapshot metadata (name,
date, etc.) size. It is currently limited to 64 MB, which should be plenty of
space for any real use cases.

I suspect that your qcow2 got somehow corrupted (before the update; the update
only made the problem apparent) and this is not real snapshot data.

If you have the qemu sources, you can run 'tests/qemu-iotests/qcow2.py
/tmp/test.qcow2 dump-header' (with your image file instead of test.qcow2,
of course) and check the values of nb_snapshots and snapshot_offset. If you
could post a hexdump of a few kilobytes starting at $snapshot_offset, that
could be helpful as well.

Revision history for this message
Nenad Cuturic (nenad-cuturic) wrote :
Download full text (4.8 KiB)

What I could see the snapshot offset points somewhere in apache log file.
Related to the discovered bash vulnerability I was rebooting VM last week as well as prior to release upgrade (all VM's were shut down during the upgrade, though if I remember it well after the upgrade process and prior to reboot I discovered that 2 of VM's were restarted but not this particular one).
Anyhow here is the requested info:

header:

magic 0x514649fb
version 2
backing_file_offset 0x0
backing_file_size 0x0
cluster_bits 16
size 1052771352576
crypt_method 0
l1_size 1961
l1_table_offset 0x30000
refcount_table_offset 0x10000
refcount_table_clusters 1
nb_snapshots 1
snapshot_offset 0x66090000
incompatible_features 0x0
compatible_features 0x0
autoclear_features 0x0
refcount_order 4
header_length 72

hexdump -s 0x66090000 -n 2048 xyz.qcow2
66090000 3231 2e37 2e30 2e30 2031 202d 2d20 2020
66090010 305b 2f31 6553 2f70 3032 3331 313a 3a32
66090020 3833 303a 2036 302b 3030 5d30 2220 4f50
66090030 5453 2f20 6f73 726c 732f 6c65 6365 2074
66090040 5448 5054 312f 312e 2022 3032 2030 3336
66090050 3733 0a20 3231 2e37 2e30 2e30 2031 202d
66090060 2d20 2020 305b 2f31 6553 2f70 3032 3331
66090070 313a 3a32 3833 303a 2036 302b 3030 5d30
66090080 2220 4547 2054 732f 6c6f 2f72 6461 696d
66090090 2f6e 6966 656c 3f2f 6966 656c 733d 6863
660900a0 6d65 2e61 6d78 206c 5448 5054 312f 312e
660900b0 2022 3032 2030 3031 3931 2038 310a 3732
660900c0 302e 302e 312e 2d20 2020 202d 5b20 3130
660900d0 532f 7065 322f 3130 3a33 3231 333a 3a38
660900e0 3730 2b20 3030 3030 205d 5022 534f 2054
660900f0 732f 6c6f 2f72 6573 656c 7463 4820 5454
66090100 2f50 2e31 2231 3220 3030 3120 3632 3637
66090110 0a20 3231 2e37 2e30 2e30 2031 202d 2d20
66090120 2020 305b 2f31 6553 2f70 3032 3331 313a
66090130 3a32 3833 303a 2037 302b 3030 5d30 2220
66090140 4f50 5453 2f20 6f73 726c 732f 6c65 6365
66090150 2074 5448 5054 312f 312e 2022 3032 2030
66090160 3534 2037 310a 3732 302e 302e 312e 2d20
66090170 2020 202d 5b20 3130 532f 7065 322f 3130
66090180 3a33 3231 333a 3a38 3730 2b20 3030 3030
66090190 205d 5022 534f 2054 732f 6c6f 2f72 6573
660901a0 656c 7463 4820 5454 2f50 2e31 2231 3220
660901b0 3030 3320 3738 0a20 3231 2e37 2e30 2e30
660901c0 2031 202d 2d20 2020 305b 2f31 6553 2f70
660901d0 3032 3331 313a 3a32 3833 303a 2037 302b
660901e0 3030 5d30 2220 4f50 5453 2f20 6f73 726c
660901f0 732f 6c65 6365 2074 5448 5054 312f 312e
66090200 2022 3032 2030 3534 2037 310a 3732 302e
66090210 302e 312e 2d20 2020 202d 5b20 3130 532f
66090220 7065 322f 3130 3a33 3231 333a 3a38 3730
66090230 2b20 3030 3030 205d 5022 534f 2054 732f
66090240 6c6f 2f72 6573 656c 7463 4820 5454 2f50
66090250 2e31 2231 3220 3030 3320 3436 0a20 3231
66090260 2e37 2e30 2e30 2031 202d 2d20 2020 305b
66090270 2f31 6553 2f70 3032 3331 313a 3a32 3833
66090280 303a 2037 302b 3030 5d30 2220 4f50 5453
66090290 2f20 6f73 726c 732f 6c65 6365 2074 5448
660902a0 5054 312f 312e 2022 3032 2030 3534 2037
660902b0 310a 3732 302e 302e 312e 2d20 2020 202d
660902c0 5b20 3130 5...

Read more...

Revision history for this message
Rob Schultz (rob-schultzfamily) wrote :

I had the exact same issue with a VM after upgrading the host from 12.04 to 14.04.

Thank you Todd for the workaround. It would have been more work than I cared for to reassemble that machine (even if it was just a test machine).

I'm not sure what the status of this bug is? Is this something that is already fixed but was an exisiting issue from a previous version?

I've attached the qemu-img I compiled. It may help someone else recover a bit quicker - getting everything in place to compile the binary wasted a good couple of hours as I had to modify several dependancies, install additional packages, etc. But of course, the safer method is to build yourself.

FWIW I had to install:
 sudo apt-get install autoconf automake autopoint autotools-dev dh-autoreconf libltdl-dev libtool m4 libglib2.0-0-dbg libglib2.0-bin libglib2.0-dev libpcre3-dev libpcrecpp0

( I think I could have just done autoconf rather than dh-autoreconf)

After the installation of libglib2.0-0-dbg remember to re-run ./configure

If you compile yourself you can kill the make process after the build of the qemu-img binary.

Revision history for this message
Peter Tonoli (peter+ubuntuone) wrote :

Thanks Todd - the recommended fix did work. Thanks Rob, I downloaded and used your qemu-img binary, and it worked perfectly :-)

Revision history for this message
D.Schäfer (trash4you) wrote :

We had also the same issue with a vm after install the host complety new. We changed from ubuntu 12.04 to Proxmox 3.3.

Unfortunately the qemu-img version compiled on Trusty Thar don't work under Proxmox 3.3 so i compiled it under Proxmox :)

For all people the run into the same issue he ist the file.

Revision history for this message
Nelle (nelle-u) wrote :

I never have a problem when using virsh snapshot-create or delete. Problem started with one VM when I use qemu-img snapshot. Thank you Todd for work-around. It's helped me too. VM working again.

Revision history for this message
Steve Fox (drfickle2) wrote :

I'll just add that the work-around worked for me too, but in addition I used the Trusty qemu-img to convert it back from RAW to QCOW2 and it was fine.

Revision history for this message
Kevin Wolf (kwolf-redhat) wrote :

I just had the chance to help debugging another case of this, and it turned out that qemu-img snapshot was used on the image while the VM is running. This was likely the root cause of the corruption. So if you're reading this, it's probably already too late, but I want to spell it out anyway:

WARNING: Never open the same disk image from two processes at the same time, except if both are read-only! If your VM is running, use the qemu monitor of that VM (or libvirt functionality, which will do the same internally) to do operations on the image. Use qemu-img only for images of shut down VMs.

If you reported in this bug that you are affected, can you please reply and specify whether you ever used qemu-img (except for read-only subcommands) on your image while the VM was running?

Revision history for this message
Steve Fox (drfickle2) wrote :

Yes, we used qemu-img snapshot on the image while it was running. Did not read the man page close enough.

Revision history for this message
Henrik Andersson (henrik-4e) wrote :

I stumble upon the same problem. Made the same mistake taking a snapshot using qemu-img on a running virtual machine.

It was easily fixed using qemu-img v1.7.2 binaries as mentioned earlier:

  ./qemu-img convert bad.qcow2 fixed.qcow2

Revision history for this message
Nenad Cuturic (nenad-cuturic) wrote :

I've got the same problem again while release-upgrading to 14.04.2 but this time I was careful to shutdown vm's and disable autostart prior to release-upgrade to prevent anything happening in background. Nevertheless after the upgrade one vm is not starting.
I'll repeat the procedure about converting the image and report the result.

Revision history for this message
Nenad Cuturic (nenad-cuturic) wrote :

I used Rob Schultz's binary to convert the image-files and it is working now.

Revision history for this message
Thomas Huth (th-huth) wrote :

Sounds this was not really a bug, but rather a file corruption problem by using qemu-img on a file that was opened by QEMU already? So could we close this ticket nowadays?

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.