kvm-84: virtio on qcow2 broken (regression)

Bug #404394 reported by nentis on 2009-07-25
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Hardy Backports
Undecided
Unassigned
Intrepid Ibex Backports
Undecided
Unassigned
kvm (Ubuntu)
High
Dustin Kirkland 
Nominated for Lucid by Dustin Kirkland 
Hardy
High
Dustin Kirkland 
Intrepid
High
Dustin Kirkland 
Jaunty
High
Dustin Kirkland 
Karmic
High
Dustin Kirkland 

Bug Description

Binary package hint: kvm

Since the latest update to 1:84+dfsg-0ubuntu12, any existing or new kvm host using a qcow2 file encounters disk errors and eventual lockup inside the guest.

The errors inside the guest (root is /dev/vda1, other filesystems are lvm on top of /dev/vda2). This can happen at bootup if there is enough IO, or boot just fine and occur when IO picks up.
----
REISERFS: abort (device dm3): Journal write error in flush_commit_list
REISERFS: Aborting journal for filesystem on dm-3
request_module: runaway loop modprobe binfmt-0000
Buffer I/O error on device dm-3, logical block 1063056
Buffer I/O error on device dm-3, logical block 1063072
... ad infinitum
----

Converting the qcow2 to a raw and back to qcow2 does not work, nor does using the newer version of qemu-img (v0.10.0) from the 'qemu' package to convert the raw to qcow2. Problems occur on 9.04 and 8.04 - both are x86_64 and have all the latest updates applied.

KVM Host:
Linux ronto 2.6.28-13-server #45-Ubuntu SMP Tue Jun 30 22:56:18 UTC 2009 x86_64 GNU/Linux

KVM Guests:
Linux halfrun 2.6.24-24-server #1 SMP Tue Jun 30 20:24:57 UTC 2009 x86_64 GNU/Linux
Linux zack 2.6.28-13-server #45-Ubuntu SMP Tue Jun 30 22:56:18 UTC 2009 x86_64 GNU/Linux

LVM setup:
 * /kvm is an LV on the Host where the guest qcow2 files are stored
 * Each qcow2 file has two partitions, vda1 (root) and vda2 (a PV for LVs inside the Guests of which var/tmp/swap are LVs)

============
SRU Verification
 1) Follow *precisely* the instructions in Comment #6 by Nathan. This reproduces the problem for me every time on the *ubuntu12.3 version of the package.
 2) Upgrade to the proposed version of this package, and retry those instructions. The problem does not occur.
============

Scobo (mk-binary-artworks) wrote :

The same here with 1:84+dfsg-0ubuntu12.1~rc5ppa1 (Backport) from https://edge.launchpad.net/~ubuntu-virt/+archive/ppa.

This was now problem with the old 1:62+dfsg-0ubuntu8.2 0 before the update.

KVM Host: 2.6.24-23-server #1 SMP Wed Apr 1 22:14:30 UTC 2009 x86_64 GNU/Linux
KVM Guest: 2.6.24-24-server #1 SMP Fri Jul 24 22:44:54 UTC 2009 x86_64 GNU/Linux
both ubuntu 8.04 LTS, only the host with the backport-repository for kvm.

Any suggestions what to do?

Scobo (mk-binary-artworks) wrote :

There's already a discussion at http://patchwork.kernel.org/patch/6615/

Scobo (mk-binary-artworks) wrote :

I found a workaround: Convert your image to RAW and use this in the meantime. This is not the best solution, but at least it prevents from data loss...

Dustin Kirkland  (kirkland) wrote :

See:
 * https://edge.launchpad.net/ubuntu/+source/kvm/+changelog

I'm pretty sure I've applied all the qcow2 corruption patches availble.

:-Dustin

Dustin Kirkland  (kirkland) wrote :

Can you update to the latest packages, create a brand new vm, and reproduce the error?

I'm 99% sure we've fixed this. You may be seeing errors from your vm disk if it was created/used with the buggy version.

:-Dustin

Changed in kvm (Ubuntu):
importance: Undecided → High
status: New → Incomplete

I ran into this bug as well, fortunately on a VM with no important data on it. KVM 84+dfsg-0ubuntu11 is rock-solid for me in production, but 0ubuntu12.3 causes data corruption nearly immediately on VMs that use virtio and qcow2.

Here is a from-scratch reproduction procedure, which I have confirmed works:

(1) Install Ubuntu 9.04 Server on a 64-bit system with VMX or SVM support. I'm using a run-of-the-mill Intel 64-bit desktop PC. Accept the defaults for everything in the installer.

(2) Run "aptitude update; aptitude full-upgrade; aptitude install kvm; reboot"

(3) Get a copy of ubuntu-9.04-server-amd64.iso.

(4) Run "kvm-img create -f qcow2 test.img 2G ; kvm -m 512 -net none -curses -drive file=test.img,if=virtio,boot=on -cdrom ubuntu-9.04-server-amd64.iso -boot d"

(5) The installer uses framebuffer which doesn't work in Curses console mode. You'll end up at a screen that says "640 x 480 Graphic mode". Working blind: press enter, then escape. Wait five seconds for the Curses input layer to deliver the escape, then press enter again. The screen might flash briefly. Now type "install fb=false" and press enter. This should start booting the installer kernel in non-framebuffer mode.

(6) Go through the installer, selecting all the defaults again. Do keyboard layout selection manually because the auto-detection doesn't work in Curses mode.

(7) After formatting the filesystem, the installer will report, "The attempt to mount a file system with type ext3 in LVM VG ubuntu, LV root at / failed." In dmesg you will find a ton of "end_request: I/O error, dev vda, sector 3670331" messages.

For fun, also try installing Ubuntu 8.04.3 Server as the guest. The same problem occurs.

Using a raw disk image instead of qcow2 doesn't trigger this problem. Using an emulated IDE controller instead of virtio doesn't trigger this problem. KVM 84+dfsg-0ubuntu11 doesn't trigger this problem.

Dustin Kirkland  (kirkland) wrote :

Okay, here's what I'll do...

I'll sync all qemu-0.10.6's block-qcow2.c to kvm-84, build that, and put it in a PPA for you. Please stay tuned, and test that when it lands. If that fixes your issues, we can probably get an SRU for this.

Thanks,
:-Dustin

Changed in kvm (Ubuntu):
assignee: nobody → Dustin Kirkland (kirkland)
status: Incomplete → Triaged
Dustin Kirkland  (kirkland) wrote :

Okay, that was easier said than done. I spent several hours trying to get 0.10.6's qcow2 ported to kvm-84. No luck.

I'm not going to have much time to look at this. If you can find the git commit id's of the qcow2 fixes, let me know.

:-Dustin

Dustin Kirkland  (kirkland) wrote :

Marking this against Jaunty, as there are no known qcow2 corruption issues in Karmic.

If I can isolate a fix, I will try to SRU it to Jaunty.

:-Dustin

Changed in kvm (Ubuntu):
status: Triaged → Fix Released
Changed in kvm (Ubuntu Jaunty):
status: New → Triaged
importance: Undecided → High
nentis (krisa-opensourcery) wrote :

I'm offering a $200 bounty if this is fixed and pushed out to Jaunty updates in the next week (completed by 2009-10-15, 14:38 PDT).

FYI... Updates have to bake in jaunty-proposed for at least a
week--sometimes more--before getting pushed to jaunty-updates.

:-Dustin

Changed in kvm (Ubuntu Jaunty):
status: Triaged → In Progress

I have confirmed the issue as described by Nathan on kvm in Jaunty.

I have also confirmed that this problem does *not* exist in Karmic.

:-Dustin

Dustin Kirkland  (kirkland) wrote :

The problem I see in the guest with the ubuntu12.3 version looks like the attached screenshot.

Dustin Kirkland  (kirkland) wrote :
Dustin Kirkland  (kirkland) wrote :

Okay, this problem is caused by the upstream git commit that we cherry picked:
  qcow2-corruption-Fix-alloc_cluster_link_l2-Kevin-W.patch
  641636d19e3d8eeb8fac31e20641eaf33befd6e7

It was fixed upstream by
  Fix-cluster-freeing-in-qcow2.patch
  d4d698f020e50333d6eae48ce323752613b5c3ea

We need to cherry pick and apply that commit to our Jaunty package, as well as the kvm-84 that lives in hardy-backports and intrepid-backports.

I'm attaching the debdiff here, and uploading to jaunty-proposed.

:-Dustin

Dustin Kirkland  (kirkland) wrote :

Attaching a patch against the hardy-backports version of kvm-84, which is also affected.

Changed in kvm (Ubuntu Karmic):
status: New → Fix Released
Changed in kvm (Ubuntu Hardy):
status: New → In Progress
Changed in kvm (Ubuntu Intrepid):
status: New → In Progress
Dustin Kirkland  (kirkland) wrote :

Attaching a patch against the intrepid-backports version of kvm-84, which is also affected.

Dustin Kirkland  (kirkland) wrote :

And the Jaunty patch for jaunty-proposed.

description: updated
Changed in kvm (Ubuntu Hardy):
importance: Undecided → High
Changed in kvm (Ubuntu Karmic):
importance: Undecided → High
Changed in kvm (Ubuntu Intrepid):
importance: Undecided → High
Changed in kvm (Ubuntu Karmic):
assignee: nobody → Dustin Kirkland (kirkland)
Changed in kvm (Ubuntu Intrepid):
assignee: nobody → Dustin Kirkland (kirkland)
Changed in kvm (Ubuntu Jaunty):
assignee: nobody → Dustin Kirkland (kirkland)
Dustin Kirkland  (kirkland) wrote :

Uploaded to:
 * jaunty-proposed
 * intrepid-backports
 * hardy-backports

Awaiting archive approval. Please test this ASAP.

Changed in kvm (Ubuntu Hardy):
assignee: nobody → Dustin Kirkland (kirkland)
summary: - qcow2 corruption regression
+ kvm-84: virtio on qcow2 broken (regression)
Changed in kvm (Ubuntu Jaunty):
status: In Progress → Fix Committed
Changed in kvm (Ubuntu Hardy):
status: In Progress → Fix Committed
Changed in kvm (Ubuntu Intrepid):
status: In Progress → Fix Committed
Mathias Gug (mathiaz) wrote :

Tested the hardy-backport patch and it fixes my issue. I can now run successful installs of karmic under a qcow2+virtio guest while the current version of kvm in hardy-backport fails.

Accepted kvm into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Martin Pitt (pitti) wrote :

Seems this bug isn't relevant to hardy/intrepid, just for backports. Closing SRU tasks and opening backport tasks, which first need an ack from the backports team.

Changed in kvm (Ubuntu Hardy):
status: Fix Committed → Invalid
Changed in kvm (Ubuntu Intrepid):
status: Fix Committed → Invalid

Right, thanks, Martin.

Can I get an ack from the backports team?

nentis (krisa-opensourcery) wrote :

Do not deploy this updated version.

I installed kvm_84+dfsg-0ubuntu12.4_amd64.deb and it has made the problem worse. Now RAW disk images are having disk I/O errors and I'm still getting I/O errors in qcow2 images.

root@kvm-host:~# file /kvm/guest/disk0.raw
/kvm/guest/disk0.raw: x86 boot sector; partition 1: ID=0x83, active, starthead 1, startsector 63, 19502847 sectors; partition 2: ID=0x5, starthead 254, startsector 19502910, 963900 sectors

From guest dmesg:

[ 415.850777] end_request: I/O error, dev vda, sector 16466887
[ 415.850789] Buffer I/O error on device vda1, logical block 2058353
[ 415.851459] lost page write due to I/O error on vda1
[ 415.851463] Buffer I/O error on device vda1, logical block 2058354
[ 415.852167] lost page write due to I/O error on vda1

Mathias Gug (mathiaz) wrote :

On Fri, Nov 06, 2009 at 01:34:17AM -0000, nentis wrote:
> Do not deploy this updated version.
>
> I installed kvm_84+dfsg-0ubuntu12.4_amd64.deb and it has made the
> problem worse. Now RAW disk images are having disk I/O errors and I'm
> still getting I/O errors in qcow2 images.

Hm - I've been able to successfully perform an Karmic -server install using raw files as the backend.

>
> >From guest dmesg:
>

Could you give more information about the guest configuration?

Which release? Which kernel version is it running (uname -a)?

How are you running the guest? Via kvm or libvirt?

If you're using kvm could you provide the command line you're using?

If you're using libvirt could you provide the xml definition of the guest
(virsh dumpxml guest-name) as well as the kvm command line launched by libvirt
(ps -ef)?

> [ 415.850777] end_request: I/O error, dev vda, sector 16466887
> [ 415.850789] Buffer I/O error on device vda1, logical block 2058353
> [ 415.851459] lost page write due to I/O error on vda1
> [ 415.851463] Buffer I/O error on device vda1, logical block 2058354
> [ 415.852167] lost page write due to I/O error on vda1
>

Are these new raw files or existing raw files?

--
Mathias Gug
Ubuntu Developer http://www.ubuntu.com

nentis (krisa-opensourcery) wrote :

I backed out from 12.4 to 11. I will try this again to check my sanity.

Dustin Kirkland  (kirkland) wrote :

I, too, tested raw images as well, and saw no problems.

Mathias Gug (mathiaz) wrote :

On Fri, Nov 06, 2009 at 02:31:24AM -0000, nentis wrote:
> I backed out from 12.4 to 11. I will try this again to check my sanity.
>

Isn't 12.3 the latest version available from jaunty-updates?

--
Mathias Gug
Ubuntu Developer http://www.ubuntu.com

Dustin Kirkland  (kirkland) wrote :

Yes, Mathias.

We're interested in changes in behavior between 12.3 and 12.4.

Scott Kitterman (kitterman) wrote :

Ack from ubuntu-backporters for Intrepid.

Changed in intrepid-backports:
status: New → Fix Released
Scott Kitterman (kitterman) wrote :

Ack for Hardy too.

Changed in hardy-backports:
status: New → Fix Released
Scott Kitterman (kitterman) wrote :

Note that the first Lucid autosync just happened and backports are low build priority, so it will take a while before these get built.

Martin Pitt (pitti) wrote :

Marking v-failed for now to flag the jaunty update as not to copy for now.

Karmic update seems to be fine?

tags: added: verification-failed
removed: verification-needed
Dustin Kirkland  (kirkland) wrote :

Martin-

Karmic is not affected by this. And in my testing, the Jaunty upload is good.

Dustin Kirkland  (kirkland) wrote :

nentis-

Okay, you are blocking this update from being published to Jaunty-Updates.

Please, with urgency, provide a very specific test case that shows
your failure against kvm 1:84+dfsg-0ubuntu12.4, while succeeding
against kvm 1:84+dfsg-0ubuntu12.3.

nentis (krisa-opensourcery) wrote :

Dustin,

Under 11, guests are working. I am going to move to 12.3 and I should see raw continue to work, and qcow2 break. I will then move on to 12.4. I have taken backup copies such that I'm working with clean instances under each version.

I have to run these tests in semi-maintenance windows. I would like to have results back to this ticket this afternoon (PST).

nentis (krisa-opensourcery) wrote :

Hey Dustin,

I wasn't able to reproduce the raw corruption, and qcow2 is working under 12.4. What I used to generate disk IO (with clean guest images for each test):

dd if=/dev/zero of=foo bs=1M count=1000

This generated failures under 12.3, and no errors under 12.4. Nice work! If you email me a paypal address or physical address to mail a check to, I will honor the bounty I put up (krisa AT opensourcery.com).

Please continue the process towards jaunty-updates.

Martin Pitt (pitti) on 2009-11-07
tags: added: verification-done
removed: verification-failed
Dustin Kirkland  (kirkland) wrote :

nentis-

Thanks for retesting.

I'm employed by Canonical to do this sort of work, and therefore won't
accept your bounty ;-)

Though I appreciate the gesture.

:-Dustin

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package kvm - 1:84+dfsg-0ubuntu12.4

---------------
kvm (1:84+dfsg-0ubuntu12.4) jaunty-proposed; urgency=low

  * debian/patches/Fix-cluster-freeing-in-qcow2.patch: cherry-pick
    from upstream, fixes regression caused by
    qcow2-corruption-Fix-alloc_cluster_link_l2-Kevin-W.patch
    LP: #404394

 -- Dustin Kirkland <email address hidden> Wed, 04 Nov 2009 14:05:14 -0500

Changed in kvm (Ubuntu Jaunty):
status: Fix Committed → Fix Released
müzso (bit2) wrote :

I hope I'm not getting lynched :-), but I'd like to reopen this bug as not fixed. Previously you wrote that Karmic is not affected, however I've experienced filesystem corruption inside qcow2 images. It's a bit hard to reproduce since I had it in Windows 2003 guests, but I'll try to create a test case with an Ubuntu (eg. Karmic) guest too.
The host OS is Ubuntu 9.10 Karmic and kvm is version "84+dfsg-0ubuntu16+0.11.0+0ubuntu6.3".
The corruption occurs only during high I/O (as described by others before). It's irrelevant whether the high I/O is in the host OS or in the guest. I'll post again once I've an easily reproduceable test case, preferably in a linux guest OS.

müzso (bit2) wrote :

Btw: I do not use virtio in any of my guests, only emulated IDE controllers ... which is most probably a big difference compared to the original bug report.

müzso (bit2) wrote :

I'm also hit by bug#448694 which was the reason for using qcow2 images in the first place. Now I'm pretty much left in the cold ... cannot use RAW or LVM ... and neither qcow2. What am I supposed to store my guest filesystems in then? :-(

müzso (bit2) wrote :

I revoke my previous comments. Meanwhile it turned out that the server I was testing with had HW problems (namely it couldn't handle 6x2GB RAM despite of the motherboard docs claiming it would). So my error reports are unreliable ... they might have been caused by HW issues.

Hello Customer

We found a way out for you - now you can pay for as many watches as a lot of style changes you have.
Watches and other replicas have become inexpensive and most important of all high quality therefore it is very important to choose the right replica retailer!

---------------------------------------------------------
Product matches the catalogue exactly. If only there was a rating for exceptional! I cannot tell you how impressed I am with the service I will be recommending them to my friends Thanks again you are one of, if not, the best in the business!
A thousand thanks
                     Cole Webb
---------------------------------------------------------

Click here ---> http://orahs.ru

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers