Fresh VM installs via preseeded oneiric isos sometimes fail with filesystem issues

Bug #1040033 reported by Jamie Strandboge
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
qemu-kvm (Ubuntu)
Fix Released
Critical
Serge Hallyn
Quantal
Fix Released
Critical
Serge Hallyn

Bug Description

I am seeing filesystem corruption in guests when using qcow2 images via libvirt. For example, the security team, et al uses lp:ubuntu-qa-tools/vm-tools/uvt to create new VM images (using libvirt) by using an ISO, then updating it with preseed information so we may perform unattended VM installs. I seem to hit this fairly regularly with 11.10 amd64 installs: fails ~50% of the time before ubiquity finishes installation (out of ~10 installs).

I am using Ubuntu 3.5.0-10.10-generic 3.5.1 with qemu-kvm 1.1~rc+dfsg-1ubuntu9 on 12.10. I have seen other filesystem issues in other existing VMs (ie, non-oneiric) as well but have not been able to find the cause. This seems to coincide with upgrading my main machine to 12.10 a couple weeks ago.

I have checked smartctl on my drive on the host (both long and short tests) and there are no errors. There are no errors in the host's dmesg. I have not noticed any other anomalies on the host, only guest VMs.

If needed, here are the contents of preseed/vmtools.seed:
ubiquity languagechooser/language-name select English
ubiquity countrychooser/shortlist select US
ubiquity time/zone select America/Chicago
ubiquity debian-installer/locale select en_US.UTF-8
ubiquity localechooser/supported-locales multiselect en_US.UTF-8
console-setup console-setup/layoutcode string us
console-setup console-setup/layout select U.S. English
console-setup console-setup/variantcode select U.S. English
console-setup console-setup/codeset select . Combined - Latin; Slavic Cyrillic; Hebrew; basic Arabic
ubiquity partman-auto/init_automatically_partition select Guided - use entire disk
ubiquity partman-auto/disk string /dev/vda
ubiquity partman-auto/method string regular
ubiquity partman-auto/choose_recipe select All files in one partition (recommended for new users)
ubiquity partman/confirm_write_new_label boolean true
ubiquity partman/choose_partition select Finish partitioning and write changes to disk
ubiquity partman/confirm boolean true
ubiquity passwd/user-fullname string REDACTED
user-setup passwd/user-fullname string REDACTED
ubiquity passwd/username string REDACTED
user-setup passwd/username string REDACTED
ubiquity passwd/user-password password REDACTED
ubiquity passwd/user-password-again password REDACTED
user-setup passwd/user-password password REDACTED
user-setup passwd/user-password-again password REDACTED
d-i netcfg/get_hostname string vmtools
d-i netcfg/get_domain string defaultdomain
ubiquity migration-assistant/partitions multiselect
ubiquity ubiquity/summary note
ubiquity ubiquity/reboot string true
d-i mirror/http/proxy string
ubiquity ubiquity/success_command string cp /cdrom/preseed/latecommand.sh /target/root/; chroot /target chmod +x /root/latecommand.sh; chroot /target bash /root/latecommand.sh;

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: ubiquity 2.8.7
ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
Uname: Linux 3.0.0-12-generic x86_64
ApportVersion: 1.23-0ubuntu3
Architecture: amd64
CasperVersion: 1.287
Date: Wed Aug 22 06:49:51 2012
LiveMediaBuild: Ubuntu 11.10 "Oneiric Ocelot" - Release amd64 (20111012)
ProcEnviron:
 LANGUAGE=
 PATH=(custom, no user)
 LANG=en_US.UTF-8
SourcePackage: ubiquity
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Jamie Strandboge (jdstrand) wrote :
description: updated
description: updated
description: updated
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

This is in UbiquitySyslog.txt:
Aug 21 22:12:59 ubuntu ubiquity: dpkg: unrecoverable fatal error, aborting:
Aug 21 22:12:59 ubuntu ubiquity: files list file for package 'dnsmasq-base' is missing final newline
Aug 21 22:12:59 ubuntu ubiquity: Error in function:
Aug 21 22:12:59 ubuntu ubiquity:
Aug 21 22:12:59 ubuntu plugininstall.py: Exception during installation:
Aug 21 22:12:59 ubuntu plugininstall.py: SystemError: E:Sub-process /usr/bin/dpkg returned an error code (2)
Aug 21 22:12:59 ubuntu plugininstall.py:

It is odd that this sometimes doesn't fail

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I kept the old image around and see that /var/lib/dpkg/info/dnsmasq-base.list has:
$ cat ./dnsmasq-base.list
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/cups-driver-gutenprint
/usr/share/doc/cups-driver-gutenprint/README.cups.gz
/usr/share/doc/cups-driver-gutenprint/README.gz
/usr/share/doc/cups-driver-gutenprint/copyright
/usr/share/doc/cups-driver-gutenprint/NEWS.Debian.gz
/usr/share/doc/cups-driver-gutenprint/changelog.Debian.gz
/usr/share/doc/cups-driver-gutenprint/FAQ.html
/usr/share/apport
/usr/share/apport/package-hooks
/usr/share/apport/package-hooks/source_gutenprint.py
/usr/share/man
/usr/share/man/man8
/usr/share/man/man8/cups-calibrate.8.gz
/usr/share/man/man8/cups-genppdupdate.8.gz
/usr/share/cups
/usr/share/cups/calibrate.ppm
/usr/share/cups/mime
/usr/share/cups/mime/command.types
/usr/sbin
/usr/sbin/cups-ge$

So there is some sort of disk problem here. I am going to move this away from ubiquity and into qemu-kvm for now.

affects: ubiquity (Ubuntu) → qemu-kvm (Ubuntu)
summary: - Fresh VM install via preseeded oneiric iso failed
+ Fresh VM installs via preseeded oneiric isos fail with filesystem issues
description: updated
tags: removed: oneiric ubiquity-2.8.7
description: updated
description: updated
Revision history for this message
Jamie Strandboge (jdstrand) wrote : Re: Fresh VM installs via preseeded oneiric isos fail with filesystem issues

I have downgraded qemu-common, qemu-kvm and qemu-utils to 1.0+noroms-0ubuntu14.1 and will be running the following repeatedly to see if there are any issues:
for i in `seq 1 4` ; do uvt new -t desktop -f oneiric amd64 test$i ; done

summary: - Fresh VM installs via preseeded oneiric isos fail with filesystem issues
+ Fresh VM installs via preseeded oneiric isos sometimes fail with
+ filesystem issues
description: updated
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

This actually needs to be:
for i in `seq 1 4` ; do uvt new -r -t desktop -f oneiric amd64 test$i ; done

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

After downgrading to qemu to the version in precise (see comment #4), I have successfully created 9 oneiric VMs without error. I will continue my tests and report back if I have any issues.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi Jamie,

if I understand right, the vmtools create a pristine qcow2-based image, then clone that and make some customizations, right? WHen you checked the file contents above for ending newline, was that in the base image or the cloned image?

Also, would you feel confident saying this only happens with qcow2?

I will run some tests to see if I can reproduce this and narrow down where it happens.

Changed in qemu-kvm (Ubuntu Quantal):
importance: Undecided → Critical
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

uvt from vmtools does the following (in essence):
1. creates a custom iso with preseed files, which among other things, drops a script into the VM that is run
2. uses that iso with virt-install. Eg:
/usr/bin/virt-install --quiet --connect=qemu:///system --name=test1-oneiric-amd64 --arch=x86_64 --ram=768 --disk=path=/home/jamie/vms/machines/test1-oneiric-amd64.qcow2,size=8,format=qcow2,sparse=true,bus=virtio --virt-type=kvm --accelerate --hvm --cdrom=/home/jamie/vms/isos/cache/vmtools-oneiric-desktop-amd64.iso --os-type=linux --os-variant=generic26 --graphics=vnc --network=network=default,model=virtio --video=cirrus --noreboot
3. On VM reboot, the customization script referenced in step 1 is run automatically and upon successful completion, removed
4. The VM is polled for lsb_release over ssh. Upon successful response, the machine is shutdown
5. A pristine snapshot is created using libvirt's 'shapshot' functionality

The installation failure occurs in step '2' during virt-install. Ubiquity shows an error that halts installation, which allows me to submit a bug report via apport (which I did for this report).

I gracefully shutdown the machine after submitting the report and saved the qcow2. Later I used qemu-img to create a raw image, then used kpartx/mount to examine the contents which revealed what I said in comments #2 and #3.

Since yesterday, I created 16 more oneiric installs successfully, for a total of 25 successful installs and no failures with qemu-kvm from 12.04.

tags: added: rls-q-incoming
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I haven't yet reproduced this, but looking through the git commit logs the following seems a possible fix:

commit 206e6d8551839008b6858cf8f500d2e644d2b561
Author: Stefan Hajnoczi <email address hidden>
Date: Mon Jun 18 14:00:57 2012 +0100

    qcow2: preserve free_byte_offset when qcow2_alloc_bytes() fails

    When qcow2_alloc_clusters() error handling code was introduced in commit
    5d757b563d59142ca81e1073a8e8396750a0ad1a, the value of free_byte_offset
    was clobbered in the error case. This patch keeps free_byte_offset at 0
    so we will try to allocate clusters again next time this function is
    called.

    Signed-off-by: Stefan Hajnoczi <email address hidden>
    Signed-off-by: Kevin Wolf <email address hidden>

:100644 100644 66f3915... 5e3f915... M block/qcow2-refcount.c

Further encouraging is that the bug this fixes was introduced after 1.0 (the precise version).

It also could possibly be:

commit 166acf546f476d3594a1c1746dc265f1984c5c85
Author: Kevin Wolf <email address hidden>
Date: Fri May 11 18:18:36 2012 +0200

    qcow2: Support for fixing refcount inconsistencies

    Signed-off-by: Kevin Wolf <email address hidden>

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Jamie,

I've pushed a qemu-kvm package with the single potential fix commit to ppa:ubuntu-virt/ppa. Could you test whether that helps? If not, I'll try a package based on either upstream git HEAD or 1.2.0.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Setting status to incomplete pending feedback from Jamie.

Changed in qemu-kvm (Ubuntu Quantal):
status: New → Incomplete
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

After installing 1.1~rc+dfsg-1ubuntu11~ppa1, 1 install completed successfully, then the next one failed during install (virt-install).

Changed in qemu-kvm (Ubuntu Quantal):
status: Incomplete → New
Changed in qemu-kvm (Ubuntu Quantal):
status: New → Triaged
James Page (james-page)
tags: removed: rls-q-incoming
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Nothing in git log stands out ot me. Will be trying to bisect to figure out where the bug was introduced. This becomes time-consuming as (a) qemu is not alwyas-buildable, (b) bug is not 100% reproducible, and (c) there is a large # commits to test. Hoping to have results by friday.

Changed in qemu-kvm (Ubuntu Quantal):
assignee: nobody → Serge Hallyn (serge-hallyn)
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Using qemu upstream git head seems to fix the bug. (Though so far I've only done it once, which isn't 100% certain)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Jamie,

could you try the qemu-kvm package in ppa:ubuntu-virt/virt-daily-upstream? This seems to be working for me. If it fixes it for you, we can either try a last-minute jump to 1.2,0, or hopefully I can figure out which commit fixed it.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Sorry for the delay. These packages seem to work well. I was able to create 20 11.10 VMs with no issues.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu-kvm - 1.2.0+noroms-0ubuntu1

---------------
qemu-kvm (1.2.0+noroms-0ubuntu1) quantal; urgency=low

  * merge upstream v1.2.0 (LP: #1052932) (LP: #1040033)
    - debian/rules: remove --enable-kvm-device-assignment - configure switch
      no longer supported
    - remaining patches:
      . 02_use_usr_share_kvm_fixed.patch
      . 04_use_etc_kvm_kvm-ifup.patch
      . disable-hpet-for-tcg.patch
      . use-libexecdir.patch
      . ubuntu/larger_default_ram_size.patch
      . ubuntu/fallback-to-tcg.patch - ported to new code
      . ubuntu/dont-try-to-hotplug-cpu.patch
      . ubuntu/expose_vmx_qemu64cpu.patch
      . ubuntu/fix-vmware-vga-negative-vals
      . ubuntu/99-allow-loading-u-boot-initrd-images.patch
 -- Serge Hallyn <email address hidden> Wed, 12 Sep 2012 10:46:28 -0500

Changed in qemu-kvm (Ubuntu Quantal):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.