Ubuntu

device mappings for partitions not removed after build using --raw, leading to filesystem corruption

Reported by Simon Huerlimann on 2010-03-03
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
vm-builder (Ubuntu)
Undecided
Unassigned
Lucid
Undecided
Unassigned

Bug Description

SRU Request for Lucid

[Impact] This bug occurs systematically when the --raw option is used to create a VM. This results in file system data corruption messages in the VM created.

[Development/Stable Fix] Fixed in 0.12.4+bzr475-0ubuntu1 for precise as stated in comment #9 with the following addition :
  [ Louis Bouchard ]
  * Remove dev maps created by parted. (LP: #531599)

[Test Case] section with detailed instructions how to reproduce the bug. These should allow someone who is not familiar with the affected package to reproduce the bug and verify that the updated package fixes the problem.

On an unmodified system with an existging 8Gb LVM volume named /dev/sysvg/testvmvol, run the following command :

ubuntu-vm-builder kvm natty --lang=en_US --hostname=testvm --rootsize=8000 --mem=10240 --cpus=2 --rootpass=X --arch=amd64 --libvirt=qemu:///system --raw=/dev/sysvg/testvmvol

On an unmodified system, once completed, the dmsetup list command will show the following :
$ dmsetup ls
sysvg-testvmvol1p1 (252, 2)
sysvg-testvmvol1 (252, 1)
sysvg-testvmvol (252, 0)

The sysvg-testvmvol1p1 table should not be there

[Regression Potential] This fix needs to be backported to the LTS version to avoid potential file system corruption.

---

I've tried to install a vm using an LVM logical volume with --raw. The problem was that I used a too small logical volume.

vmbuilder told me:
2010-03-03 21:33:59,923 INFO : Adding partition table to disk image: /dev/mapper/leo01-test--disk
2010-03-03 21:34:00,007 INFO : Adding type 1 partition to disk image: /dev/mapper/leo01-test--disk
2010-03-03 21:34:00,879 INFO : Adding type 3 partition to disk image: /dev/mapper/leo01-test--disk
2010-03-03 21:34:00,890 INFO : [0] ../../libparted/filesys.c:147 (ped_file_system_type_get): File system alias linux-swap(new) is deprecated
2010-03-03 21:34:00,892 INFO : Cleaning up
Traceback (most recent call last):
  File "/usr/bin/vmbuilder", line 29, in <module>
    VMBuilder.run()
  File "/usr/lib/python2.6/dist-packages/VMBuilder/__init__.py", line 65, in run
    frontend.run()
  File "/usr/lib/python2.6/dist-packages/VMBuilder/plugins/cli/__init__.py", line 68, in run
    vm.create()
  File "/usr/lib/python2.6/dist-packages/VMBuilder/vm.py", line 480, in create
    disk.create_partitions(self)
  File "/usr/lib/python2.6/dist-packages/VMBuilder/disk.py", line 416, in create_partitions
    disk.create(vm.workdir)
  File "/usr/lib/python2.6/dist-packages/VMBuilder/disk.py", line 100, in create
    part.create(self)
  File "/usr/lib/python2.6/dist-packages/VMBuilder/disk.py", line 244, in create
    run_cmd('parted', '--script', '--', disk.filename, 'mkpart', 'primary', self.parted_fstype(), self.begin, self.end)
  File "/usr/lib/python2.6/dist-packages/VMBuilder/util.py", line 135, in run_cmd
    raise VMBuilderException, "Process (%s) returned %d. stdout: %s, stderr: %s" % (args.__repr__(), status, stdout, stderr)
VMBuilder.exception.VMBuilderException: Process (['parted', '--script', '--', '/dev/mapper/leo01-test--disk', 'mkpart', 'primary', 'linux-swap(new)', '9', '1032']) returned 1. stdout: Error: The location 1032 is outside of the device /dev/mapper/leo01-test--disk.
, stderr: [0] ../../libparted/filesys.c:147 (ped_file_system_type_get): File system alias linux-swap(new) is deprecated

While this is a lot of noise, telling me 'location 1032 is outside of the device...' made me recognize my mistake.

The problem is that I wasn't able to just remove the logical volume, create a bigger one and start again:
 sudo lvremove /dev/mapper/leo01-test--disk
   Can't remove open logical volume "test-disk"

The reason is that there's another device map set up by vmbuilder: /dev/mapper/leo01-test--diskp1

This is because the first (root) partition did fit on the logical volume, but only the second (swap) one not.

I had to drop the device mapping using
 sudo dmsetup remove /dev/mapper/leo01-test--diskp1

This would allow me to lvremove the logical volume now.

I would expect that vmbuilder cleans up on exceptions like this and remove the device mappings for the inner partitions.

On Wed, Mar 03, 2010 at 09:06:21PM -0000, Simon Huerlimann wrote:
> Public bug reported:
>
> I've tried to install a vm using an LVM logical volume with --raw. The
> problem was that I used a too small logical volume.

Which version of VMBuilder are you using?

If you use "ubuntu-bug python-vm-builder" to report the bug, all this
information gets included automatically.

 status incomplete

--
Soren Hansen
Ubuntu Developer
http://www.ubuntu.com/

Changed in vm-builder (Ubuntu):
status: New → Incomplete

We'd like to figure out what's causing this bug for you, but we haven't heard back from you in a while. Could you please provide the requested information? Thanks!

Simon Huerlimann (huerlisi) wrote :

I tried with the current version in Lucid (0.12.3.r435), error still present with similar output.

Changed in vm-builder (Ubuntu):
status: Incomplete → Confirmed
Simon Huerlimann (huerlisi) wrote :
Download full text (3.8 KiB)

Doing this on a Lucid with vmbuilder 0.12.3.r435.

After some testing I found out that this is not restricted to failed/interrupted vmbuilder runs. Additionally leaving behind those device mappings seem to interfere with the filesystem in the VM.

If you use a LVM logical volume as --raw device, a partition table is created on the device and partitions are setup. For those partitions device mapper is configured. Example for --raw /dev/leo02/kvm-web01.test:
brw-rw---- 1 shuerlimann shuerlimann 251, 8 2010-05-01 07:17 leo02-kvm--web01.test
brw-rw---- 1 root disk 251, 9 2010-04-22 16:44 leo02-kvm--web01.testp1
brw-rw---- 1 root disk 251, 10 2010-04-22 16:44 leo02-kvm--web01.testp2

As far as I can tell, no such leftovers exist if qcow images are used...

If you start a vm created using --raw and having those device mappings still configured, filesystem corruption in the vm happens. This is quite reliably reproducable. Here's a snipplet of the /var/log/kernel.log right after booting:

Apr 30 12:47:11 web01 kernel: [ 0.357236] Freeing unused kernel memory: 796k freed
Apr 30 12:47:11 web01 kernel: [ 0.357881] Write protecting the kernel read-only data: 7788k
Apr 30 12:47:11 web01 kernel: [ 0.380691] udev: starting version 151
Apr 30 12:47:11 web01 kernel: [ 0.540618] FDC 0 is a S82078B
Apr 30 12:47:11 web01 kernel: [ 0.586564] EXT4-fs (sda1): INFO: recovery required on readonly filesystem
Apr 30 12:47:11 web01 kernel: [ 0.586573] EXT4-fs (sda1): write access will be enabled during recovery
Apr 30 12:47:11 web01 kernel: [ 0.603151] EXT4-fs (sda1): recovery complete
Apr 30 12:47:11 web01 kernel: [ 0.651570] EXT4-fs (sda1): mounted filesystem with ordered data mode
Apr 30 12:47:11 web01 kernel: [ 1.476513] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 1: 1110 blocks in bitmap, 1688 in gd
Apr 30 12:47:11 web01 kernel: [ 1.477954] JBD: Spotted dirty metadata buffer (dev = sda1, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
Apr 30 12:47:11 web01 kernel: [ 1.532262] udev: starting version 151
Apr 30 12:47:11 web01 kernel: [ 1.721612] vga16fb: initializing
Apr 30 12:47:11 web01 kernel: [ 1.721620] vga16fb: mapped to 0xffff8800000a0000
Apr 30 12:47:11 web01 kernel: [ 1.721680] fb0: VGA16 VGA frame buffer device
Apr 30 12:47:11 web01 kernel: [ 1.904983] Adding 999416k swap on /dev/sda2. Priority:-1 extents:1 across:999416k
Apr 30 12:47:11 web01 kernel: [ 2.241484] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
Apr 30 12:47:11 web01 kernel: [ 3.210207] Console: switching to colour frame buffer device 80x30
Apr 30 12:47:11 web01 kernel: [ 4.219962] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 2: 415 blocks in bitmap, 478 in gd
Apr 30 12:47:11 web01 kernel: [ 4.247451] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 3: 7494 blocks in bitmap, 19027 in gd
Apr 30 12:47:23 web01 kernel: [ 13.720135] eth0: no IPv6 routers present
Apr 30 12:47:58 web01 kernel: [ 49.482760] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 0...

Read more...

summary: - tricky leftovers when using a too small --raw device
+ device mappings for partition not removed after build using --raw,
+ leading to filesystem corruption
summary: - device mappings for partition not removed after build using --raw,
+ device mappings for partitions not removed after build using --raw,
leading to filesystem corruption
Simon Huerlimann (huerlisi) wrote :

After some investigation it looks like vmbuilder is nicely registering a callback hook to unmap the devices it sets up using kpartx. The problem is udev rules adding device mappings using kpartx as soon as the raw device gets partitioned.

AFAIK the relevant udev rule is /lib/udev/rules.d/55-dm.rules. I don't exactly know how and who creates the mapping but it follows the scheme /dev/mapper/{DEVICE}p{X}. I first thought it would be /lib/udev/rules.d/95-kpartx.rules, but this would create /dev/mapper/{DEVICE}-part{X} mappings.

The kpartx call in vmbuilder does use the scheme /dev/mapper/{DEVICE}{X} and thus doesn't clash with the set up mappings. But it also doesn't destroy it in the cleanup call back.

Hurl (hurl) wrote :

I can confirm this bug too, using vmbuilder 0.12.4 on an up to date lucid.
Vmbuilder complete guest creation without an error. But launching the guest with the mapping cause immediate fs corruption.
This has nothing to do with the lv size.
Simon have done all the investigation correctly but I don't how to fix that.
The only solution is to not forget to remove this mapping before launchping the guest.
I would appreciate if this could be fixed.
Regards

Louis Bouchard (louis-bouchard) wrote :

After doing some test to identify what is creating this duplicate set of dm maps, it seems to be the following 'parted mkpart' commands that triggers the udev rule :

From /usr/share/pyshared/VMBuilder/disk.py :

        def create(self, disk):
            """Adds partition to the disk image (does not mkfs or anything like that)"""
            logging.info('Adding type %d partition to disk image: %s' % (self.type, disk.filename))
            run_cmd('parted', '--script', '--', disk.filename, 'mkpart', 'primary', self.parted_fstype(), self.begin, self.end)

running the equivalent command while watching the dm tables does show a new table entry when the 'parted --script {dev} mkpart primary ext2 {beginning} {size}' command is invoked.

Currently, it is unclear to me if the removal of those unnecessary dm entries should be done by a callback from create() or included in the existing unmap() callback for map_partitions.

Louis Bouchard (louis-bouchard) wrote :

I have been able to test a potential fix for this problem. The unmap() function is responsible for tearing apart the device maps created by the kpartx commands. Those are created by the map_partitions() function.

While tearing down the device maps created by the parted command, which is invoked in the partition() function might be better done by a callback in the partition() function, the patch is simpler by adding to the existing unmap() which is a callback from map_partition, which is invoked sequentially after partition() in hypervisor.py.

Please let me know if the patch in the proposed branch is unacceptable.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vm-builder - 0.12.4+bzr475-0ubuntu1

---------------
vm-builder (0.12.4+bzr475-0ubuntu1) precise; urgency=low

  [ Louis Bouchard ]
  * Remove dev maps created by parted. (LP: #531599)

  [ Serge Hallyn ]
  * pre-hook to create vcsversion.py is failing, work around it.
 -- Serge Hallyn <email address hidden> Thu, 22 Dec 2011 09:18:12 -0600

Changed in vm-builder (Ubuntu):
status: Confirmed → Fix Released
Simon Chan (simonchan) wrote :

The new fix 0.12.4+bzr475-0ubuntu1 seems to have created another bug:

Repoduce the Bug : Use vmbuilder to create VM, execution would terminate right at "Unmounting target filesystem".
An error occurs when the system tries to unmount /dev/mapper/loop0pp1

It seems to be a typo, it should be /dev/mapper/loop0p1 (Only one 'p').

Rollbacked to 0.12.4+bzr471-0ubuntu1 and this problem is gone.

@Simon

Could you please provide an example of the vmbuilder command that you used to generate the problem, so I can test with it ?

Kind regards

@Simon

Nevermind the previous request, I was able to reproduce the bug myself. As the issue fixed by this patch only happens when --raw is used, the modification is not restrictive enough and try to remove nonexistent maps. I'll work on refining the patch.

no longer affects: vmbuilder
description: updated
Micah Gersten (micahg) wrote :

Unsubscribing sponsors as I don't see a patch for lucid, please resubscribe when a patch or merge proposal is ready

@micah

I'm sorry, I thought I had done so. Here is the debdiff against the ubuntu version. Let me know if this is sufficient.

Stefano Rivera (stefanor) wrote :

Corrected the bug number in the changelog entry, retarrgeted the upload at lucid-proposed, and uploaded.

It's now waiting for SRU team review.

Changed in vm-builder (Ubuntu Lucid):
status: New → Fix Committed

Hello Simon, or anyone else affected,

Accepted vm-builder into lucid-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/vm-builder/0.12.4-0ubuntu0.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed

Changed tag to verification-done. tested in a Lucid environment with initial reproducer and it no longer leaves a lingering dm map. Behavior is as expected.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package vm-builder - 0.12.4-0ubuntu0.3

---------------
vm-builder (0.12.4-0ubuntu0.3) lucid-proposed; urgency=low

  * debian/patches/0001-lp531599.patch: Backport fix for
    LP: #531599 which removes dev maps created by parted.
 -- Louis Bouchard <email address hidden> Mon, 04 Jun 2012 12:33:21 +0100

Changed in vm-builder (Ubuntu Lucid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers