vmbuilder hangs or crashes when building images on ec2 instances

Bug #358098 reported by Cap Petschulat
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
vm-builder (Ubuntu)
Triaged
Wishlist
Unassigned

Bug Description

Binary package hint: python-vm-builder

I'm trying to build hardy vmw6 images on an ec2 instance. Depending on the ami I'm building on, vmbuilder either hangs indefinitely or ends prematurely after something segfaults.

For instance, running on alestic/ubuntu-8.10-intrepid-base-20090216.manifest.xml with the following command, I get an indefinite hang:

root@domU-12-31-39-01-C4-03:~# vmbuilder vmw6 ubuntu --suite hardy --flavour virtual --arch i386 --verbose --debug --mirror http://ec2-us-east-mirror.rightscale.com/ubuntu > log 2>&1

At the time of the hang, top shows the following:

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18602 root 25 0 18412 15m 13m R 97 0.9 1:48.67 apt-get -y --force-yes dist-upgrade
19782 root 25 0 3484 1612 1276 R 97 0.1 1:29.01 /usr/bin/perl /usr/sbin/update-rc.d module-init-tools start 15 S .

The log from this is attached.

Within update-rc.d, the hang occurs on the following line within the checklinks sub, though my perl debugging skills get me no further than this:

foreach $_ (readdir(DIR)) {

Depending on the options I pass to vmbuilder, the hang can occur in different places, for example in /usr/sbin/update-mime, but it always seems to hang in perl scripts.

Building an intrepid image fails, too, but for a completely different reason that appears to already have been reported (and doesn't occur until writing the final image).

I originally encountered this problem when building the images on a hardy system with python-vm-builder debs manually installed. I've also reproduced this on a canonical AMI (canonical-beta-us/ubuntu-intrepid-beta2-20090226-i386.manifest.xml), where something segfaulted at the same time in the build as the hang usually occurs.

I've tried various ubuntu mirrors, so it's not that, and I've successfully built with the same command locally on a hardy system.

More on the system from which the log originates:

root@domU-12-31-39-01-C4-03:~# lsb_release -rd
Description: Ubuntu 8.10
Release: 8.10
root@domU-12-31-39-01-C4-03:~# apt-cache policy python-vm-builder
python-vm-builder:
  Installed: 0.9-0ubuntu3.1

Revision history for this message
Cap Petschulat (cap-petschulat) wrote :
Revision history for this message
Cap Petschulat (cap-petschulat) wrote :

Changing the hypervisor doesn't seem to matter; xen fails similarly.

Revision history for this message
Eric Hammond (esh) wrote :

Am I correct in understanding that this process works on your local Ubuntu system, but fails when you run it on EC2 instances using various official and unofficial AMIs? If so, then it sounds like it may be more of a bug against the EC2 images and environment than against vmbuilder itself.

Revision history for this message
Cap Petschulat (cap-petschulat) wrote :

That is correct. It may well be a problem with the images, but I have a few datapoints that make me wonder:

1. The problem persists across different images from different AMI authors; I've tried a few different versions of hardy and intrepid AMIs with similar results.

2. The problem seems specific to creating hardy images; as I said, intrepid images build, mostly, and fail to a different problem; I think reading some bug reports or throwing a few hours at the problem, I could get intrepid images built. So in this sense vmbuilder works for some settings and fails for others.

Neither rules out a problem in the AMIs or Amazon in general, sure, but it seems sketchy.

I'm happy to dig in a bit further, provide more information, or redirect to an appropriate group, but on my own I don't quite know where to go next. When I saw process hangs with 100% cpu use on what appeared to be an innocuous perl call, I knew this was a problem a bit deeper than I know how to handle without another day or several days of serious dedication. Pointers are most welcome!

Revision history for this message
Eric Hammond (esh) wrote :

This is now ringing a bell with me. I ran into very similar behavior with simple Perl calls spinning when building images on EC2 instances. It was back in 2007 so I'm having a hard time remembering what the problem or solution were, but the answer may be somewhere in this code (which now builds images ok on EC2):

  http://code.google.com/p/ec2ubuntu/

I think it was some odd bug between Perl and the EC2 kernel/environment.

I'm pretty sure the problem showed up in debootstrap or code run by debootstrap. I still see the commented out debug line in the above script right before running debootstrap. What's odd, though, is that the script does almost nothing of significance before running debootstrap.

Revision history for this message
Cap Petschulat (cap-petschulat) wrote :

Thanks for the pointer! I found the fix in your script, and then with a bit more searching discovered it's already been reported and dismissed here:

https://bugs.launchpad.net/ubuntu/+source/vm-builder/+bug/293067

In that bug, you mention that it's fixed in the latest ubuntu kernel, but it's clearly not fixed when I'm running the latest alestic images on ec2. I'm a bit new to ec2; does this mean I can fix everything by manually choosing a newer kernel when starting the instance, or is there some further bug to fix?

Revision history for this message
Eric Hammond (esh) wrote :

Cap, Are you saying that you are able to apply a fix to a running Ubuntu instance on EC2 and get vm-builder to work correctly for you (what AMI and what steps)? Or, are you just assuming that #293067 probably relates because of the description?

Note that the images listed on http://alestic.com are running Amazon's older 2.6.21fc8 kernel and that they are generally not of interest to the team reading launchpad.net bugs, though I am personally very interested and appreciate bug reports directly to me or through http://groups.google.com/group/ec2ubuntu

Bugs you find running on the official Ubuntu beta images should definitely continue to be posted here or on the ec2-beta mailing list so that they can be investigated.

Are you still experiencing this particular problem on the official Ubuntu beta images?

Revision history for this message
Cap Petschulat (cap-petschulat) wrote :

I just re-tested with the official ubuntu beta; the result is still a segfault instead of a hang, but the libc fix works. Here's what I did, boiled down as simply as I can get it.

Spin up a small instance of ami-69d73000 (canonical-beta-us/ubuntu-intrepid-beta2-20090226-i386.manifest.xml), apt-get upgrade and install debootstrap.

$ mkdir img
$ sudo debootstrap hardy img
$ sudo touch img/etc/init.d/foo
$ sudo chroot img /usr/sbin/update-rc.d foo start 15 S .
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
 LANGUAGE = (unset),
 LC_ALL = (unset),
 LANG = "en_CA.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Segmentation fault

Add restricted, universe, and multiverse to img/etc/sources.list

$ sudo chroot img apt-get update
$ sudo chroot img apt-get install libc6-xen

Create img/etc/ld.so.conf.d/libc6-xen.conf as in https://bugs.launchpad.net/ubuntu/+source/vm-builder/+bug/293067.

$ sudo chroot img apt-get remove libc6-686
$ sudo chroot img ldconfig

$ sudo chroot img /usr/sbin/update-rc.d foo start 15 S .
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
 LANGUAGE = (unset),
 LC_ALL = (unset),
 LANG = "en_CA.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
 Adding system startup for /etc/init.d/foo ...
   /etc/rcS.d/S15foo -> ../init.d/foo

---------

I've also added the libc6-xen kludge to an appropriate spot in my python-vm-builder so that it works now, too. Since I'm building vmw6 images, it seems a bit perverse to be installing the xen libc6, though I'm not yet sure if it's actively harmful. The same modified python-vm-builder works on the alestic images, as well.

Chuck Short (zulcss)
Changed in vm-builder (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Soren Hansen (soren)
Changed in vm-builder (Ubuntu):
importance: Low → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.