20100222 images fail to boot in UEC (HTTP error 500 retrieving ephemeral0 metadata)

Bug #525675 reported by Thierry Carrez on 2010-02-22
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Eucalyptus
Fix Released
Undecided
chris grzegorczyk
cloud-init (Ubuntu)
Low
Scott Moser
Lucid
Low
Scott Moser
eucalyptus (Ubuntu)
High
Dustin Kirkland 
Lucid
High
Dustin Kirkland 
python-boto (Ubuntu)
Wishlist
Unassigned
Lucid
Wishlist
Unassigned

Bug Description

Binary package hint: cloud-init

Starting a 20100222 lucid cloud image on UEC, the instance boots, IP is up, but SSH is never started.
Got the following errors in euca-get-console-output:

FATAL: Could not load /lib/modules/2.6.32-14-server/modules.dep: No such file or directory
[ 4.525542] kjournald starting. Commit interval 5 seconds
[ 4.527080] EXT3-fs: mounted filesystem with ordered data mode.
Begin: Running /scripts/local-bottom ...
Done.
Done.
Begin: Running /scripts/init-bottom ...
Begin: Starting AppArmor profiles ...
chroot: cannot execute /etc/apparmor/initramfs: No such file or directory
Failure: AppArmor profiles failed to load
Done.
Caught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance data

Might be a problem in eucalyptus rather than in the cloud image (metadata service not responding ?)

Related branches

Thierry Carrez (ttx) wrote :

Reinstalled my UEC setup to todays archive:
- Karmic image works alright
- Lucid image still fails

So I suppose it's an issue in the image rather than in Eucalyptus.

Changed in cloud-init (Ubuntu):
assignee: nobody → Scott Moser (smoser)
importance: Undecided → High
milestone: none → lucid-alpha-3
status: New → Confirmed
Scott Moser (smoser) wrote :

It would seem that your metadata service is not up for some reason.

However, one thing to note here is that you seem to be trying to boot with an initramfs, but there is no initramfs in this build.

Thierry Carrez (ttx) wrote :

About the metadata service:
The metadata service seems to work well for the karmic cloud image ?

About the boot with initramfs:
The image was registered through uec-register-tarball, is there anything specific to do to avoid booting with an initramfs ?

Scott Moser (smoser) wrote :

I'm fairly sure I know what is wrong, or at least suspect something.

cloudinit/DataSourceEc2.py:
class DataSourceEc2(DataSource.DataSource):
    api_ver = '2009-04-04'

Where, in ec2init (in karmic):
./ec2init/__init__.py:
class EC2Init:
    api_ver = '2008-02-01'

The result of above is that lucid request the newer version of the data service, which I don't think eucalyptus is making available.

You can verifiy this in your karmic instance with:

python -c 'import boto.utils; boto_utils.get_instance_userdata("2009-02-01")'

I think that will work, but if you use the newer version (2009-04-04) i think you'll fail.

Thierry Carrez (ttx) wrote :

In karmic instance:
boto.utils.get_instance_userdata("2009-02-01") --> returns ''
boto.utils.get_instance_userdata("2009-04-04") --> returns ''
boto.utils.get_instance_metadata("2009-02-01") --> returns a full dict
boto.utils.get_instance_metadata("2009-04-04") --> returns the same full dict

In karmic instance with boto 1.9:
boto.utils.get_instance_userdata("2009-02-01") --> returns ''
boto.utils.get_instance_userdata("2009-04-04") --> returns ''
boto.utils.get_instance_metadata("2009-02-01") --> hangs
boto.utils.get_instance_metadata("2009-04-04") --> hangs

I traced it back to the metadata enumeration. With boto 1.9b:

http://169.254.169.254/latest/meta-data/block-device-mapping/ returns:
'emi\nephemeral0\nroot\nswap'
http://169.254.169.254/latest/meta-data/block-device-mapping/emi returns:
sda1
http://169.254.169.254/latest/meta-data/block-device-mapping/ephemeral0 returns error code 500.

So retry_url loops while trying to get http://169.254.169.254/latest/meta-data/block-device-mapping/ephemeral0

summary: - 20100222 images fail to boot in UEC
+ 20100222 images fail to boot in UEC (no ephemeral0 metadata)
Thierry Carrez (ttx) on 2010-02-22
Changed in eucalyptus (Ubuntu Lucid):
assignee: nobody → Dustin Kirkland (kirkland)
importance: Undecided → High
milestone: none → lucid-alpha-3
status: New → Confirmed
Thierry Carrez (ttx) wrote :

Right, boto 1.9 implemented recursive metadata retrieval in get_instance_metadata (in boto/utils.py). That makes it fail if Eucalyptus exposes a key (ephemeral0) that it doesn't support querying to (error 500).

summary: - 20100222 images fail to boot in UEC (no ephemeral0 metadata)
+ 20100222 images fail to boot in UEC (HTTP error 500 retrieving
+ ephemeral0 metadata)
Changed in cloud-init (Ubuntu Lucid):
importance: High → Low
milestone: lucid-alpha-3 → none
status: Confirmed → Invalid

On Mon, 22 Feb 2010, Thierry Carrez wrote:

> The image was registered through uec-register-tarball, is there anything specific to do to avoid booting with an initramfs ?
>

does euca-describe-images show a eri ?
if so, then uec-register tarbal is buggy,

Dustin Kirkland  (kirkland) wrote :

Adding a wishlist priority task against boto; it could fail more gracefully with a more informative message.

Changed in python-boto (Ubuntu Lucid):
status: New → Confirmed
importance: Undecided → Wishlist
Thierry Carrez (ttx) wrote :

I think I nailed it:
in clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java

    m.put( "block-device-mapping/", "emi\nephemeral0\nroot\nswap" );
    m.put( "block-device-mapping/emi", "sda1" );
    m.put( "block-device-mapping/ami", "sda1" );
    m.put( "block-device-mapping/ephemeral", "sda2" );

That last line should probably read
    m.put( "block-device-mapping/ephemeral0", "sda2" );

Regression introduced in r906:
- m.put( "block-device-mapping/", "emi\nephemeral\nroot\nswap" );
+ m.put( "block-device-mapping/", "emi\nephemeral0\nroot\nswap" );
without changing the ephemeral reference below.

Changed in eucalyptus (Ubuntu Lucid):
status: Confirmed → Triaged
Thierry Carrez (ttx) wrote :

@Scott: no eri reference for the one without ramdisk, so it's ok.

Thierry Carrez (ttx) wrote :

For reference, regression introduced by an incomplete fix for bug 513842

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 1.6.2-0ubuntu3

---------------
eucalyptus (1.6.2-0ubuntu3) lucid; urgency=low

  [ Thierry Carrez ]
  * clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java:
    fix incomplete ephemeral block device mapping path, LP: #525675
 -- Dustin Kirkland <email address hidden> Mon, 22 Feb 2010 14:09:26 -0600

Changed in eucalyptus (Ubuntu Lucid):
status: Triaged → Fix Released
Dustin Kirkland  (kirkland) wrote :

Okay, I rolled a package with Thierry's fix. I bundled and uploaded today's Lucid's image, ran that Instance, and I was able to successfully ssh into it.

Note, I did file a couple of new bugs, notably that I had to manually register the image, rather than using uec-publish*
 * Bug #525989

And there's still a few errors in console-out:
 * Bug #525994

Dustin Kirkland  (kirkland) wrote :

Dan/upstream Eucalyptus-

I have a minor fix for this bug that I'd like for you to review and merge into your 1.6.2 branch.

See:
  lp:~kirkland/eucalyptus/525675

chris grzegorczyk (chris-grze) wrote :

Hi Dustin,

This was fixed in the below revno. Closing the bug against Eucalyptus.

------------------------------------------------------------
    revno: 1185.1.1
    committer: decker <decker@hawaii>
    branch nick: metadata
    timestamp: Tue 2010-02-09 17:27:01 -0800
    message:
      ephemeral -> ephemeral0 LP:#513842
------------------------------------------------------------

Changed in eucalyptus:
status: New → Fix Released
assignee: nobody → chris grzegorczyk (chris-grze)

Chris-

Almost, but not quite ... Looks to me like this commit fixed one
ephemeral reference, but missed the other... Can you confirm/deny,
Chris?

bzr diff -r 1185..1185.1.1

=== modified file
'clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java'
--- clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java
2010-02-05 12:04:49 +0000
+++ clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java
2010-02-10 01:27:01 +0000
@@ -342,7 +342,7 @@
     m.put( "ramdisk-id", this.getImageInfo( ).getRamdiskId( ) );
     m.put( "security-groups", this.getNetworkNames( ).toString(
).replaceAll( "[\\Q[]\\E]", "" ).replaceAll( ", ", "\n" ) );

- m.put( "block-device-mapping/", "emi\nephemeral\nroot\nswap" );
+ m.put( "block-device-mapping/", "emi\nephemeral0\nroot\nswap" );
     m.put( "block-device-mapping/emi", "sda1" );
     m.put( "block-device-mapping/ami", "sda1" );
     m.put( "block-device-mapping/ephemeral", "sda2" );

chris grzegorczyk (chris-grze) wrote :

Indeed. Not sure how that slipped by. Thanks for catching it. See r1200.

On Mon, Feb 22, 2010 at 8:05 PM, Dustin Kirkland
<email address hidden> wrote:
> Chris-
>
> Almost, but not quite ...  Looks to me like this commit fixed one
> ephemeral reference, but missed the other...  Can you confirm/deny,
> Chris?
>
> bzr diff -r 1185..1185.1.1
>
> === modified file
> 'clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java'
> --- clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java
> 2010-02-05 12:04:49 +0000
> +++ clc/modules/cluster-manager/src/main/java/edu/ucsb/eucalyptus/cloud/cluster/VmInstance.java
> 2010-02-10 01:27:01 +0000
> @@ -342,7 +342,7 @@
>     m.put( "ramdisk-id", this.getImageInfo( ).getRamdiskId( ) );
>     m.put( "security-groups", this.getNetworkNames( ).toString(
> ).replaceAll( "[\\Q[]\\E]", "" ).replaceAll( ", ", "\n" ) );
>
> -    m.put( "block-device-mapping/", "emi\nephemeral\nroot\nswap" );
> +    m.put( "block-device-mapping/", "emi\nephemeral0\nroot\nswap" );
>     m.put( "block-device-mapping/emi", "sda1" );
>     m.put( "block-device-mapping/ami", "sda1" );
>     m.put( "block-device-mapping/ephemeral", "sda2" );
>
> --
> 20100222 images fail to boot in UEC (HTTP error 500 retrieving ephemeral0 metadata)
> https://bugs.launchpad.net/bugs/525675
> You received this bug notification because you are a bug assignee.
>
> Status in Eucalyptus: Fix Released
> Status in “cloud-init” package in Ubuntu: Invalid
> Status in “eucalyptus” package in Ubuntu: Fix Released
> Status in “python-boto” package in Ubuntu: Confirmed
> Status in “cloud-init” source package in Lucid: Invalid
> Status in “eucalyptus” source package in Lucid: Fix Released
> Status in “python-boto” source package in Lucid: Confirmed
>
> Bug description:
> Binary package hint: cloud-init
>
> Starting a 20100222 lucid cloud image on UEC, the instance boots, IP is up, but SSH is never started.
> Got the following errors in euca-get-console-output:
>
> FATAL: Could not load /lib/modules/2.6.32-14-server/modules.dep: No such file or directory
> [    4.525542] kjournald starting.  Commit interval 5 seconds
> [    4.527080] EXT3-fs: mounted filesystem with ordered data mode.
> Begin: Running /scripts/local-bottom ...
> Done.
> Done.
> Begin: Running /scripts/init-bottom ...
> Begin: Starting AppArmor profiles ...
> chroot: cannot execute /etc/apparmor/initramfs: No such file or directory
> Failure: AppArmor profiles failed to load
> Done.
> Caught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance dataCaught exception reading instance data
>
> Might be a problem in eucalyptus rather than in the cloud image (metadata service not responding ?)
>
>
>

Dustin Kirkland  (kirkland) wrote :

Erg, I think I'm hitting this again on an Alpha3 eucalyptus 1.6.2-0ubuntu4 installation.

Funny thing is that I did not hit it in topo1 (CLC+WC+CC+SC, NC), but I did hit it in topo3 (CLC+WC, CC+SC, NC).

I'm still investigating :-/

Thierry Carrez (ttx) wrote :

@Dustin: make sure it's the same bug. I'd rather think it's a global issue for the instance to access metadata in that topology, rather than specifically an issue with the "ephemeral0" key, and that would make it a new bug. Checking cloud-error.log should help.

Dustin Kirkland  (kirkland) wrote :

Upon further inspection, it is definitely a different bug.

I have not actually seen this particular bug resurface. Only similar
symptoms. Thanks.

Thierry Carrez (ttx) on 2010-03-05
Changed in python-boto (Ubuntu Lucid):
status: Confirmed → Won't Fix
Scott Moser (smoser) on 2014-07-29
Changed in python-boto (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers