raring instance failed to find EC2 datasource
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mountall (Ubuntu) |
Fix Released
|
High
|
Steve Langasek | ||
Precise |
Fix Released
|
Undecided
|
Unassigned | ||
Quantal |
Fix Released
|
High
|
Steve Langasek | ||
Raring |
Fix Released
|
High
|
Steve Langasek |
Bug Description
[Impact]
The previous SRU, while fixing the problem it was intended to fix, partially reintroduced the problem from before 2.41 where some filesystem events would end up blocking on one another when they shouldn't. This is particularly noticeable for filesystems that have been mounted by the initramfs and there are jobs started on the 'mounted' event for one of these. In the particular case of cloud-init, the jobs that start on mounted MOUNTPOINT=/ block waiting for the network to come up, which needs the 'virtual-
[Test case]
1. Boot the current quantal daily cloud images.
2. Confirm that at least sometimes, the images take 5 minutes to boot.
3. Upgrade mountall to the quantal-proposed version.
4. Confirm that the images now boot without hitting the "wait for network" timeout.
[Regression potential]
Minimal, as this is correcting a regression from the previous SRU.
This seems sporadic failure at best. I had seen it on openstack fail similarly.
Today I launched 7 instances of us-east-1 t1.micro from each of :
ami-be70f5d7 ebs/ubuntu-
ami-de27a2b7 ebs/ubuntu-
ami-f6c94d9f ebs/ubuntu-
ami-cc8307a5 ebs/ubuntu-
ami-5c43c735 ebs/ubuntu-
and then another one of 20121109 and 20121113.
One of the 2 20121109 failed to boot, and I will attach its console log and cloud-init.log. The log was obtained from stopping the instance, attaching to another system, and getting it. Then I restarted the instance, and it booted fine.
Things that were of interest:
* from cloud-init's perspective, the thing that failed was finding the ec2 metadata service.
* In the failure case, there were no messages to the console log after initramfs, while
cloud-init's log shows it WARNing, which should go to console.
Related bugs:
* mountall: bug 1059471 2.41 fails to mount root partition
* plymouth: bug 1086072 some output to /dev/console does not reach /dev/console
ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: cloud-init 0.7.0-0ubuntu2
ProcVersionSign
Uname: Linux 3.5.0-17-generic x86_64
ApportVersion: 2.6.2-0ubuntu3
Architecture: amd64
Date: Wed Nov 14 21:30:05 2012
Ec2AMI: ami-be70f5d7
Ec2AMIManifest: (unknown)
Ec2Availability
Ec2InstanceType: t1.micro
Ec2Kernel: aki-825ea7eb
Ec2Ramdisk: unavailable
MarkForUpload: True
PackageArchitec
ProcEnviron:
TERM=xterm
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: cloud-init
UpgradeStatus: No upgrade log present (probably fresh install)
tags: | added: patch |
affects: | ubuntu → mountall (Ubuntu) |
Changed in mountall (Ubuntu Quantal): | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Steve Langasek (vorlon) |
description: | updated |
saw this on t1.micro on raring- daily-amd64- server- 20121119
us-east-1 ami-3f70f756 canonical ebs/ubuntu-
same symptoms
* no console output after initramfs
* could not ssh in as None datasource is selected
Here, a reboot fixed the issue.
I'll attach /var/log/boot.log which had some interesting information. Notably, it seems to show that networking did not come up.
The stuff here generally *should* have gone to /dev/console.
Also, interesting was that networking after the reboot was not very reliable. /var/log/ boot.log /tmp/boot.log
'apt-get install pastebinit' would work, but 'pastebinit /var/log/boot.log' did not work.
Also did not work:
ssh $hostname cat /var/log/boot.log > /tmp/boot.log
scp user@$hostname:
the ssh and scp connections would basically block.