Ubuntu

cloud-init.conf never runs, instance not reachable via ssh

Reported by Scott Moser on 2011-02-02
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Natty
High
Unassigned
udev (Ubuntu)
High
Ubuntu Server Team
Natty
High
Ubuntu Server Team

Bug Description

Binary package hint: udev

In natty alpha-2 EC2 testing, I found several instances unreachable via ssh, that were "fixed" with a reboot.

I launched 182 instances across 4 regions. 87 of those were were i386 instances. 7 exhibited this behavior.
All 7 that showed the error were i386 and m1.small. So, its fairly rare.

Of the 182 instances, only the 7 that failed had lines like this in their console log:

| udevd[191]: bind failed: Address already in use
| udevd[191]: error binding control socket, seems udevd is already running

(bug 712034 is related, covering the error messages)

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: udev 165-0ubuntu2
ProcVersionSignature: User Name 2.6.38-1.28-virtual 2.6.38-rc2
Uname: Linux 2.6.38-1-virtual i686
Architecture: i386
CurrentDmesg: [ 13.636015] eth0: no IPv6 routers present
Date: Wed Feb 2 17:40:10 2011
Ec2AMI: ami-c416e6ad
Ec2AMIManifest: ubuntu-images-testing-us/ubuntu-natty-daily-i386-server-20110202.manifest.xml
Ec2AvailabilityZone: us-east-1c
Ec2InstanceType: m1.small
Ec2Kernel: aki-407d9529
Ec2Ramdisk: unavailable
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 LC_MESSAGES=en_US.utf8
 SHELL=/bin/bash
ProcKernelCmdLine: root=LABEL=uec-rootfs ro console=hvc0
ProcModules: acpiphp 23425 0 - Live 0xedc10000
SourcePackage: udev

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :

I should have noted, that cloud-init.conf runs :
  start on (mounted MOUNTPOINT=/ and net-device-up IFACE=eth0 and \
      stopped cloud-init-local )

cloud-init-local had already ran in all cases. It starts on:
  start on mounted MOUNTPOINT=/

tags: added: iso-testing
Changed in udev (Ubuntu Natty):
assignee: nobody → Canonical Server Team (canonical-server)
Dave Walker (davewalker) on 2011-03-02
tags: added: server-nrs
Scott Moser (smoser) wrote :

I hit this 3 times in alpha-3 testing for natty. Again, all i386 instances.

Changed in udev (Ubuntu Natty):
importance: Undecided → High
milestone: none → ubuntu-11.04-beta-1
status: New → Confirmed
Scott Moser (smoser) on 2011-03-08
description: updated
Martin Pitt (pitti) on 2011-03-31
Changed in udev (Ubuntu Natty):
milestone: ubuntu-11.04-beta-1 → ubuntu-11.04-beta-2
Scott Moser (smoser) wrote :

We're *hoping* this is related to bug 731878.

Andy Whitcroft (apw) wrote :

@scott -- as the reference bug is now Fix Released perhaps you could re-test and confirm.

Changed in udev (Ubuntu Natty):
assignee: Canonical Server Team (canonical-server) → Ubuntu Server Team (ubuntu-server)
James Page (james-page) wrote :

I ran several iterations of multiple instance testing across three regions over the last couple of days (see [0]); all instances started up first time which would indicate that this issue is resolved.

Beta-2 candidate testing (see [1]) will complete further instance testing so suggest that we review again at the end of today.

[0] http://tinyurl.com/5v44lwh
[1] http://tinyurl.com/5rwh5sw

Dave Walker (davewalker) wrote :

Tentatively marking Fixed Released based on previous comment, and previous considerations that it may have been an infrastructure issue.

Changed in udev (Ubuntu Natty):
status: Confirmed → Fix Released
Scott Moser (smoser) wrote :

I'm tagging this as 'Affects' linux because that is where the bug/fix actually was. We're very close to certain that this is really just fallout of bug 731878.

Changed in linux (Ubuntu Natty):
importance: Undecided → High
milestone: none → ubuntu-11.04-beta-2
status: New → Fix Released
Scott Moser (smoser) wrote :

So, I marked this as fix released, and most definitely we're seeing it less.

However, we *did* see it once in today's beta2 testing. I'll get console log and attach later.

Scott Moser (smoser) wrote :

Attached is console of failed natty beta2 test.

Scott Moser (smoser) wrote :

previously i attached the wrong console log. Here is the correct console log for natty beta2 failure. Note, we see:

| Begin: Running /scripts/local-bottom ... done.
| done.
| Begin: Running /scripts/init-bottom ... done.
| udevd-work[156]: open /dev/null failed: No such file or directory
| udevd-work[159]: open /dev/null failed: No such file or directory
| udevd-work[158]: open /dev/null failed: No such file or directory
| lxcmount stop/pre-start, process 174
| udevd[220]: bind failed: Address already in use
| udevd[220]: error binding udev control socket
| init: udev main process (220) terminated with status 1
| init: udev main process ended, respawning
| cloud-init start-local running: Thu, 14 Apr 2011 11:19:24 +0000. up 1.32 seconds
| no instance data found in start-local
| init: cloud-init-local main process (243) terminated with status 1
| cloud-init-nonet waiting 60 seconds for a network device.
| cloud-init-nonet gave up waiting for a network device.

Scott Moser (smoser) wrote :

I'm attaching a similar failure in oneiric. It was fixed with reboot.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers