job with "mounted MOUNTPOINT=/ and net-device-up IFACE=eth0" blocks boot

Bug #504883 reported by Scott Moser on 2010-01-08
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
High
Scott Moser
Lucid
High
Scott Moser
udev (Ubuntu)
Medium
Scott James Remnant (Canonical)
Lucid
Medium
Scott James Remnant (Canonical)
upstart (Ubuntu)
Medium
Scott James Remnant (Canonical)
Lucid
Medium
Scott James Remnant (Canonical)

Bug Description

Binary package hint: upstart

If I boot an ec2instance, and then reboot it, it comes up fine.

If I add the following job, and reboot, the system will not come back up.

###/etc/init/early-task.conf
start on (mounted MOUNTPOINT=/ and net-device-up IFACE=eth0)

task
console output
script
echo ========== BEGIN ${UPSTART_JOB}: $(date) ====================
echo "hello world\n"
echo ========== END ${UPSTART_JOB}: $(date) ====================
end script
###

To further debug, I've enabled more verbosity in init with
###/etc/init/init-debug.conf
description "debug me"
start on startup
task
exec initctl log-priority debug
###

This appears to be ec2-specific, as I cannot reproduce it using the same disk image booted under kvm.
Differences between kvm and ec2 boot are:
a.) kvm and ec2 use different kernels/ramdisks
b.) ec2 boot has karmic initramdisk (to work around bug 503212)

ProblemType: Bug
Architecture: i386
Date: Fri Jan 8 18:00:41 2010
DistroRelease: Ubuntu 10.04
Ec2AMI: ami-a0c32ec9
Ec2AMIManifest: ubuntu-images-testing-us/ubuntu-lucid-daily-i386-server-20100107.manifest.xml
Ec2AvailabilityZone: us-east-1c
Ec2InstanceType: m1.small
Ec2Kernel: aki-60cb2609
Ec2Ramdisk: ari-06c22f6f
Package: upstart 0.6.3-11
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: User Name 2.6.32-301.4-ec2
SourcePackage: upstart
Tags: lucid ec2-images
Uname: Linux 2.6.32-301-ec2 i686

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :

console log from clean successful reboot before adding the described job.

Scott Moser (smoser) wrote :

attaching the console log of the boot after enabling the job

Scott Moser (smoser) wrote :

For reference, a successful boot console from a kvm boot of this filesystem.

Scott Moser (smoser) wrote :

In case you're wondering why I'm doing such a silly thing, its part of ec2 boot hooks [https://blueprints.edge.launchpad.net/ubuntu/+spec/server-lucid-ec2-boothooks]. The primary goal is to run as early as possible with write access to root and to block as many other things as possible, so that this init script could modify their behavior.

Changed in upstart (Ubuntu):
importance: Undecided → Medium
Scott Moser (smoser) wrote :

See attachment for usage. this shows how to reproduce this bug.
Verified on 20100129 build.

Steve Langasek (vorlon) wrote :

This turns out to be a bug in udev. udev_monitor_receive_device()'s call to recvmsg is returning ENOBUFS, apparently because the udev upstream code has changed such that only udevd can tweak the buffer size, resulting in upstart-udev-bridge missing some of the events. One of the events it seems to miss regularly on EC2 is the net-device-up IFACE=eth0 one. Needs fixed in udev, because upstart-udev-bridge doesn't even have the option of tweaking the buffer size because it's in an opaque struct and the setter isn't exported in libudev.so; but I've verified locally that bumping up the buffer size does get us this event reliably.

affects: upstart (Ubuntu) → udev (Ubuntu)
Changed in udev (Ubuntu):
status: New → Triaged
Steve Langasek (vorlon) wrote :

Scott mentions he also believes this is the root cause of some missing events that have been causing problems for gdm...

Steve Langasek (vorlon) on 2010-02-10
Changed in udev (Ubuntu Lucid):
assignee: nobody → Scott James Remnant (scott)
Thierry Carrez (ttx) wrote :

Targeting to alpha3 since this blocks completion of the boothooks spec (High priority).

Changed in udev (Ubuntu Lucid):
milestone: none → lucid-alpha-3
Scott Moser (smoser) wrote :

current status of this bug:
Summary:
 Due to this bug, cloud-init has to run later in the boot process than we would like. Ideally it would run on the stated conditions and block other upstart jobs from running. That allows for the most modification of initial boot without re-starting jobs. Currently, as a work around the cloud-init process runs much later.
 When this bug is fixed and cloud-init runs earlier in the boot process there quite likely could be fallout due to a difference in environment than it is currently running in (ie, which filesystems are mounted and such). Because of that, I'd like to see this bug fixed sooner rather than later.

Notes:
 - Steve debugged and found this is an issue with udev . Events are being lost, upstart is not getting the events and thus the stated 'start on' condition never occurs.
 - Steve mentioned that Keybuk is working on finding a solution with upstream udev.

Changed in upstart (Ubuntu Lucid):
status: New → Triaged
importance: Undecided → Medium
milestone: none → lucid-alpha-3
assignee: nobody → Scott James Remnant (scott)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package udev - 151-5

---------------
udev (151-5) lucid; urgency=low

  * Merge from GIT HEAD:
    - Force key release for volume keys on Dell Studio 1557.
    - keymap: Add Toshiba Satellite M30X. LP: #510019.
    - libudev: export udev_monitor_set_receive_buffer_size()
    - udevadm monitor: increase netlink buffer size
      (above two related to LP: #504883)
 -- Scott James Remnant <email address hidden> Wed, 17 Feb 2010 15:47:18 +0000

Changed in udev (Ubuntu Lucid):
status: Triaged → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 0.6.5-2

---------------
upstart (0.6.5-2) lucid; urgency=low

  * udev/upstart-udev-bridge.c:
    - Increase receiving buffer size for uevents so we don't miss any.
      LP: #504883.
 -- Scott James Remnant <email address hidden> Wed, 17 Feb 2010 15:50:40 +0000

Changed in upstart (Ubuntu Lucid):
status: Triaged → Fix Released
Scott Moser (smoser) wrote :

Tried to enable running on the specified "start on", but ran into bug 524484 .
Got that fixed, with new upstart build, system now appears to be up, but I get ssh connection refused.

Steve said that he was seeing this also I believe in his tests.

Scott Moser (smoser) on 2010-02-19
Changed in cloud-init (Ubuntu Lucid):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Scott Moser (smoser)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.5.7-0ubuntu1

---------------
cloud-init (0.5.7-0ubuntu1) lucid; urgency=low

  * New upstream release.
  * run cloud-init early in boot process (LP: #504883, #524516)

cloud-init (0.5.6-0ubuntu1) lucid; urgency=low

  * New upstream release.
  * supports 'runcmd' in cloud-config
  * enable the update check code (LP: #524258)
  * fix retry_url in boto_utils.py when metadata service not around
    (LP: #523832)
  * run cloud-config-puppet.conf later (LP: #523625)
  [ Scott Moser 0.5.5 ]
  * New upstream release, supports checking for updates
 -- Scott Moser <email address hidden> Fri, 19 Feb 2010 18:27:45 -0500

Changed in cloud-init (Ubuntu Lucid):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers